* f7762b7143 sandbox: Preserve net caps across user namespace before unsharing net
* 582eadee34 Revert "Put build history into the output directory"
* 5ef262bc53 action: don't fail if apk cannot be downloaded
* bdd341ff9b Lock the package cache during package manager invocations
* da49fe976c Put build history into the output directory
* 1c392f1918 tests: Use unique machine names
* e4f4026e30 tests: Reduce VM RAM size
* de41a5e03e Don't leak gpg-agent when signing with gpg
* 1bc5d61e1d ci: Pin openSUSE to second-to-last Tumbleweed snapshot
* c4d565a009 test: Use the main build's snapshot for extension builds
* 718b06c866 tests: ignore masked units in check-and-shutdown
* 0dc5ecbc02 ci: enable postmarketOS in integration testing
* d4c6761ad3 action: install apk to /usr/bin
* 9980f31309 mkosi-vm: add systemd-efistub to postmarketOS config
* 5640ace38f mkosi.conf: add grub to postmarketOS
* 6741b440c0 mkosi-initrd: add sulogin, device-mapper to postmarketOS initrd
* c3575c035c mkosi-tools: add missing packages to postmarketOS tools tree
* 0774bc2498 mkosi-tools: add apk-tools to tools trees for Arch and OpenSuSE
| * bb87e48401 curl: Retry on failures
|/
* 41fea1dd8d dnf: Work around librepo rejecting valid repomd signatures cross-distro
* 647e3b610b dnf: Proper repository metadata signature requirement
* 46d907cce2 dnf: Don't skip unavailable repositories during makecache
* a91e89c3b7 run_locale_gen: noop if output_format is confext
* 30329e401b tests: Make integration tests runnable locally
* be549f04db config: Don't propagate $MKOSI_DNF when using a tools tree
* 42ed648981 build(deps): bump actions/upload-artifact from 7.0.0 to 7.0.1
* fd5eedd62b build(deps): bump aws-actions/configure-aws-credentials
* 86733c703d tree: check for root when copying SELinux attributes as well
* de2256f8fe Skip security.ima xattrs when copying tree as non-root
| * 08ebf6d678 vmspawn: Exclude secure-boot unless requested
|/
* 1d3c51e36d obs workflow: do not build aarch64/i586
When 'homectl deactivate' is called immediately after a preceding
operation, the umount inside systemd-homework can fail with EBUSY
because something briefly holds a reference to the home mount (e.g. a
concurrent inspect). systemd-homed already handles this gracefully
by moving the home into the 'lingering' state and retrying deactivation
after 15 seconds, but the bus reply for the original DeactivateHome
call returns the org.freedesktop.home1.HomeBusy error immediately,
which makes TEST-46-HOMED flaky.
Fix homectl to follow homed and retry for up to 30 seconds on HomeBusy
and add a test case trying to make the issue more reproducible.
Let's do the standard thing. The 'static const' variable requires space
and less efficient code (moving from memory instead of a const insertion).
This doesn't matter much, but let's follow the standard pattern.
Follow-up for 93e9c2c974.
setup_swtpm() decided whether a software TPM had already been
manufactured by checking whether the state directory was empty. But
manufacture_swtpm() writes swtpm's config files before forking
swtpm_setup, so an interrupted manufacture leaves the directory
non-empty yet without a usable TPM. The next boot then mistook it for a
complete TPM and started swtpm against a broken state directory.
Keying off a swtpm state file like tpm2-00.permall is no better, as
swtpm_setup gives no guarantee any single one is written atomically or
last. Instead, have manufacture_swtpm() write a marker (.manufactured)
as its very last step, once swtpm_setup has exited successfully, and
gate on it: re-manufacture when it is missing in the initrd, and refuse
rather than start a broken TPM outside it.
Signed-off-by: Paul Meyer <katexochen0@gmail.com>
Open the swtpm state directory once and write the three config files
relative to that fd with WRITE_STRING_FILE_ATOMIC, rather than by path
with a plain truncating write. Writing atomically ensures a crash or a
concurrent reader never observes a half-written config file, and
operating through a single directory fd lets later steps reuse it.
Signed-off-by: Paul Meyer <katexochen0@gmail.com>
The new polkit will return a new detail regarding a successful
authentication: the actual result type, which we can use to
see whether the user authenticated as admin. This can be used
to grant additional privileges.
Apply sandboxing. The plain backend's needs writable StateDirectory and
/dev/urandom for key generation. The service must stay root (the
private key is root-only), but everything else is locked down.
Signed-off-by: Paul Meyer <katexochen0@gmail.com>
client_context_read_extra_fields() reads a 64-bit field length v from
the per-unit log-extra-fields file. n = sizeof(uint64_t) + v overflows
when v is near UINT64_MAX, so the "left < n" check is bypassed and the
following memchr() scans v bytes past the buffer. Bound v against the
remaining bytes instead, which cannot overflow.
uid_range_partition() filled the grown entries[] buffer backwards in
place. The backward-fill invariant (the write cursor stays above the
read index) only holds when every source entry contributes at least
one partition; an entry with nr < size contributes zero, so the cursor
stalls while the read index keeps descending. A later multi-part
entry's writes then overwrite the still-live zero-part slot, the
corrupted slot is re-read as a one-part entry, and the next
range->entries[--t] underflows.
Add a forward compaction first pass that drops the zero-part entries
before the backward fill.
Follow-up for 025439faaa
Co-Authored-by: Paul Meyer <katexochen0@gmail.com>
dhcp6_option_parse_ia_pdprefix() validates the lifetimes but never the
prefixlen byte, so a delegated prefix with prefixlen == 0 or > 128 is
stored in the lease and handed over.
RFC 8415 defines the prefix length as 1 to 128, and the send-side
option_append_pd_prefix() already rejects 0, so reject the out-of-range
values on the receive path too.
Follow-up for f8ad4dd45d
If the user callback set via sd_lldp_rx_set_callback() drops the last
reference to the sd_lldp_rx object, trying to use it later does not go
well. Take a ref to keep the objects alive as long as they are needed.
When sd_bus_call() fails, the continue was inside the
'if (ret == EXIT_SUCCESS)' guard, so only the first failure skipped
adding the unit to the job waiter. On the second and subsequent
failures, the unit was still passed to bus_wait_for_units_add_unit()
despite no job being started, causing bus_wait_for_units_run() to
hang indefinitely.
Move continue outside the guard so any failure skips the waiter
registration. The guard still prevents ret from being overwritten by
a later error code.
Signed-off-by: dongshengyuan <dongshengyuan@uniontech.com>
Lennart reminded me in [1] that we need to add assert() in functions
that do pointer access. For the simple `*p` pointer dereferences
we even have an automatic coccinelle script that ensures that as
part of the automatic code checks.
However for deref in the `p->` style this is not supported right
now and adding it to coccinelle is hard because its too slow for
this kind of check. So I created a (slightly messy) tree-sitter
python script to see how many asserts we are currently missing.
This commit is the result of running it over the `src/basic`
dir and fixing the flagged issues. I plan to tidy it up and
add it to the checks too but this is orthogonal to this commit.
[1] https://github.com/systemd/systemd/pull/42360#discussion_r3426964562
The userns/netns/ipcns fdpairs were declared as plain int arrays without
_cleanup_close_pair_. If exec_shared_runtime_add() fails (e.g. OOM on
hashmap_ensure_put), the already-opened fds are leaked.
Since exec_shared_runtime_add() uses TAKE_FD on success, the array
entries are reset to -1 after ownership transfer, so adding
_cleanup_close_pair_ is safe and closes the fds only when they were
never consumed.
Signed-off-by: dongshengyuan <dongshengyuan@uniontech.com>
journal_file_setup_data_hash_table() allocates s * sizeof(HashItem)
bytes for the hash table but then only zeroes s bytes, leaving 15/16 of
the entries uninitialized. This corrupts the hash chain in any newly
created journal file.
The adjacent journal_file_setup_field_hash_table() already uses the
correct size.
Signed-off-by: dongshengyuan <dongshengyuan@uniontech.com>
Lock down the software TPM service: restrict the runtime directory (which
holds the AES key sealing swtpm's state) to 0700, and apply the usual
sandboxing (NoNewPrivileges, MemoryDenyWriteExecute, ProtectSystem-adjacent
Protect*/Restrict* knobs, PrivateNetwork, PrivateTmp, a @system-service
syscall filter, etc.).
A few common knobs can't be used here: the service must keep CAP_SYS_ADMIN
(needed for the ioctl that creates the vtpm proxy device on /dev/vtpmx),
and it needs runtime access to the ESP and its backing block device at a
path only known at runtime, which rules out PrivateDevices=, DevicePolicy=,
ProtectSystem= and User=/DynamicUser=.
Signed-off-by: Paul Meyer <katexochen0@gmail.com>
swtpm keeps its state on the ESP (--tpmstate=dir=) and thus holds it
busy for as long as it runs, but nothing ensured it was stopped before
the ESP was unmounted on shutdown, leaving boot.mount failing to
unmount.
Two things were missing:
- systemd-tpm2-swtpm.service has DefaultDependencies=no, which strips
the implicit shutdown.target membership, so it was torn down late
rather than stopped in an ordered manner. Add
Conflicts=/Before=shutdown.target, as the sibling
systemd-tpm2-setup{,-early}.service units already do.
- The generator only ordered the service
After=boot.automount/efi.automount. Ordering after the .automount
units is enough for start-up, but only an ordering against the actual
.mount units makes the service stop (releasing the ESP) before the
file system is unmounted. Add boot.mount/efi.mount to the After= line;
this is a no-op at start-up, as the mount has no job of its own there
(it is triggered on access via the automount).
Signed-off-by: Paul Meyer <katexochen0@gmail.com>
Boot a VM in EFI mode without a hardware/firmware TPM and with
systemd.tpm2_software_fallback=yes, so systemd-tpm2-generator manufactures a
software TPM on the ESP in the initrd and chainloads swtpm. Assert the service
starts, the vtpm-proxy device shows up, and a systemd-creds TPM2 seal/unseal
round-trip works. Then reboot and confirm the sealed secret still unseals,
i.e. the TPM state persisted on the ESP across the reboot.
Signed-off-by: Paul Meyer <katexochen0@gmail.com>
The strv_sort() call sat after a for (;;) loop whose only exits are
return statements inside the loop, so it never ran.
CID#1660125
Follow-up for 82b8615463
dhcp_option_type_from_code() returns _DHCP_OPTION_TYPE_INVALID (-EINVAL)
for the PAD and END option codes, and dump_dhcp_option_one() uses the
returned value directly as an index into the functions[] table. Those
codes are excluded by an assert() at the top of the function, but
assert() compiles down to __builtin_unreachable() under NDEBUG, so a
negative array index read is reachable there (and trips static
analyzers). Bail out explicitly on the error return.
CID#1660105
Follow-up for 149adb2fdc
Building the result one char at a time via strextendn() is O(N^2)
because each call rescans and reallocs the buffer. With lines up to
LONG_LINE_MAX this caused a timeout in fuzz-hostname-setup. Use
GREEDY_REALLOC_APPEND to make it linear.
Fixes https://github.com/systemd/systemd/issues/42713
dns_stub_listener_extra_free() frees the listener while DnsQuery and
DnsStream objects still keep pointers to it. On a reload the extra
listeners are freed before dns_stream_disconnect_all() and
dns_query_free() run, and dns_query_free() then dereferences those
pointers.
bus_method_register_service() inserted the DnssdRegisteredService into
m->dnssd_registered_services before assigning service->manager and
before the sd_bus_track_new()/sd_bus_track_add_sender() calls, so if
either failed, the destructor ran with service->manager still NULL,
so its guarded hashmap_remove() was skipped and the freed service was
left in the hashmap.
On Lenovo ThinkPad T14 Gen 1 AMD model 20UES5TQ00 with the Brazilian
keyboard, the physical slash/question key reports as KEY_RIGHTCTRL.
This keyboard layout has no physical Right Ctrl key in that position. The
key after Space is AltGr, then PrtSc, then the slash/question key. Map the
AT keyboard scancode 0x9d to KEY_RO, matching the ABNT slash/question key
used by Brazilian keyboard layouts.
Verified with evtest:
Event: type 4 (EV_MSC), code 4 (MSC_SCAN), value 9d
Event: type 1 (EV_KEY), code 97 (KEY_RIGHTCTRL), value 1
After applying the hwdb mapping, the key reports as KEY_RO.
DMI: svnLENOVO:pn20UES5TQ00:pvrThinkPadT14Gen1
AT keyboard scancode: 0x9d
ellipsize_mem() scans backwards for ANSI escape sequences and calls
previous_ansi_sequence(s, t - s, ...) as t walks down toward s. When
t reaches s + 1 the helper is invoked with length == 1 and computes
'length - 2', which wraps to SIZE_MAX - 1.
Follow-up for cb558ab222
This is now implemented: sysupdate calls out to the
/run/systemd/sysupdate/notify/ Varlink directory on completion, and bootctl
binds a socket there that links a UKI plus extras staged below
/var/lib/systemd/uki/ (with .v/ vpick support) via "bootctl link-auto".
Add a TEST-87 testcase exercising "bootctl link-auto" and the equivalent
io.systemd.BootControl.LinkAuto() Varlink method: a UKI plus extras are staged
below the search directories and we assert the kernel and sidecar resources
are linked into $BOOT. Covered: plain kernel.efi + extras.d/, versioned
kernel.efi.v/ and extras .v/ resolved via vpick, directory priority
(/etc wins over /run), the no-op case when nothing is staged, and the Varlink
method including its empty reply when there is nothing to link.
Extend TEST-72-SYSUPDATE with a check that, after a successful update,
systemd-sysupdate connects to every socket linked into
/run/systemd/sysupdate/notify/ and invokes
io.systemd.SysUpdate.Notify.OnCompletedUpdate(). A tiny recorder socket is
hooked into that directory; it captures the request and replies with success.
We assert the recorded call carries the expected method, version and resource
list, and that a subsequent no-op update emits no notification.
Bind the io.systemd.SysUpdate.Notify.OnCompletedUpdate() method in the
sysext Varlink server. systemd-sysext provides a single Varlink service
covering both the sysext and confext image classes, so one notification
refreshes both (equivalent to "systemd-sysext refresh" plus
"systemd-confext refresh"). Hook a socket into
/run/systemd/sysupdate/notify/ via systemd-sysupdate-notify-sysext.socket,
enabled by default via the preset.
Add a "bootctl link-auto" verb and a matching io.systemd.BootControl.LinkAuto()
Varlink method that behave exactly like "bootctl link" / Link(), except that
the UKI and extra resources are discovered automatically instead of being
passed in. The following directories are searched, in decreasing priority:
/etc/systemd/uki/, /run/systemd/uki/, /var/lib/systemd/uki/ (where
systemd-sysupdate stages downloaded resources), /usr/local/lib/systemd/uki/
and /usr/lib/systemd/uki/.
- the UKI is taken from kernel.efi, or the best version in kernel.efi.v/
(resolved via vpick, without honouring boot-counting suffixes), from the
highest-priority directory that has one;
- extra resources are picked up from extras.d/, matching *.sysext.raw,
*.confext.raw and *.cred, each either as a plain file or as a versioned
*.v/ directory resolved via vpick, combined across all directories with
higher-priority directories winning on conflicts.
Everything is resolved relative to the pinned root directory fd. Files passed
via --extra= on the command line are linked in addition to the auto-discovered
ones.
Also bind io.systemd.SysUpdate.Notify.OnCompletedUpdate() in the boot control
Varlink server, which simply does the same as LinkAuto(), and hook a socket
into /run/systemd/sysupdate/notify/ via systemd-sysupdate-notify-bootctl.socket
(enabled by default via the preset) so a freshly downloaded kernel is linked
into $BOOT automatically after a sysupdate run.
Bind the io.systemd.SysUpdate.Notify.OnCompletedUpdate() method in the
pcrlock Varlink server and hook a socket into
/run/systemd/sysupdate/notify/ via systemd-sysupdate-notify-pcrlock.socket,
enabled by default via the preset. When sysupdate signals a completed
update, we unconditionally re-run make-policy, since the set of measured
components may have changed.
Define a new io.systemd.SysUpdate.Notify Varlink interface with a single
OnCompletedUpdate() method, and after sysupdate successfully installs an
update, invoke that method on every socket linked into
/run/systemd/sysupdate/notify/ via varlink_execute_directory(). This
gives other components a hook to react to applied updates (e.g. recompute
a TPM policy, link a freshly downloaded kernel, refresh extensions).
The notification carries the component name, the installed version and the
list of updated resources (transfer id + on-disk path). Subscribers are
free to ignore the parameters and just treat the call as a trigger.
Setting SYSTEMD_SYSUPDATE_FORCE_NOTIFY=1 forces the notification to be sent
even when no update was applied (in which case no resource list is included),
so follow-up work can be triggered unconditionally.
Fixes: #35988
Mirror how chaseat() works these days: instead of a single toplevel_fd that
serves as both the root (chroot) boundary and the directory that resolution
starts from, path_pick() now takes a separate root_fd and dir_fd. This lets
callers resolve a path relative to a specific directory fd while confining
symlink and absolute-path resolution to a root directory fd.
All existing callers are updated to pass the same fd for both, preserving
their current behaviour.