The previous x2 was still not enough, and the test is still killed often in
slow GHA CI workers, eg:
https://github.com/systemd/systemd/actions/runs/28012425459/job/82908555094?pr=42705
This happens in test units with many commands, so reset the timer when
a command completes and the test advances. The number of Exec
instructions is bounded so this will terminate jobs that are really
stuck anyway.
Follow-up for 3b00327fe6
Linux 7.0 added the ability to mark socket inodes with xattrs. Let's use
that to clearly mark all our Varlink sockets as being varlink related.
This is then used to implement a very useful new command "varlinkctl
list-sockets" which lists all varlink entrypoint sockets marked this
way.
By marking not just the entrypoint inodes but also the connection
sockets properly, we can one day add an ebpf based "varlinkctl trace"
command that watches varlink sockets for traffic. but that's material
for a later PR.
When the test suite is run in the "standalone" mode, the minimal
container might not contain the test-fdstore binary that's needed for a
couple of tests. Since installing systemd-tests into the minimal
container pulls in a lot of other dependencies, let's just skip the
affected tests instead to avoid this.
This also relaxes the inode access modes a bit, in case they were set to
0600: we now set the "r" bit too, i.e. use 0644. This is beneficial
since it permits unpriv code to read the xattrs of the entrypoints
(which require read access). Note that in order to be able to connect()
to a socket inode you need write access, hence this shouldn't compromise
security in any way.
tpm2-setup requires both libcrypto and the tpm2-tss libraries, but so
far it only directly dlopen'ed libcrypto, with a clear error on startup
if missing, and a dependency added via dlopen notes.
Do the same for the tpm2-tss dlopens, to get a clear error and the
required dependencies.
This commit exposes the last 10 high priority logs as metrics so that
the systemd-report reports them. The entries are reported as
`io.systemd.Journal.HighPriorityMessage` and include all fields that are
printable as strings.
This is archived via a new socket-activated unit listens on
/run/systemd/report/io.systemd.Journal
With EnqueueUnitJobMany(), one anchor can collapse to NOP (inactive
unit + try-restart) while another anchor pulls that same unit in as a
regular start/restart job, leaving a NOP and a regular job in one
unit's transaction list, hitting an assert:
#11 0x00007f3fd2a446dc in __assert_fail (assertion=<optimized out>, file=<optimized out>, line=<optimized out>,
function=<optimized out>) at ./assert/assert.c:127
#12 0x00007f3fd326e872 in job_type_lookup_merge (a=<optimized out>, b=<optimized out>) at ../src/core/job.c:428
#13 0x00007f3fd32e5641 in job_type_merge_and_collapse (a=0x7ffc7dda2430, b=<optimized out>, u=0x557bb11434c0)
at ../src/core/job.c:523
#14 0x00007f3fd335e4b3 in transaction_ensure_mergeable (tr=tr@entry=0x557bb0f6d150,
matters_to_anchor=matters_to_anchor@entry=true, e=e@entry=0x7ffc7dda33e0) at ../src/core/transaction.c:241
#15 0x00007f3fd3360242 in transaction_merge_jobs (tr=0x557bb0f6d150, e=0x7ffc7dda33e0)
at ../src/core/transaction.c:273
#16 transaction_activate (tr=0x557bb0f6d150, m=0x557bb0dd9c10, mode=JOB_REPLACE, affected_jobs=0x0, e=0x7ffc7dda33e0)
at ../src/core/transaction.c:797
#17 0x00007f3fd33091ed in manager_add_jobs (m=<optimized out>, type=<optimized out>, names=<optimized out>,
reload_if_possible=false, mode=JOB_REPLACE, extra_flags=0, affected_jobs=0x0, reterr_error=0x7ffc7dda33e0,
ret_jobs=0x557bb0fe8790) at ../src/core/manager.c:2386
Follow-up for 7d3b32daef
growfs actually gracefully skips when cryptsetup fails or is
missing already, and it is only necessary when the device is
a LUKS device anyway. Downgrade from required ro recommended.
Follow-up for b0ede9f9ee
This commit exposes the last 10 high priority logs as metrics
so that the systemd-report reports them. The entries are
reported as `io.systemd.Journal.HighPriorityMessage` and
include all field as the new METRIC_FAMILY_TYPE_OBJECT.
Individual fields from a journal entry that are unprintable
(invalid utf-8) are skipped.
This is archived via a new socket-activated unit listens on
/run/systemd/report/io.systemd.Journal
This commit adds a new OUTPUT_SKIP_UNPRINTABLE to the OutputFlags
and adds code in `update_json_data` and `json_escape` to honor it.
When set all json fields that have unprintable data will be skipped
and `null` is send instead.
We will need a way to send journal entries as metrics. Those are already
json objects. So Lennart suggested to introduce a new type
METRIC_FAMILY_TYPE_OBJECT that does this. This commit implements
his suggestion.
vmm.c carries the confidential-VM detection used by sd-boot/sd-stub.
Its detect_tdx() had the same dead guard as the userspace copy: it
gated the 0x21 read on CPUID_GET_HIGHEST_FUNCTION (0x80000000, the
extended max function), which is always >= 0x80000000, so the guard
never held.
Mirror the userspace fix: read leaf 0x21 directly and rely on the
IntelTDX signature, matching the kernel. An out-of-range CPUID leaf
returns the highest basic leaf's data (no fault), and 0x21 is a
synthetic TDX leaf whose presence need not be reflected in the max
basic function, so it must not be gated on it.
Ref: Linux 59bd54a84d15 ("x86/tdx: Detect running as a TDX guest in
early boot"), arch/x86/coco/tdx/tdx.c:1119 (tdx_early_init()).
Signed-off-by: Paul Meyer <katexochen0@gmail.com>
detect_tdx() guarded the read of the TDX enumeration leaf (0x21, a
standard leaf) with CPUID_GET_HIGHEST_FUNCTION (0x80000000), which
returns the highest *extended* function. eax is therefore always
>= 0x80000000, so the "eax < 0x21" guard never held and the leaf was
read unconditionally anyway.
Drop the guard rather than re-gate it on the basic max function
(leaf 0), and read 0x21 directly, relying on the IntelTDX signature
compare. This matches the kernel, which reads the leaf unconditionally
on purpose: an out-of-range CPUID leaf returns the highest basic leaf's
data (no fault, per the Intel SDM), and 0x21 is a synthetic TDX leaf
whose presence need not be reflected in the reported max basic function,
so gating the read on it risks missing a genuine TDX guest. With no
guard the Hyper-V isolation fallback (Azure TDX guests have 0x21
blocked) also stays reachable.
Ref: Linux 59bd54a84d15 ("x86/tdx: Detect running as a TDX guest in
early boot"), arch/x86/coco/tdx/tdx.c:1119 (tdx_early_init()).
Signed-off-by: Paul Meyer <katexochen0@gmail.com>
msr() returned 0 on failure, indistinguishable from a real MSR value of
0. With /dev/cpu/0/msr unavailable (e.g. the msr module not loaded in the
initrd), detect_sev() read 0 and reported a genuine SEV/-ES/-SNP guest as
CONFIDENTIAL_VIRTUALIZATION_NONE.
That inverts the firmware-credential trust gate: import_credentials_*()
skip fw_cfg/SMBIOS credentials only when detect_confidential_virtualization()
is > 0 ("don't trust firmware in confidential VMs"). A false NONE makes a
confidential guest trust and import credentials injected by the untrusted
hypervisor.
msr() now returns a negative errno, and detect_sev() assumes plain SEV when
the MSR is unreadable but CPUID already advertised SEV under a hypervisor,
so the gate still trips.
The conservative branch only fires when CPUID already advertised SEV, i.e.
for a guest the hypervisor marked SEV-capable. QEMU gates that CPUID leaf on
the SEV launch object and does not expose it to ordinary guests even under
-cpu host, so it does not misfire for non-confidential guests. Were a
hypervisor to expose the bit anyway the outcome is fail-safe (we only
decline to trust firmware-supplied data); nothing in-tree branches on the
specific SEV tier.
Signed-off-by: Paul Meyer <katexochen0@gmail.com>
vl_method_extend() accepted an empty text/data value and measured an
empty word, bypassing the empty-word refusal the CLI path already
enforces. Measured words are joined with ":" in the record, so an empty
word is ambiguous. Reject it.
Signed-off-by: Paul Meyer <katexochen0@gmail.com>
tpm2_index_to_handle() returns 0 with a NULL handle when the NV index is
not present on the TPM. tpm2_nvpcr_extend_bytes() only checked for r < 0,
so a tombstoned NvPCR (anchor file present, NV slot cleared out from under
us) passed the NULL handle to tpm2_extend_nvpcr_nv_index() and aborted the
process via its assert(). Handle r == 0 explicitly, as the other
tpm2_index_to_handle() callers already do.
The newly introduced -ENODEV is mapped together with -ENOENT to the
io.systemd.PCRExtend.NoSuchNvPCR varlink error.
Signed-off-by: Paul Meyer <katexochen0@gmail.com>
Add tsm_report_acquire(), a thin wrapper around the kernel's
/sys/kernel/config/tsm/report/ configfs interface for fetching a
confidential-computing attestation report (SEV-SNP, TDX, ...), including
a caller supplied input.
A report signing backend that returns a confidential-computing
attestation report obtained via configfs-tsm. Implements
io.systemd.Report.Signer.Sign(): embeds the digest as the report's
inblob and returns the outblob (plus provider and any aux/manifest
blobs). Wired up as the "tsm" mechanism with a socket-activated service.
Currently only a single job for a single unit can be enqueued
atomically,
so there is no guarantee that, e.g., starting a unit and its socket
at the same time will happen in the same transaction. That forces
callers to 'know' the right order in which to start new units being
installed, or failures will occur. It also means some ordering
constraints are ignored, in case the separate calls are done
in the wrong manual order.
Add a new EnqueueUnitJobMany() D-Bus method that takes a list of units
to start.
min_free is supposed to track the minimum free space across all home
directories to scale the next rebalance interval. However, it was
incorrectly assigned h->rebalance_size (the home's current total
allocation) instead of new_free (the remaining allocatable space).
This caused the rebalance interval to be computed from allocation sizes
rather than free space, so a nearly-full home would not trigger the
shorter intervals it should, delaying response to low-space conditions.
Signed-off-by: dongshengyuan <dongshengyuan@uniontech.com>
Currently the partition list is ordered like this: First come the
partitions that exist as definition files (could be pre-existing
partitions or could be new ones), then come the pre-existing partitions
that aren't matched to a definition file.
This ordering is visible to the user when we print our partition table,
and it doesn't really make sense from a UX perspective: Partition tables
are usually either presented in order of the partition indices, or in
order of the partition offsets. Arguably the latter would be nicer here,
since the visualization below is already ordered by physical offsets.
So reorder the list after we assigned the new partitions to their
respective free areas, according to the physical offset (or, for
partitions to newly create, the order that we will allocate them in).
Another potential upside of this is that we could rely on the partition
order in the code now more, too.
To ensure it keeps working, also add a test in the integration tests for
it.
Screenshot before:
<img width="2853" height="686" alt="Screenshot From 2026-06-05 00-58-07"
src="https://github.com/user-attachments/assets/7f24b527-7d79-49c4-916b-52faa892d4eb"
/>
Screenshot after:
<img width="2853" height="686" alt="Screenshot From 2026-06-05 00-58-16"
src="https://github.com/user-attachments/assets/4505ec5e-cab4-4ac1-95f0-b5af3991509e"
/>
Convert the --help text of systemd-imds and systemd-imdsd to the common
help_cmdline()/help_abstract()/help_section()/help_man_page_reference()
helpers, for a uniform output style across tools.
When systemd-imds is invoked as a Varlink service (via the new
systemd-imds-metrics.socket), it now acts as an io.systemd.Metrics
provider for systemd-report. It connects to systemd-imdsd over the
existing io.systemd.InstanceMetadata interface to acquire the real
data and re-exposes the detected cloud vendor plus the well-known
hostname, region, zone and public IPv4/IPv6 fields as metrics in the
io.systemd.InstanceMetadata.* namespace.
The metrics logic lives entirely on the client side
(imds-tool-metrics.c); systemd-imdsd is unchanged. Each metric is
acquired on demand with a blocking call to the daemon, benefiting from
its local cache. Fields that are unset or unsupported by the vendor are
simply omitted.
The metrics socket is statically enabled into sockets.target.wants/.
- Pass colon+1 (port string) instead of s (hostname) to safe_atou16,
so host:port registries are no longer always rejected.
- Switch to safe_atou16_full() with base-10 and strict flags to reject
non-decimal port forms (hex, octal, leading whitespace, sign prefix)
that would produce malformed URL authorities.
- Reject empty host explicitly via isempty() guard (covers both NULL
and empty-string input), and guard colon == n to reject ':port' form,
since dns_name_is_valid('') == 1 (DNS root) would otherwise accept
empty host as valid.
- Wrap overlong line to fit 109-column limit.
- Add test coverage for oci_registry_is_valid().
Signed-off-by: dongshengyuan <dongshengyuan@uniontech.com>
Currently `systemd-ssh-generator` supports
`systemd.ssh_listen=vsock::22` and aliases the "empty CID" towards
`VMADDR_CID_ANY`. VMADDR_CID_ANY is -1, so it's confusing from a user
experience that `systemd.ssh_listen=vsock:-1:22` isn't supported.
A report signing backend that returns a confidential-computing
attestation report obtained via configfs-tsm. Implements
io.systemd.Report.Signer.Sign(): embeds the digest as the report's
inblob and returns the outblob (plus provider and any aux/manifest
blobs). Wired up as the "tsm" mechanism with a socket-activated service.
Signed-off-by: Paul Meyer <katexochen0@gmail.com>
Add tsm_report_acquire(), a thin wrapper around the kernel's
/sys/kernel/config/tsm/report/ configfs interface for fetching a
confidential-computing attestation report (SEV-SNP, TDX, ...), including
a caller supplied input.
Signed-off-by: Paul Meyer <katexochen0@gmail.com>
verb_attach() falls back to unsigned activation (crypt_activate_by_volume_key)
when signed activation fails, but still passed the signature to
pcrextend_verity_now(). The signer is parsed out of the (unverified)
signature and folded into the dm_verity NvPCR measurement, making an
unsigned fallback indistinguishable from a genuinely signed activation to
an attester. Only measure the signature when signed activation succeeded.
Signed-off-by: Paul Meyer <katexochen0@gmail.com>