Extend the garbage-collection framework so a collectible resource can emit
forward references during graph traversal, in addition to the existing
back-reference mechanism.
A CollectionContext may now implement the optional collectionWithReferences
interface:
References(ctx context.Context, node gc.Node, fn func(gc.Node))
When the GC visits a node whose resource type was registered by an external
collector, gcContext.references consults the per-type References
implementation after the built-in core resource types are handled.
This is the forward-reference analogue of collectionWithBackRefs. Whereas
ActiveWithBackRefs must enumerate every edge up front and the gcContext
holds all of them in its backRefs map for the entire collection, References
is invoked on demand for a single node. A collector whose resources fan
out to many other nodes can therefore emit those edges without retaining
them in memory for the gc context.
This commit is intentionally a no-op: no plugin registers a collector that
uses collectionWithReferences yet. It is isolated here so that concurrent
development efforts that depend on this interface can be proposed and
reviewed upstream independently.
Signed-off-by: Derek McGowan <derek@mcg.dev>
Force a 4K block size on all platforms rather than only on darwin.
An explicit caller-supplied -b is still respected.
Signed-off-by: Chris Crone <christopher.crone@docker.com>
The CRI checkpoint restore path unpacked checkpoint archive/OCI image content
directly into the container's persistent state directory and read files such as
container.log back from it with a symlink-following copy. Checkpoint content is
externally provided, so make restore more defensive about what it unpacks and
how it reads those files back.
Behavior changes:
- Only unpack regular files and directories from the checkpoint archive.
- Unpack checkpoint content into a dedicated <state>/ctrd-restore
subdirectory created fresh rather than into the state dir itself, so
checkpoint content cannot collide with containerd's own files (e.g.
the "status" blob). Restore and cleanup operate on that subdir;
cleanup is now a single RemoveAll of it.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Allow the last host to retry on transient network errors to incrase the
likelihood of the operation succeeding and help reduce flaky tests.
Signed-off-by: Derek McGowan <derek@mcg.dev>
TestImagesCreateUpdateDelete asserts that an image's updatedat is
strictly after its createdat. Both timestamps are stamped via
time.Now().UTC(), which strips the monotonic reading, so the comparison
falls back to the wall clock. On platforms with coarse timer resolution
(e.g. Windows, which advances system time at the ~15.6ms tick), the
Create and Update calls can land in the same tick and produce identical
timestamps, making the strict After() check fail intermittently.
Wait for the wall clock to advance past the creation timestamp before
updating so the assertion stays meaningful without depending on clock
resolution. On fine-resolution clocks the loop runs zero iterations.
Signed-off-by: Austin Vazquez <austin.vazquez@docker.com>
Some proxy stream setup and receive paths still returned raw RPC
status errors while neighboring proxy methods normalized them with
errgrpc.ToNative. This made errdefs checks depend on which proxy API
surfaced the same remote failure.
Normalize event subscription setup and receive errors, and streaming
stream creation errors, while preserving io.EOF for completed receive
streams.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
Most content proxy operations normalize remote RPC errors before
returning them, including stream receive errors from Walk and write
errors from the remote writer. remoteReaderAt.ReadAt was an outlier and
returned raw status errors from Read and Recv.
Callers that use content.ReadBlob through the proxy can then fail
errdefs checks, such as treating concurrent content deletion as
NotFound.
Convert non-EOF read stream errors with errgrpc.ToNative so ReaderAt
matches the rest of the content proxy while preserving io.EOF.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
go1.26.4 includes security fixes to the crypto/x509, mime, and
net/textproto packages, as well as bug fixes to the compiler, the
runtime, the go fix command, and the crypto/fips140 package
Signed-off-by: Akhil Mohan <akhilerm@gmail.com>
awaitPipeReady retried only when DialPipe returned os.IsNotExist or
context.DeadlineExceeded, but winio.DialPipe converts the per-attempt
deadline into winio.ErrTimeout before returning. A pipe in state 1
(ListenPipe called, Accept not yet called) causes DialPipe to block for
the full per-attempt timeout and return winio.ErrTimeout, which the old
check treated as a fatal error instead of retrying.
Also guard windows.ERROR_PIPE_BUSY explicitly to match the error checks
in containerd/nerdbox#218.
Adds a regression test that forces the state-1 to state-2 transition
race by delaying Accept past the 1-second per-attempt timeout.
Signed-off-by: Esteban Ginez <esteban.ginez@docker.com>
Image config labels are copied onto the container by both the CRI
plugin (BuildLabels) and the client's WithImageConfigLabels option
used by `ctr run`. Labels in the containerd.io/* namespace are
interpreted by containerd itself and labels in the io.cri-containerd*
namespace are interpreted by the CRI plugin. An image config is not a
trusted source for labels in either namespace.
Skip labels in both reserved namespaces when copying labels from an
image config to a container, and warn about each label skipped: an
image that tries to set them may be attempting to alter containerd
behavior. Oversized image labels are already skipped this way by
the CRI plugin.
Labels set explicitly by clients, for example via `ctr run --label`
or in the CRI request, are unaffected.
Verified with the CRI plugin and with `ctr run` against an image
whose config carries labels like these: the labels are no longer
present on the created container and a warning is logged for each.
Assisted-by: Claude Code
Signed-off-by: Ben Cressey <ben@cressey.org>
Signed-off-by: Samuel Karp <samuelkarp@google.com>
GHA runners occasionally experience I/O constraints during root-test
test execution. While concurrent tests rapidly allocate loopback
devices, background udev probing stalls. This quickly exhausts
systemd-udevd's default worker pool ceiling (20 children max), stalling
netlink uevent processing so device-mapper device nodes are never
created for subsequent dm-verity test execution.
Logging cgroups v2 pids.peak telemetry confirmed peak in-flight udev
workers accumulate to 325 during test execution. Raising the
children-max limit to 500 provides comfortable buffer room so udevd
freely spawns worker processes without entering event lockup or causing
test timeouts.
Assisted-by: Antigravity
Signed-off-by: Chris Henzie <chrishenzie@gmail.com>
Filter out any annotations on the checkpointed container matching
`cdi.k8s.io/` or exactly `cdi.k8s.io` during restore to prevent
unauthorized device restoration. When an annotation is denied, a warning
log is generated.
Tested by:
* Unit tests for exact matching, prefix boundaries, and metadata merging
* Complete CRI integration and checkpoint restore suite
Assisted-by: Antigravity
Signed-off-by: Samuel Karp <samuelkarp@google.com>