514 Commits

Author SHA1 Message Date
Maksym Pavlenko
49480db376 Merge pull request #13634 from dmcgowan/gc-forward-references
core/metadata: add forward References to the GC collection context
2026-06-22 23:20:07 +00:00
Fu Wei
3fcad510c2 Merge pull request #13588 from austinvazquez/fix-flaky-images-create-update-delete-test
test: fix flaky image timestamp check on coarse clocks
2026-06-22 14:07:11 +00:00
Derek McGowan
4be39f13f4 core/metadata: add forward References to the GC collection context
Extend the garbage-collection framework so a collectible resource can emit
forward references during graph traversal, in addition to the existing
back-reference mechanism.

A CollectionContext may now implement the optional collectionWithReferences
interface:

	References(ctx context.Context, node gc.Node, fn func(gc.Node))

When the GC visits a node whose resource type was registered by an external
collector, gcContext.references consults the per-type References
implementation after the built-in core resource types are handled.

This is the forward-reference analogue of collectionWithBackRefs.  Whereas
ActiveWithBackRefs must enumerate every edge up front and the gcContext
holds all of them in its backRefs map for the entire collection, References
is invoked on demand for a single node.  A collector whose resources fan
out to many other nodes can therefore emit those edges without retaining
them in memory for the gc context.

This commit is intentionally a no-op: no plugin registers a collector that
uses collectionWithReferences yet.  It is isolated here so that concurrent
development efforts that depend on this interface can be proposed and
reviewed upstream independently.

Signed-off-by: Derek McGowan <derek@mcg.dev>
2026-06-19 12:49:30 -07:00
Derek McGowan
e96fd14b81 Merge pull request #13585 from vvoland/content-proxy-convert-grpc-errors
core/content/proxy: Convert reader errors to native errdefs
2026-06-19 00:01:53 +00:00
Akihiro Suda
a54728641e Merge pull request #13586 from vvoland/streaming-grpc-errors
core/proxy: Convert stream proxy errors to native errdefs
2026-06-15 18:47:19 +00:00
Akihiro Suda
06c38dcad5 Merge pull request #13323 from dmcgowan/resolver-transient-errors
resolver: retry on transient network errors
2026-06-13 18:15:13 +00:00
Derek McGowan
20af2e324a resolver: retry on transient network errors
Allow the last host to retry on transient network errors to incrase the
likelihood of the operation succeeding and help reduce flaky tests.

Signed-off-by: Derek McGowan <derek@mcg.dev>
2026-06-12 16:36:08 -07:00
Austin Vazquez
e5e2190886 test: fix flaky image timestamp check on coarse clocks
TestImagesCreateUpdateDelete asserts that an image's updatedat is
strictly after its createdat. Both timestamps are stamped via
time.Now().UTC(), which strips the monotonic reading, so the comparison
falls back to the wall clock. On platforms with coarse timer resolution
(e.g. Windows, which advances system time at the ~15.6ms tick), the
Create and Update calls can land in the same tick and produce identical
timestamps, making the strict After() check fail intermittently.

Wait for the wall clock to advance past the creation timestamp before
updating so the assertion stays meaningful without depending on clock
resolution. On fine-resolution clocks the loop runs zero iterations.

Signed-off-by: Austin Vazquez <austin.vazquez@docker.com>
2026-06-12 14:36:13 -05:00
Paweł Gronowski
d3c143e8b4 core/proxy: Convert stream proxy errors to native errdefs
Some proxy stream setup and receive paths still returned raw RPC
status errors while neighboring proxy methods normalized them with
errgrpc.ToNative. This made errdefs checks depend on which proxy API
surfaced the same remote failure.

Normalize event subscription setup and receive errors, and streaming
stream creation errors, while preserving io.EOF for completed receive
streams.

Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2026-06-12 14:07:40 +02:00
Paweł Gronowski
d58c2c1aa4 core/content/proxy: Convert reader errors to native errdefs
Most content proxy operations normalize remote RPC errors before
returning them, including stream receive errors from Walk and write
errors from the remote writer. remoteReaderAt.ReadAt was an outlier and
returned raw status errors from Read and Recv.

Callers that use content.ReadBlob through the proxy can then fail
errdefs checks, such as treating concurrent content deletion as
NotFound.

Convert non-EOF read stream errors with errgrpc.ToNative so ReaderAt
matches the rest of the content proxy while preserving io.EOF.

Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2026-06-12 13:10:18 +02:00
Maksym Pavlenko
b4ab8c0537 Merge pull request #13520 from dmcgowan/add-snapshot-max-size-label
Add max size label for snapshots
2026-06-03 22:25:59 +00:00
Derek McGowan
f2b7791b23 Add max size label for snapshots
Signed-off-by: Derek McGowan <derek@mcg.dev>
2026-06-02 15:23:27 -07:00
Samuel Karp
a989093a9c remotes: close fetch reader immediately on EOF
The CRI progress reporter cancels an image pull if it sees no progress
for 5 seconds. It tracks this through active HTTP requests. During
remote fetches, the HTTP response reader is closed via a deferred
call after `content.Copy` completes.

Diagnosis:
`content.Copy` handles both downloading the stream and committing
the writer to the content store. Any delays during the database
commit phase (e.g. from database locks, slow disk syncs, or concurrent
pull deduplication blocks) keep the HTTP connection open. The progress
reporter sees the request is still active (`activeReqs = 1`) but no new
bytes are coming in, leading to a premature timeout cancellation.

Reproduction:
We reproduced this flakiness deterministically on a GCE VM under a
simulated 2 Mbps ingress bandwidth limit using Linux traffic control
ingress policing (`tc filter ... action police rate 2mbit`). Under this
slowness, the download took longer than the progress timeout during the
slow commit phase, triggering context cancellation and failing the
`TestCRIImagePullTimeout/HoldingContentOpenWriterWithLocalPull` test.

Solution:
To fix this, we wrap the HTTP reader in a `closeOnEOFReader` or
`closeOnEOFReadSeeker` before handing it to `content.Copy`. If the
underlying connection reader implements `io.Seeker`, it is dynamically
wrapped in `closeOnEOFReadSeeker` to forward `Seek` operations. This
ensures that O(1) Range seeks are fully preserved during network
resumes or retries. The wrappers automatically close the underlying
network stream as soon as `Read()` returns `io.EOF` (when the download
completes, before the database commit begins). This drops `activeReqs`
to `0` early, freeing the socket and preventing progress timeouts
during commits. A `sync.Once` ensures that subsequent deferred
`Close()` calls do not double-decrement the reporter.

How it was tested:
Verified the fix on a GCE VM under a simulated 2 Mbps ingress
bandwidth limit. Verified seeker safety via standalone logic audits
and trace proofs.

Assisted-by: Antigravity
Signed-off-by: Samuel Karp <samuelkarp@google.com>
2026-06-02 14:53:33 -07:00
Austin Vazquez
88af11e081 core/runtime/v2: fix race on Windows deferredPipeConnection.c in Read
Read short-circuited on `if dpc.c == nil` before calling
`dpc.wg.Wait()` which races with the dialer goroutine spawned in
openShimLog. The dialer assigns `dpc.c = c` (and may set `dpc.conerr`)
outside any lock; the only synchronization is the WaitGroup, and Read
skipped it on the fast path.

Signed-off-by: Austin Vazquez <austin.vazquez@docker.com>
2026-05-21 17:15:58 -05:00
Austin Vazquez
3bc019ea3d fix: close boltdb on metadata and mount plugin close
Co-authored-by: Rob Murray <rob.murray@docker.com>
Signed-off-by: Austin Vazquez <austin.vazquez@docker.com>
2026-05-05 17:20:26 -05:00
Maksym Pavlenko
3e0ebf0f6d Deprecate shim.Command
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2026-04-29 10:51:55 -07:00
Samuel Karp
bc69a52680 Merge pull request #13167 from lauralorenz/10681-ctr-image-export-oci-ref-name
#10681 by-digest `ctr image export` of `org.opencontainers.image.ref.name`
2026-04-28 22:29:59 +00:00
Samuel Karp
f7150e2215 Merge pull request #13126 from dmcgowan/handle-mount-already-exists
Fix mount manager activation error when already exists
2026-04-24 16:16:40 +00:00
Fu Wei
1aef5484c5 Merge pull request #12667 from dmcgowan/transfer-extrarefs-gc
Update transfer service to support automatically garbage collecting extra references
2026-04-24 16:11:21 +00:00
Derek McGowan
a3f3103285 Add GC log when image is removed via GC
Signed-off-by: Derek McGowan <derek@mcg.dev>
2026-04-23 11:20:15 -07:00
Derek McGowan
ec140ec1da Add GC labels to images created as extra references
Setting the GC labels ensures that extra references may get garbage
collected when the original image using them is removed.

Signed-off-by: Derek McGowan <derek@mcg.dev>
2026-04-23 11:20:15 -07:00
Davanum Srinivas
c30f23452c cri: use upstream Kubernetes modules
Switch the CRI integration layer from containerd's forked Kubernetes helpers
and clients to the upstream Kubernetes modules, and finalize the dependency
update to Kubernetes v0.36.0.

Replace the remaining internal helper copies with upstream packages:
- internal/cri/clock -> k8s.io/utils/clock
- internal/cri/executil -> upstream CRI exec helpers
- internal/cri/resourcequantity -> k8s.io/apimachinery/pkg/api/resource
- internal/cri/setutils -> k8s.io/apimachinery/pkg/util/sets
- internal/cri/types/labels.go -> internal/cri/labels
- integration/cri-api/pkg/apis/services.go -> k8s.io/cri-api/pkg/apis/services.go

Adopt the upstream CRI clients directly:
- add k8s.io/cri-client v0.36.0, k8s.io/cri-streaming v0.36.0, and
  k8s.io/streaming v0.36.0 as direct dependencies
- promote k8s.io/utils to a direct dependency and pull in
  k8s.io/component-base v0.36.0 indirectly
- keep integration/remote as a thin containerd adapter around cri-client,
  because the integration tests still need the stream-shaped
  GetContainerEvents RPC

Finalize the Kubernetes dependency update from v0.36.0-rc.0 to v0.36.0,
refresh vendor/, and drop the obsolete internal utility copies.

Also fix the protobuf MessageState mutex-copy vet failures exposed by the new
APIs and close the temporary integration CRI clients explicitly.

Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2026-04-23 12:59:58 +02:00
Fu Wei
84ac5de468 Merge pull request #13256 from chrishenzie/fix-volatile-mount-check
Support both styles of volatile mount option
2026-04-21 23:46:48 +00:00
Derek McGowan
eb62cdc169 Fix transfer server not setting prefix extra references
Extrareferences may have the prefix flag with the digest added.
Currently they are not being processed. The option today which sets the
digest ref will set both prefix and add digest flags.

Signed-off-by: Derek McGowan <derek@mcg.dev>
2026-04-20 21:49:47 -07:00
Derek McGowan
80ec03fe7e core/mount: Fix mount manager activation error when already exists
Correctly handle cases where the mount activation still exists:
- If activation is fully activate, then just return already exists and
  allow the caller to return error or call Info to continue.
- If activation is stale or incomplete due to crash during activation,
  overwrite the identifier and cleanup the incomplete activation during
  activate.

Signed-off-by: Derek McGowan <derek@mcg.dev>
2026-04-20 18:04:16 -07:00
Chris Henzie
93f7a62e50 Support both styles of volatile mount option
Kernel 6.12.80+ returns 'fsync=volatile' instead of just 'volatile'
in mount options, which breaks containerd's exact string matching
checks.

Fixes this issue by adding support for 'fsync=volatile' in addition
to the existing 'volatile' check in RemoveVolatileOption and
addVolatileOptionOnImageVolumeMount.

Assisted-by: Antigravity
Signed-off-by: Chris Henzie <chrishenzie@gmail.com>
2026-04-20 11:50:57 -07:00
Mike Brown
5fa03e6bbb Merge pull request #13164 from Mujib-Ahasan/add-ResponseHeaderTimeout
Add: ResponseHeaderTimeout to image pull HTTP transport
2026-04-20 13:20:04 +00:00
Samuel Karp
c6cf634443 Merge pull request #12262 from doddi/fix-check-status-code-on-fetch
Add check for status code for GET requests
2026-04-15 23:36:52 +00:00
Laura Lorenz
75d32fda0f Check for digest only when setting org.opencontainers.image.ref.name
On export, if the image is by-digest without any tag,
set the org.opencontainers.image.ref.name as the full name.
This prevents setting this field with a leading non-alphanum,
which is incorrect OCI grammar. Fixes #10681.

Signed-off-by: Laura Lorenz <lauralorenz@google.com>
2026-04-15 22:15:58 +00:00
Fu Wei
0fd46bef4e Merge pull request #12398 from dmcgowan/gc-conditional-references
Add support for conditional gc references in metadata
2026-04-15 17:48:59 +00:00
Akihiro Suda
341401c1d5 Merge pull request #12785 from dmcgowan/pass-socket-address
Make shim socket directory use configured directory
2026-04-15 10:24:25 +00:00
Derek McGowan
e07a1aa491 Add configuration for socket directory to the shim manager
Allow the socket directory to be directly configured by the shim manager
with reasonable defaults when not set. The default for root users will
still be the same directory under the default state directory. For
non-root users a temp directory will be used as default if the state
directory is not owned by the user.

Signed-off-by: Derek McGowan <derek@mcg.dev>
2026-04-15 00:21:29 -07:00
Derek McGowan
d806373feb Make shim socket directory use configured state
Send the socket directory from containerd to the shim. The shim still
decides where the socket goes but can use the environment variable
passed from containerd to ensure the socket is placed in the configured
directory with proper permission.

This is needed for some rootless cases which do not have permission to
the default state directory as currently set. The directory being
hardcoded by the shim means it is currently not possible to change the
location the shim will listen at.

Signed-off-by: Derek McGowan <derek@mcg.dev>
2026-04-15 00:21:18 -07:00
ChengyuZhu6
64a2e62b52 erofs: wire os.features into conversion and selection
Mark converted EROFS manifests with the erofs OS feature and cover
feature-aware manifest selection and unpack routing for erofs images.

Signed-off-by: ChengyuZhu6 <hudson@cyzhu.com>
2026-04-15 10:28:54 +08:00
ChengyuZhu6
b320d3c855 ctr: add EROFS image conversion support
Add EROFS conversion support to ctr convert command with configurable
options for tar-index mode and mkfs parameters.

Usage:
  ctr image convert --erofs src:tag dst:tag
  ctr image convert --erofs --erofs-compression='lz4hc,12' src:tag dst:tag

Signed-off-by: ChengyuZhu6 <hudson@cyzhu.com>
2026-04-15 10:28:47 +08:00
Derek McGowan
046421ab78 Breakout arguments to sendLabelRefs in gc
Make the code clearer to follow and understand the arguments.

Signed-off-by: Derek McGowan <derek@mcg.dev>
2026-04-14 17:13:22 -07:00
Derek McGowan
bd02dc1d7b Add support for conditional gc references in metadata
Conditional gc references allows establishing a conditional reference,
which can be used for expiration of specific connections without needing
to updated multiple objects.

For example, content can hold a temporary relationship to a snapshot
that can expire if the snapshot is unused after a specific time. This
allows the just updating the snapshot label when it is used without
needing to update other objects or create an expiring lease to hold the
connection.

Signed-off-by: Derek McGowan <derek@mcg.dev>
2026-04-14 17:13:21 -07:00
Derek McGowan
83044a43a1 Merge pull request #13128 from thaJeztah/windows_system_pool
core/remotes/docker: use SystemCertPool on Windows
2026-04-12 16:58:22 +00:00
Esteban Ginez
01e5fa616f fix: address review feedback on awaitPipeReady
- Use time.NewTimer + Stop() instead of time.After to avoid timer leaks
- Treat context.DeadlineExceeded as retryable (pipe busy, not just missing)
- Wrap last dial error instead of os.ErrNotExist for better diagnostics
- Update makeConnection godoc to reflect current BootstrapResult type

Signed-off-by: Esteban Ginez <esteban.ginez@docker.com>
2026-04-09 15:15:32 -07:00
Esteban Ginez
1e98ebaf0e fix(windows): verify pipe readiness before returning shim address
The shim "start" helper returns the named pipe address before the
daemon process has created the pipe via winio.ListenPipe(). On busy
Windows systems, containerd may try to connect before the pipe exists.

Add awaitPipeReady() — the start helper now polls the pipe address
(up to 5s, 10ms intervals) before writing the bootstrap result to
stdout. This follows hcsshim's readiness pattern where the shim
verifies its endpoint is ready before signaling the parent.

As a safety net, also parameterize makeConnection() with a dialer so
binary.Start() uses AnonDialer (retry) for new shims while loadShim()
keeps AnonReconnectDialer (fail-fast) for reconnects per #3659.

On Unix, awaitPipeReady() is a no-op: domain sockets appear atomically.

Signed-off-by: Esteban Ginez <esteban.ginez@docker.com>
2026-04-09 15:07:06 -07:00
Derek McGowan
0b164554be Merge pull request #13186 from erofs/walking_differ
diff/walking: enable mount manager
2026-04-09 15:35:08 +00:00
Gao Xiang
47cfd1138b diff/walking: enable mount manager
The default walking applier performs a real temporary mount for
unpacking, but the mount manager failed to adapt to the walking
differ.

This fixes the EROFS snapshotter together with the default walking
differ, otherwise it reports:

```
ctr: apply layer error for "[]": failed to extract layer sha256:[]:
failed to mount /var/lib/containerd/tmpmounts/containerd-mount3992073457:
internal mount option "X-containerd.mkfs.fs=ext4" was not consumed by
the mount manager
```

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2026-04-09 14:17:35 +08:00
Maksym Pavlenko
3c0e8a55b6 Update comments wording about when to deprecate and remove the old path
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2026-04-08 08:29:28 -05:00
Maksym Pavlenko
9dc864fd0f Switch to proto instead of json
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2026-04-08 08:29:28 -05:00
Derek McGowan
243cab594e Deprecate old pkg/shim interfaces
Signed-off-by: Derek McGowan <derek@mcg.dev>
2026-04-08 08:29:28 -05:00
Maksym Pavlenko
d957b1bf53 Use log level instead of debug flag
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2026-04-08 08:29:28 -05:00
Maksym Pavlenko
fa02acee20 Generate shim CLI flags under Command
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2026-04-08 08:29:28 -05:00
Maksym Pavlenko
fc8062f379 Rename CommandConfig field to better reflect their purpose
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2026-04-08 08:29:28 -05:00
Maksym Pavlenko
7f39b2d933 Update shim to support new bootstrap api
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2026-04-08 08:29:28 -05:00
Derek McGowan
17cdec2981 Merge pull request #12206 from wjordan/push-namespace
Add registry host namespace query parameter to mirror push requests
2026-04-07 22:21:22 +00:00