486 Commits

Author SHA1 Message Date
Maksym Pavlenko
33fb482a85 Merge pull request #13656 from erofs/cri-image-volume
cri: don't leak the new mount if mutateImageMount() fails
2026-06-23 17:43:39 +00:00
Gao Xiang
a88ce40fd1 cri: don't leak the new mount if mutateImageMount() fails
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2026-06-23 17:09:27 +08:00
Samuel Karp
5558f3aa0e Merge pull request #13626 from samuelkarp/june-18-combined-main
Patches
2026-06-18 16:02:14 -07:00
Chris Crone
773d3517dd erofs: align default mkfs block size across platforms
Force a 4K block size on all platforms rather than only on darwin.
An explicit caller-supplied -b is still respected.

Signed-off-by: Chris Crone <christopher.crone@docker.com>
2026-06-18 14:25:26 -04:00
Chris Henzie
a0086cfcee Merge commit from fork 2026-06-15 21:26:29 -07:00
Chris Henzie
432a7af299 Merge commit from fork 2026-06-15 21:25:18 -07:00
Chris Henzie
3977106b53 Merge commit from fork 2026-06-15 21:25:18 -07:00
Chris Henzie
a834385de9 Merge commit from fork 2026-06-15 21:25:17 -07:00
Brian Goff
8196411f24 cri: make checkpoint restore robust to unexpected archive content
The CRI checkpoint restore path unpacked checkpoint archive/OCI image content
directly into the container's persistent state directory and read files such as
container.log back from it with a symlink-following copy. Checkpoint content is
externally provided, so make restore more defensive about what it unpacks and
how it reads those files back.

Behavior changes:

- Only unpack regular files and directories from the checkpoint archive.

- Unpack checkpoint content into a dedicated <state>/ctrd-restore
  subdirectory created fresh rather than into the state dir itself, so
  checkpoint content cannot collide with containerd's own files (e.g.
  the "status" blob). Restore and cleanup operate on that subdir;
  cleanup is now a single RemoveAll of it.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2026-06-15 15:11:36 -07:00
Ben Cressey
0ec1af4cae Do not propagate reserved labels from image configs
Image config labels are copied onto the container by both the CRI
plugin (BuildLabels) and the client's WithImageConfigLabels option
used by `ctr run`. Labels in the containerd.io/* namespace are
interpreted by containerd itself and labels in the io.cri-containerd*
namespace are interpreted by the CRI plugin. An image config is not a
trusted source for labels in either namespace.

Skip labels in both reserved namespaces when copying labels from an
image config to a container, and warn about each label skipped: an
image that tries to set them may be attempting to alter containerd
behavior. Oversized image labels are already skipped this way by
the CRI plugin.

Labels set explicitly by clients, for example via `ctr run --label`
or in the CRI request, are unaffected.

Verified with the CRI plugin and with `ctr run` against an image
whose config carries labels like these: the labels are no longer
present on the created container and a warning is logged for each.

Assisted-by: Claude Code
Signed-off-by: Ben Cressey <ben@cressey.org>
Signed-off-by: Samuel Karp <samuelkarp@google.com>
2026-06-10 13:18:24 -07:00
Samuel Karp
861ffc1097 cri: filter CDI annotations on checkpoint restore
Filter out any annotations on the checkpointed container matching
`cdi.k8s.io/` or exactly `cdi.k8s.io` during restore to prevent
unauthorized device restoration. When an annotation is denied, a warning
log is generated.

Tested by:
* Unit tests for exact matching, prefix boundaries, and metadata merging
* Complete CRI integration and checkpoint restore suite

Assisted-by: Antigravity
Signed-off-by: Samuel Karp <samuelkarp@google.com>
2026-06-09 16:56:45 -07:00
Samuel Karp
ade39c7c93 Merge pull request #13399 from lauralorenz/13355-nri-hook-leak
Add defer in event of mid-function failures in RunPodSandbox to avoid mount leaks
2026-06-09 18:10:57 +00:00
Samuel Karp
0c0918fa8f cri: do not re-tag restored checkpoints
Google-Bug-Id: 508657842
Signed-off-by: Samuel Karp <samuelkarp@google.com>
2026-06-03 10:49:45 -07:00
Maksym Pavlenko
83d9e661cb Merge pull request #13304 from dmcgowan/cri-pull-progress-idle-active-reset
cri: reset pull progress timer on idle→active transition
2026-06-02 20:15:28 +00:00
lauralorenz
2b2b80f558 Add deferred call to ShutdownSandbox to avoid leaks
Between starting the sandbox and adding it to the
sandbox store, there are opportunities for failures
including in any NRI RunPodSandbox prehooks. This defer
is added to that period so if they fail, this function
will try to clean it up itself. If the sandbox is
already added to the persistent store, it will not attempt
to stop the sandbox as it can now be recognized by other
components from the CRI store. ShutdownSandbox is used
instead of StopSandbox as it both stops it and cleans up
all its directories.

Signed-off-by: lauralorenz <lauralorenz@google.com>
2026-05-28 21:50:33 +00:00
Alex Lyn
8f7c7fb447 cri: skip pause image pull for non-podsandbox sandboxers
The RunPodSandbox unconditionally pre-pulls the pause container
image via ensurePauseImageExists() before starting any sandbox.
However, only the "podsandbox" controller actually uses the pause
image to create a pause container holding namespaces. Shim-based
sandbox controllers (e.g. Kata Containers) manage the sandbox
lifecycle entirely at the shim level and never reference the pause
image.

Add a DisablePauseImagePull flag to the Runtime config that gates
ensurePauseImageExists(). When a sandboxer is not "podsandbox", the
flag skips the unnecessary pre-pull, avoiding wasted network/storage
overhead and reducing sandbox startup latency.

The long-term direction is to offload image pulling entirely to the
controller implementation (shim level); this flag is an incremental
step toward that goal without introducing a breaking behavior change.

Also add unit tests to verify that ensurePauseImageExists is only
invoked for the "podsandbox" sandboxer and correctly skipped otherwise.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-05-27 15:53:27 -05:00
Fu Wei
125377b576 Merge pull request #13470 from mxpv/time
Fix flaky e2e test
2026-05-22 23:52:16 +00:00
Maksym Pavlenko
8e0713454f cri: use per-metric timestamp in background stats collector
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2026-05-22 11:25:53 -07:00
Maksym Pavlenko
c16baa7d25 Merge pull request #13240 from coderbirju/fix-cgroup-test
Fix: TestCgroupNamespace failure on cgroups v1 hosts
2026-05-22 18:14:49 +00:00
Maksym Pavlenko
a275642502 Merge pull request #13008 from chelnak/fix-erofs-tmpdir
fix(erofs): set TMPDIR for mkfs.erofs on Windows
2026-05-11 21:00:30 +00:00
Derek McGowan
6c396d050d cri: reset pull progress timer on idle→active transition
The pull progress reporter resets lastSeenTimestamp on every tick where
activeReqs == 0, but never on the transition to a non-zero count. When a
pull is held in content.OpenWriter (idle in HTTP terms) and then
unblocks, the next request can be cancelled less than `timeout` after it
was actually issued — its first byte must arrive within whatever fraction
of `timeout` remains on the timer captured during the previous idle
tick.

Track the previous tick's activeReqs and reset the timer on the 0→1
transition so a newly-issued request always gets a full timeout window
to produce its first byte. This deflakes
TestCRIImagePullTimeout/HoldingContentOpenWriterWithLocalPull, which
hits ghcr.io directly and can exceed the shrunken window during
auth handshakes in CI.

Signed-off-by: Derek McGowan <derek@mcg.dev>
2026-04-28 17:25:12 -07:00
Chris Henzie
5e907d7777 Implement NRI metrics adaptation layer
Add metrics for NRI plugin invocations, latency, adjustments, and active
count. Map NRI Metrics adaptation layer to containerd's Prometheus
metrics system via docker/go-metrics for observability.

Categorize plugin invocation errors into `deadline_exceeded`,
`canceled`, and dynamic gRPC status code dimensions to assist
troubleshooting.

Assisted-by: Antigravity
Signed-off-by: Chris Henzie <chrishenzie@gmail.com>
2026-04-28 16:04:03 -07:00
sreeram-venkitesh
1b1aba4b66 Added stop signal to container termination logic and container status
Signed-off-by: sreeram-venkitesh <sreeramvenkitesh@gmail.com>
Signed-off-by: Wei Fu <fuweid89@gmail.com>
2026-04-27 07:52:09 -05:00
Fu Wei
f9372eccf3 Merge pull request #13269 from Apokleos/erofs-dmverity-label
snapshotter/erofs: pass explicit dm-verity metadata path via mount options
2026-04-25 20:59:12 +00:00
Alex Lyn
9c8111b70d dmverity: enhance MetadataPath() with suffix checking
We need enhance MetadataPath() with checking the layerBlobPath's
suffix to ensure it doesn't end with .dmverity.

And add a unit test asserting that MetadataPath("...dmverity")
returns the path unchanged to lock in the new behavior.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-04-25 18:29:16 +08:00
Derek McGowan
1b71eeeae7 Merge pull request #12648 from jokemanfire/optimize
fix: ticker < sync time will cause cpu high
2026-04-24 19:24:34 +00:00
Davanum Srinivas
c30f23452c cri: use upstream Kubernetes modules
Switch the CRI integration layer from containerd's forked Kubernetes helpers
and clients to the upstream Kubernetes modules, and finalize the dependency
update to Kubernetes v0.36.0.

Replace the remaining internal helper copies with upstream packages:
- internal/cri/clock -> k8s.io/utils/clock
- internal/cri/executil -> upstream CRI exec helpers
- internal/cri/resourcequantity -> k8s.io/apimachinery/pkg/api/resource
- internal/cri/setutils -> k8s.io/apimachinery/pkg/util/sets
- internal/cri/types/labels.go -> internal/cri/labels
- integration/cri-api/pkg/apis/services.go -> k8s.io/cri-api/pkg/apis/services.go

Adopt the upstream CRI clients directly:
- add k8s.io/cri-client v0.36.0, k8s.io/cri-streaming v0.36.0, and
  k8s.io/streaming v0.36.0 as direct dependencies
- promote k8s.io/utils to a direct dependency and pull in
  k8s.io/component-base v0.36.0 indirectly
- keep integration/remote as a thin containerd adapter around cri-client,
  because the integration tests still need the stream-shaped
  GetContainerEvents RPC

Finalize the Kubernetes dependency update from v0.36.0-rc.0 to v0.36.0,
refresh vendor/, and drop the obsolete internal utility copies.

Also fix the protobuf MessageState mutex-copy vet failures exposed by the new
APIs and close the temporary integration CRI clients explicitly.

Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2026-04-23 12:59:58 +02:00
Craig Gumbley
1a6bd7020a fix(erofs): set TMPDIR for mkfs.erofs on Windows
mkfs.erofs uses TMPDIR for its internal diskbuf temp files. Windows
does not set TMPDIR (only TEMP/TMP), so the MinGW binary falls back
to "/tmp" which resolves to C:\tmp. That directory does not exist on
most Windows machines. mkstemp fails, and erofs_diskbuf_init returns
ENOSPC regardless of actual errno, producing a misleading "No space
left on device" error even on disks with plenty of free space.

Set TMPDIR to the snapshot directory (parent of the output layer file)
for all mkfs.erofs invocations on Windows. This directory is managed
by containerd and guaranteed to exist. On Unix, TMPDIR is left to the
parent process (no change in behavior).

Signed-off-by: Craig Chelnak <craig.chelnak@docker.com>
2026-04-22 09:40:24 -07:00
Fu Wei
84ac5de468 Merge pull request #13256 from chrishenzie/fix-volatile-mount-check
Support both styles of volatile mount option
2026-04-21 23:46:48 +00:00
zylxjtu
3f1c5fdd70 cri/windows: propagate AffinityCpus through update and status paths
Three related bugs prevented Windows CPU affinity from round-tripping
through UpdateContainerResources and ContainerStatus:

1. WithWindowsResources silently dropped AffinityCpus, so the kubelet's
   CPU manager reconcile loop never applied affinity changes to running
   containers. Add translation from CRI AffinityCpus to OCI
   WindowsCPUGroupAffinity.

2. copyResourcesToStatus never read the Affinity field from the OCI spec,
   so the stored container status always had AffinityCpus = nil. Add the
   read-back loop.

3. deepCopyOf omitted AffinityCpus when snapshotting Windows resources,
   silently dropping the field on every Status.Get(). Add the deep copy.
Signed-off-by: zylxjtu <zhang.yuanliang@hotmail.com>
2026-04-20 21:30:26 +00:00
Chris Henzie
93f7a62e50 Support both styles of volatile mount option
Kernel 6.12.80+ returns 'fsync=volatile' instead of just 'volatile'
in mount options, which breaks containerd's exact string matching
checks.

Fixes this issue by adding support for 'fsync=volatile' in addition
to the existing 'volatile' check in RemoveVolatileOption and
addVolatileOptionOnImageVolumeMount.

Assisted-by: Antigravity
Signed-off-by: Chris Henzie <chrishenzie@gmail.com>
2026-04-20 11:50:57 -07:00
Arjun Yogidas
970b5d46bc Fix TestCgroupNamespace failure on cgroups v1 hosts
Signed-off-by: Arjun Yogidas <arjunry@amazon.com>
2026-04-16 18:39:32 +00:00
ChengyuZhu6
b320d3c855 ctr: add EROFS image conversion support
Add EROFS conversion support to ctr convert command with configurable
options for tar-index mode and mkfs parameters.

Usage:
  ctr image convert --erofs src:tag dst:tag
  ctr image convert --erofs --erofs-compression='lz4hc,12' src:tag dst:tag

Signed-off-by: ChengyuZhu6 <hudson@cyzhu.com>
2026-04-15 10:28:47 +08:00
Derek McGowan
a755ca16e5 Merge pull request #12865 from dmcgowan/readonly-overlay-erofs-no-mount
Support reading readonly overlays without mounting
2026-04-09 18:37:15 +00:00
Sergey Kanzhelev
1615e07bb8 replace one more k8s.io/apimachinery/ reference
Signed-off-by: Sergey Kanzhelev <S.Kanzhelev@live.com>
2026-04-06 06:16:21 +00:00
Derek McGowan
30951c6f03 Add overlay symlink resolution using ReadLinkFS
Signed-off-by: Derek McGowan <derek@mcg.dev>
2026-04-04 22:37:39 -07:00
Derek McGowan
21d666cfbc Update fsview to allow type registration
Move erofs implementation to plugin and register with fsview.

Signed-off-by: Derek McGowan <derek@mcg.dev>
2026-04-04 22:37:38 -07:00
Derek McGowan
a77c757f15 internal/fsview: update overlay to handle file replacing directory
Signed-off-by: Derek McGowan <derek@mcg.dev>
2026-04-04 22:37:38 -07:00
Derek McGowan
2fe15d7c87 internal/fsview: add support for suffixes in formatted mounts
Signed-off-by: Derek McGowan <derek@mcg.dev>
2026-04-04 22:37:38 -07:00
Derek McGowan
04b7b495f9 internal/fsview: add fsview package for reading snapshot mounts
Allows reading snapshot mounts without performing mounts. This is
valuable when the host cannot perform the mounts due to platform or
permissions.

Signed-off-by: Derek McGowan <derek@mcg.dev>
2026-04-04 22:37:37 -07:00
Sergey Kanzhelev
05d3b31586 pause image 3.10.1 -> 3.10.2 for add Windows Server 2025 (ltsc2025) support
Signed-off-by: Sergey Kanzhelev <S.Kanzhelev@live.com>
2026-04-03 16:17:39 +00:00
Sergey Kanzhelev
1fc92e63dd switch from internal/cri/streamingserver to k8s.io/cri-streaming
Signed-off-by: Sergey Kanzhelev <S.Kanzhelev@live.com>
2026-04-01 22:38:45 +00:00
Sergey Kanzhelev
1b67e78540 switch from k8s.io/apimachinery/pkg/util/httpstream to k8s.io/streaming/pkg/httpstream
Signed-off-by: Sergey Kanzhelev <S.Kanzhelev@live.com>
2026-04-01 22:38:45 +00:00
Aadhar Agarwal
50f5461fb7 Add dmverity support to the erofs snapshotter using veritysetup-go
Signed-off-by: Aadhar Agarwal <aadagarwal@microsoft.com>
2026-03-31 20:21:39 +00:00
Davanum Srinivas
6ebe1ce6ab Merge pull request #13138 from dims/fix-usage-nanocores-window
cri: mirror cadvisor UsageNanoCores semantics
2026-03-30 20:52:12 +00:00
Samuel Karp
b7a467e4f3 Merge pull request #12175 from smira/fix/hide-go-cmp
fix: hide `go-cmp` library from the non-test code path
2026-03-30 20:21:10 +00:00
HirazawaUi
7d7c56357a add unit tests
Signed-off-by: HirazawaUi <695097494plus@gmail.com>
2026-03-30 09:01:49 -05:00
HirazawaUi
93cf5418b9 Allow user namespace with hostNetwork in container
Signed-off-by: HirazawaUi <695097494plus@gmail.com>
2026-03-30 09:01:49 -05:00
Davanum Srinivas
66a1d3a607 cri: mirror cadvisor UsageNanoCores semantics
Mirror cAdvisor's instantaneous CPU rate behavior for CRI stats.

Compute UsageNanoCores from the latest two samples only, and leave the field unset when there is not yet enough data to calculate an instantaneous rate. This avoids publishing an authoritative zero before a valid rate exists while keeping containerd aligned with cAdvisor semantics.

Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2026-03-28 14:31:20 -04:00
jokemanfire
c6fe5456f7 fix: ticker < sync time will cause cpu high
If the ticket time is shorter than the sync time, it will cause the CPU to surge. Use adaptive time and sleep to ensure that the CPU is released.

Signed-off-by: jokemanfire <hu.dingyang@zte.com.cn>
2026-03-25 09:29:51 +08:00