Read short-circuited on `if dpc.c == nil` before calling
`dpc.wg.Wait()` which races with the dialer goroutine spawned in
openShimLog. The dialer assigns `dpc.c = c` (and may set `dpc.conerr`)
outside any lock; the only synchronization is the WaitGroup, and Read
skipped it on the fast path.
Signed-off-by: Austin Vazquez <austin.vazquez@docker.com>
Switch the CRI integration layer from containerd's forked Kubernetes helpers
and clients to the upstream Kubernetes modules, and finalize the dependency
update to Kubernetes v0.36.0.
Replace the remaining internal helper copies with upstream packages:
- internal/cri/clock -> k8s.io/utils/clock
- internal/cri/executil -> upstream CRI exec helpers
- internal/cri/resourcequantity -> k8s.io/apimachinery/pkg/api/resource
- internal/cri/setutils -> k8s.io/apimachinery/pkg/util/sets
- internal/cri/types/labels.go -> internal/cri/labels
- integration/cri-api/pkg/apis/services.go -> k8s.io/cri-api/pkg/apis/services.go
Adopt the upstream CRI clients directly:
- add k8s.io/cri-client v0.36.0, k8s.io/cri-streaming v0.36.0, and
k8s.io/streaming v0.36.0 as direct dependencies
- promote k8s.io/utils to a direct dependency and pull in
k8s.io/component-base v0.36.0 indirectly
- keep integration/remote as a thin containerd adapter around cri-client,
because the integration tests still need the stream-shaped
GetContainerEvents RPC
Finalize the Kubernetes dependency update from v0.36.0-rc.0 to v0.36.0,
refresh vendor/, and drop the obsolete internal utility copies.
Also fix the protobuf MessageState mutex-copy vet failures exposed by the new
APIs and close the temporary integration CRI clients explicitly.
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
Correctly handle cases where the mount activation still exists:
- If activation is fully activate, then just return already exists and
allow the caller to return error or call Info to continue.
- If activation is stale or incomplete due to crash during activation,
overwrite the identifier and cleanup the incomplete activation during
activate.
Signed-off-by: Derek McGowan <derek@mcg.dev>
Allow the socket directory to be directly configured by the shim manager
with reasonable defaults when not set. The default for root users will
still be the same directory under the default state directory. For
non-root users a temp directory will be used as default if the state
directory is not owned by the user.
Signed-off-by: Derek McGowan <derek@mcg.dev>
Send the socket directory from containerd to the shim. The shim still
decides where the socket goes but can use the environment variable
passed from containerd to ensure the socket is placed in the configured
directory with proper permission.
This is needed for some rootless cases which do not have permission to
the default state directory as currently set. The directory being
hardcoded by the shim means it is currently not possible to change the
location the shim will listen at.
Signed-off-by: Derek McGowan <derek@mcg.dev>
- Use time.NewTimer + Stop() instead of time.After to avoid timer leaks
- Treat context.DeadlineExceeded as retryable (pipe busy, not just missing)
- Wrap last dial error instead of os.ErrNotExist for better diagnostics
- Update makeConnection godoc to reflect current BootstrapResult type
Signed-off-by: Esteban Ginez <esteban.ginez@docker.com>
The shim "start" helper returns the named pipe address before the
daemon process has created the pipe via winio.ListenPipe(). On busy
Windows systems, containerd may try to connect before the pipe exists.
Add awaitPipeReady() — the start helper now polls the pipe address
(up to 5s, 10ms intervals) before writing the bootstrap result to
stdout. This follows hcsshim's readiness pattern where the shim
verifies its endpoint is ready before signaling the parent.
As a safety net, also parameterize makeConnection() with a dialer so
binary.Start() uses AnonDialer (retry) for new shims while loadShim()
keeps AnonReconnectDialer (fail-fast) for reconnects per #3659.
On Unix, awaitPipeReady() is a no-op: domain sockets appear atomically.
Signed-off-by: Esteban Ginez <esteban.ginez@docker.com>
When running hostNetwork containers with the shim sandboxer, the spec
passed to NewBundle is never initialized to a typed nil and is instead a
bare golang nil. Fix this by guarding the dereference to avoid crashing
when such a container is created.
Signed-off-by: Fletcher Woodruff <fwood@amazon.com>
When a shim becomes unresponsive (e.g., stopped via SIGSTOP), ttrpc
communication times out with `context deadline exceeded`.
Currently, this error is not properly propagated, causing redundant API
calls and slow container listing by client sides.
Specifically, when executing the API to check the task state, it appears
that the `context deadline exceeded` error via ttrpc is not being handled
within `shimTask.State()` and `getProcessState()`.
As a result, when this error occurs, clients such as nerdctl cannot
recognize this error, and it is thought that the issue described below is
occurring:
- https://github.com/containerd/nerdctl/issues/4720
Therefore, this commit adds error handling to ensure timeouts are properly
handled by client sides.
Signed-off-by: Hayato Kiwata <dev@haytok.jp>
The otelgrpc.UnaryClientInterceptor and otelgrpc.StreamClientInterceptor
options were deprecated and removed in favor of NewClientHandler.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
- Create /var/lib/containerd with 0o700 (was: 0o711).
- Create config.TempDir with 0o700 (was: 0o711).
- Create /run/containerd/io.containerd.grpc.v1.cri with 0o700 (was: 0o755).
- Create /run/containerd/io.containerd.sandbox.controller.v1.shim with 0o700 (was: 0o711).
- Leave /run/containerd and /run/containerd/io.containerd.runtime.v2.task created with 0o711,
as required by userns-remapped containers.
/run/containerd/io.containerd.runtime.v2.task/<NS>/<ID> is created with:
- 0o700 for non-userns-remapped containers
- 0o710 for userns-remapped containers with the remapped root group as the owner group.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Allow task manager to fetch info on runtimes at startup.
Use this info to configure whether the runtime allows formatted mounts.
This info could also be used in the future to enforce policy such as
requiring a pre-known set of runtimes or specific runtime properties.
Signed-off-by: Derek McGowan <derek@mcg.dev>
While testing some things in moby, which involved removing storage paths,
I noticed some errors being printed for non-existing paths. As this function
is meant to delete the directories, it should probably be fine to consider it
idempotent (as further processing in the function also does).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Previously, to address issue #11708, PR #11793 changed containerd to always
invoke the shim binary to establish shim connections, rather than reusing the
sandbox shim. However, this change did not ensure that the Shutdown API was
called to stop the shim process.
Starting with containerd v2.0.0, the Shutdown API is only invoked for sandbox
containers (when container.SandboxID is empty). This approach works for
groupable shims, where multiple containers share a single socket address and
only require a single Shutdown call. However, for non-groupable shims, each
container requires its own Shutdown call during cleanup to avoid leaking shim
processes.
Additionally, PR #11793 introduced a corner case during upgrades:
- T1: An old container-shim-runc-v2 (<=v1.7.X) is running for pod A.
- T2: containerd is upgraded to v2.X.Y.
- T3: A new container A-C1 is created in pod A using the new shim-runc-v2 binary.
- T4: bootstrap.json indicates version:3 protocol, but it is downgraded to version:2 in memory.
- T5: containerd is restarted.
- T6: containerd fails to connect to A-C1.
- T7: The A-C1 container is left in EXITED status in the CRI plugin.
To address this, ensure that loadShimTask downgrades to version:2 if necessary,
and always invoke the Shutdown API for each non-groupable shim during cleanup to
prevent resource leaks and handle upgrade scenarios correctly.
(Introduced by #11793)
Signed-off-by: Wei Fu <fuweid89@gmail.com>
the `cleanup.Background` utility was introduced in f606c4eba7,
at which time the project used go1.19, and Go's stdlib context did not yet
have [`context.WithoutCancel`], which was introduced in go1.21.
This patch replaces `cleanup.Background` for `context.WithoutCancel`, which
is near-identical, and part of go stdlib;
`cleanup.Background`:
type clearCancel struct {
context.Context
}
func (cc clearCancel) Deadline() (deadline time.Time, ok bool) {
return
}
func (cc clearCancel) Done() <-chan struct{} {
return nil
}
func (cc clearCancel) Err() error {
return nil
}
// Background creates a new context which clears out the parent errors
func Background(ctx context.Context) context.Context {
return clearCancel{ctx}
}
`context.WithoutCancel`:
// WithoutCancel returns a derived context that points to the parent context
// and is not canceled when parent is canceled.
// The returned context returns no Deadline or Err, and its Done channel is nil.
// Calling [Cause] on the returned context returns nil.
func WithoutCancel(parent Context) Context {
if parent == nil {
panic("cannot create context from nil parent")
}
return withoutCancelCtx{parent}
}
type withoutCancelCtx struct {
c Context
}
func (withoutCancelCtx) Deadline() (deadline time.Time, ok bool) {
return
}
func (withoutCancelCtx) Done() <-chan struct{} {
return nil
}
func (withoutCancelCtx) Err() error {
return nil
}
func (c withoutCancelCtx) Value(key any) any {
return value(c, key)
}
func (c withoutCancelCtx) String() string {
return contextName(c.c) + ".WithoutCancel"
}
[`context.WithoutCancel`]: https://pkg.go.dev/context@go1.21.0#WithoutCancel
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
I noticed that the "cleaning up after shim disconnected" was logged as a
warning, but I couldn't find a reason why it was considered a warning (
and didn't see an error attached to the log);
INFO[2025-05-07T08:57:56.576773500Z] ignoring event container=589c33c88591a9936cb8fd64453e4b0c1357ae2f79d975c34645ff8f0aa10485 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
INFO[2025-05-07T08:57:56.576938958Z] shim disconnected id=589c33c88591a9936cb8fd64453e4b0c1357ae2f79d975c34645ff8f0aa10485 namespace=moby
WARN[2025-05-07T08:57:56.577356208Z] cleaning up after shim disconnected id=589c33c88591a9936cb8fd64453e4b0c1357ae2f79d975c34645ff8f0aa10485 namespace=moby
INFO[2025-05-07T08:57:56.577393417Z] cleaning up dead shim namespace=moby
I think this must've been set as a warning by acccident, so changing
this log to an "info" log as it seems part of the normal flow or operations.
While updating, also updating related logs, such as the "cleaning up dead shim"
log, to include the id, so that these logs can more easily be correlated.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>