Keep exec network modes limited to sandbox, host, and none, and pass proxy
network configuration separately through solve and executor runtime state.
Proxy execs now use bridge-style egress by default, host egress only for host
network mode with entitlement, and no proxy for none mode. Add integration
coverage for bridge, host, and none proxy behavior across OCI and containerd
workers.
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Forward non-tty stdin through an os.Pipe so runc receives an *os.File
instead of the caller's reader. This lets runc exit after the container
process is killed without waiting on Go's internal stdin copy.
Add gateway coverage for graceful pid1 exit, release-based cleanup, and
explicit SIGKILL while pid1 stdin is still open.
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Record successful GET responses through the exec proxy as provenance
materials and report incomplete material coverage as a typed solve error.
Thread proxy policy and capture state through typed executor/network options.
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Add a build request option that rewrites default exec networking to an
internal proxy network while preserving explicit none networking.
Route HTTP and HTTPS traffic through a BuildKit-owned proxy namespace, enforce
source policy checks for proxied requests, and inject a temporary CA into Linux
rootfs trust bundles for HTTPS interception.
Share namespace pooling between CNI and proxy providers, and cover proxy mode
with unit and integration tests.
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Change how the runc executor kills runc processes by removing the
previous warning message that occurred every 50 milliseconds with a bit
more precision in how it sends the warning.
The previous version could potentially successfully kill the runc
process and then the runc process could take some time to exit. It would
spam the logs every 50 milliseconds until the process exited and would
attempt to rekill a container that was already marked as killed.
This change makes it so we detect a successful kill. If we detect a
successful kill, we then wait for the process while writing a warning to
the log that the process is taking a long time to end. We print one
message 50 milliseconds after the kill and then an additional one with
the exact time it took to exit after the exit succeeds.
If the kill is not successful, we stay in the same loop as previously
existed.
Signed-off-by: Jonathan A. Sternberg <jonathan.sternberg@docker.com>
Update golangci-lint and adjust code for new gosec diagnostics. Use
root-scoped filesystem operations where appropriate, preserve explicit
user path behavior for SSH keys, and avoid background contexts in
request-scoped cleanup paths.
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Use os.OpenRoot for resolv.conf and hosts state file creation, and
adapt executor callers and tests to the root-relative helper API.
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Add executor.ValidContainerID and enforce it in runc/containerd Run paths.
Only runc executor used the ID in filesystem operations.
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
(cherry picked from commit 789df2422341960b7549d14ea475add43e73cd74)
Convert usages of `github.com/docker/docker/pkg/idtools` to
`github.com/moby/sys/user` in order to break the dependency between
buildkit and docker.
Signed-off-by: Jonathan A. Sternberg <jonathan.sternberg@docker.com>
In this case the current stack trace points to the line
where the context was created. Instead the stack should be
captured when the defer is running so the return path to
the defer call is also part of the stack.
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
If container exits with error and has invoked OOMKiller
mark the origin error as ENOMEM so that it can be detected
on the client side.
gRPC will set ENOMEM as codes.ResouceExhausted based on #5182
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
If context is canceled before the process is ready
then kill goroutine returns early because there is nothing
to kill. But the process may still start after this and
that case remain running without cancellation. Fix is to skip
cancellation only if the run goroutine is ended, as then the
process will not be started.
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Before this, the runc executor did not close the cgroupRecord when the
container exited non-zero, which resulted in goroutines leaking.
Signed-off-by: Erik Sipsma <erik@sipsma.dev>
It's possible for the Status field of runc.ExitError to be set to -1, in
which case conversion to uint32 results in the error message to say that
the container exited with code 4294967295 (2^32-1).
Signed-off-by: Erik Sipsma <erik@sipsma.dev>
This fixes the incorrect kill handling introduced in
b76f8c0248. We need to send the
SIGKILL to the in-container process, not the runc process. This patch
adds an abstraction over the kill handling:
* for `runc run` processes use `runc kill`
* for `runc exec` processes, read pid (in host PID namespace) from
pidfile created by `runc exec`, then send the signal directly to
that process.
Also use the kill abstraction when we receive a SIGKILL over the
signal channel for containers created by gateway NewContainer
Signed-off-by: coryb <cbennett@netflix.com>
This patch makes the process handling consistent between runc.Run and
runc.Exec usage. Previously runc.Run would use context.Background
for the runc.Run process and would monitor the request context for
shutdown requests, sending a SIGKILL to the container pid1 process. This
allowed runc.Run to gracefully shutdown and reap child processes. This
logic was not used for runc.Exec where instead we were passing in the
request context to runc.Exec, and if that request context was cancelled
the runc process would immediately terminate preventing runc from reaping
the child process. In this scenario the extra pid will remain forever
and then when the pid1 process will get wedged in zap_pid_ns_processes
syscall upon shutdown waiting fo the zombie pid to exit.
With this fix both runc.Run and runc.Exec will use context.Background
for runc processes and monitor the request context for shutdown request
triggering a SIGKILL to the pid being monitored by runc.
Signed-off-by: coryb <cbennett@netflix.com>
This allows a frontend to request a specific for stubs removal.
By default, if not specified, this will revert to the previous
behaviour. New gateway clients however will set the property to the
desired recursive removal mode.
This property needs to be set for both components that call the
executor: for ExecOp, as well as for the StartContainer API.
Signed-off-by: Justin Chadwell <me@jedevc.com>
This adds netNSPoolSize pool options which allow setting a target
network namespace pool size. buildkitd will create this number of
network namespaces at startup (without blocking). When a container
execution finishes, the network namespace gets returned to the pool. If
the pool goes above the target size, there is a grace period to allow
network namespaces to be reused, and if this passes without reuse, the
extra namespaces will be released.
Signed-off-by: Aaron Lehmann <alehmann@netflix.com>