The github.com/mitchellh/hashstructure/v2 module was archived, and
there's a maintained fork in the gohugoio org.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Traces are now forwarded in a non-blocking goroutine when sent through
the traces exporter. This prevents traces forwarded from the client from
being stalled while waiting for an upstream uploader to appear.
In addition, adds a shutdown context to `appcontext` that will only
cancel when an interrupt has been received twice. One interrupt will
signal the program should clean up and shut down, the second indicates
we should skip shutdown procedures (more forceful), and the third will
indicate that we should immediately terminate the program.
This gives a bit more of a degree of control to shutdown procedures like
the traces and metrics exporter so there's a difference between forcibly
calling exit and just waiting a long time for the shutdown to happen.
Includes a more aggressive shutdown timeout for `buildctl` that is
similar to the export timeout on `docker-buildx` for the tracing
shutdown as another preventative measure to ensure the CLI hangs up at
an appropriate time interval.
Signed-off-by: Jonathan A. Sternberg <jonathan.sternberg@docker.com>
Add a proxyNetwork TOML setting and --proxy-network daemon flag to enable
exec proxy enforcement for every build. Wire the default through controller
and solver setup while preserving per-build enablement.
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Add a build request option that rewrites default exec networking to an
internal proxy network while preserving explicit none networking.
Route HTTP and HTTPS traffic through a BuildKit-owned proxy namespace, enforce
source policy checks for proxied requests, and inject a temporary CA into Linux
rootfs trust bundles for HTTPS interception.
Share namespace pooling between CNI and proxy providers, and cover proxy mode
with unit and integration tests.
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Threads the existing OTEL MeterProvider through llbsolver.Opt and emits
three build-event instruments from the recordBuildHistory finalizer:
- buildkit.builds (counter; labels: status, error_code)
- buildkit.builds.steps (counter; labels: kind)
- buildkit.build.duration (Base2 exponential histogram; labels: status)
The duration histogram uses an exponential aggregation, rendered as a
Prometheus native histogram by the existing exporter, to avoid the
"tens of millions of series" cardinality blow-up reported in #5777.
MeterProvider is passed explicitly through the constructor — buildkit
policy (per the #4957 review) prohibits relying on the OTel global
provider in library packages.
error_code uses gRPC codes.Code.String() for a bounded set;
rec.Error.Message is intentionally never used as a label. The frontend
label is intentionally omitted — client.Build clears req.Frontend on
the wire, so the field is empty for every caller that goes through the
gateway-client API (buildctl, buildx). The metric is forward-compatible
with a future buildkit change that populates rec.Frontend on that path.
A follow-up PR will add observable gauges for worker count and cache
state, plus an operator guide at docs/metrics.md.
Refs #1544; addresses discussion #5777.
Signed-off-by: Ava Barron <abarron@coreweave.com>
Update non-generated code for the newer lint recommendations by using typed
atomic values, strings.Cut, and slices.Backward where applicable.
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
There's a large potential for a lock contention issue in the gateway
forwarder's logic. The previous iteration of this would keep a global
mapping of the build ids and, when a forwarder for a build id didn't
exist, the forwarder would wait 3 seconds for the build to register.
The issue with lock contention comes after this. Instead of having a
notification channel that a specific build was ready, the forwarder
would wake up all goroutines that were waiting each time a build was
registered. Since each of those builds took a read lock to check whether
its build was present and registering subsequent builds took a write
lock, it was very easy to end up in a lock contention scenario when
starting many builds at the same time. Then it was easy to hit the 3
second timeout especially when the machine itself was under load.
This changes the notification mechanism so the notify happens per build.
Looking up a build id creates a forwarder registrar with a channel that
can be polled for when the registration is complete. A forwarder will
then only be notified and woken when that specific build id is ready by
the go runtime rather than from the sync condition.
Signed-off-by: Jonathan A. Sternberg <jonathan.sternberg@docker.com>
Expose the builtin Dockerfile frontend version in BuildKit version
APIs and buildctl debug output.
Move Dockerfile version logic into frontend/dockerfile/version and
validate that the builtin version constant matches release tags.
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Add solve-wide compatibility-version support for image and oci
exports, with historical goldens and release compatibility tests.
Backfill version 10 for v0.13-v0.14 git artifact behavior, keep
version 20 as current, and reject unsupported zstd on v10.
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Move build history queue from solver/llbsolver into its own
solver/llbsolver/history package. The history subsystem is
persistence/API concern, not solving logic.
Rename types to avoid stutter with package name:
HistoryQueue -> history.Queue
HistoryQueueOpt -> history.QueueOpt
NewHistoryQueue -> history.NewQueue
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
This creates an interface that can be used to read the filesystem of a
new container created through the gateway API. These filesystem reading
methods are tied to a specific container that has been created, but
aren't tied to the container itself.
Due to being run inside of buildkit, these containers have access to the
same mounts that a container request would have. This is useful for
features like the file explorer in `buildx dap` because it can access
container filesystem state from stages that error along with ones that
have completed successfully.
Signed-off-by: Jonathan A. Sternberg <jonathan.sternberg@docker.com>
Add support for dynamic source policies via client session.
Client session can allow or deny specific source or
ask additional metadata information via sourcemetaresolver if
that is needed to make the decision.
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Inline cache exporter can be set with multiple ways, doesn't
have any attributes and can always only run one time.
Instead of allowing multiple inline exporters where one
gets ignored later when there is an attribute difference, or
erroring when attributes are unset, just ignore the extra ones.
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
This allows buildkitd daemon to define additional fields
that are added to all the provenance attestations that
BuildKit creates (by default from /etc/buildkit/provenance.d/).
These custom fields can provide additional context about
the environment BuildKit itself is running (eg. Github workflow)
and are not allowed to collide with the trusted fields created
by BuildKit itself.
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Allows similar time-based filter that is allowed for
prune requests so that DiskUsage request can be used to
check which records would be candidates for pruning.
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Trace blob is created 3 seconds after build completion.
If this happens after test has cleaned all history records
and before it checks for leaked blobs, test can report the
trace blob as leaked. In practice it would be cleaned up
next time containerd GC gets triggered.
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Devices can be marked as "automatically allowed" by TOML config
or by the CDI spec of specific file via annotation.
Device that is is not "automatically allowed" needs to be allowed
by the build request by passing entitlement. For example a Dockerfile
may not use a device without use invoking the build permitting it.
--allow device grants access to any device.
--allow device=kind|name grants access to specific device.
--allow device=kind|name,alias=kind|name allows mapping kind to
a specific device or one device to another. Alias is the name requested
by the build and device is the actual device that is being enabled.
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
The prune logic would prune multiple times because one prune could cause
more things to be capable of pruning and change the logic. This was done
through a recursive invocation.
Since go doesn't have support for function tail calls, this would result
in a new stack entry for each loop. This unrolls the logic so the prune
function is invoked iteratively rather than recursively.
`prune` and `pruneOnce` have also had their names swapped. In general,
`pruneOnce` implies that it runs a single prune while `prune` sounds
like the top level function. The current code had this reversed and
`pruneOnce` would call `prune` and `prune` would call itself
recursively.
I've also updated a section in the controller that invoked prune on each
worker. In older versions of Go, the current version was correct because
those versions of Go would reuse the location for each loop which would
cause goroutines to all reference the same worker instead of different
workers.
Recent versions of Go have changed the behavior so this is no longer
needed.
Signed-off-by: Jonathan A. Sternberg <jonathan.sternberg@docker.com>
In this case the current stack trace points to the line
where the context was created. Instead the stack should be
captured when the defer is running so the return path to
the defer call is also part of the stack.
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
When GracefulStop is called gRPC waits for current requests to finish
before closing. While this is generally the behavior we want, it is
not always same for the History.Listen endpoint. That endpoint is
usually open even if buildkit is not actively processing any builds,
because client may be waiting for new events.
The new logic is that if GracefulStop will happen, history will
close active listeners if there are no active builds. If there are
active builds then active listeners will be closed after all the
active builds have completed their finalizers.
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Current implementation based on leases.SynchronousDelete only works
with the containerd worker and is ignored otherwise. This means that
blobs referenced by history records were left on disc until the
periodic background GC was initialized later.
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Remove gogoproto in favor of the standard protobuf compiler. This
removes any nonstandard extensions that were part of gogoproto such as
the custom types.
Signed-off-by: Jonathan A. Sternberg <jonathan.sternberg@docker.com>
Use golang.org/x/exp/trace to implement an trace recorder that saves the trace
to a circular buffer and can be retrieved at any time.
Debug endpoints have been added under /debug/flight to start and stop the trace
as well as to set its period.
Due to golang.org/x/exp/trace, the minimum go version has been bumped to 1.22
Signed-off-by: Alberto Garcia Hierro <damaso.hierro@docker.com>
Client can send a finalize update to build record that
will complete saving the traces and block until the record
has been updated. If no request is sent then the traces will be
sent after a 3 second timeout as before.
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
This is more versatile function that works for any source,
not just images.
It can be used together with a policy that switches
between input and output source as well as for adding
additional metadata for other sources in the future.
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
The solver has a Close method to shutdown the scheduler, which releases
a goroutine. We should call it on shutdown.
While in the area, we can also close the sysSampler.
Signed-off-by: Justin Chadwell <me@jedevc.com>
We can derive exporter ids from their place in the exporter array in a
SolveRequest - this removes the need to manually generate and handle
multiple sets of IDs.
Signed-off-by: Justin Chadwell <me@jedevc.com>