From 76fcf9a8e0f1cc6824a8c58f0ab4cc3a77bd3826 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Tom=C3=A1=C5=A1=20Virtus?= Date: Tue, 23 Apr 2024 21:10:46 +0200 Subject: [PATCH] apparmor: Allow confined runc to kill containers MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit /usr/sbin/runc is confined with "runc" profile[1] introduced in AppArmor v4.0.0. This change breaks stopping of containers, because the profile assigned to containers doesn't accept signals from the "runc" peer. AppArmor >= v4.0.0 is currently part of Ubuntu Mantic (23.10) and later. In the case of Docker, this regression is hidden by the fact that dockerd itself sends SIGKILL to the running container after runc fails to stop it. It is still a regression, because graceful shutdowns of containers via "docker stop" are no longer possible, as SIGTERM from runc is not delivered to them. This can be seen in logs from dockerd when run with debug logging enabled and also from tracing signals with killsnoop utility from bcc[2] (in bpfcc-tools package in Debian/Ubuntu): Test commands: root@cloudimg:~# docker run -d --name test redis ba04c137827df8468358c274bc719bf7fc291b1ed9acf4aaa128ccc52816fe46 root@cloudimg:~# docker stop test Relevant syslog messages (with wrapped long lines): Apr 23 20:45:26 cloudimg kernel: audit: type=1400 audit(1713905126.444:253): apparmor="DENIED" operation="signal" class="signal" profile="docker-default" pid=9289 comm="runc" requested_mask="receive" denied_mask="receive" signal=kill peer="runc" Apr 23 20:45:36 cloudimg dockerd[9030]: time="2024-04-23T20:45:36.447016467Z" level=warning msg="Container failed to exit within 10s of kill - trying direct SIGKILL" container=ba04c137827df8468358c274bc719bf7fc291b1ed9acf4aaa128ccc52816fe46 error="context deadline exceeded" Killsnoop output after "docker stop ...": root@cloudimg:~# killsnoop-bpfcc TIME PID COMM SIG TPID RESULT 20:51:00 9631 runc 3 9581 -13 20:51:02 9637 runc 9 9581 -13 20:51:12 9030 dockerd 9 9581 0 This change extends the docker-default profile with rules that allow receiving signals from processes that run confined with either runc or crun profile (crun[4] is an alternative OCI runtime that's also confined in AppArmor >= v4.0.0, see [1]). It is backward compatible because the peer value is a regular expression (AARE) so the referenced profile doesn't have to exist for this profile to successfully compile and load. Note that the runc profile has an attachment to /usr/sbin/runc. This is the path where the runc package in Debian/Ubuntu puts the binary. When the docker-ce package is installed from the upstream repository[3], runc is installed as part of the containerd.io package at /usr/bin/runc. Therefore it's still running unconfined and has no issues sending signals to containers. [1] https://gitlab.com/apparmor/apparmor/-/commit/2594d936 [2] https://github.com/iovisor/bcc/blob/master/tools/killsnoop.py [3] https://download.docker.com/linux/ubuntu [4] https://github.com/containers/crun Signed-off-by: Tomáš Virtus (cherry picked from commit 5ebe2c0d6bf30ad76550f0dc8cf35a71098ba5fc) Signed-off-by: Paweł Gronowski --- profiles/apparmor/template.go | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/profiles/apparmor/template.go b/profiles/apparmor/template.go index cf8c34ce8a..8dbc1b6102 100644 --- a/profiles/apparmor/template.go +++ b/profiles/apparmor/template.go @@ -25,6 +25,10 @@ profile {{.Name}} flags=(attach_disconnected,mediate_deleted) { umount, # Host (privileged) processes may send signals to container processes. signal (receive) peer=unconfined, + # runc may send signals to container processes (for "docker stop"). + signal (receive) peer=runc, + # crun may send signals to container processes (for "docker stop" when used with crun OCI runtime). + signal (receive) peer=crun, # dockerd may send signals to container processes (for "docker kill"). signal (receive) peer={{.DaemonProfile}}, # Container processes may send signals amongst themselves.