mirror of
https://github.com/systemd/systemd.git
synced 2026-06-24 08:47:49 +00:00
Whenever delegating UID ranges to a user namespace, it can also be useful to map the foreign UID range, so that the container running in the user namespace with delegated UID ranges can download container images and unpack them to the foreign UID range. Let's add an option mapForeign to make this possible. Note that this option gives unprivileged users full access to the any foreign UID range owned directory that they can access. Hence it is recommended (and already was recommended) to store foreign UID range owned directories in a 0700 directory owned by the owner of the tree to avoid access and modifications by other users. This is already the case for the main users of the foreign UID range, namely /var/lib/machines, /var/lib/portables and /home/<user> which all use 0700 as their mode. Users will also be able to create foreign UID range owned inodes in any directories their own user can write to (on most systems this means /tmp, /var/tmp and /home/<user>).
103 lines
6.2 KiB
XML
103 lines
6.2 KiB
XML
<?xml version='1.0'?> <!--*-nxml-*-->
|
|
<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
|
|
"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
|
|
<!-- SPDX-License-Identifier: LGPL-2.1-or-later -->
|
|
|
|
<refentry id="systemd-nsresourced.service" conditional='ENABLE_NSRESOURCED'>
|
|
|
|
<refentryinfo>
|
|
<title>systemd-nsresourced.service</title>
|
|
<productname>systemd</productname>
|
|
</refentryinfo>
|
|
|
|
<refmeta>
|
|
<refentrytitle>systemd-nsresourced.service</refentrytitle>
|
|
<manvolnum>8</manvolnum>
|
|
</refmeta>
|
|
|
|
<refnamediv>
|
|
<refname>systemd-nsresourced.service</refname>
|
|
<refname>systemd-nsresourced</refname>
|
|
<refpurpose>User Namespace Resource Delegation Service</refpurpose>
|
|
</refnamediv>
|
|
|
|
<refsynopsisdiv>
|
|
<para><filename>systemd-nsresourced.service</filename></para>
|
|
<para><filename>/usr/lib/systemd/systemd-nsresourced</filename></para>
|
|
</refsynopsisdiv>
|
|
|
|
<refsect1>
|
|
<title>Description</title>
|
|
|
|
<para><command>systemd-nsresourced</command> is a system service that permits transient delegation of a
|
|
UID/GID range to a user namespace (see <citerefentry
|
|
project='man-pages'><refentrytitle>user_namespaces</refentrytitle><manvolnum>7</manvolnum></citerefentry>)
|
|
allocated by a client, via a Varlink IPC API.</para>
|
|
|
|
<para>Unprivileged clients may allocate a user namespace, and then request a UID/GID range to be assigned
|
|
to it via this service. The user namespace may then be used to run containers and other sandboxes, and/or
|
|
apply it to an id-mapped mount.</para>
|
|
|
|
<para>Allocations of UIDs/GIDs this way are transient: when a user namespace goes away, its UID/GID range
|
|
is returned to the pool of available ranges. In order to ensure that clients cannot gain persistency in
|
|
their transient UID/GID range a BPF-LSM based policy is enforced that ensures that user namespaces set up
|
|
this way can only write to file systems they allocate themselves or that are explicitly allowlisted via
|
|
<command>systemd-nsresourced</command>.</para>
|
|
|
|
<para><command>systemd-nsresourced</command> automatically ensures that any registered UID ranges show up
|
|
in the system's NSS database via the <ulink url="https://systemd.io/USER_GROUP_API">User/Group Record
|
|
Lookup API via Varlink</ulink>.</para>
|
|
|
|
<para>Currently, only UID/GID ranges consisting of either exactly 1 or exactly 65536 UIDs/GIDs can be
|
|
registered with this service. Moreover, UIDs and GIDs are always allocated together, and
|
|
symmetrically.</para>
|
|
|
|
<para>The allocation API supports <emphasis>delegated ranges</emphasis>: additional UID/GID ranges that
|
|
are mapped 1:1 into the user namespace rather than being translated to a target UID/GID. These delegated
|
|
ranges enable nested user namespace scenarios where a container needs to create child user namespaces
|
|
with their own transient UID ranges. Normally, the kernel restricts which UIDs can be mapped into a user
|
|
namespace to those that are also mapped in the parent. Delegated ranges solve this by pre-allocating
|
|
additional ranges that are visible inside the user namespace and can be used by nested
|
|
<function>AllocateUserRange()</function> calls. Up to 16 delegated ranges can be requested per user
|
|
namespace, each of size 65536. The ranges are allocated from the container UID ranges as per
|
|
<ulink url="https://systemd.io/UIDS-GIDS">Users, Groups, UIDs and GIDs on systemd Systems</ulink>.</para>
|
|
|
|
<para>The allocation API also supports <emphasis>identity mappings</emphasis>: instead of allocating a
|
|
transient UID/GID range, the user namespace can be configured to map the caller's UID/GID to root (UID
|
|
0) inside the namespace, or to itself. Identity mappings can be combined with delegated ranges to enter
|
|
a privileged user namespace from which the container can be set up after which the container can run in
|
|
one of the delegated ranges. Identity mapped users are not subject to BPF-LSM write restrictions unlike
|
|
the transient ranges.</para>
|
|
|
|
<para>Additionally, the allocation API supports mapping the <emphasis>foreign UID range</emphasis> into
|
|
the user namespace. When this option is enabled, the foreign UID range is mapped 1:1 into the user
|
|
namespace, allowing processes inside to access and manipulate files owned by the foreign UID range.</para>
|
|
|
|
<para>The service provides API calls to allowlist mounts (referenced via their mount file descriptors as
|
|
per Linux <function>fsmount()</function> API), to pass ownership of a cgroup subtree to the user
|
|
namespace and to delegate a virtual Ethernet device pair to the user namespace. When used in combination
|
|
this is sufficient to implement fully unprivileged container environments, as implemented by
|
|
<citerefentry><refentrytitle>systemd-nspawn</refentrytitle><manvolnum>1</manvolnum></citerefentry>, fully
|
|
unprivileged <varname>RootImage=</varname> (see
|
|
<citerefentry><refentrytitle>systemd.exec</refentrytitle><manvolnum>5</manvolnum></citerefentry>) or
|
|
fully unprivileged disk image tools such as
|
|
<citerefentry><refentrytitle>systemd-dissect</refentrytitle><manvolnum>1</manvolnum></citerefentry>.</para>
|
|
|
|
<para>This service provides one <ulink url="https://varlink.org/">Varlink</ulink> service:
|
|
<constant>io.systemd.NamespaceResource</constant> allows registering user namespaces, and assign mounts,
|
|
cgroups and network interfaces to it.</para>
|
|
</refsect1>
|
|
|
|
<refsect1>
|
|
<title>See Also</title>
|
|
<para><simplelist type="inline">
|
|
<member><citerefentry><refentrytitle>systemd</refentrytitle><manvolnum>1</manvolnum></citerefentry></member>
|
|
<member><citerefentry><refentrytitle>systemd-mountfsd.service</refentrytitle><manvolnum>8</manvolnum></citerefentry></member>
|
|
<member><citerefentry><refentrytitle>systemd-nspawn</refentrytitle><manvolnum>1</manvolnum></citerefentry></member>
|
|
<member><citerefentry><refentrytitle>systemd.exec</refentrytitle><manvolnum>5</manvolnum></citerefentry></member>
|
|
<member><citerefentry><refentrytitle>systemd-dissect</refentrytitle><manvolnum>1</manvolnum></citerefentry></member>
|
|
<member><citerefentry project='man-pages'><refentrytitle>user_namespaces</refentrytitle><manvolnum>7</manvolnum></citerefentry></member>
|
|
</simplelist></para>
|
|
</refsect1>
|
|
</refentry>
|