diff options
author | Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl> | 2024-10-15 18:53:00 +0200 |
---|---|---|
committer | Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl> | 2024-10-18 18:43:40 +0200 |
commit | 9b1a5bc365e379b4b13849adacfde3427f55ca38 (patch) | |
tree | 28e180cd0a59625e5aa2b8470f22e2e2293d5efe | |
parent | Merge pull request #34717 from anonymix007/fundamental-boot-changes (diff) | |
download | systemd-9b1a5bc365e379b4b13849adacfde3427f55ca38.tar.xz systemd-9b1a5bc365e379b4b13849adacfde3427f55ca38.zip |
man/systemd-nspawn: emphasise that user namespaces are strongly recommended
-rw-r--r-- | man/systemd-nspawn.xml | 65 |
1 files changed, 35 insertions, 30 deletions
diff --git a/man/systemd-nspawn.xml b/man/systemd-nspawn.xml index cd7d349b95..4feedd8644 100644 --- a/man/systemd-nspawn.xml +++ b/man/systemd-nspawn.xml @@ -46,8 +46,8 @@ <para><command>systemd-nspawn</command> may be used to run a command or OS in a light-weight namespace container. In many ways it is similar to <citerefentry project='man-pages'><refentrytitle>chroot</refentrytitle><manvolnum>1</manvolnum></citerefentry>, but more powerful - since it fully virtualizes the file system hierarchy, as well as the process tree, the various IPC subsystems and - the host and domain name.</para> + since it virtualizes the file system hierarchy, as well as the process tree, the various IPC subsystems, and + the host and domain names.</para> <para><command>systemd-nspawn</command> may be invoked on any directory tree containing an operating system tree, using the <option>--directory=</option> command line option. By using the <option>--machine=</option> option an OS @@ -59,11 +59,14 @@ project='man-pages'><refentrytitle>chroot</refentrytitle><manvolnum>1</manvolnum></citerefentry> <command>systemd-nspawn</command> may be used to boot full Linux-based operating systems in a container.</para> - <para><command>systemd-nspawn</command> limits access to various kernel interfaces in the container to read-only, - such as <filename>/sys/</filename>, <filename>/proc/sys/</filename> or <filename>/sys/fs/selinux/</filename>. The - host's network interfaces and the system clock may not be changed from within the container. Device nodes may not - be created. The host system cannot be rebooted and kernel modules may not be loaded from within the - container.</para> + <para><command>systemd-nspawn</command> limits access to various kernel interfaces in the container to + read-only, such as <filename>/sys/</filename>, <filename>/proc/sys/</filename>, or + <filename>/sys/fs/selinux/</filename>. The host's network interfaces and the system clock may not be + changed from within the container. Device nodes may not be created. The host system cannot be rebooted + and kernel modules may not be loaded from within the container. <emphasis>This sandbox can easily be + circumvented from within the container if user namespaces are not used</emphasis>. This means that + untrusted code must always be run in a user namespace, see the discussion of the + <option>--private-users=</option> option below.</para> <para>Use a tool like <citerefentry project='mankier'><refentrytitle>dnf</refentrytitle><manvolnum>8</manvolnum></citerefentry>, <citerefentry @@ -100,8 +103,8 @@ template unit file, making it usually unnecessary to alter this template file directly.</para> <para>Note that <command>systemd-nspawn</command> will mount file systems private to the container to - <filename>/dev/</filename>, <filename>/run/</filename> and similar. These will not be visible outside of the - container, and their contents will be lost when the container exits.</para> + <filename>/dev/</filename>, <filename>/run/</filename>, and similar. These will not be visible outside of + the container, and their contents will be lost when the container exits.</para> <para>Note that running two <command>systemd-nspawn</command> containers from the same directory tree will not make processes in them see each other. The PID namespace separation of the two containers is complete and the containers @@ -810,17 +813,6 @@ range. In this mode, the number of UIDs/GIDs assigned to the container is 65536, and the owner UID/GID of the root directory must be a multiple of 65536.</para></listitem> - <listitem><para>If the parameter is <literal>no</literal>, user namespacing is turned off. This is - the default.</para> - </listitem> - - <listitem><para>If the parameter is <literal>identity</literal>, user namespacing is employed with - an identity mapping for the first 65536 UIDs/GIDs. This is mostly equivalent to - <option>--private-users=0:65536</option>. While it does not provide UID/GID isolation, since all - host and container UIDs/GIDs are chosen identically it does provide process capability isolation, - and hence is often a good choice if proper user namespacing with distinct UID maps is not - appropriate.</para></listitem> - <listitem><para>The special value <literal>pick</literal> turns on user namespacing. In this case the UID/GID range is automatically chosen. As first step, the file owner UID/GID of the root directory of the container's directory tree is read, and it is checked that no other container is @@ -837,22 +829,35 @@ for it, and thus in the (possibly expensive) file ownership adjustment operation. However, subsequent invocations of the container will be cheap (unless of course the picked UID/GID range is assigned to a different use by then).</para></listitem> + + <listitem><para>If the parameter is <literal>no</literal>, user namespacing is turned off. This is + the default when <command>systemd-nspawn</command> is invoked directly. (Note that the + <filename>systemd-nspawn@.service</filename> unit enables private users.) This option is not + secure and must not be used to run untrusted code.</para></listitem> + + <listitem><para>If the parameter is <literal>identity</literal>, user namespacing is employed with + an identity mapping for the first 65536 UIDs/GIDs. This is mostly equivalent to + <option>--private-users=0:65536</option>. While it does not provide UID/GID isolation, since all + host and container UIDs/GIDs are chosen identically it does provide process capability isolation, + but may be useful if proper user namespacing with distinct UID maps is not possible. This option is + not secure and must not be used to run untrusted code.</para></listitem> </orderedlist> - <para>It is recommended to assign at least 65536 UIDs/GIDs to each container, so that the usable UID/GID range in the - container covers 16 bit. For best security, do not assign overlapping UID/GID ranges to multiple containers. It is - hence a good idea to use the upper 16 bit of the host 32-bit UIDs/GIDs as container identifier, while the lower 16 - bit encode the container UID/GID used. This is in fact the behavior enforced by the - <option>--private-users=pick</option> option.</para> + <para>It is recommended to assign at least 65536 UIDs/GIDs to each container, so that the usable + UID/GID range in the container covers 16 bits. For best security, do not assign overlapping UID/GID + ranges to multiple containers. It is hence a good idea to use the upper 16 bit of the host 32-bit + UIDs/GIDs as container identifier, while the lower 16 bits encode the container UID/GID used. This is + in fact the behavior enforced by the <option>--private-users=pick</option> option.</para> - <para>When user namespaces are used, the GID range assigned to each container is always chosen identical to the - UID range.</para> + <para>When user namespaces are used, the GID range assigned to each container is always chosen + identical to the UID range.</para> - <para>In most cases, using <option>--private-users=pick</option> is the recommended option as it enhances - container security massively and operates fully automatically in most cases.</para> + <para>In most cases, using <option>--private-users=pick</option> is the recommended option as user + namespacing is required for security, and this option massively enhances container security while + operating fully automatically in most cases.</para> <para>Note that the picked UID/GID range is not written to <filename>/etc/passwd</filename> or - <filename>/etc/group</filename>. In fact, the allocation of the range is not stored persistently anywhere, + <filename>/etc/group</filename>. In fact, the allocation of the range is not stored persistently, except in the file ownership of the files and directories of the container.</para> <para>Note that when user namespacing is used file ownership on disk reflects this, and all of the container's |