summaryrefslogtreecommitdiffstats
path: root/src/nspawn/nspawn.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* nspawn: add support for 'managed' userns mode even when we run privilegedLennart Poettering7 days1-92/+156
| | | | | | | | | | | | | | | | | | | | | | | So far, we supported two modes: 1. when running unpriv we'd get the mounts from mountfsd, and the userns from nsresourced 2. when running priv we'd do the mounts/userns ourselves This untangles this a bit, so that we can also use mountfsd/nsresourced when running privilged. I think this is generally a bit nicer, and probably something we should switch to entirely one day, as it reduces the variety of codepaths. With this patch the default behaviour remains unchanged, but by selecting the new "managed" option for --private-users= the codepaths via mountfsd/nsresourced can be explicitly requested even when running with privs. This is mostly just reworks that we check for arg_userns_mode != USER_NAMESPACE_MANAGED rather than arg_privileged for a number of codepaths, but requires more fixes, too. The devil is in the details.
* nspawn: support foreign mappings also when nspawn doing the mapping itselfLennart Poettering7 days1-2/+39
| | | | | | | | | | This adds a new "foreign" value to --private-users-ownership= which is a lot like "map", but maps from the host's foreign UID range rather than from the host's 0. (This has nothing much to do with making unprivileged directory-based containers work, it's just very handy that we can run privileged contains with such a mapping too, with an easy switch)
* nspawn: allow to run unpriv from dirLennart Poettering7 days1-62/+66
| | | | | | | | | | | | | | | | This simply calls into mountfsd to acquire the root mount and uses it as root for the container. Note that this also makes one more change: previously we ran containers directory off their backing directory. Except when we didn't, and there were a variety of exceptions: if we had no privs, if we ran off a disk image, if the directory was the host's root dir, and some others. This simplifies the logic a bit: we now simply always create a temporary directory in /tmp/ and bind mount everything there, in all code paths. This simplifies our code a bit. After all, in order to control propagation we need to turn the root into a mount point anyway, hence we might just do it at one place for all cases.
* nspawn: assorted coding style fixesLennart Poettering13 days1-1/+1
|
* nspawn: trivial scope reductionLennart Poettering2025-01-151-1/+2
|
* tree-wide: port more code to namespace_open_by_type()Lennart Poettering2025-01-101-8/+3
|
* basic: port various pidfd/pidref helpers to PIDFD_GET_INFO and ↵Lennart Poettering2025-01-061-5/+3
|\ | | | | | | | | | | | | | | PIDFD_GET_*_NAMESPACE (#35242) Supersedes #35308 (cherry-picked one commit and replaced the rest) (I left a few comments that's folded by GitHub. Please make sure to check them too.)
| * namespace-util: modernize fd_is_namespace() and is_our_namespace()Mike Yuan2025-01-041-5/+3
| | | | | | | | | | | | | | | | - Make fd_is_namespace() take NamespaceType - Drop support for kernel without NS_GET_NSTYPE (< 4.11) - Port is_our_namespace() to namespace_open_by_type() (preparation for later commits, where the latter would go by pidfd if available, avoiding procfs)
* | signal-util: generalize sigaction_nop_nocldstopMike Yuan2025-01-041-6/+1
|/
* nspawn: move uid shift/chown() code into shared/Lennart Poettering2025-01-041-1/+1
|
* nspawn: trivial tweaklets (#35831)Daan De Meyer2025-01-031-8/+5
|\
| * nspawn: improve log messages a bitLennart Poettering2025-01-031-2/+2
| |
| * nspawn: drop some redundant {}Lennart Poettering2025-01-031-6/+3
| |
* | nspawn: rework userns_mkdir() around chase()Lennart Poettering2025-01-031-9/+19
|/
* discover-image: introduce per-user image directoriesLennart Poettering2024-12-201-1/+2
| | | | | | | | | | | | | | | | | | | We nowadays support unprivileged invocation of systemd-nspawn + systemd-vmspawn, but there was no support for discovering suitable disk images (i.e. no per-user counterpart of /var/lib/machines). Add this now, and hook it up everywhere. Instead of hardcoding machined's, importd's, portabled's, sysupdated's image discovery to RUNTIME_SCOPE_SYSTEM I introduced a field that make the scope variable, even if this field is always initialized to RUNTIME_SCOPE_SYSTEM for now. I think these four services should eventually be updated to support a per-user concept too, this is preparation for that, even though it doesn't outright add support for this. This is for the largest part not user visible, except for in nspawn, vmspawn and the dissect tool. For the latter I added a pair of --user/--system switches to select the discovery scope.
* nspawn: switch to read_virtual_file() for reading audit loginuidLennart Poettering2024-12-191-1/+1
|
* nspawn: trivial improvementsLennart Poettering2024-12-191-1/+2
|
* nspawn: rename pin_fully_visible_fs() → pin_fully_visible_api_fs()Lennart Poettering2024-12-191-2/+2
| | | | | | | | This function pins the *API* FS, i.e. /proc/ + /sys/, not just any fs. Hence clarify this in the name. (At least we call these two fs "API (V)FS" in our codebase, hence continue to do so here)
* nspawn: rename 'fd' variable to something more descriptiveLennart Poettering2024-12-191-6/+7
|
* nspawn: use DEVNUM_FORMAT_STR/DEVNUM_FORMAT_VAL moreLennart Poettering2024-12-191-1/+2
|
* ptyfwd: always flush buffer and disconnect before exitYu Watanabe2024-12-181-3/+0
| | | | | | Then, it is not necessary to manually drain PTY forwarder by the user side. Also, not necessary to free PTY forwarder earlier explicitly to make it disconnected.
* ptyfwd: always write additional line break on stopYu Watanabe2024-12-181-8/+1
| | | | | Currently we do that in the user of PTY forwarder, e.g. nspawn. But, let's do that unconditionally in the PTY forwarder.
* tree-wide: remove support for kernels lacking ambient capsLennart Poettering2024-12-171-2/+2
| | | | | | | | Let's bump the kernel baseline a bit to 4.3 and thus require ambient caps. This allows us to remove support for a variety of special casing, most importantly the ExecStart=!! hack.
* meson: allow to customize the access mode for tty/pts devicesYu Watanabe2024-12-161-2/+2
| | | | | | | Then, switch the default value to "0600", due to general security concerns about terminals being written to by other users. Closing #35599.
* nspawn: improve error message when we cannot look into a container tree due ↵Lennart Poettering2024-11-271-3/+6
| | | | to perms
* nspawn: don't try to unregister a machine we never registeredLennart Poettering2024-11-271-1/+1
| | | | | | When registering we condition this on "arg_register". Let's do the same when unregistering, otherwise we might end up trying to unregister a machine we never registered.
* nspawn: improve log message on bad incoming sd_notify() messageLennart Poettering2024-11-231-1/+1
| | | | It's the PID that is wrong, not the UID/GID, be precise.
* nspawn: fix userns_mkdir() invocationLennart Poettering2024-11-231-4/+3
| | | | | | | | | The wrong error code was logged. But actually given that userns_mkdir() is fine with existing dirs, let's drop the redundant conditionalization. Follow-up for: a1fcaa1549d86098d0ba75254b6afc96c786b3b6
* nspawn: --private-users-ownership= value is called 'chown', not 'own'Lennart Poettering2024-11-151-1/+2
|
* nspawn: ignore failure in creating /dev/net/tun when --private-network is ↵Yu Watanabe2024-11-141-6/+19
| | | | | | | unspecified Follow-up for efedb6b0f3cff37950112fd37cb750c16d599bc7. Closes #35116.
* nspawn: split out copy_devnode_one() and bind_mount_devnode() from ↵Yu Watanabe2024-11-141-70/+104
| | | | | | | | | copy_devnodes() While doing that, even if mknod() failed, we anyway try to fall back to use bind mount if arg_uid_shift == 0. Mostly no functional change, just refactoring and preparation for later commit.
* nspawn: silence warning about failure in getting fuse versionYu Watanabe2024-11-141-1/+2
| | | | | | | | Follow-up for dc3223919f663b7c8b8d8d1d6072b4487df7709b. If nspawn is invoked with DevicePolicy= but DeviceAllow= does not contain /dev/fuse, nspawn will fail to get fuse version with -EPERM. Let's silence the warning in that case.
* nspawn: fix indentation of run_container() parameter listLennart Poettering2024-11-121-9/+9
|
* tree-wide: replace for loop with FOREACH_ELEMENT or FOREACH_ARRAY macros ↵Integral2024-10-261-7/+5
| | | | (#34893)
* tree-wide: use isatty_safe() everywhereLennart Poettering2024-10-251-3/+3
|
* Merge pull request #34783 from keszybz/man-nspawn-private-usersZbigniew Jędrzejewski-Szmek2024-10-181-1/+1
|\ | | | | Change systemd-nspawn man page to strongly recommend private users
| * tree-wise: use "lightweight" spellingZbigniew Jędrzejewski-Szmek2024-10-181-1/+1
| | | | | | | | | | Both spellings were used, but the dictionary says that "lightweight" is the standard spelling.
* | fdset: optionally, close remaining fds asynchronouslyLennart Poettering2024-10-171-1/+1
|/
* tree-wide: drop doubled empty linesYu Watanabe2024-10-071-1/+0
|
* fs-util: rename laccess to access_nofollowMike Yuan2024-10-051-1/+1
| | | | In order to distinguish it from libc function naming.
* nspawn: fix typoYu Watanabe2024-09-161-1/+1
| | | | Follow-up for d7a6bb9891ecc38a1bedef9689d00671bb0001ff.
* tree-wide: make sigprocmask() changes more automaticLennart Poettering2024-09-131-0/+4
| | | | | | | | | | | | | This tries to get rid of most manual sigprocmask() changes, in favour of: 1. The SD_EVENT_SIGNAL_PROCMASK flag to sd_event_add_signal() 2. The sd_event_set_signal_exit() call for handling SIGTERM/SIGINT 3. Move masking of SIGWINCH into ptyfwd, out of nspawn/vmspawn/run And while we are at it get rid of a bunch of event source fields whose lifetime is bound to the sd_event object they belong to anyway, and make use of the "floating" event source feature of sd-event instead.
* nspawn: use ERRNO_IS_NEG_NOT_SUPPORTED() at one more placeYu Watanabe2024-09-091-1/+1
| | | | | | | | Follow-up for dc3223919f663b7c8b8d8d1d6072b4487df7709b. Addresses https://github.com/systemd/systemd/pull/34067#discussion_r1748061156. Error codes other than ENOSYS may not come here, but if it comes, still there is nothing we can do here, so let's not log the failure loudly.
* Merge pull request #34258 from yuwata/nspawn-volatile-uLennart Poettering2024-09-091-10/+21
|\ | | | | nspawn: make --volatile work with -U
| * nspawn: only remount /usr/ with idmap when --volatile=yesYu Watanabe2024-09-061-4/+7
| | | | | | | | | | | | | | | | | | | | The root directory is already mounted with a picked UID shift, hence it is not necessary to remount with idmap. However, /usr/ is a bind-mount, hence it must be remounted with idmap. With this change, now '-U --volatile=yes' works fine. Fixes #34254.
| * nspawn: mount /var/ after remount_idmap() when --volatile=stateYu Watanabe2024-09-061-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | Previously, remount_idmap() failed as /var/ was already mounted, thus remounting (strictly speaking, unmounting old root directory) failed with -EBUSY. As tmpfs /var/ is mounted with picked UID shift, it should not be remounted with idmap, but needs to be mounted after the root directory being remounted. This makes '-U --volatile=state' work as expected.
| * nspawn: use strv_extend() and friends to build directories passed to ↵Yu Watanabe2024-09-061-9/+9
| | | | | | | | | | | | remount_idmap() No functional change, just refactoring and preparation for later change.
* | nspawn: enable FUSE in containersLuke T. Shumaker2024-09-071-3/+97
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Linux kernel v4.18 (2018-08-12) added user-namespace support to FUSE, and bumped the FUSE version to 7.27 (see: da315f6e0398 (Merge tag 'fuse-update-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse, Linus Torvalds, 2018-06-07). This means that on such kernels it is safe to enable FUSE in nspawn containers. In outer_child(), before calling copy_devnodes(), check the FUSE version to decide whether enable (>=7.27) or disable (<7.27) FUSE in the container. We look at the FUSE version instead of the kernel version in order to enable FUSE support on older-versioned kernels that may have the mentioned patchset backported ([as requested by @poettering][1]). However, I am not sure that this is safe; user-namespace support is not a documented part of the FUSE protocol, which is what FUSE_KERNEL_VERSION/FUSE_KERNEL_MINOR_VERSION are meant to capture. While the same patchset - added FUSE_ABORT_ERROR (which is all that the 7.27 version bump is documented as including), - bumped FUSE_KERNEL_MINOR_VERSION from 26 to 27, and - added user-namespace support these 3 things are not inseparable; it is conceivable to me that a backport could include the first 2 of those things and exclude the 3rd; perhaps it would be safer to check the kernel version. Do note that our get_fuse_version() function uses the fsopen() family of syscalls, which were not added until Linux kernel v5.2 (2019-07-07); so if nothing has been backported, then the minimum kernel version for FUSE-in-nspawn is actually v5.2, not v4.18. Pass whether or not to enable FUSE to copy_devnodes(); have copy_devnodes() copy in /dev/fuse if enabled. Pass whether or not to enable FUSE back over fd_outer_socket to run_container() so that it can pass that to append_machine_properties() (via either register_machine() or allocate_scope()); have append_machine_properties() append "DeviceAllow=/dev/fuse rw" if enabled. For testing, simply check that /dev/fuse can be opened for reading and writing, but that actually reading from it fails with EPERM. The test assumes that if FUSE is supported (/dev/fuse exists), then the testsuite is running on a kernel with FUSE >= 7.27; I am unsure how to go about writing a test that validates that the version check disables FUSE on old kernels. [1]: https://github.com/systemd/systemd/issues/17607#issuecomment-745418835 Closes #17607
* | nspawn: register_machine() and allocate_scope() bools to flagsLuke T. Shumaker2024-09-071-4/+7
| |
* | nspawn: convert copy_devnodes():devnodes from nulstr to strvLuke T. Shumaker2024-09-071-12/+14
| |