summaryrefslogtreecommitdiffstats
path: root/src/krbd.cc (follow)
Commit message (Collapse)AuthorAgeFilesLines
* krbd: return error when no initial monitor address foundBurt Holzman2022-02-071-0/+5
| | | | | | | | | | | | | | | Since we filter monitor addresses based on ms_mode, check that at least one address was found. Otherwise, we mismatch arguments when calling sysfs/add_single_major which emits a misleading error message to dmesg: libceph: resolve 'name=user1' (ret=-3): failed libceph: parse_ips bad ip 'name=user1,key=client.user1' Fixes: https://tracker.ceph.com/issues/54128 Signed-off-by: Burt Holzman <burt@fnal.gov>
* krbd: escape udev_enumerate_add_match_sysattr valuesIlya Dryomov2021-08-281-4/+12
| | | | | | | | | libudev uses fnmatch(3) for matching attributes, meaning that shell glob pattern matching is employed instead of literal string matching. Escape glob metacharacters to suppress pattern matching. Fixes: https://tracker.ceph.com/issues/52425 Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* krbd: build without "using namespace std"Kefu Chai2021-08-131-13/+15
| | | | | | | | | | * add "std::" prefix in headers * add "using" declarations in .cc files. so we don't rely on "using namespace std" in one or more included headers. Signed-off-by: Kefu Chai <kchai@redhat.com>
* krbd: check device node accessibility only if we actually mappedIlya Dryomov2021-03-171-1/+5
| | | | | | | Fix a braino that came with commit f6854ac65d2a ("krbd: make sure the device node is accessible after the mapping"). Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* krbd: make sure the device node is accessible after the mappingIlya Dryomov2021-02-221-9/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | We have always assumed this to be the case and users' scripts and orchestration tools have grown to depend on this. Let's add some enforcement, prompted by [1]: "I am running my Kubernetes worker node inside of an LXC container which doesn't benefit from the device node created by the kernel, so I'm using udev to create the /dev/rbd* device nodes inside of the LXC container." which, through the unfortunate interaction with ceph-csi rbd plugin, results in data loss for "volumeMode: Filesystem" PVs because it ends up recreating the filesystem every time the PV is attached to the pod: "When deleting the pod and re-creating it, I can see that the RBD image is indeed being reformatted. This seems to be because when blkid is being run to check if the image is formatted, the /dev/rbd* device has not yet been created by udev. By the time the code gets down to running mkfs, the device is there and the damage is done." [1] https://github.com/ceph/ceph-csi/issues/1820 Fixes: https://tracker.ceph.com/issues/49410 Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* krbd: add support for msgr2Ilya Dryomov2021-01-251-9/+25
| | | | | | | | | | | | | | | | | Recognize ms_mode map option and filter initial monitor addresses accordingly: if ms_mode is not given or ms_mode=legacy, discard v2 addresses, otherwise discard v1 addresses. Note that nothing was discarded (i.e. v2 addresses were passed to the kernel) previously. The intent was to preserve that behaviour in case ms_mode is not given, allowing to change the kernel default in the future. However, it turns out that mount.ceph helper has been misguidedly discarding v2 addresses since commit eae01275134e ("mount.ceph: fork a child to get info from local configuration"), so that ship has sailed. Fixes: https://tracker.ceph.com/issues/48976 Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* krbd: optionally skip waiting for udev eventsIlya Dryomov2020-09-011-10/+47
| | | | | | | | | Add support for noudev option to allow mapping and unmapping images from a privileged container in a non-initial network namespace (e.g. when using Multus CNI). Fixes: https://tracker.ceph.com/issues/47128 Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* krbd: misc cleanupsIlya Dryomov2019-11-281-26/+26
| | | | | | | Introduce get_devnode() and append_unmap_options(); make some functions static. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* krbd: make wait_for_udev_add() clearer and a bit more efficientIlya Dryomov2019-11-281-20/+24
| | | | | | | | Collect only /dev/rbd* block events and dispose of them as soon as possible; match on devnode and assert on major/minor instead of the other way around. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* krbd: do away with explicit memory managementIlya Dryomov2019-11-281-192/+143
| | | | | | Wrap udev_monitor, udev_enumerate and udev_device with std::unique_ptr. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* krbd: retry on an empty list from udev_enumerate_scan_devices()Ilya Dryomov2019-10-251-13/+36
| | | | | | | | | | | | | | | | | | systemd 219 doesn't have the issue that is worked around in the previous commit, but has a different one: udev_enumerate_scan_devices() always succeeds, but sometimes returns an empty list when the device is actually there. This happens rarely and at random so I haven't been able to get to the bottom of it yet, but it looks like another similar race condition in libudev. Since an empty list is expected if the device isn't there, retry just twice with a small sleep in-between. This appears to be enough: I got 7 occurrences per 600000 "rbd unmap" invocations, all of which needed a single retry: rbd: udev enumerate missed a device, tries = 1 Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* krbd: retry on transient errors from udev_enumerate_scan_devices()Ilya Dryomov2019-10-251-3/+24
| | | | | | | | | | | | | | | | | | | | udev_enumerate_scan_devices() doesn't handle disappearing devices well. If called while some devices are being removed, it sometimes propagates ENOENT and ENODEV errors encountered operating on directory entries in /sys that no longer exist. Some of these errors are suppressed, but this isn't reliable and varies across versions. In particular, systemd 239 suppresses ENODEV from sd_device_new_from_syspath() but doesn't suppress ENODEV from sd_device_get_devnum(). In systemd 243 the call to sd_device_get_devnum() has been moved, but it still leaks ENOENT from sd_device_get_is_initialized() (referring to the body of FOREACH_DIRENT_ALL loop in enumerator_scan_dir_and_add_devices()). Assume that all ENOENT and ENODEV errors are transient and retry the call to udev_enumerate_scan_devices(). Don't limit the number, but log each retry. Fixes: https://tracker.ceph.com/issues/41036 Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* Merge pull request #30965 from idryomov/wip-krbd-udev-socket-overrunIlya Dryomov2019-10-211-108/+206
|\ | | | | | | | | krbd: avoid udev netlink socket overrun Reviewed-by: Jason Dillaman <dillaman@redhat.com>
| * krbd: increase udev netlink socket receive buffer to 2MIlya Dryomov2019-10-181-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | Even though with the previous commit we no longer block between binding the socket and starting handling events, we still want a larger receive buffer to accommodate for scheduling delays. Since the filtering is done in the listener, an estimate focused on just rbd is not accurate, but anyway: a pair of "rbd" and "block" events for "rbd map" take 2048 bytes in the receive buffer. This allows for roughly a thousand of them ("rbd map" and "rbd unmap" require root and libudev makes use of SO_RCVBUFFORCE so rmem_max limit is ignored). Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * krbd: avoid udev netlink socket overrunIlya Dryomov2019-10-161-47/+128
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because the event(s) we are interested in can be deliveled while we are still in the kernel finishing map or unmap, we start listening for udev events before going into the kernel. However, if (un)mapping takes its time, udev netlink socket can be fairly easily overrun -- the filtering is done on the listener side, so we get to process everything, not just rbd events. If any of the events of interest get dropped (ENOBUFS), we hang in poll(). Go into the kernel in a separate thread and leave the main thread to run the event loop. The return value is communicated to the reactor though a pipe. Fixes: https://tracker.ceph.com/issues/41404 Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * krbd: reap all available events before polling againIlya Dryomov2019-10-161-8/+13
| | | | | | | | | | | | | | This also exposes errors from udev_monitor_receive_device() which were previously ignored. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * krbd: separate event reaping from event processingIlya Dryomov2019-10-161-62/+70
| | | | | | | | | | | | | | | | Move event processing into UdevMapHandler and UdevUnmapHandler functors and replace wait_for_udev_{add,remove}() with a single wait_for_mapping() template. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * krbd: get rid of poll() timeoutIlya Dryomov2019-10-161-18/+6
| | | | | | | | | | | | | | | | This timeout was added as a (very poor) workaround for an issue addressed in commit 42dd1eae630f ("krbd: fix rbd map hang due to udev return subsystem unordered"). Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* | krbd: modprobe before calling build_map_buf()Ilya Dryomov2019-10-171-4/+7
|/ | | | | | | | Otherwise add_key() in set_kernel_secret() fails as if running against an ancient kernel and we fall back to secret= in options for the first image being mapped on the machine. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* krbd: fix rbd map hang due to udev return subsystem unorderedZhi Zhang2019-04-111-24/+39
| | | | | | | | | | | | The order of subsystem returned by udev_device_get_subsystem might not be same order as adding subsystem by udev_monitor_filter_add_match_subsystem_devtype. So if block event is returned first and rbd event is returned next, then further poll will get nothing back until timed-out. Fixes: http://tracker.ceph.com/issues/39089 Signed-off-by: Zhi Zhang <zhangz.david@outlook.com>
* rbd: krbd: return -ETIMEDOUT in pollingDongsheng Yang2019-03-201-2/+12
| | | | | | | | We don't want to wait on uevent forever, but the return value of polling in timeout is 0 rather than a negative value. Fixes: http://tracker.ceph.com/issues/38792 Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
* mon/MonMap: adjust build_initial behavior for mkfs vs probeSage Weil2019-01-031-1/+1
| | | | | | | For the mkfs case, interpret an ambiguous port as a v2 address. For probe, try both. Signed-off-by: Sage Weil <sage@redhat.com>
* Rename "include/assert.h"Brad Hubbard2018-09-141-1/+1
| | | | | | | | | This conflicts with the system assert.h so rename and change includes to reflect the new name. Fixes: http://tracker.ceph.com/issues/35682 Signed-off-by: Brad Hubbard <bhubbard@redhat.com>
* krbd: support for images within namespacesIlya Dryomov2018-08-311-17/+76
| | | | Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* krbd: create udev_enumerate in enumerate_devices()Ilya Dryomov2018-08-311-18/+21
| | | | | | Make it easier to run more than one scan in a row. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* krbd: introduce krbd_specIlya Dryomov2018-08-311-58/+83
| | | | | | | Don't substitute "@-" for HEAD when printing the spec. Instead, omit the snapshot part. The same would be done for the namespace part. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* krbd: change krbd_unmap_by_spec() to take "" instead of NULLIlya Dryomov2018-08-311-1/+1
| | | | | | | krbd_map() and krbd_is_mapped() take "", krbd_unmap_by_spec() is the odd one out. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* krbd: remove unused includeIlya Dryomov2018-08-311-1/+0
| | | | Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* rbd: Use ceph_assert for asserts.Adam C. Emerson2018-08-271-3/+3
| | | | Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
* common,rbd,rgw,osd: extract config values into ConfigValuesKefu Chai2018-07-101-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | this change introduce three classes: ConfigValues, ConfigProxy and ConfigReader. in seastar port of OSD, each CPU shard will hold its own reference of configuration, and upon changes of settings, each shard will be updated with the new setting in async. so this forces us to be able to keep two set of configuration at the same time. so we need to extract the changeable part of md_config_t out. so we can replace the old one with new one on demand, and let different shards share the same unchanged part, amon the other things, the Options map and the lookup tables. that's why we need ConfigValues. we will add a policy template for this class, so we can specialize for Seastar implementation to allow different ConfigProxy instances to point md_config_impl<> to different ConfigValues. because the observer interface is still using md_config_t, to minimise the impact of this change, handle_conf_change() and handle_subsys_change() are not changed. but as it accepts a `const md_config_t`, which cannot be used to create/reference the ConfigProxy holding it, we need to introduce ConfigReader for reading the updated setting from md_config_t in a simpler way, without exposing the internal "values" member variable. Signed-off-by: Kefu Chai <kchai@redhat.com>
* krbd.cc: fix parameter to variadic functionDanny Al-Gaaf2018-04-141-1/+1
| | | | | | | | | Fix for: [src/krbd.cc:549]: (portability) Passing NULL after the last typed argument to a variadic function leads to undefined behaviour. Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
* krbd.cc: fix uninitialized variableDanny Al-Gaaf2018-04-121-1/+1
| | | | Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
* krbd: include sys/sysmacros.h for major, minor and makedevIlya Dryomov2018-03-071-0/+1
| | | | | | | | | | | | | Previously we got these through sys/types.h, but that's now deprecated: warning: In the GNU C Library, "major" is defined by <sys/sysmacros.h>. For historical compatibility, it is currently defined by <sys/types.h> as well, but we plan to remove this soon. To use "major", include <sys/sysmacros.h> directly. If you did not intend to use a system-defined macro "major", you should undefine it after including <sys/types.h>. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* rbd: adjusted "showmapped" JSON and XML formatted outputMykola Golub2018-01-121-2/+3
| | | | Signed-off-by: Mykola Golub <mgolub@suse.com>
* rbd: drop unnecessary using declaration, etcShinobu Kinjo2017-11-271-6/+8
| | | | Signed-off-by: Shinobu Kinjo <shinobu@redhat.com>
* krbd: rewrite "already mapped" codeIlya Dryomov2017-09-111-49/+53
| | | | | | | | | | | | The "already mapped" code, introduced in commit d6a66fc8f49b ("rbd: before rbd map, warn if the image is already mapped") is broken: because of a use-after-free on attribute strings, the warning isn't even printed half the time. Rewrite making use of udev enumeration filters and fix the interface while at it. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* krbd: factor out enumerate_devices()Ilya Dryomov2017-09-101-16/+28
| | | | Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* rbd: before rbd map, warn if the image is already mappedlijing2017-08-091-0/+56
| | | | | | | RBD should check if an image is already mapped before mapping one image as serveral devices. Fixes: http://tracker.ceph.com/issues/20580 Signed-off-by: Jing Li <lijing@gohighsec.com>
* rbd: do not attempt to load key if auth is disabledJason Dillaman2017-06-291-7/+9
| | | | | Fixes: http://tracker.ceph.com/issues/19035 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
* Merge pull request #14051 from SpandanKumarSahu/bug#17195Jason Dillaman2017-05-051-2/+4
|\ | | | | | | | | rbd: stop indefinite thread waiting in krbd udev handling Reviewed-by: Jason Dillaman <dillaman@redhat.com>
| * rbd: stop indefinite thread waiting in krbd.ccSpandan Kumar Sahu2017-03-271-2/+4
| | | | | | | | | | | | Fixes: http://tracker.ceph.com/issues/17195 Signed-off-by: Spandan Kumar Sahu <spandankumarsahu@gmail.com>
* | rbd: warning, ‘devno’ may be used uninitialized in this functionJos Collin2017-03-311-2/+2
|/ | | | | | | | | | | | | | | The following warning appears during make. Fixed in both unmap_image() functions in krbd.cc -- krbd.cc: In function ‘int krbd_unmap_by_spec(krbd_ctx*, const char*, const char*, const char*, const char*)’: krbd.cc:608:65: warning: ‘devno’ may be used uninitialized in this function [-Wmaybe-uninitialized] return do_unmap(ctx->udev, devno, build_unmap_buf(id, options)); ^ krbd.cc:591:9: note: ‘devno’ was declared here dev_t devno; -- Signed-off-by: Jos Collin <jcollin@redhat.com>
* krbd: kernel client expects ip[:port], not an entity_addr_tIlya Dryomov2016-11-101-1/+1
| | | | | | | Commit 2ee1b9a4084f ("krbd.cc: don't rely on MonMap internal members") inadvertently dropped .get_sockaddr() call, breaking rbd map. Fix it. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* krbd.cc: don't rely on MonMap internal membersJoao Eduardo Luis2016-11-031-5/+7
| | | | | | | Use the public interface instead. That's stable and not as prone to change. Signed-off-by: Joao Eduardo Luis <joao@suse.de>
* rbd: expose rbd unmap optionsIlya Dryomov2016-10-071-11/+24
| | | | | | | | Reuse rbd map -o infrastructure to expose rbd unmap options in a similar fashion. Currently it's just one bool option, but we may need more in the future. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* krbd: don't segfault if images are unmapped concurrentlyIlya Dryomov2016-06-061-19/+17
| | | | | | | | | | | | | | | | | | | "rbd map c" can die from a NULL dereference on any of this_pool, this_image or this_snap in wait_for_udev_add(): <image a is mapped> rbd map c rbd map b rbd unmap a rbd unmap b However unlikely, this segfault is triggered by the rbd/concurrent.sh workunit on a regular basis. Similarly, "rbd showmapped" can die if an image to be listed is unmapped. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* krbd: use sockaddr instead of sockaddr_storage to printSage Weil2016-05-111-1/+1
| | | | Signed-off-by: Sage Weil <sage@redhat.com>
* rbd: unbreak rbd map CLIIlya Dryomov2015-11-141-8/+2
| | | | | | | | | | Previously, if snapname wasn't specified we would pass NULL to krbd_map(), which was a cue for it to use "-" as a snapshot name. With the new rbd CLI, "" is passed in; same goes for map options. Change krbd_map() accordingly and update its other user. Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
* krbd: remove deprecated --quiet param from udevadmJason Dillaman2015-10-271-2/+1
| | | | | | | | | | This parameter has been removed since systemd 213, so this effects Fedora 21+, Debian Jessie, and potentially future releases of RHEL 7. Fixes: #13560 Backport: hammer, infernalis Signed-off-by: Jason Dillaman <dillaman@redhat.com>
* krbd: fix incorrect types in the krbd APIJason Dillaman2015-04-301-2/+3
| | | | | | | | The C API functions were referencing the C++ CephContext instead of the C rados_config_t. Additionally, the ceph namespace was missing on the Formatter class. Signed-off-by: Jason Dillaman <dillaman@redhat.com>