| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since we filter monitor addresses based on ms_mode, check that at
least one address was found.
Otherwise, we mismatch arguments when calling sysfs/add_single_major
which emits a misleading error message to dmesg:
libceph: resolve 'name=user1' (ret=-3): failed
libceph: parse_ips bad ip 'name=user1,key=client.user1'
Fixes: https://tracker.ceph.com/issues/54128
Signed-off-by: Burt Holzman <burt@fnal.gov>
|
|
|
|
|
|
|
|
|
| |
libudev uses fnmatch(3) for matching attributes, meaning that shell
glob pattern matching is employed instead of literal string matching.
Escape glob metacharacters to suppress pattern matching.
Fixes: https://tracker.ceph.com/issues/52425
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
|
|
|
|
|
|
|
|
| |
* add "std::" prefix in headers
* add "using" declarations in .cc files.
so we don't rely on "using namespace std" in one or more included
headers.
Signed-off-by: Kefu Chai <kchai@redhat.com>
|
|
|
|
|
|
|
| |
Fix a braino that came with commit f6854ac65d2a ("krbd: make sure the
device node is accessible after the mapping").
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have always assumed this to be the case and users' scripts and
orchestration tools have grown to depend on this. Let's add some
enforcement, prompted by [1]:
"I am running my Kubernetes worker node inside of an LXC container
which doesn't benefit from the device node created by the kernel, so
I'm using udev to create the /dev/rbd* device nodes inside of the LXC
container."
which, through the unfortunate interaction with ceph-csi rbd plugin,
results in data loss for "volumeMode: Filesystem" PVs because it ends
up recreating the filesystem every time the PV is attached to the pod:
"When deleting the pod and re-creating it, I can see that the RBD
image is indeed being reformatted. This seems to be because when
blkid is being run to check if the image is formatted, the /dev/rbd*
device has not yet been created by udev. By the time the code gets
down to running mkfs, the device is there and the damage is done."
[1] https://github.com/ceph/ceph-csi/issues/1820
Fixes: https://tracker.ceph.com/issues/49410
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Recognize ms_mode map option and filter initial monitor addresses
accordingly: if ms_mode is not given or ms_mode=legacy, discard v2
addresses, otherwise discard v1 addresses.
Note that nothing was discarded (i.e. v2 addresses were passed to
the kernel) previously. The intent was to preserve that behaviour
in case ms_mode is not given, allowing to change the kernel default
in the future. However, it turns out that mount.ceph helper has
been misguidedly discarding v2 addresses since commit eae01275134e
("mount.ceph: fork a child to get info from local configuration"),
so that ship has sailed.
Fixes: https://tracker.ceph.com/issues/48976
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
|
|
|
|
|
|
|
| |
Add support for noudev option to allow mapping and unmapping images
from a privileged container in a non-initial network namespace (e.g.
when using Multus CNI).
Fixes: https://tracker.ceph.com/issues/47128
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
|
|
|
|
|
| |
Introduce get_devnode() and append_unmap_options(); make some functions
static.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
|
|
|
|
|
|
| |
Collect only /dev/rbd* block events and dispose of them as soon as
possible; match on devnode and assert on major/minor instead of the
other way around.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
|
|
|
|
| |
Wrap udev_monitor, udev_enumerate and udev_device with std::unique_ptr.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
systemd 219 doesn't have the issue that is worked around in the
previous commit, but has a different one: udev_enumerate_scan_devices()
always succeeds, but sometimes returns an empty list when the device is
actually there. This happens rarely and at random so I haven't been
able to get to the bottom of it yet, but it looks like another similar
race condition in libudev.
Since an empty list is expected if the device isn't there, retry just
twice with a small sleep in-between. This appears to be enough: I got
7 occurrences per 600000 "rbd unmap" invocations, all of which needed
a single retry:
rbd: udev enumerate missed a device, tries = 1
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
udev_enumerate_scan_devices() doesn't handle disappearing devices well.
If called while some devices are being removed, it sometimes propagates
ENOENT and ENODEV errors encountered operating on directory entries in
/sys that no longer exist. Some of these errors are suppressed, but
this isn't reliable and varies across versions. In particular, systemd
239 suppresses ENODEV from sd_device_new_from_syspath() but doesn't
suppress ENODEV from sd_device_get_devnum(). In systemd 243 the call
to sd_device_get_devnum() has been moved, but it still leaks ENOENT
from sd_device_get_is_initialized() (referring to the body of
FOREACH_DIRENT_ALL loop in enumerator_scan_dir_and_add_devices()).
Assume that all ENOENT and ENODEV errors are transient and retry the
call to udev_enumerate_scan_devices(). Don't limit the number, but log
each retry.
Fixes: https://tracker.ceph.com/issues/41036
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|\
| |
| |
| |
| | |
krbd: avoid udev netlink socket overrun
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Even though with the previous commit we no longer block between binding
the socket and starting handling events, we still want a larger receive
buffer to accommodate for scheduling delays. Since the filtering is
done in the listener, an estimate focused on just rbd is not accurate,
but anyway: a pair of "rbd" and "block" events for "rbd map" take 2048
bytes in the receive buffer. This allows for roughly a thousand of
them ("rbd map" and "rbd unmap" require root and libudev makes use of
SO_RCVBUFFORCE so rmem_max limit is ignored).
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Because the event(s) we are interested in can be deliveled while we are
still in the kernel finishing map or unmap, we start listening for udev
events before going into the kernel. However, if (un)mapping takes its
time, udev netlink socket can be fairly easily overrun -- the filtering
is done on the listener side, so we get to process everything, not just
rbd events. If any of the events of interest get dropped (ENOBUFS), we
hang in poll().
Go into the kernel in a separate thread and leave the main thread to
run the event loop. The return value is communicated to the reactor
though a pipe.
Fixes: https://tracker.ceph.com/issues/41404
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
| |
| |
| |
| |
| |
| |
| | |
This also exposes errors from udev_monitor_receive_device() which were
previously ignored.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Move event processing into UdevMapHandler and UdevUnmapHandler
functors and replace wait_for_udev_{add,remove}() with a single
wait_for_mapping() template.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
This timeout was added as a (very poor) workaround for an issue
addressed in commit 42dd1eae630f ("krbd: fix rbd map hang due to udev
return subsystem unordered").
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|/
|
|
|
|
|
|
| |
Otherwise add_key() in set_kernel_secret() fails as if running against
an ancient kernel and we fall back to secret= in options for the first
image being mapped on the machine.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The order of subsystem returned by udev_device_get_subsystem
might not be same order as adding subsystem by
udev_monitor_filter_add_match_subsystem_devtype. So if block
event is returned first and rbd event is returned next, then
further poll will get nothing back until timed-out.
Fixes: http://tracker.ceph.com/issues/39089
Signed-off-by: Zhi Zhang <zhangz.david@outlook.com>
|
|
|
|
|
|
|
|
| |
We don't want to wait on uevent forever, but the return value
of polling in timeout is 0 rather than a negative value.
Fixes: http://tracker.ceph.com/issues/38792
Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
|
|
|
|
|
|
|
| |
For the mkfs case, interpret an ambiguous port as a v2 address. For probe,
try both.
Signed-off-by: Sage Weil <sage@redhat.com>
|
|
|
|
|
|
|
|
|
| |
This conflicts with the system assert.h so rename and change includes to
reflect the new name.
Fixes: http://tracker.ceph.com/issues/35682
Signed-off-by: Brad Hubbard <bhubbard@redhat.com>
|
|
|
|
| |
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
|
|
|
|
| |
Make it easier to run more than one scan in a row.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
|
|
|
|
|
| |
Don't substitute "@-" for HEAD when printing the spec. Instead, omit
the snapshot part. The same would be done for the namespace part.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
|
|
|
|
|
| |
krbd_map() and krbd_is_mapped() take "", krbd_unmap_by_spec() is the
odd one out.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
|
|
| |
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
|
|
| |
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
this change introduce three classes: ConfigValues, ConfigProxy and
ConfigReader. in seastar port of OSD, each CPU shard will hold its own
reference of configuration, and upon changes of settings, each
shard will be updated with the new setting in async. so this forces us
to be able to keep two set of configuration at the same time. so we
need to extract the changeable part of md_config_t out. so we can
replace the old one with new one on demand, and let different shards
share the same unchanged part, amon the other things, the Options map
and the lookup tables. that's why we need ConfigValues. we will add
a policy template for this class, so we can specialize for Seastar
implementation to allow different ConfigProxy instances to point
md_config_impl<> to different ConfigValues.
because the observer interface is still using md_config_t, to minimise
the impact of this change, handle_conf_change() and
handle_subsys_change() are not changed. but as it accepts a `const
md_config_t`, which cannot be used to create/reference the ConfigProxy
holding it, we need to introduce ConfigReader for reading the updated
setting from md_config_t in a simpler way, without exposing the
internal "values" member variable.
Signed-off-by: Kefu Chai <kchai@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Fix for:
[src/krbd.cc:549]: (portability) Passing NULL after the last typed
argument to a variadic function leads to undefined behaviour.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
|
|
|
|
| |
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we got these through sys/types.h, but that's now deprecated:
warning: In the GNU C Library, "major" is defined
by <sys/sysmacros.h>. For historical compatibility, it is
currently defined by <sys/types.h> as well, but we plan to
remove this soon. To use "major", include <sys/sysmacros.h>
directly. If you did not intend to use a system-defined macro
"major", you should undefine it after including <sys/types.h>.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
|
|
| |
Signed-off-by: Mykola Golub <mgolub@suse.com>
|
|
|
|
| |
Signed-off-by: Shinobu Kinjo <shinobu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The "already mapped" code, introduced in commit d6a66fc8f49b ("rbd:
before rbd map, warn if the image is already mapped") is broken:
because of a use-after-free on attribute strings, the warning isn't
even printed half the time.
Rewrite making use of udev enumeration filters and fix the interface
while at it.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
|
|
| |
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
|
|
|
|
|
| |
RBD should check if an image is already mapped before mapping one image as serveral devices.
Fixes: http://tracker.ceph.com/issues/20580
Signed-off-by: Jing Li <lijing@gohighsec.com>
|
|
|
|
|
| |
Fixes: http://tracker.ceph.com/issues/19035
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
|
|\
| |
| |
| |
| | |
rbd: stop indefinite thread waiting in krbd udev handling
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
|
| |
| |
| |
| |
| |
| | |
Fixes: http://tracker.ceph.com/issues/17195
Signed-off-by: Spandan Kumar Sahu <spandankumarsahu@gmail.com>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The following warning appears during make. Fixed in both unmap_image() functions in krbd.cc
--
krbd.cc: In function ‘int krbd_unmap_by_spec(krbd_ctx*, const char*, const char*, const char*, const char*)’:
krbd.cc:608:65: warning: ‘devno’ may be used uninitialized in this function [-Wmaybe-uninitialized]
return do_unmap(ctx->udev, devno, build_unmap_buf(id, options));
^
krbd.cc:591:9: note: ‘devno’ was declared here
dev_t devno;
--
Signed-off-by: Jos Collin <jcollin@redhat.com>
|
|
|
|
|
|
|
| |
Commit 2ee1b9a4084f ("krbd.cc: don't rely on MonMap internal members")
inadvertently dropped .get_sockaddr() call, breaking rbd map. Fix it.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
|
|
|
|
|
| |
Use the public interface instead. That's stable and not as prone to
change.
Signed-off-by: Joao Eduardo Luis <joao@suse.de>
|
|
|
|
|
|
|
|
| |
Reuse rbd map -o infrastructure to expose rbd unmap options in
a similar fashion. Currently it's just one bool option, but we may
need more in the future.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
"rbd map c" can die from a NULL dereference on any of this_pool,
this_image or this_snap in wait_for_udev_add():
<image a is mapped>
rbd map c
rbd map b
rbd unmap a
rbd unmap b
However unlikely, this segfault is triggered by the rbd/concurrent.sh
workunit on a regular basis.
Similarly, "rbd showmapped" can die if an image to be listed is
unmapped.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
|
|
| |
Signed-off-by: Sage Weil <sage@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Previously, if snapname wasn't specified we would pass NULL to
krbd_map(), which was a cue for it to use "-" as a snapshot name. With
the new rbd CLI, "" is passed in; same goes for map options.
Change krbd_map() accordingly and update its other user.
Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
This parameter has been removed since systemd 213, so this
effects Fedora 21+, Debian Jessie, and potentially future
releases of RHEL 7.
Fixes: #13560
Backport: hammer, infernalis
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
|
|
|
|
|
|
|
|
| |
The C API functions were referencing the C++ CephContext
instead of the C rados_config_t. Additionally, the ceph
namespace was missing on the Formatter class.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
|