| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
Using join() to wait discard thread end is more safe, it can
ensure that resource releases are sequential, to avoid potential
race conditions.
Signed-off-by: Yite Gu <yitegu0@gmail.com>
|
|\
| |
| |
| |
| | |
blk/kerneldevice: add perfcounter for block async discard
Reviewed-by: Igor Fedotov <igor.fedotov@croit.io>
|
| |
| |
| |
| |
| |
| | |
Adding perfcounter helps to understand the status of async discard.
Signed-off-by: Yite Gu <yitegu0@gmail.com>
|
| |
| |
| |
| | |
Signed-off-by: Yite Gu <yitegu0@gmail.com>
|
|/
|
|
|
|
| |
Value discard_stop could be uninitialized.
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
|
|\
| |
| | |
blk/KernelDevice: React to bdev_enable_discard changes in handle_conf_change(); Fix several issues with stopping discard threads
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Instead of having _discard_start() and _discard_stop() partially or
completely duplicate functionality in handle_conf_change(), have a
single _discard_update_threads() that can handle all three. Loops are
tidied slightly, the unnecessary target_discard_threads class variable
has been removed, and now handle_conf_change() will respect
support_discard.
Signed-off-by: Joshua Baergen <jbaergen@digitalocean.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
1. In _discard_stop(), the wait for !discard_threads.empty() was there
from a prior implementation where there could theoretically be a race
between _discard_start() and _discard_stop(). If that race does
exist, this check won't help; not only is _discard_stop() not called
if discard_threads is empty, but discard_threads won't be populated
until _discard_start() runs and thus this won't detect such a race.
2. Calling _discard_stop() from handle_conf_change() is a guaranteed
deadlock because discard_lock is already held, so don't do that. Use
the same flow whether we're stopping a subset of threads or all
threads.
3. Asking a subset of discard threads to stop was not guaranteed to take
effect, since if they continued to find contents in discard_queue
then they would continue to run indefinitely. Add additional logic to
_discard_thread() to have threads stop if they have been requested to
stop and other threads exist to continue draining discard_queue.
4. Make the flow of _discard_stop() and handle_conf_change() more
similar.
Fixes: https://tracker.ceph.com/issues/66817
Signed-off-by: Joshua Baergen <jbaergen@digitalocean.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This fixes two issues that were introduced by 755f3e03b5bf547cfd940b52a53833f66285062b:
1. After an OSD boots, discard threads were not stopped when
bdev_enable_discard was set to false, whereas that was the intent of
that commit.
2. If bdev_enable_discard or bdev_async_discard_threads are configured
with a mask that can't be evaluated at OSD boot (e.g. a device
class), then async discard won't be enabled until a later config
change to bdev_async_discard_threads.
Fixes: https://tracker.ceph.com/issues/66817
Signed-off-by: Joshua Baergen <jbaergen@digitalocean.com>
|
| |
| |
| |
| | |
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
|
|\ \
| | |
| | |
| | |
| | |
| | | |
blk/aio: fix long batch (64+K entries) submission.
Reviewed-by: Adam Kupczyk <akupczyk@redhat.com>
|
| | |
| | |
| | |
| | | |
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
|
| | |
| | |
| | |
| | | |
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Due to aios_size being a uint16 and the source value for the actual
call being an int there was a possible overflow. This was "fixed"
with an assert, however that still causes a crash.
This commit removes the need for aios_size completely by iterating
over the list and submitting it in max_iodepth batches.
Fixes: https://tracker.ceph.com/issues/46366
Signed-off-by: Robin Geuze <robin.geuze@nl.team.blue>
(cherry picked from commit f87db49b0013088f1c87802886c4c16ce47c5cc2)
|
| | |
| | |
| | |
| | | |
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
|
|\ \ \
| | | |
| | | | |
os/bluestore: Warning added for slow operations and stalled read
|
| |/ /
| | |
| | |
| | |
| | |
| | |
| | | |
control how much time the warning should persist after last occurence and maximum number of operations as a threshold will be considered for the warning.
Fixes: https://tracker.ceph.com/issues/62500
Signed-off-by: Md Mahamudur Rahaman Sajib <mahamudur.sajib@croit.io>
|
|\ \ \
| |_|/
|/| | |
tools/bluestore: Add command 'trim' to ceph-bluestore-tool
|
| |/
| |
| |
| |
| |
| |
| | |
Add command 'trim' to ceph-bluestore-tool.
Co-authored-by: Igor Fedotov <igor.fedotov@croit.io>
Signed-off-by: Yite Gu <yitegu0@gmail.com>
|
| |
| |
| |
| | |
Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com>
|
| |
| |
| |
| |
| |
| | |
On fast-shutdown take over the main discarded queue copying it to the allocator and only wait for the threads to commit their small private discarded queues
Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
the allocator.
ON fast shutdown we will simply copy the discard queue entries to the allocator
Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com>
|
|/
|
|
|
|
|
| |
Fix calls bdev->discard_drain() before calling store_allocator() to make sure all freed space is reflected in the allocator before destaging it
The fix set a timeout for the drain call (500msec) and if expires will not store the allocator (forcing a recovery on the next startup)
Fixes: https://tracker.ceph.com/issues/65298
Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com>
|
|\
| |
| |
| |
| |
| | |
blk: threaded discard support
Reviewed-by: Igor Fedotov <ifedotov@suse.com>
|
| |
| |
| |
| | |
Signed-off-by: Matt Vandermeulen <matt@reenigne.net>
|
| |
| |
| |
| | |
Signed-off-by: Matt Vandermeulen <matt@reenigne.net>
|
|/
|
|
| |
Signed-off-by: Pere Diaz Bou <pere-altea@hotmail.com>
|
|
|
|
|
|
| |
For use by multiple projects, rocksdb in particular.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
|
|\
| |
| |
| |
| |
| |
| | |
blk/kernel: Add O_EXCL for block devices
Reviewed-by: Ronen Friedman <rfriedma@redhat.com>
Reviewed-by: Mark Nelson <mnelson@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Change behaviour when target file is block device "mknod name b major minor".
Append O_EXCL flag for first open of the block device.
The problem is that if 2 different files for same block devices are created,
it is possible to ::flock each of them in 2 separate processes.
In some container cases when we recreate bluestore osd dir with
ceph-bluestore-tool prime-osd
command, we can end up with completely different files.
Open with O_EXCL is immune to that.
Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
|
|\ \
| | |
| | |
| | |
| | | |
blk/KernelDevice: don't start discard thread if device not support_di…
Reviewed-by: Igor Fedotov <ifedotov@suse.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Only create discard thread if the device support discard,
otherwise we will have some threads which does nothing.
Also extract queue_discard/discard logic to device, make it cleaner when
calling discard from bluefs and bluestore.
Signed-off-by: haoyixing <haoyixing@kuaishou.com>
|
| |/
|/|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
pread returns -1 upon error and stores the error code in errno, and thus
the wrong error was being passed into is_expected_ioerr. This is handled
correctly just a few lines down where we return -errno, so it was likely
just an oversight when adapting this logic from the aio codepath, where
the return code is indeed the errno.
This logic has been incorrect since it was introduced in 2018 via
a1e0ece7f987c7a563b25ec0d02fc6f8445ef54e.
Signed-off-by: Joshua Baergen <jbaergen@digitalocean.com>
|
|/
|
|
|
|
|
|
|
|
|
|
| |
support into plugin
The current VDO support implementation is buried inside the common/blkdev.cc
with a simple interface used by KernelDevice. It is not easily extendable
and can not be easily used for other devices providing similar capabilities.
This patch adds a plugin system that is based in its structure on the
erasure code plugin system and moves the VDO support code into a VDO plugin.
Signed-off-by: Martin Ohmacht <mohmacht@us.ibm.com>
|
|
|
|
|
|
| |
Fixes: https://tracker.ceph.com/issues/57271
Signed-off-by: Vikhyat Umrao <vikhyat@redhat.com>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| | |
common/bl, kv, tests: drop MemDB and simplify buffer::ptr and buffer::raw
Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
|
| |
| |
| |
| | |
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
|
|/
|
|
|
|
|
|
| |
At first, libpmem was the only library. Later, pmem related libraries
such as libpmemobj and libpmem2 were gradually added. These libraries
were also integrated into one named pmdk. So rename to pmdk.
Signed-off-by: Yin Congmin <congmin.yin@intel.com>
|
|\
| |
| |
| |
| | |
blk/pmem: refactor pmem_check_file_type() using std::filesystem
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
|
| |
| |
| |
| |
| |
| | |
for better readability
Signed-off-by: Kefu Chai <tchaikov@gmail.com>
|
|/
|
|
|
|
|
| |
This patch is used to add the support to use the nvmedevice provided
by NVMe-oF target.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
|
|\
| |
| |
| |
| | |
blk/pmem: use DML library to offload read/write operations in pmem
Reviewed-by: Kefu Chai <tchaikov@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
The purpose of this patch is to add the initial support to
offload memory/pmem operations by sync usage through hardware path
in DML library.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
|
|/
|
|
|
|
|
|
| |
The purpose is to make the pmem device usage more flexible
than the current solution. And prepare for the potential
offloading by hardware engine later.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
|
|\
| |
| |
| |
| |
| | |
bdev: fix FTBFS on FreeBSD, keep the huge paged read buffers.
Reviewed-by: Igor Fedotov <ifedotov@suse.com>
Reviewed-by: Adam Kupczyk <akupczyk@redhat.com>
|
| |
| |
| |
| |
| |
| | |
Special thanks to Willem Jan Withagen!
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
|
|/
|
|
|
|
|
|
|
|
|
| |
We build spdk as static library, linking against them requires the
use of `-Wl,--whole-archive` as argument, otherwise we will have error
`nvme.c: nvme_probe_internal: *ERROR*: NVMe trtype 256 not available`.
This is due to the use of constructor functions in spdk to register
NVMe transports. So we need to do so to ensure we call all the
constructors.
Signed-off-by: Tongliang Deng <dengtongliang@sensetime.com>
|
|
|
|
| |
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
|
|
|
|
| |
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
|
|
|
|
|
|
|
|
| |
When testing remember about `bluestore_max_blob_size` as it's
only 64 KB by default while the entire huge page-based pools
machinery targets far bigger scenrios (initially 4 MB!).
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
|