summaryrefslogtreecommitdiffstats
path: root/src/blk (follow)
Commit message (Collapse)AuthorAgeFilesLines
* blk/KernelDevice: using join() to wait thread end is more safeYite Gu2024-09-262-14/+11
| | | | | | | | Using join() to wait discard thread end is more safe, it can ensure that resource releases are sequential, to avoid potential race conditions. Signed-off-by: Yite Gu <yitegu0@gmail.com>
* Merge pull request #58952 from YiteGu/add-perfcounter-for-blk-discardIgor Fedotov2024-09-254-8/+34
|\ | | | | | | | | blk/kerneldevice: add perfcounter for block async discard Reviewed-by: Igor Fedotov <igor.fedotov@croit.io>
| * blk/kerneldevice: add perfcounter for block async discardYite Gu2024-08-122-0/+23
| | | | | | | | | | | | Adding perfcounter helps to understand the status of async discard. Signed-off-by: Yite Gu <yitegu0@gmail.com>
| * os/bluestore: passing device type name parameter to kernel deviceYite Gu2024-08-084-8/+11
| | | | | | | | Signed-off-by: Yite Gu <yitegu0@gmail.com>
* | blk/kernel: Fix uninitialized discard_stopAdam Kupczyk2024-08-051-0/+1
|/ | | | | | Value discard_stop could be uninitialized. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
* Merge pull request #58409 from baergj/upstream-fix-async-discard-on-startLaura Flores2024-07-292-77/+46
|\ | | | | blk/KernelDevice: React to bdev_enable_discard changes in handle_conf_change(); Fix several issues with stopping discard threads
| * blk/KernelDevice: Unify discard thread managementJoshua Baergen2024-07-152-70/+36
| | | | | | | | | | | | | | | | | | | | | | Instead of having _discard_start() and _discard_stop() partially or completely duplicate functionality in handle_conf_change(), have a single _discard_update_threads() that can handle all three. Loops are tidied slightly, the unnecessary target_discard_threads class variable has been removed, and now handle_conf_change() will respect support_discard. Signed-off-by: Joshua Baergen <jbaergen@digitalocean.com>
| * blk/KernelDevice: Fix several issues with stopping discard threadsJoshua Baergen2024-07-031-20/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. In _discard_stop(), the wait for !discard_threads.empty() was there from a prior implementation where there could theoretically be a race between _discard_start() and _discard_stop(). If that race does exist, this check won't help; not only is _discard_stop() not called if discard_threads is empty, but discard_threads won't be populated until _discard_start() runs and thus this won't detect such a race. 2. Calling _discard_stop() from handle_conf_change() is a guaranteed deadlock because discard_lock is already held, so don't do that. Use the same flow whether we're stopping a subset of threads or all threads. 3. Asking a subset of discard threads to stop was not guaranteed to take effect, since if they continued to find contents in discard_queue then they would continue to run indefinitely. Add additional logic to _discard_thread() to have threads stop if they have been requested to stop and other threads exist to continue draining discard_queue. 4. Make the flow of _discard_stop() and handle_conf_change() more similar. Fixes: https://tracker.ceph.com/issues/66817 Signed-off-by: Joshua Baergen <jbaergen@digitalocean.com>
| * blk/KernelDevice: React to bdev_enable_discard changes in handle_conf_change()Joshua Baergen2024-07-031-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes two issues that were introduced by 755f3e03b5bf547cfd940b52a53833f66285062b: 1. After an OSD boots, discard threads were not stopped when bdev_enable_discard was set to false, whereas that was the intent of that commit. 2. If bdev_enable_discard or bdev_async_discard_threads are configured with a mask that can't be evaluated at OSD boot (e.g. a device class), then async discard won't be enabled until a later config change to bdev_async_discard_threads. Fixes: https://tracker.ceph.com/issues/66817 Signed-off-by: Joshua Baergen <jbaergen@digitalocean.com>
* | blk/aio: fix compile issue when HAVE_LIBURING isn't definedYingxin Cheng2024-07-231-1/+1
| | | | | | | | Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
* | Merge pull request #56352 from ifed01/wip-ifed-many-many-extents-readYuri Weinstein2024-07-185-30/+37
|\ \ | | | | | | | | | | | | | | | blk/aio: fix long batch (64+K entries) submission. Reviewed-by: Adam Kupczyk <akupczyk@redhat.com>
| * | bli/aio: replace inappropriate aio_read() with aio_write for POSIXAIOIgor Fedotov2024-04-181-1/+1
| | | | | | | | | | | | Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
| * | blk/aio: fix incomplete patch to get rid off aio_sizeIgor Fedotov2024-04-182-8/+16
| | | | | | | | | | | | Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
| * | osd: Remove aios_size argument from submit_batchRobin Geuze2024-04-125-28/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Due to aios_size being a uint16 and the source value for the actual call being an int there was a possible overflow. This was "fixed" with an assert, however that still causes a crash. This commit removes the need for aios_size completely by iterating over the list and submitting it in max_iodepth batches. Fixes: https://tracker.ceph.com/issues/46366 Signed-off-by: Robin Geuze <robin.geuze@nl.team.blue> (cherry picked from commit f87db49b0013088f1c87802886c4c16ce47c5cc2)
| * | blk/kernel: always use ceph_assertIgor Fedotov2024-04-121-3/+3
| | | | | | | | | | | | Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
* | | Merge pull request #57722 from sajibreadd/wip-62500Adam Kupczyk2024-07-173-1/+49
|\ \ \ | | | | | | | | os/bluestore: Warning added for slow operations and stalled read
| * | | Warning added for slow operations and stalled read in BlueStore. User can ↵sajibreadd2024-06-263-1/+49
| |/ / | | | | | | | | | | | | | | | | | | control how much time the warning should persist after last occurence and maximum number of operations as a threshold will be considered for the warning. Fixes: https://tracker.ceph.com/issues/62500 Signed-off-by: Md Mahamudur Rahaman Sajib <mahamudur.sajib@croit.io>
* | | Merge pull request #57369 from YiteGu/bluestore-offline-trimAdam Kupczyk2024-07-092-2/+3
|\ \ \ | |_|/ |/| | tools/bluestore: Add command 'trim' to ceph-bluestore-tool
| * | tools/bluestore: Add command 'trim' to ceph-bluestore-toolyite.gu2024-05-162-2/+3
| |/ | | | | | | | | | | | | Add command 'trim' to ceph-bluestore-tool. Co-authored-by: Igor Fedotov <igor.fedotov@croit.io> Signed-off-by: Yite Gu <yitegu0@gmail.com>
* | style changes requested by IgorGabriel BenHanokh2024-04-101-1/+1
| | | | | | | | Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com>
* | Limit private discarded queue for threads to a small items count.Gabriel BenHanokh2024-04-093-12/+27
| | | | | | | | | | | | On fast-shutdown take over the main discarded queue copying it to the allocator and only wait for the threads to commit their small private discarded queues Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com>
* | On graceful shutdown we will wait for discard queue to drain before storing ↵Gabriel BenHanokh2024-04-093-27/+5
| | | | | | | | | | | | | | | | the allocator. ON fast shutdown we will simply copy the discard queue entries to the allocator Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com>
* | os/BlueStore: NCB fix for leaked space when bdev_async_discard is enabledGabriel BenHanokh2024-04-093-3/+25
|/ | | | | | | Fix calls bdev->discard_drain() before calling store_allocator() to make sure all freed space is reflected in the allocator before destaging it The fix set a timeout for the drain call (500msec) and if expires will not store the allocator (forcing a recovery on the next startup) Fixes: https://tracker.ceph.com/issues/65298 Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com>
* Merge pull request #55469 from Matt1360/mainYuri Weinstein2024-03-142-47/+155
|\ | | | | | | | | | | blk: threaded discard support Reviewed-by: Igor Fedotov <ifedotov@suse.com>
| * blk: support bdev_async_discard_threads == 0Matt Vandermeulen2024-02-162-19/+23
| | | | | | | | Signed-off-by: Matt Vandermeulen <matt@reenigne.net>
| * blk: add threaded discard support to kernel devicesMatt Vandermeulen2024-02-082-37/+141
| | | | | | | | Signed-off-by: Matt Vandermeulen <matt@reenigne.net>
* | os/bluestore: remove zoned from crimsonPere Diaz Bou2024-01-095-213/+0
|/ | | | Signed-off-by: Pere Diaz Bou <pere-altea@hotmail.com>
* cmake: promote uring package search to top-levelPatrick Donnelly2023-10-171-6/+0
| | | | | | For use by multiple projects, rocksdb in particular. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
* Merge pull request #49132 from aclamk/wip-aclamk-bs-excl-lockYuri Weinstein2023-02-091-1/+22
|\ | | | | | | | | | | | | blk/kernel: Add O_EXCL for block devices Reviewed-by: Ronen Friedman <rfriedma@redhat.com> Reviewed-by: Mark Nelson <mnelson@redhat.com>
| * blk/kernel: Add O_EXCL for block devicesAdam Kupczyk2023-01-251-1/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Change behaviour when target file is block device "mknod name b major minor". Append O_EXCL flag for first open of the block device. The problem is that if 2 different files for same block devices are created, it is possible to ::flock each of them in 2 separate processes. In some container cases when we recreate bluestore osd dir with ceph-bluestore-tool prime-osd command, we can end up with completely different files. Open with O_EXCL is immune to that. Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
* | Merge pull request #48416 from Rethan/wip-bluestore-discard-threadYuri Weinstein2022-12-154-25/+38
|\ \ | | | | | | | | | | | | blk/KernelDevice: don't start discard thread if device not support_di… Reviewed-by: Igor Fedotov <ifedotov@suse.com>
| * | blk/KernelDevice: don't start discard thread if device not support_discardhaoyixing2022-10-264-25/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | Only create discard thread if the device support discard, otherwise we will have some threads which does nothing. Also extract queue_discard/discard logic to device, make it cleaner when calling discard from bluefs and bluestore. Signed-off-by: haoyixing <haoyixing@kuaishou.com>
* | | blk/kernel: Fix error code mapping in KernelDevice::read.Joshua Baergen2022-10-121-1/+1
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | pread returns -1 upon error and stores the error code in errno, and thus the wrong error was being passed into is_expected_ioerr. This is handled correctly just a few lines down where we return -errno, so it was likely just an oversight when adapting this logic from the aio codepath, where the return code is indeed the errno. This logic has been incorrect since it was introduced in 2018 via a1e0ece7f987c7a563b25ec0d02fc6f8445ef54e. Signed-off-by: Joshua Baergen <jbaergen@digitalocean.com>
* | blk/kernel: add plugin system for devices with compression and move VDO ↵Martin Ohmacht2022-09-284-33/+24
|/ | | | | | | | | | | | support into plugin The current VDO support implementation is buried inside the common/blkdev.cc with a simple interface used by KernelDevice. It is not easily extendable and can not be easily used for other devices providing similar capabilities. This patch adds a plugin system that is based in its structure on the erasure code plugin system and moves the VDO support code into a VDO plugin. Signed-off-by: Martin Ohmacht <mohmacht@us.ibm.com>
* blk/KernelDevice: Modify the rotational and discard check log messageVikhyat Umrao2022-08-241-1/+1
| | | | | | Fixes: https://tracker.ceph.com/issues/57271 Signed-off-by: Vikhyat Umrao <vikhyat@redhat.com>
* Merge pull request #36282 from rzarzynski/wip-bl-drop-cloneYuri Weinstein2022-07-221-5/+0
|\ | | | | | | | | | | | | | | | | | | common/bl, kv, tests: drop MemDB and simplify buffer::ptr and buffer::raw Reviewed-by: Kefu Chai <kchai@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@redhat.com> Reviewed-by: Neha Ojha <nojha@redhat.com>
| * common/bl: drop clone() and clone_empty() from buffer::raw.Radoslaw Zarzynski2022-05-231-5/+0
| | | | | | | | Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
* | cmake: rename a series of pmem libraries to pmdkYin Congmin2022-06-271-1/+1
|/ | | | | | | | At first, libpmem was the only library. Later, pmem related libraries such as libpmemobj and libpmem2 were gradually added. These libraries were also integrated into one named pmdk. So rename to pmdk. Signed-off-by: Yin Congmin <congmin.yin@intel.com>
* Merge pull request #46122 from tchaikov/wip-pmemKefu Chai2022-05-201-39/+23
|\ | | | | | | | | blk/pmem: refactor pmem_check_file_type() using std::filesystem Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
| * blk/pmem: refactor pmem_check_file_type() using std::filesystemKefu Chai2022-05-031-39/+23
| | | | | | | | | | | | for better readability Signed-off-by: Kefu Chai <tchaikov@gmail.com>
* | blk/spdk: Add the support to use nvme device provided by NVMe-of TargetZiye Yang2022-05-151-26/+57
|/ | | | | | | This patch is used to add the support to use the nvmedevice provided by NVMe-oF target. Signed-off-by: Ziye Yang <ziye.yang@intel.com>
* Merge pull request #44230 from optimistyzy/122_add_dmlKefu Chai2022-04-202-0/+28
|\ | | | | | | | | blk/pmem: use DML library to offload read/write operations in pmem Reviewed-by: Kefu Chai <tchaikov@gmail.com>
| * Add the support to use DML library for PMEM device.Ziye Yang2022-04-192-0/+28
| | | | | | | | | | | | | | | | The purpose of this patch is to add the initial support to offload memory/pmem operations by sync usage through hardware path in DML library. Signed-off-by: Ziye Yang <ziye.yang@intel.com>
* | blk/pmem: Add the devdax support.Ziye Yang2022-04-142-4/+93
|/ | | | | | | | The purpose is to make the pmem device usage more flexible than the current solution. And prepare for the potential offloading by hardware engine later. Signed-off-by: Ziye Yang <ziye.yang@intel.com>
* Merge pull request #44612 from rzarzynski/wip-bs-lazy4freebsdRadoslaw Zarzynski2022-04-021-0/+11
|\ | | | | | | | | | | bdev: fix FTBFS on FreeBSD, keep the huge paged read buffers. Reviewed-by: Igor Fedotov <ifedotov@suse.com> Reviewed-by: Adam Kupczyk <akupczyk@redhat.com>
| * bdev: fix FTBFS on FreeBSD, keep the huge paged read buffers.Radoslaw Zarzynski2022-03-311-0/+11
| | | | | | | | | | | | Special thanks to Willem Jan Withagen! Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
* | cmake/modules/BuildSPDK.cmake: link whole-archiveTongliang Deng2022-01-201-1/+2
|/ | | | | | | | | | | We build spdk as static library, linking against them requires the use of `-Wl,--whole-archive` as argument, otherwise we will have error `nvme.c: nvme_probe_internal: *ERROR*: NVMe trtype 256 not available`. This is due to the use of constructor functions in spdk to register NVMe transports. So we need to do so to ensure we call all the constructors. Signed-off-by: Tongliang Deng <dengtongliang@sensetime.com>
* test/objectstore: verify the huge page-backed reading of BlueStore.Radoslaw Zarzynski2022-01-122-2/+7
| | | | Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
* blk: don't cache the huge page-based buffers of KernelDevice.Radoslaw Zarzynski2022-01-122-4/+6
| | | | Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
* blk: introduce multi-size huge page pools to KernelDevice.Radoslaw Zarzynski2022-01-121-15/+64
| | | | | | | | When testing remember about `bluestore_max_blob_size` as it's only 64 KB by default while the entire huge page-based pools machinery targets far bigger scenrios (initially 4 MB!). Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>