summaryrefslogtreecommitdiffstats
path: root/src/blk (follow)
Commit message (Collapse)AuthorAgeFilesLines
...
* blk: move the buffer size of ExplicitHugePagePool to run-time.Radoslaw Zarzynski2022-01-121-11/+12
| | | | Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
* blk: bring MAP_HUGETLB-based buffer pool to KernelDevice.Radoslaw Zarzynski2022-01-122-8/+104
| | | | | | | | | | | | | | | | | | | | | | | | The idea here is to bring a pool of `mmap`-allocated, constantly-sized buffers which would take precedence over the 2 MB-aligned, THP-based mechanism. On first attempt to acquire a 4 MB buffer, KernelDevice mmaps `bdev_read_preallocated_huge_buffer_num` (default 128) memory regions using the MAP_HUGETLB option. If this fails, the entire process is aborted. Buffers, after their life-times going over, are recycled with lock- free queue shared across entire process. Remember about allocating the appropriate number of huge pages in the system! For instance: ``` echo 256 | sudo tee /proc/sys/vm/nr_hugepages ``` This commit bases on / cherry-picks with changes 897a4932bee5cba3641c18619cccd0ee945bfcf8. Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
* blk: make the buffer alignment configurable in KernelDevice.Radoslaw Zarzynski2022-01-121-2/+18
| | | | Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
* blk, os/bluestore: introduce a cache bypassing to IOContext and BlueStore.Radoslaw Zarzynski2022-01-121-0/+9
| | | | Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
* os/bluestore: Set min_alloc_size to optimal io sizeCurt Bruns2021-11-052-0/+4
| | | | | | | | | | | | | Block devices may report an "optimal_io_size" that is different than the typical 4KiB. To optimize BlueStore for this io size, the allocator needs to set its min_alloc_size to this optimal_io_size. This PR adds the discovery of the optimal_io_size for a block device and an option to use the optimal_io_size as the min_alloc_size for the bluestore allocator. Older devices may report an optimal_io_size of 0 and if that is the case, the default config min_alloc_size is used. Signed-off-by: Curt Bruns <curt.e.bruns@gmail.com>
* blk/zoned: make discard a no-opSage Weil2021-10-291-0/+6
| | | | | | | Discard is meaningless on SMR or ZNS since we are always explicitly managing the reset of entire zones. Signed-off-by: Sage Weil <sage@newdream.net>
* os/bluestore: simplify zone to clean selectionSage Weil2021-10-293-10/+12
| | | | | | | | | | | | Only pick one zone to clean based on the current. Since the best victim may change (maybe another zone gets a bunch of releases and new dead bytes!) there is no reason (yet) to explicitly avoid the victim zone during allocation. There is also no need to track which zones we are cleaning on disk because we can choose to clean from any zone at any time, and in general want to clean from the best candidate at the time, not the one that looked the best some time in the past. Signed-off-by: Sage Weil <sage@newdream.net>
* blk/zoned: add get_zones() to fetch write pointersSage Weil2021-10-293-0/+24
| | | | Signed-off-by: Sage Weil <sage@newdream.net>
* blk/zones: implement HMSMRDevice has KernelDevice childSage Weil2021-10-294-1255/+57
| | | | | | | | | | No need to duplicate so much code when we are just adding a few things. Also, we want to track KernelDevice changes/improvements. We could probably integrate these SMR capabilities directly into KernelDevice too... Signed-off-by: Sage Weil <sage@newdream.net>
* blk/zoned: remove dead VDO codeSage Weil2021-10-292-32/+1
| | | | | | VDO won't work on an SMR device Signed-off-by: Sage Weil <sage@newdream.net>
* blk/zoned: add reset_all_zones()Sage Weil2021-10-293-0/+7
| | | | Signed-off-by: Sage Weil <sage@newdream.net>
* blk/zoned: print error during initSage Weil2021-10-291-0/+5
| | | | | | Otherwise it is easy to miss things like EPERM during testing. Signed-off-by: Sage Weil <sage@newdream.net>
* Merge pull request #42813 from wjwithagen/wjw-fix-hexdump-outputKefu Chai2021-08-203-12/+12
|\ | | | | | | | | blk: start 1st line of hexdump() on a new line Reviewed-by: Kefu Chai <tchaikov@gmail.com>
| * blk: start 1st line of hexdump() on a new lineWillem Jan Withagen2021-08-193-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Otherwise the fist line looks rather strange ``` 2021-08-19T23:42:33.604+0200 4f60700 40 bdev:kd:965(0x4fab180 /usr/local/src/wip.bluestore-test/build/dev/osd0/block.db) data: 00000000 56 f9 b8 f8 1c 00 01 01 1a 6c 65 76 65 6c 64 62 |V........leveldb| 00000010 2e 42 79 74 65 77 69 73 65 43 6f 6d 70 61 72 61 |.BytewiseCompara| 00000020 74 6f 72 98 af 58 a6 02 00 01 02 00 d2 c7 3c 95 |tor..X........<.| 00000030 06 00 01 09 00 03 04 04 00 6b 93 6d c5 2b 00 01 |.........k.m.+..| 00000040 01 1a 6c 65 76 65 6c 64 62 2e 42 79 74 65 77 69 |..leveldb.Bytewi| ..... ``` versus new: ``` 2021-08-19T23:42:33.604+0200 4f60700 40 bdev:kd:965(0x4fab180 /usr/local/src/wip.bluestore-test/build/dev/osd0/block.db) data: 00000000 56 f9 b8 f8 1c 00 01 01 1a 6c 65 76 65 6c 64 62 |V........leveldb| 00000010 2e 42 79 74 65 77 69 73 65 43 6f 6d 70 61 72 61 |.BytewiseCompara| 00000020 74 6f 72 98 af 58 a6 02 00 01 02 00 d2 c7 3c 95 |tor..X........<.| 00000030 06 00 01 09 00 03 04 04 00 6b 93 6d c5 2b 00 01 |.........k.m.+..| 00000040 01 1a 6c 65 76 65 6c 64 62 2e 42 79 74 65 77 69 |..leveldb.Bytewi| ..... ``` Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>
* | blk/pmem: Add namespace std for map,stringFeng Hualong2021-08-182-3/+5
|/ | | | | | | Due to not use namespace std for map,string, it leads to `map,string` not be declared. Signed-off-by: Feng Hualong <hualong.feng@intel.com>
* Merge pull request #42791 from ideepika/wip-50947-blk-cmakeKefu Chai2021-08-171-1/+4
|\ | | | | | | | | src/blk: fix block_device_t return if no aio libs present Reviewed-by: Kefu Chai <kchai@redhat.com>
| * src/blk: fix block_device_t return if no aio libs presentDeepika Upadhyay2021-08-161-1/+4
| | | | | | | | | | | | | | | | | | In cases, when no libaio are present in the system, compilation fails, return block_device_t as unknown fixes: https://tracker.ceph.com/issues/50947#note-1 Signed-off-by: Deepika Upadhyay <dupadhya@redhat.com>
* | blk: build without "using namespace std"Kefu Chai2021-08-134-4/+9
| | | | | | | | | | | | | | | | | | | | * add "std::" prefix in headers * add "using" declarations in .cc files. so we don't rely on "using namespace std" in one or more included headers. Signed-off-by: Kefu Chai <kchai@redhat.com>
* | libcephsqlite: build without "using namespace std"Kefu Chai2021-08-132-2/+4
|/ | | | | | | | | | * add "std::" prefix in headers * add "using" declarations in .cc files. so we don't rely on "using namespace std" in one or more included headers. Signed-off-by: Kefu Chai <kchai@redhat.com>
* Merge pull request #42040 from wjwithagen/wjw-wip-bluestore-choose_fdv3Kefu Chai2021-07-074-9/+34
|\ | | | | | | | | blk: use choose_fd for all filehandle references Reviewed-by: Kefu Chai <kchai@redhat.com>
| * blk/kernel: Only use file hint capabilities if available.Willem Jan Withagen2021-07-071-1/+7
| | | | | | | | | | | | | | | | Without WRITE_LIFE capabilities, only one file is used. And rocksdb sets this value also to > 0, so we need to catch this here instead of trusting rocksdb to set write_hint. Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>
| * blk/kernel: reorganise and use fd in debugWillem Jan Withagen2021-07-071-1/+1
| | | | | | | | Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>
| * blk: use choose_fd for all filehandle referencesWillem Jan Withagen2021-07-074-8/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a leading part of the changes to implement bluestore on FreeBSD. Without this invalid indexing in the descriptor arrays will occur. Creates a new enum: `blk_access_mode_t` to describe `BUFFERED` and `DIRECT `mode access with `choose_fd()` to get the correct file for the typed access and adds a pretty-printer boolean convertor `blk_access_mode_t::buffermode(bool)` This PR is a redo for PR #37258, since that one was lost in rebasing errors. But the review notes are still there. Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>
* | blk: Initialized zoned block device descriptor.Abutalib Aghayev2021-06-222-2/+2
| | | | | | | | Signed-off-by: Abutalib Aghayev <agayev@psu.edu>
* | os/bluestore: Pass in parameters using a const reference instead of a pointer.Abutalib Aghayev2021-06-223-4/+4
| | | | | | | | Signed-off-by: Abutalib Aghayev <agayev@psu.edu>
* | blk: Add functionality for resetting zones to HM-SMR device.Abutalib Aghayev2021-06-223-0/+14
|/ | | | Signed-off-by: Abutalib Aghayev <agayev@psu.edu>
* blk/KernelDevice: be more verbose on read errors.Igor Fedotov2021-06-161-1/+2
| | | | Signed-off-by: Igor Fedotov <ifedotov@suse.com>
* blk/kernel: explicit assign to fields in structWillem Jan Withagen2021-05-091-2/+2
| | | | | | | | | | | | | | | | | Clang on FreeBSD reports: ``` Building CXX object src/global/CMakeFiles/libglobal_objs.dir/pidfile.cc.o ../src/global/pidfile.cc:170:5: warning: ISO C++ requires field designators to be specified in declaration order; field 'l_whence' will be initialized after field 'l_start' [-Wreorder-init-list] .l_start = 0, ^~~~~~~~~~~~ ../src/global/pidfile.cc:169:17: note: previous initialization for field 'l_whence' is here .l_whence = SEEK_SET, ^~~~~~~~ ``` And Linux and BSD have different orders in their `struct flock`. It also prevents the wrong initialisation on FreeBSD Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>
* cmake: do not build libblk if libblk_srcs is emptyKefu Chai2021-05-021-2/+4
| | | | | | if we don't or just not able to buld libblk, let's skip it. Signed-off-by: Kefu Chai <kchai@redhat.com>
* blk/spdk/NVMEDevice.cc: remove unused variableswangyunqing2021-03-291-5/+1
| | | Signed-off-by: wangyunqing <wangyunqing@inspur.com>
* Merge pull request #40032 from aclamk/wip-bdev-remove-reapKefu Chai2021-03-176-37/+0
|\ | | | | | | | | blk/BlockDevice: Remove reap_ioc logic Reviewed-by: Igor Fedotov <ifedotov@suse.com>
| * blk/BlockDevice: Remove reap_ioc logicAdam Kupczyk2021-03-116-37/+0
| | | | | | | | | | | | | | | | queue_reap_ioc and reap_ioc logic was necessary for BlueFS's _close_writer() method. At one point it did not perform aio_wait(), and there was race condition in access/deletion to IOContext object. Now we can simply delete it after successfull aio_wait(). Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
* | cmake: do not link blk against pmem::pmemobjKefu Chai2021-03-061-2/+1
|/ | | | | | as blk does not use pmem::pmemobj, librbd does. Signed-off-by: Kefu Chai <kchai@redhat.com>
* cmake: build static libs if they are internal onesKefu Chai2021-02-191-1/+1
| | | | | | | | | | there are chances that user or build script set `BUILD_SHARED_LIBS`, so these convenience libraries (using the autotools' terminology) are built and linked by never get installed. Fixes: https://tracker.ceph.com/issues/38611 Fixes: https://tracker.ceph.com/issues/49080 Signed-off-by: Kefu Chai <kchai@redhat.com>
* Merge pull request #39132 from rzarzynski/wip-blk-ptr_node-for-aioKefu Chai2021-01-311-4/+4
|\ | | | | | | | | | | blk: avoid temporary bptrs on aio paths; use ptr_node instead. Reviewed-by: Kefu Chai <kchai@redhat.com> Reviewed-by: Josh Durgin <jdurgin@redhat.com>
| * blk: avoid temporary bptrs on aio paths; use ptr_node instead.Radoslaw Zarzynski2021-01-281-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | This is a slight optimization for the `HAVE_LIBAIO` paths of the kernel-based `BlockDevice` implementation. The overall idea is to squeeze temporary, short-living instances of `ceph::bufferptr` as `ceph::bufferlist` actually aggregates `ptr_node` (`bufferptr` with the extra `next` pointer field to form a list). It can be created directly and this commit switches to exactly this behavior. Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
* | blk/kernel: fix io_uring got (4) Interrupted system callYanhu Cao2021-01-141-1/+1
|/ | | | | Fixes: https://tracker.ceph.com/issues/48872 Signed-off-by: Yanhu Cao <gmayyyha@gmail.com>
* blk: add upper bound of bluestore_deferred_batch_ops* optionshzwuhongsong2020-12-291-0/+3
| | | | | | so the number of pending io does not overflow when being passed to submit_batch(). Signed-off-by: hzwuhongsong <hzwuhongsong@corp.netease.com>
* os/bluestore: Fix HMSMRDevice.cc compilation.Abutalib Aghayev2020-12-081-1/+3
| | | | Signed-off-by: Abutalib Aghayev <agayev@psu.edu>
* Merge pull request #38240 from ifed01/wip-ifed-is-valid-io-logKefu Chai2020-11-272-7/+18
|\ | | | | | | | | blk: log is_valid_io() parameters when unsuccessful. Reviewed-by: Adam Kupczyk <akupczyk@redhat.com>
| * blk: log is_valid_io() parameters when unsuccessful.Igor Fedotov2020-11-232-7/+18
| | | | | | | | Signed-off-by: Igor Fedotov <ifedotov@suse.com>
* | blk: fix parameters for non native uringWillem Jan Withagen2020-11-261-1/+1
| | | | | | | | | | | | | | | | The parameterlist used when uring is availble needs to be equal to the case where no uring is available. fixes: https://github.com/ceph/ceph/pull/38257 Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>
* | Merge pull request #38257 from lnsyyj/wip-iouring-pollKefu Chai2020-11-263-9/+10
|\ \ | | | | | | | | | | | | blk/kernel: expose IORING_SETUP_{IOPOLL,SQPOLL} as options Reviewed-by: Kefu Chai <kchai@redhat.com>
| * | blk/kernel: expose IORING_SETUP_{IOPOLL,SQPOLL} as optionsJiangYu2020-11-243-9/+10
| |/ | | | | | | Signed-off-by: JiangYu <lnsyyj@hotmail.com>
* / blk/kernel/io_uring: do not guard liburing backend with __x86_64__ anymoreJiangYu2020-11-221-3/+3
|/ | | | | | Signed-off-by: JiangYu <lnsyyj@hotmail.com> Let the liburing library to ensure the support of the iouring system call back-end CPU instruction set.
* blk/kernel/io_uring: bump liburing to v0.7Kefu Chai2020-11-051-9/+6
| | | | | | | | | | * use functions exposed by liburing instead of using syscalls * v0.7 is the latest release at the time of writing, as liburing is under active development. it'd be better to use a newer release. * also use https://git.kernel.dk/liburing instead of http://git.kernel.dk/liburing. Signed-off-by: Kefu Chai <kchai@redhat.com>
* Merge pull request #37788 from tchaikov/wip-zbdKefu Chai2020-10-301-4/+2
|\ | | | | | | | | | | rpm,cmake: s/WITH_LIBZBD/WITH_ZBD/ and enable ZBD on demand Reviewed-by: Josh Durgin <jdurgin@redhat.com> Reviewed-By: Neha Ojha <nojha@redhat.com>
| * cmake: set HAVE_LIBZBD before creating "acconfig.h"Kefu Chai2020-10-291-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | `acconfig.h` is generated using configure_file( ${CMAKE_SOURCE_DIR}/src/include/config-h.in.cmake ${CMAKE_BINARY_DIR}/include/acconfig.h ) in `config-h.in.cmake`, the cmake variable of `HAVE_LIBZBD` is checked. so we need to ensure that this variable is visible from this `configure_file()` statement. Signed-off-by: Kefu Chai <kchai@redhat.com>
| * cmake: s/WITH_LIBZBD/WITH_LIBZBD/Kefu Chai2020-10-291-2/+2
| | | | | | | | | | | | | | fix the regression introduced by d53638630631cc6596a1238228332e7579318415 Signed-off-by: Kefu Chai <kchai@redhat.com>
* | librbd/cache: init functionality for SSD CacheMahati Chamarthy2020-10-281-1/+1
|/ | | | | | | | Adds build option and implements init functionality for SSD cache Signed-off-by: Lisa Li <xiaoyan.li@intel.com> Signed-off-by: Mahati Chamarthy <mahati.chamarthy@intel.com> Signed-off-by: Changcheng Liu <changcheng.liu@intel.com>