| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
| |
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The idea here is to bring a pool of `mmap`-allocated,
constantly-sized buffers which would take precedence
over the 2 MB-aligned, THP-based mechanism. On first
attempt to acquire a 4 MB buffer, KernelDevice mmaps
`bdev_read_preallocated_huge_buffer_num` (default 128)
memory regions using the MAP_HUGETLB option. If this
fails, the entire process is aborted. Buffers, after
their life-times going over, are recycled with lock-
free queue shared across entire process.
Remember about allocating the appropriate number of
huge pages in the system! For instance:
```
echo 256 | sudo tee /proc/sys/vm/nr_hugepages
```
This commit bases on / cherry-picks with changes
897a4932bee5cba3641c18619cccd0ee945bfcf8.
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
|
|
|
|
| |
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
|
|
|
|
| |
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Block devices may report an "optimal_io_size" that is different than the
typical 4KiB. To optimize BlueStore for this io size, the allocator
needs to set its min_alloc_size to this optimal_io_size. This PR adds
the discovery of the optimal_io_size for a block device and an option
to use the optimal_io_size as the min_alloc_size for the bluestore allocator.
Older devices may report an optimal_io_size of 0 and if that is the
case, the default config min_alloc_size is used.
Signed-off-by: Curt Bruns <curt.e.bruns@gmail.com>
|
|
|
|
|
|
|
| |
Discard is meaningless on SMR or ZNS since we are always explicitly
managing the reset of entire zones.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Only pick one zone to clean based on the current. Since the best victim
may change (maybe another zone gets a bunch of releases and new dead
bytes!) there is no reason (yet) to explicitly avoid the victim zone
during allocation. There is also no need to track which zones we are
cleaning on disk because we can choose to clean from any zone at any time,
and in general want to clean from the best candidate at the time, not the
one that looked the best some time in the past.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
|
|
| |
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
|
|
|
|
|
|
|
|
| |
No need to duplicate so much code when we are just adding a few things.
Also, we want to track KernelDevice changes/improvements.
We could probably integrate these SMR capabilities directly into
KernelDevice too...
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
|
|
|
|
| |
VDO won't work on an SMR device
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
|
|
| |
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
|
|
|
|
| |
Otherwise it is easy to miss things like EPERM during testing.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|\
| |
| |
| |
| | |
blk: start 1st line of hexdump() on a new line
Reviewed-by: Kefu Chai <tchaikov@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Otherwise the fist line looks rather strange
```
2021-08-19T23:42:33.604+0200 4f60700 40 bdev:kd:965(0x4fab180 /usr/local/src/wip.bluestore-test/build/dev/osd0/block.db) data: 00000000 56 f9 b8 f8 1c 00 01 01 1a 6c 65 76 65 6c 64 62 |V........leveldb|
00000010 2e 42 79 74 65 77 69 73 65 43 6f 6d 70 61 72 61 |.BytewiseCompara|
00000020 74 6f 72 98 af 58 a6 02 00 01 02 00 d2 c7 3c 95 |tor..X........<.|
00000030 06 00 01 09 00 03 04 04 00 6b 93 6d c5 2b 00 01 |.........k.m.+..|
00000040 01 1a 6c 65 76 65 6c 64 62 2e 42 79 74 65 77 69 |..leveldb.Bytewi|
.....
```
versus new:
```
2021-08-19T23:42:33.604+0200 4f60700 40 bdev:kd:965(0x4fab180 /usr/local/src/wip.bluestore-test/build/dev/osd0/block.db) data:
00000000 56 f9 b8 f8 1c 00 01 01 1a 6c 65 76 65 6c 64 62 |V........leveldb|
00000010 2e 42 79 74 65 77 69 73 65 43 6f 6d 70 61 72 61 |.BytewiseCompara|
00000020 74 6f 72 98 af 58 a6 02 00 01 02 00 d2 c7 3c 95 |tor..X........<.|
00000030 06 00 01 09 00 03 04 04 00 6b 93 6d c5 2b 00 01 |.........k.m.+..|
00000040 01 1a 6c 65 76 65 6c 64 62 2e 42 79 74 65 77 69 |..leveldb.Bytewi|
.....
```
Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>
|
|/
|
|
|
|
|
| |
Due to not use namespace std for map,string, it leads to
`map,string` not be declared.
Signed-off-by: Feng Hualong <hualong.feng@intel.com>
|
|\
| |
| |
| |
| | |
src/blk: fix block_device_t return if no aio libs present
Reviewed-by: Kefu Chai <kchai@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In cases, when no libaio are present in the system, compilation fails,
return block_device_t as unknown
fixes: https://tracker.ceph.com/issues/50947#note-1
Signed-off-by: Deepika Upadhyay <dupadhya@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* add "std::" prefix in headers
* add "using" declarations in .cc files.
so we don't rely on "using namespace std" in one or more included
headers.
Signed-off-by: Kefu Chai <kchai@redhat.com>
|
|/
|
|
|
|
|
|
|
|
| |
* add "std::" prefix in headers
* add "using" declarations in .cc files.
so we don't rely on "using namespace std" in one or more included
headers.
Signed-off-by: Kefu Chai <kchai@redhat.com>
|
|\
| |
| |
| |
| | |
blk: use choose_fd for all filehandle references
Reviewed-by: Kefu Chai <kchai@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Without WRITE_LIFE capabilities, only one file is used.
And rocksdb sets this value also to > 0, so we need to catch this here
instead of trusting rocksdb to set write_hint.
Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>
|
| |
| |
| |
| | |
Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This is a leading part of the changes to implement bluestore on
FreeBSD. Without this invalid indexing in the descriptor arrays
will occur.
Creates a new enum: `blk_access_mode_t`
to describe `BUFFERED` and `DIRECT `mode access
with `choose_fd()` to get the correct file for the typed access
and adds
a pretty-printer
boolean convertor `blk_access_mode_t::buffermode(bool)`
This PR is a redo for PR #37258, since that one was lost
in rebasing errors. But the review notes are still there.
Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>
|
| |
| |
| |
| | |
Signed-off-by: Abutalib Aghayev <agayev@psu.edu>
|
| |
| |
| |
| | |
Signed-off-by: Abutalib Aghayev <agayev@psu.edu>
|
|/
|
|
| |
Signed-off-by: Abutalib Aghayev <agayev@psu.edu>
|
|
|
|
| |
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Clang on FreeBSD reports:
```
Building CXX object src/global/CMakeFiles/libglobal_objs.dir/pidfile.cc.o
../src/global/pidfile.cc:170:5: warning: ISO C++ requires field designators to be specified in declaration order; field 'l_whence' will be initialized after field 'l_start' [-Wreorder-init-list]
.l_start = 0,
^~~~~~~~~~~~
../src/global/pidfile.cc:169:17: note: previous initialization for field 'l_whence' is here
.l_whence = SEEK_SET,
^~~~~~~~
```
And Linux and BSD have different orders in their `struct flock`.
It also prevents the wrong initialisation on FreeBSD
Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>
|
|
|
|
|
|
| |
if we don't or just not able to buld libblk, let's skip it.
Signed-off-by: Kefu Chai <kchai@redhat.com>
|
|
|
| |
Signed-off-by: wangyunqing <wangyunqing@inspur.com>
|
|\
| |
| |
| |
| | |
blk/BlockDevice: Remove reap_ioc logic
Reviewed-by: Igor Fedotov <ifedotov@suse.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
queue_reap_ioc and reap_ioc logic was necessary for BlueFS's _close_writer() method.
At one point it did not perform aio_wait(), and there was race condition in access/deletion to IOContext object.
Now we can simply delete it after successfull aio_wait().
Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
|
|/
|
|
|
|
| |
as blk does not use pmem::pmemobj, librbd does.
Signed-off-by: Kefu Chai <kchai@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
there are chances that user or build script set `BUILD_SHARED_LIBS`,
so these convenience libraries (using the autotools' terminology)
are built and linked by never get installed.
Fixes: https://tracker.ceph.com/issues/38611
Fixes: https://tracker.ceph.com/issues/49080
Signed-off-by: Kefu Chai <kchai@redhat.com>
|
|\
| |
| |
| |
| |
| | |
blk: avoid temporary bptrs on aio paths; use ptr_node instead.
Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This is a slight optimization for the `HAVE_LIBAIO` paths
of the kernel-based `BlockDevice` implementation.
The overall idea is to squeeze temporary, short-living
instances of `ceph::bufferptr` as `ceph::bufferlist`
actually aggregates `ptr_node` (`bufferptr` with the extra
`next` pointer field to form a list). It can be created
directly and this commit switches to exactly this behavior.
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
|
|/
|
|
|
| |
Fixes: https://tracker.ceph.com/issues/48872
Signed-off-by: Yanhu Cao <gmayyyha@gmail.com>
|
|
|
|
|
|
| |
so the number of pending io does not overflow when being passed to submit_batch().
Signed-off-by: hzwuhongsong <hzwuhongsong@corp.netease.com>
|
|
|
|
| |
Signed-off-by: Abutalib Aghayev <agayev@psu.edu>
|
|\
| |
| |
| |
| | |
blk: log is_valid_io() parameters when unsuccessful.
Reviewed-by: Adam Kupczyk <akupczyk@redhat.com>
|
| |
| |
| |
| | |
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
The parameterlist used when uring is availble needs
to be equal to the case where no uring is available.
fixes: https://github.com/ceph/ceph/pull/38257
Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>
|
|\ \
| | |
| | |
| | |
| | | |
blk/kernel: expose IORING_SETUP_{IOPOLL,SQPOLL} as options
Reviewed-by: Kefu Chai <kchai@redhat.com>
|
| |/
| |
| |
| | |
Signed-off-by: JiangYu <lnsyyj@hotmail.com>
|
|/
|
|
|
|
| |
Signed-off-by: JiangYu <lnsyyj@hotmail.com>
Let the liburing library to ensure the support of the iouring system call back-end CPU instruction set.
|
|
|
|
|
|
|
|
|
|
| |
* use functions exposed by liburing instead of using syscalls
* v0.7 is the latest release at the time of writing, as liburing is under
active development. it'd be better to use a newer release.
* also use https://git.kernel.dk/liburing instead of
http://git.kernel.dk/liburing.
Signed-off-by: Kefu Chai <kchai@redhat.com>
|
|\
| |
| |
| |
| |
| | |
rpm,cmake: s/WITH_LIBZBD/WITH_ZBD/ and enable ZBD on demand
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-By: Neha Ojha <nojha@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
`acconfig.h` is generated using
configure_file(
${CMAKE_SOURCE_DIR}/src/include/config-h.in.cmake
${CMAKE_BINARY_DIR}/include/acconfig.h
)
in `config-h.in.cmake`, the cmake variable of `HAVE_LIBZBD` is checked.
so we need to ensure that this variable is visible from this
`configure_file()` statement.
Signed-off-by: Kefu Chai <kchai@redhat.com>
|
| |
| |
| |
| |
| |
| |
| | |
fix the regression introduced by
d53638630631cc6596a1238228332e7579318415
Signed-off-by: Kefu Chai <kchai@redhat.com>
|
|/
|
|
|
|
|
|
| |
Adds build option and implements init functionality for SSD cache
Signed-off-by: Lisa Li <xiaoyan.li@intel.com>
Signed-off-by: Mahati Chamarthy <mahati.chamarthy@intel.com>
Signed-off-by: Changcheng Liu <changcheng.liu@intel.com>
|