| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| | |
os/bluestore: Fix BlueFS::truncate()
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In `struct bluefs_fnode_t` there is a vector `extents` and
the vector `extents_index` that is a log2 seek cache.
Until modifications to truncate() we never removed extents from files.
Modified truncate() did not update extents_index.
For example 10 extents long files when truncated to 0 will have:
0 extents, 10 extents_index.
After writing some data to file:
1 extents, 11 extents_index.
Now, `bluefs_fnode_t::seek` will binary search extents_index,
lets say it located seek at item #3.
It will then jump up from #0 extent (that exists) to #3 extent which
does not exist at.
The worst part is that code is now broken, as #3 != extent.end().
There are 3 parts of the fix:
1) assert in `bluefs_fnode_t::seek` to protect against
jumping outside extents
2) code in BlueFS::truncate to sync up `extents_index` with `extents`
3) dampening down assert in _replay to give a way out of cases
where incorrect "offset 12345" (12345 is file size) instead of
"offset 20000" (allocations occupied) was written to log.
Fixes: https://tracker.ceph.com/issues/69481
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
|
|\ \
| | |
| | |
| | |
| | | |
os/bluestore: record omapiter init latency
Reviewed-by: Igor Fedotov <ifedotov@suse.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
if one object has many `internal keys` at its omap beginning,
it maybe very slow for the underlying seek to reach the first
`user key` when initializing a omapiter.
this may stuck osd when build_push_op, seek recovering
object's first omap key again and again.
Signed-off-by: imtzw <tongzhiwei_yewu@cmss.chinamobile.com>
|
|\ \ \
| |_|/
|/| |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
os, osd: bring the lightweight OMAP iteration
Reviewed-by: Casey Bodley <cbodley@redhat.com>
Reviewed-by: Matan Breizman <Matan.Brz@gmail.com>
Reviewed-by: Mark Kogan <mkogan@redhat.com>
Reviewed-by: Adam Kupczyk <akupczyk@redhat.com>
Reviewed-by: Samuel Just <sjust@redhat.com>
|
| | |
| | |
| | |
| | | |
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
|
| | |
| | |
| | |
| | | |
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
|
| | |
| | |
| | |
| | | |
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
|
| | |
| | |
| | |
| | | |
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
|
| | |
| | |
| | |
| | | |
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
|
| | |
| | |
| | |
| | | |
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
|
| | |
| | |
| | |
| | | |
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
|
| | |
| | |
| | |
| | | |
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
```
- 63.07% _ZN12PrimaryLogPG19prepare_transactionEPNS_9OpContextE ▒
- 63.06% _ZN12PrimaryLogPG10do_osd_opsEPNS_9OpContextERSt6vectorI5OSDOpSaIS3_EE ▒
- 20.19% _ZN9BlueStore16OmapIteratorImpl4nextEv ▒
- 12.21% _ZN14CFIteratorImpl4nextEv ▒
+ 10.56% _ZN7rocksdb6DBIter4NextEv ▒
1.02% _ZN7rocksdb18ArenaWrappedDBIter4NextEv ▒
+ 3.11% clock_gettime@@GLIBC_2.17 ▒
+ 2.44% _ZN9BlueStore11log_latencyEPKciRKNSt6chrono8durationImSt5ratioILl1ELl1000000000EEEEdS1_i ▒
0.78% pthread_rwlock_rdlock@plt ▒
0.69% pthread_rwlock_unlock@plt ▒
- 14.28% _ZN9BlueStore16OmapIteratorImpl5valueEv ▒
- 11.60% _ZN14CFIteratorImpl5valueEv ▒
- 11.41% _ZL13to_bufferlistN7rocksdb5SliceE ▒
- 10.50% _ZN4ceph6buffer7v15_2_03ptrC1EPKcj ▒
- _ZN4ceph6buffer7v15_2_04copyEPKcj ▒
- 10.01% _ZN4ceph6buffer7v15_2_014create_alignedEjj ▒
- _ZN4ceph6buffer7v15_2_025create_aligned_in_mempoolEjji ▒
5.27% _ZN7mempool6pool_t12adjust_countEll ▒
+ 3.72% tc_posix_memalign ▒
0.54% _ZN4ceph6buffer7v15_2_04list6appendEONS1_3ptrE ▒
1.25% pthread_rwlock_rdlock@plt ▒
0.90% pthread_rwlock_unlock@plt
```
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
|
| | |
| | |
| | |
| | | |
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
|
|\ \ \
| | | |
| | | |
| | | |
| | | | |
tool/ceph-bluestore-tool: fix wrong keyword for 'free-fragmentation' …
Reviewed-by: Adam Kupczyk <akupczyk@ibm.com>
|
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
|
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
|
|\ \ \ \
| | | | |
| | | | | |
os/bluestore: log txc details in slow op notification on committed_kv
|
| | | | |
| | | | |
| | | | |
| | | | | |
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
kv_committed.
This might be helpful to troubleshoot issues with slow ops caused by
bulky client transactions.
Related-to: https://tracker.ceph.com/issues/67339
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
|
|\ \ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
os: remove unused btrfs_ioctl.h and tests
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
|
| |/ / / /
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
remove unused header whose GPL license was potentially problematic
Fixes: https://tracker.ceph.com/issues/68083
Signed-off-by: Casey Bodley <cbodley@redhat.com>
|
|\ \ \ \ \
| | | | | |
| | | | | | |
os/bluestore/ceph-bluestore-tool: Modify show-label for many devs
|
| | |_|/ /
| |/| | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
It was possible to give multiple devices to cbt:
> ceph-bluestore-tool show-label --dev /dev/sda --dev /dev/sdb
But is any of devices cannot provide valid label, nothing was printed.
Now, always print results. Non readable labels are output as empty dictionaries.
Exit code:
- 0 if any label properly read
- 1 if all labels failed
Fixes: https://tracker.ceph.com/issues/68505
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
|
|\ \ \ \ \
| | | | | |
| | | | | | |
os/bluestore: Fix repair of multilabel when collides with BlueFS
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
The problem was that BDEV_FIRST_LABEL_POSITION was removed from
bdev_label_valid_locations set.
Now, if label at BDEV_FIRST_LABEL_POSITION is valid, it is in the set.
Fixes: https://tracker.ceph.com/issues/68528
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
|
|\ \ \ \ \ \
| |/ / / / /
|/| | | | | |
os/bluestore: Fix ceph-bluestore-tool allocmap command
|
| | |/ / /
| |/| | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
BlueStore::read_allocation_from_drive_for_bluestore_tool was
not informed that multiple bdev labels can exist and reserve space.
Comparison of real alloc vs recovered alloc was failing.
Fixes: https://tracker.ceph.com/issues/67596
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
|
| |/ / /
|/| | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Now when truncate() drops unused allocations.
Modified Close() in BlueRocksEnv to unconditionally call truncate.
Fixes: https://tracker.ceph.com/issues/68385
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | | |
blk/kerneldevice: add perfcounter for block async discard
Reviewed-by: Igor Fedotov <igor.fedotov@croit.io>
|
| | | | |
| | | | |
| | | | |
| | | | | |
Signed-off-by: Yite Gu <yitegu0@gmail.com>
|
|\ \ \ \ \
| | | | | |
| | | | | | |
os/bluestore: Fix BlueFS allocating bdev label reserved location.
|
| | |/ / /
| |/| | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Reservation (alloc->init_rm_free) was after reopening DB in r/w mode.
This was a problem - as soon as DB is in r/w it can flush sst or compact,
which will make allocations.
Fixes: https://tracker.ceph.com/issues/67911
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
|
|\ \ \ \ \
| |/ / / /
|/| | | | |
ceph-bluestore-tool: Fixes for multilple bdev label
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Make zapping precisely target block device labels.
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
ceph-volume needs to query the devices for `ceph-volume raw list`.
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
|
| | | | |
| | | | |
| | | | |
| | | | | |
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Fixes: https://tracker.ceph.com/issues/67926
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
|
|/ / / /
| | | |
| | | |
| | | |
| | | | |
co-author: Jrchyang Yu <yuzhiqiang_yewu@cmss.chinamobile.com>
Signed-off-by: Wang Linke <wanglinke_yewu@cmss.chinamobile.com>
|
|\ \ \ \
| | | | |
| | | | | |
os/bluestore: Recompression, part 2. New write path.
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
4) remove Writer::shared_changed and use txc::shared_blobs directly
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
1) moved stats and blobs update to Writer::do_write
2) preallocate space in Writer:_split_data
3) fixed Writer::_write_expand_l that could check one extent too much
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Added conf.bluestore_write_v2_random. This is useful only for testing.
If set, it overrides value of bluestore_write_v2 with a random
true/false selection.
It is useful for v1 / v2 compatibility testing.
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
For write_v2 create fallback to write_v1 if compression is selected.
This is temporary until compression dedicated to benefit from v2 is
merged.
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
1) Algorithm assumed that blob->blob_start() is aligned to csum size.
It is true for blobs created by write_v2, but write_v1 can generate
blob like: begin = 0x9000, size = 0x6000, csum = 0x2000.
2) Blobs with unused were selected even if those need to be expanded.
This is illegal since we cannot expand unused.
Fixed blob selection algorithm.
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
More diligent calcualtion algorithm of need_size.
Takes into account front and back alignment.
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Clang fails at _construct_at().
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Usually the data we put to disk is AU aligned.
In weird cases like AU=16K we put less data than we allocated.
_crop_allocs_to_io trims allocated extents into disk block extents
to reflect real IO.
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
BufferSpace is not with Onode, not Blob.
Modify code to adapt to this change.
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
|