summaryrefslogtreecommitdiffstats
path: root/fs/bcachefs/extents.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* bcachefs: Fix assertion pop in bch2_ptr_swab()Kent Overstreet2024-11-121-1/+4
| | | | | | | | This runs on extents that haven't yet been validated, so we don't want to assert that we have a valid entry type. Reported-by: syzbot+4f29c3f12f864d8a8d17@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Don't keep tons of cached pointers aroundKent Overstreet2024-10-291-16/+70
| | | | | | | | | | | | | | | | | | | | | | | | We had a bug report where the data update path was creating an extent that failed to validate because it had too many pointers; almost all of them were cached. To fix this, we have: - want_cached_ptr(), a new helper that checks if we even want a cached pointer (is on appropriate target, device is readable). - bch2_extent_set_ptr_cached() now only sets a pointer cached if we want it. - bch2_extent_normalize_by_opts() now ensures that we only have a single cached pointer that we want. While working on this, it was noticed that this doesn't work well with reflinked data and per-file options. Another patch series is coming that plumbs through additional io path options through bch_extent_rebalance, with improved option handling. Reported-by: Reed Riley <reed@riley.engineer> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Don't drop devices with stripe pointersKent Overstreet2024-09-211-4/+13
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: EIO errcode cleanupKent Overstreet2024-09-211-7/+4
| | | | | | | We want to be using private errcodes whenever possible, for better error messages. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_dev_rcu_noerror()Kent Overstreet2024-09-211-2/+3
| | | | | | bch2_dev_rcu() now properly errors if the device is invalid Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: More BCH_SB_MEMBER_INVALID supportKent Overstreet2024-09-091-0/+5
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Simplify bch2_bkey_drop_ptrs()Kent Overstreet2024-09-091-16/+5
| | | | | | | | bch2_bkey_drop_ptrs() had a some complicated machinery for avoiding O(n^2) when dropping multiple pointers - but when n is only going to be ~4, it's not worth it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix bch2_extents_match() false positiveKent Overstreet2024-08-271-1/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This was caught as a very rare nonce inconsistency, on systems with encryption and replication (and tiering, or some form of rebalance operation running): [Wed Jul 17 13:30:03 2024] about to insert invalid key in data update path [Wed Jul 17 13:30:03 2024] old: u64s 10 type extent 671283510:6392:U32_MAX len 16 ver 106595503: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:104 gen 7 ptr: 4:513244:48 gen 6 rebalance: target hdd compression zstd [Wed Jul 17 13:30:03 2024] k: u64s 10 type extent 671283510:6400:U32_MAX len 16 ver 106595508: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:112 gen 7 ptr: 4:513244:56 gen 6 rebalance: target hdd compression zstd [Wed Jul 17 13:30:03 2024] new: u64s 14 type extent 671283510:6392:U32_MAX len 8 ver 106595508: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:112 gen 7 cached ptr: 4:513244:56 gen 6 cached rebalance: target hdd compression zstd crc: c_size 8 size 16 offset 8 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 1:10860085:32 gen 0 ptr: 0:17285918:408 gen 0 [Wed Jul 17 13:30:03 2024] bcachefs (cca5bc65-fe77-409d-a9fa-465a6e7f4eae): fatal error - emergency read only bch2_extents_match() was reporting true for extents that did not actually point to the same data. bch2_extent_match() iterates over pairs of pointers, looking for pointers that point to the same location on disk (with matching generation numbers). However one or both extents may have been trimmed (or merged) and they might not have the same disk offset: it corrects for this by subtracting the key offset and the checksum entry offset. However, this failed when an extent was immediately partially overwritten, and the new overwrite was allocated the next adjacent disk space. Normally, with compression off, this would never cause a bug, since the new extent would have to be immediately after the old extent for the pointer offsets to match, and the rebalance index update path is not looking for an extent outside the range of the extent it moved. However with compression enabled, extents take up less space on disk than they do in the btree index space - and spuriously matching after partial overwrite is possible. To fix this, add a secondary check, that strictly checks that the regions pointed to on disk overlap. https://github.com/koverstreet/bcachefs/issues/717 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix rebalance_work accountingKent Overstreet2024-08-241-0/+39
| | | | | | | | | | rebalance_work was keying off of the presence of rebelance_opts in the extent - but that was incorrect, we keep those around after rebalance for indirect extents since the inode's options are not directly available Fixes: 20ac515a9cc7 ("bcachefs: bch_acct_rebalance_work") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Extra debug for data move pathKent Overstreet2024-08-201-0/+2
| | | | | | | | | | | We don't have sufficient information to debug: https://github.com/koverstreet/bcachefs/issues/726 - print out durability of extent ptrs, when non default - print the number of replicas we need in data_update_to_text() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Make bkey_fsck_err() a wrapper around fsck_err()Kent Overstreet2024-08-141-72/+72
| | | | | | | | | | | | | | | | | | bkey_fsck_err() was added as an interface that looks like fsck_err(), but previously all it did was ensure that the appropriate error counter was incremented in the superblock. This is a cleanup and bugfix patch that converts it to a wrapper around fsck_err(). This is needed to fix an issue with the upgrade path to disk_accounting_v3, where the "silent fix" error list now includes bkey_fsck errors; fsck_err() handles this in a unified way, and since we need to change printing of bkey fsck errors from the caller to the inner bkey_fsck_err() calls, this ends up being a pretty big change. Als,, rename .invalid() methods to .validate(), for clarity, while we're changing the function signature anyways (to drop the printbuf argument). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Self healing on read IO errorKent Overstreet2024-07-151-4/+4
| | | | | | | | | This repurposes the promote path, which already knows how to call data_update() after a read: we now automatically rewrite bad data when we get a read error and then successfully retry from a different replica. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_extent_crc_unpacked_to_text()Kent Overstreet2024-07-151-7/+14
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Check for invalid bucket from bucket_gen(), gc_bucket()Kent Overstreet2024-06-101-3/+6
| | | | | | | Turn more asserts into proper recoverable error paths. Reported-by: syzbot+246b47da27f8e7e7d6fb@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: s/bkey_invalid_flags/bch_validate_flagsKent Overstreet2024-05-091-7/+7
| | | | | | | We're about to start using bch_validate_flags for superblock section validation - it's no longer bkey specific. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Invalid devices are now checked for by fsck, not .invalid methodsKent Overstreet2024-05-091-14/+1
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: kill bch2_dev_bkey_exists() in bkey_pick_read_device()Kent Overstreet2024-05-081-15/+22
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_extent_normalize() -> bch2_dev_rcu()Kent Overstreet2024-05-081-1/+6
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_bkey_has_target() -> bch2_dev_rcu()Kent Overstreet2024-05-081-3/+10
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: extent_ptr_invalid() -> bch2_dev_rcu()Kent Overstreet2024-05-081-11/+19
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: ptr_stale() -> dev_ptr_stale()Kent Overstreet2024-05-081-4/+4
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: extent_ptr_durability() -> bch2_dev_rcu()Kent Overstreet2024-05-081-4/+8
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_extent_merge() -> bch2_dev_rcu()Kent Overstreet2024-05-081-3/+6
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_dev_safe() -> bch2_dev_rcu()Kent Overstreet2024-05-081-1/+1
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_bkey_drop_ptrs() declares loop iterKent Overstreet2024-05-081-4/+0
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Check for writing btree_ptr_v2.sectors_written == 0Kent Overstreet2024-05-081-0/+5
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: member helper cleanupsKent Overstreet2024-05-081-14/+15
| | | | | | | | | | | | | | Some renaming for better consistency bch2_member_exists -> bch2_member_alive bch2_dev_exists -> bch2_member_exists bch2_dev_exsits2 -> bch2_dev_exists bch_dev_locked -> bch2_dev_locked bch_dev_bkey_exists -> bch2_dev_bkey_exists new helper - bch2_dev_safe Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bucket_valid()Kent Overstreet2024-05-081-3/+1
| | | | | | cut out a branch from doing it the obvious way Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Standardize helpers for printing enum strs with bounds checksKent Overstreet2024-04-141-3/+4
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: fix unsafety in bch2_extent_ptr_to_text()Kent Overstreet2024-04-141-1/+3
| | | | | | Need to check if we have a valid bucket before checking if ptr is stale Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Repair pass for scanning for btree nodesKent Overstreet2024-04-031-24/+28
| | | | | | | | | | | | | | | If a btree root or interior btree node goes bad, we're going to lose a lot of data, unless we can recover the nodes that it pointed to by scanning. Fortunately btree node headers are fully self describing, and additionally the magic number is xored with the filesytem UUID, so we can do so safely. This implements the scanning - next patch will rework topology repair to make use of the found nodes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Check btree ptr min_key in .invalidKent Overstreet2024-04-011-2/+7
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: opts->compression can now also be applied in the backgroundKent Overstreet2024-01-211-1/+3
| | | | | | | | | The "apply this compression method in the background" paths now use the compression option if background_compression is not set; this means that setting or changing the compression option will cause existing data to be compressed accordingly in the background. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Prep work for variable size btree node buffersKent Overstreet2024-01-211-0/+1
| | | | | | | | | | | | | | | | | bcachefs btree nodes are big - typically 256k - and btree roots are pinned in memory. As we're now up to 18 btrees, we now have significant memory overhead in mostly empty btree roots. And in the future we're going to start enforcing that certain btree node boundaries exist, to solve lock contention issues - analagous to XFS's AGIs. Thus, we need to start allocating smaller btree node buffers when we can. This patch changes code that refers to the filesystem constant c->opts.btree_node_size to refer to the btree node buffer size - btree_buf_bytes() - where appropriate. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_prt_compression_type()Kent Overstreet2024-01-211-3/+3
| | | | | | bounds checking helper, since compression types are extensible Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bkey_for_each_ptr() now declares loop iterKent Overstreet2024-01-011-4/+0
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: rebalance shouldn't attempt to compress unwritten extentsDaniel Hill2023-12-061-1/+2
| | | | | | | | This fixes a bug where rebalance would loop repeatedly on the same extents. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix ec + durability calculationKent Overstreet2023-11-261-18/+12
| | | | | | | | Durability of an erasure coded pointer doesn't add the device durability; durability is the same for any extent in that stripe so the calculation only comes from the stripe. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Enumerate fsck errorsKent Overstreet2023-11-021-130/+106
| | | | | | | | | | | | | This patch adds a superblock error counter for every distinct fsck error; this means that when analyzing filesystems out in the wild we'll be able to see what sorts of inconsistencies are being found and repair, and hence what bugs to look for. Errors validating bkeys are not yet considered distinct fsck errors, but this patch adds a new helper, bkey_fsck_err(), in order to add distinct error types for them as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: rebalance_workKent Overstreet2023-11-021-13/+142
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a new btree, rebalance_work, to eliminate scanning required for finding extents that need work done on them in the background - i.e. for the background_target and background_compression options. rebalance_work is a bitset btree, where a KEY_TYPE_set corresponds to an extent in the extents or reflink btree at the same pos. A new extent field is added, bch_extent_rebalance, which indicates that this extent has work that needs to be done in the background - and which options to use. This allows per-inode options to be propagated to indirect extents - at least in some circumstances. In this patch, changing IO options on a file will not propagate the new options to indirect extents pointed to by that file. Updating (setting/clearing) the rebalance_work btree is done by the extent trigger, which looks at the bch_extent_rebalance field. Scanning is still requrired after changing IO path options - either just for a given inode, or for the whole filesystem. We indicate that scanning is required by adding a KEY_TYPE_cookie key to the rebalance_work btree: the cookie counter is so that we can detect that scanning is still required when an option has been flipped mid-way through an existing scan. Future possible work: - Propagate options to indirect extents when being changed - Add other IO path options - nr_replicas, ec, to rebalance_work so they can be applied in the background when they change - Add a counter, for bcachefs fs usage output, showing the pending amount of rebalance work: we'll probably want to do this after the disk space accounting rewrite (moving it to a new btree) Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: trivial extents.c refactoringKent Overstreet2023-10-311-11/+11
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Check for too-large encoded extentsKent Overstreet2023-10-311-0/+8
| | | | | | We don't yet repair (split) them, just check. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix 'pointer to invalid device' checkKent Overstreet2023-10-221-2/+11
| | | | | | | | | This fixes the device removal tests, which have been failing at random due to the fact that when we're running the .key_invalid checks in the write path the key may actually no longer exist - we might be racing with the keys being deleted. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix assorted checkpatch nitsKent Overstreet2023-10-221-6/+6
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Change check for invalid key typesKent Overstreet2023-10-221-4/+8
| | | | | | | | | | | As part of the forward compatibility patch series, we need to allow for new key types without complaining loudly when running an old version. This patch changes the flags parameter of bkey_invalid to an enum, and adds a new flag to indicate we're being called from the transaction commit path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Assorted sparse fixesKent Overstreet2023-10-221-4/+4
| | | | | | | | | - endianness fixes - mark some things static - fix a few __percpu annotations - fix silent enum conversions Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: struct bch_extent_rebalanceKent Overstreet2023-10-221-0/+6
| | | | | | | | | | This adds the extent entry for extents that rebalance needs to do something with. We're adding this ahead of the main rebalance_work patchset, because adding new extent entries can't be done in a forwards-compatible way. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_extent_ptr_desired_durability()Kent Overstreet2023-10-221-7/+21
| | | | | | | | | | | This adds a new helper for getting a pointer's durability irrespective of the device state, and uses it in the the data update path. This fixes a bug where we do a data update but request 0 replicas to be allocated, because the replica being rewritten is on a device marked as failed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bkey_ops.min_val_sizeKent Overstreet2023-10-221-14/+0
| | | | | | | | | | | | | This adds a new field to bkey_ops for the minimum size of the value, which standardizes that check and also enforces the new rule (previously done somewhat ad-hoc) that we can extend value types by adding new fields on to the end. To make that work we do _not_ initialize min_val_size with sizeof, instead we initialize it to the size of the first version of those values. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Delete obsolete btree ptr checkKent Overstreet2023-10-221-7/+0
| | | | | | | | This patch deletes a .key_invalid check for btree pointers that only applies to _very_ old on disk format versions, and potentially complicates the upgrade process. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>