summaryrefslogtreecommitdiffstats
path: root/drivers/block (unfollow)
Commit message (Collapse)AuthorFilesLines
2023-10-22bcachefs: Dump journal state when we get stuckKent Overstreet1-1/+10
We had a bug reported where the journal is failing to allocate a journal write - this should help figure out what's going on. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix a 64 bit divide on 32 bitKent Overstreet1-2/+4
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't use inode btree key cache in fsck codeKent Overstreet3-10/+26
We had a cache coherency bug with the btree key cache in the fsck code - this fixes fsck to be consistent about not using it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't call into journal reclaim when we're not supposed toKent Overstreet1-1/+2
This was causing a deadlock when btree_update_nodes_writtes() invokes journal reclaim because of the btree cache being too dirty. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Create allocator threads when allocating filesystemKent Overstreet2-1/+25
We're seeing failures to mount because of a failure to start the allocator threads, which currently happens fairly late in the mount process, after walking all metadata, and kthread_create() fails if something has tried to kill the mount process, which is probably not what we want. This patch avoids this issue by creating, but not starting, the allocator threads when we preallocate all of our other in memory data structures. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix for bch2_btree_node_get_noiter() returning -ENOMEMKent Overstreet1-4/+10
bch2_btree_node_get_noiter() isn't used from the btree iterator code, which retries with the btree node cache cannibalize lock held on -ENOMEM, so we should do it ourself if necessary. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add error message for some allocation failuresKent Overstreet6-15/+53
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Extents may now cross btree node boundariesKent Overstreet7-237/+87
When snapshots arrive, we won't necessarily be able to arbitrarily split existis - when we need to split an existing extent, we'll have to check if the extent was overwritten in child snapshots and if so emit a whiteout for the split in the child snapshot. Because extents couldn't span btree nodes previously, journal replay would sometimes have to split existing extents. That's no good anymore, but fortunately since extent handling has already been lifted above most of the btree code there's no real need for that rule anymore. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: iter->real_posKent Overstreet3-92/+120
We need to differentiate between the search position of a btree iterator, vs. what it actually points at (what we found). This matters for extents, where iter->pos will typically be the start of the key we found and iter->real_pos will be the end of the key we found (which soon won't necessarily be in the same btree node!) and it will also matter for snapshots. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Ensure btree iterators are traversed in bch2_trans_commit()Kent Overstreet1-1/+6
The upcoming patch to allow extents to span btree nodes will require this... and this assertion seems to be popping, and it's not a very good assertion anyways. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Drop invalid stripe ptrs in fsckKent Overstreet3-21/+56
More repair code, now that we can repair extents during initial gc. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix unnecessary read amplificaiton when allocating ec stripesRobbie Litchfield1-63/+92
When allocating an erasure coding stripe, bcachefs will always reuse any partial stripes before reserving a new stripe. This causes unnecessary read amplification when preparing a stripe for writing. This patch changes bcachefs to always reserve new stripes first, only relying on stripe reuse when copygc needs more time to empty buckets from existing stripes. Signed-off-by: Robbie Litchfield <blam.kiwi@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fsck fixesKent Overstreet1-4/+11
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix a shift greater than type sizeKent Overstreet1-1/+1
Found by UBSAN Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Assert that we're not trying to flush journal seq in the futureKent Overstreet1-0/+2
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix bch2_btree_iter_peek_prev()Kent Overstreet2-23/+35
This makes bch2_btree_iter_peek_prev() and bch2_btree_iter_prev() consistent with peek() and next(), w.r.t. iter->pos. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bch2_btree_iter_advance_pos()Kent Overstreet1-24/+17
This adds a new common helper for advancing past the last key returned by peek(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Kill bch2_btree_iter_set_pos_same_leaf()Kent Overstreet3-43/+3
The only reason we were keeping this around was for BTREE_INSERT_NOUNLOCK semantics - if bch2_btree_iter_set_pos() advances to the next leaf node, it'll drop the lock on the node that we just inserted to. But we don't rely on BTREE_INSERT_NOUNLOCK semantics for the extents btree, just the inodes btree, and if we do need it for the extents btree in the future we can do it more cleanly by cloning the iterator - this lets us delete some special cases in the btree iterator code, which is complicated enough as it is. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Simplify btree_iter_(next|prev)_leaf()Kent Overstreet1-18/+9
There's no good reason for these functions to not be using bch2_btree_iter_set_pos(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix for hash_redo_key() in fsckKent Overstreet1-1/+1
It's possible we're calling hash_redo_key() because of a duplicate key - easiest fix for that is to just not use BCH_HASH_SET_MUST_CREATE. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add flushed_seq_ondisk to journal_debug_to_text()Kent Overstreet1-2/+5
Also, make the wait in bch2_journal_flush_seq() interruptible, not just killable. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Redo checks for sufficient devicesKent Overstreet7-110/+51
When the replicas mechanism was added, for tracking data by which drives it's replicated on, the check for whether we have sufficient devices was never updated to make use of it. This patch finally does that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Run fsck if BCH_FEATURE_alloc_v2 isn't setKent Overstreet1-0/+7
We're using BCH_FEATURE_alloc_v2 to also gate journalling updates to dev usage - we don't have the code for reconstructing this from buckets anymore, so we need to run fsck if it's not set. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fixes/improvements for journal entry reservationsKent Overstreet4-14/+16
This fixes some arithmetic bugs in "bcachefs: Journal updates to dev usage" - additionally, it cleans things up by switching everything that goes in every journal entry to the journal_entry_res mechanism. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Include device in btree IO error messagesKent Overstreet3-37/+44
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Journal updates to dev usageKent Overstreet10-75/+219
This eliminates the need to scan every bucket to regenerate dev_usage at mount time. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Persist 64 bit io clocksKent Overstreet19-310/+141
Originally, bcachefs - going back to bcache - stored, for each bucket, a 16 bit counter corresponding to how long it had been since the bucket was read from. But, this required periodically rescaling counters on every bucket to avoid wraparound. That wasn't an issue in bcache, where we'd perodically rewrite the per bucket metadata all at once, but in bcachefs we're trying to avoid having to walk every single bucket. This patch switches to persisting 64 bit io clocks, corresponding to the 64 bit bucket timestaps introduced in the previous patch with KEY_TYPE_alloc_v2. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: KEY_TYPE_alloc_v2Kent Overstreet9-299/+403
This introduces a new version of KEY_TYPE_alloc, which uses the new varint encoding introduced for inodes. This means we'll eventually be able to support much larger bucket sizes (for SMR devices), and the read/write time fields are expanded to 64 bits - which will be used in the next patch to get rid of the periodic rescaling of those fields. Also, for buckets that are members of erasure coded stripes, this adds persistent fields for the index of the stripe they're members of and the stripe redundancy. This is part of work to get rid of having to scan and read into memory the alloc and stripes btrees at mount time. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add missing call to bch2_replicas_entry_sort()Kent Overstreet3-6/+9
This fixes a bug introduced by "bcachefs: Improve diagnostics when journal entries are missing" - devices in a replicas entry are supposed to be sorted. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add an assertion to check for journal writes to same locationKent Overstreet2-0/+4
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add an option for metadata_targetKent Overstreet4-3/+23
Also, make journal writes obey foreground_target and metadata_target. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Repair bad data pointersKent Overstreet1-36/+102
Now that we can repair metadata during GC, we can handle bad pointers that would trigger errors being marked, when they need to just be dropped. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add (partial) support for fixing btree topologyKent Overstreet5-44/+156
When we walk the btrees during recovery, part of that is checking that btree topology is correct: for every interior btree node, its child nodes should exactly span the range the parent node covers. Previously, we had checks for this, but not repair code. Now that we have the ability to do btree updates during initial GC, this patch adds that repair code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add support for doing btree updates prior to journal replayKent Overstreet7-81/+176
Some errors may need to be fixed in order for GC to successfully run - walk and mark all metadata. But we can't start the allocators and do normal btree updates until after GC has completed, and allocation information is known to be consistent, so we need a different method of doing btree updates. Fortunately, we already have code for walking the btree while overlaying keys from the journal to be replayed. This patch adds an update path that adds keys to the list of keys to be replayed by journal replay, and also fixes up iterators. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add BTREE_PTR_RANGE_UPDATEDKent Overstreet4-8/+11
This is so that when we discover btree topology issues, we can just update the pointer to a btree node and signal btree read path that the min/max keys in the node header should be updated from the node pointer. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Refactor checking of btree topologyKent Overstreet1-35/+48
Still a lot of work to be done here: we can't yet repair btree topology issues, but this patch refactors things so that we have better access to what we need in the topology checks. Next up will be figuring out a way to do btree updates during gc, before journal replay is done. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Improve diagnostics when journal entries are missingKent Overstreet3-28/+96
There's an outstanding bug with journal entries being missing in journal replay. This patch adds code to print out where the journal entries were physically located that were around the entry(ies) being missing, which should make debugging easier. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix BCH_REPLICAS_MAX checkKent Overstreet1-4/+4
Ideally, this limit will be going away in the future. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix build in userspaceKent Overstreet1-3/+2
The userspace bch_err() macro doesn't use the filesystem argument. Could also be fixed with a better macro. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix an assertionKent Overstreet1-1/+2
If we're invalidating a bucket that has cached data in it, data_type won't be 0 - oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Mark superblocks transactionallyKent Overstreet6-47/+211
More work towards getting rid of the in memory struct bucket: this path adds code for marking superblock and journal buckets via the btree, and uses it in the device add and journal resize paths. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Kill bch2_invalidate_bucket()Kent Overstreet3-58/+14
This patch is working towards eventually getting rid of the in memory struct bucket, and relying only on the btree representation. Since bch2_invalidate_bucket() was only used for incrementing gens, not invalidating cached data, no other counters were being changed as a side effect - meaning it's safe for the allocator code to increment the bucket gen directly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Refactor dev usageKent Overstreet9-129/+91
This is to make it more amenable for serialization. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Kill metadata only gcKent Overstreet3-61/+27
This was useful before we had transactional updates to interior btree nodes - but now, it's just extra unneeded complexity. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Ensure __bch2_trans_commit() always calls bch2_trans_reset()Kent Overstreet1-3/+3
This was leading to a very strange bug in bch2_bucket_io_time_reset(), where we'd retry without clearing out the list of updates. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix a faulty assertionKent Overstreet1-0/+1
If journal replay hasn't finished, the journal can't be empty - oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Switch replicas.c allocations to GFP_KERNELKent Overstreet1-9/+9
We're transitioning to memalloc_nofs_save/restore instead of GFP flags with the rest of the kernel, and GFP_NOIO was excessively strict and causing unnnecessary allocation failures - these allocations are done with btree locks dropped. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix loopback in dio modeKent Overstreet1-4/+26
We had a deadlock on page_lock, because buffered reads signal completion by unlocking the page, but the dio read path normally dirties the pages it's reading to with set_page_dirty_lock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Clean up bch2_extent_can_insertKent Overstreet1-10/+5
It was using an internal btree node iterator interface, when bch2_btree_iter_peek_slot() sufficed. We were hitting a null ptr deref that looked like it was from the iterator not being uptodate - this will also fix that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix an assertion popKent Overstreet3-22/+1
There was a race: btree node writes drop their reference on journal pins before clearing the btree_node_write_in_flight flag. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>