summaryrefslogtreecommitdiffstats
path: root/fs/xfs/libxfs/xfs_btree.h (follow)
Commit message (Collapse)AuthorAgeFilesLines
* xfs: return a 64-bit block count from xfs_btree_count_blocksDarrick J. Wong2024-12-131-1/+1
| | | | | | | | | | | | | | | | | | With the nrext64 feature enabled, it's possible for a data fork to have 2^48 extent mappings. Even with a 64k fsblock size, that maps out to a bmbt containing more than 2^32 blocks. Therefore, this predicate must return a u64 count to avoid an integer wraparound that will cause scrub to do the wrong thing. It's unlikely that any such filesystem currently exists, because the incore bmbt would consume more than 64GB of kernel memory on its own, and so far nobody except me has driven a filesystem that far, judging from the lack of complaints. Cc: <stable@vger.kernel.org> # v5.19 Fixes: df9ad5cc7a5240 ("xfs: Introduce macros to represent new maximum extent counts for data/attr forks") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
* xfs: add a generic group pointer to the btree cursorChristoph Hellwig2024-11-051-2/+1
| | | | | | | | | Replace the pag pointers in the type specific union with a generic xfs_group pointer. This prepares for adding realtime group support. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
* xfs: Avoid races with cnt_btree lastrec updatesZizhi Wo2024-07-041-15/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A concurrent file creation and little writing could unexpectedly return -ENOSPC error since there is a race window that the allocator could get the wrong agf->agf_longest. Write file process steps: 1) Find the entry that best meets the conditions, then calculate the start address and length of the remaining part of the entry after allocation. 2) Delete this entry and update the -current- agf->agf_longest. 3) Insert the remaining unused parts of this entry based on the calculations in 1), and update the agf->agf_longest again if necessary. Create file process steps: 1) Check whether there are free inodes in the inode chunk. 2) If there is no free inode, check whether there has space for creating inode chunks, perform the no-lock judgment first. 3) If the judgment succeeds, the judgment is performed again with agf lock held. Otherwire, an error is returned directly. If the write process is in step 2) but not go to 3) yet, the create file process goes to 2) at this time, it may be mistaken for no space, resulting in the file system still has space but the file creation fails. We have sent two different commits to the community in order to fix this problem[1][2]. Unfortunately, both solutions have flaws. In [2], I discussed with Dave and Darrick, realized that a better solution to this problem requires the "last cnt record tracking" to be ripped out of the generic btree code. And surprisingly, Dave directly provided his fix code. This patch includes appropriate modifications based on his tmp-code to address this issue. The entire fix can be roughly divided into two parts: 1) Delete the code related to lastrec-update in the generic btree code. 2) Place the process of updating longest freespace with cntbt separately to the end of the cntbt modifications. Move the cursor to the rightmost firstly, and update the longest free extent based on the record. Note that we can not update the longest with xfs_alloc_get_rec() after find the longest record, as xfs_verify_agbno() may not pass because pag->block_count is updated on the outside. Therefore, use xfs_btree_get_rec() as a replacement. [1] https://lore.kernel.org/all/20240419061848.1032366-2-yebin10@huawei.com [2] https://lore.kernel.org/all/20240604071121.3981686-1-wozizhi@huawei.com Reported by: Ye Bin <yebin10@huawei.com> Signed-off-by: Zizhi Wo <wozizhi@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* xfs: support in-memory btreesDarrick J. Wong2024-02-221-0/+7
| | | | | | | | | | | | | | Adapt the generic btree cursor code to be able to create a btree whose buffers come from a (presumably in-memory) buftarg with a header block that's specific to in-memory btrees. We'll connect this to other parts of online scrub in the next patches. Note that in-memory btrees always have a block size matching the system memory page size for efficiency reasons. There are also a few things we need to do to finalize a btree update; that's covered in the next patch. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
* xfs: move and rename xfs_btree_read_buflChristoph Hellwig2024-02-221-13/+0
| | | | | | | | | | | Despite its name, xfs_btree_read_bufl doesn't contain any btree-related functionaliy and isn't used by the btree code. Move it to xfs_bmap.c, hard code the refval and ops arguments and rename it to xfs_bmap_read_buf. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
* xfs: remove xfs_btree_reada_bufsChristoph Hellwig2024-02-221-12/+0
| | | | | | | | | | xfs_btree_reada_bufl just wraps xfs_btree_readahead and a agblock to daddr conversion. Just open code it's three callsites in the two callers (One of which isn't even btree related). Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
* xfs: remove xfs_btree_reada_buflChristoph Hellwig2024-02-221-11/+0
| | | | | | | | | | xfs_btree_reada_bufl just wraps xfs_btree_readahead and a fsblock to daddr conversion. Just open code it's two callsites in the only caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
* xfs: rename btree helpers that depends on the block number representationChristoph Hellwig2024-02-221-8/+8
| | | | | | | | | | | All these helpers hardcode fsblocks or agblocks and not just the pointer size. Rename them so that the names are still fitting when we add the long format in-memory blocks and adjust the checks when calling them to check the btree types and not just pointer length. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
* xfs: consolidate btree block verificationChristoph Hellwig2024-02-221-8/+1
| | | | | | | | | | | Add a __xfs_btree_check_block helper that can be called by the scrub code to validate a btree block of any form, and move the duplicate error handling code from xfs_btree_check_sblock and xfs_btree_check_lblock into xfs_btree_check_block and thus remove these two helpers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
* xfs: consolidate btree ptr checkingChristoph Hellwig2024-02-221-18/+3
| | | | | | | | | | Merge xfs_btree_check_sptr and xfs_btree_check_lptr into a single __xfs_btree_check_ptr that can be shared between xfs_btree_check_ptr and the scrub code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
* xfs: remove xfs_btnum_tChristoph Hellwig2024-02-221-11/+0
| | | | | | | | | | | The last checks for bc_btnum can be replaced with helpers that check the btree ops. This allows adding new btrees to XFS without having to update a global enum. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> [djwong: complete the ops predicates] Signed-off-by: Darrick J. Wong <djwong@kernel.org>
* xfs: add a sick_mask to struct xfs_btree_opsChristoph Hellwig2024-02-221-0/+3
| | | | | | | | | Clean up xfs_btree_mark_sick by adding a sick_mask to the btree-ops for all AG-root btrees. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
* xfs: add a name field to struct xfs_btree_opsChristoph Hellwig2024-02-221-0/+2
| | | | | | | | | | The btnum in struct xfs_btree_ops is often used for printing a symbolic name for the btree. Add a name field to the ops structure and use that directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
* xfs: add a xfs_btree_init_ptr_from_curChristoph Hellwig2024-02-221-0/+2
| | | | | | | | | | | | Inode-rooted btrees don't need to initialize the root pointer in the ->init_ptr_from_cur method as the root is found by the xfs_btree_get_iroot method later. Make ->init_ptr_from_cur option for inode rooted btrees by providing a helper that does the right thing for the given btree type and also documents the semantics. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
* xfs: create predicate to determine if cursor is at inode root levelDarrick J. Wong2024-02-221-0/+10
| | | | | | | | | Create a predicate to decide if the given cursor and level point to the root block in the inode immediate area instead of a disk block, and get rid of the open-coded logic everywhere. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
* xfs: split the per-btree union in struct xfs_btree_curChristoph Hellwig2024-02-221-32/+23
| | | | | | | | | | | | | | | | | Split up the union that encodes btree-specific fields in struct xfs_btree_cur. Most fields in there are specific to the btree type encoded in xfs_btree_ops.type, and we can use the obviously named union for that. But one field is specific to the bmapbt and two are shared by the refcount and rtrefcountbt. Move those to a separate union to make the usage clear and not need a separate struct for the refcount-related fields. This will also make unnecessary some very awkward btree cursor refc/rtrefc switching logic in the rtrefcount patchset. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
* xfs: split out a btree type from the btree ops geometry flagsChristoph Hellwig2024-02-221-4/+11
| | | | | | | | | | | | | | | | | | | | | Two of the btree cursor flags are always used together and encode the fundamental btree type. There currently are two such types: 1) an on-disk AG-rooted btree with 32-bit pointers 2) an on-disk inode-rooted btree with 64-bit pointers and we're about to add: 3) an in-memory btree with 64-bit pointers Introduce a new enum and a new type field in struct xfs_btree_geom to encode this type directly instead of using flags and change most code to switch on this enum. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> [djwong: make the pointer lengths explicit] Signed-off-by: Darrick J. Wong <djwong@kernel.org>
* xfs: store the btree pointer length in struct xfs_btree_opsDarrick J. Wong2024-02-221-10/+16
| | | | | | | | | | Make the pointer length an explicit field in the btree operations structure so that the next patch (which introduces an explicit btree type enum) doesn't have to play a bunch of awkward games with inferring the pointer length from the enumeration. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
* xfs: move the btree stats offset into struct btree_opsChristoph Hellwig2024-02-221-3/+7
| | | | | | | | | The statistics offset is completely static, move it into the btree_ops structure instead of the cursor. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
* xfs: move lru refs to the btree ops structureDarrick J. Wong2024-02-221-0/+3
| | | | | | | | | | Move the btree buffer LRU refcount to the btree ops structure so that we can eliminate the last bc_btnum switch in the generic btree code. We're about to create repair-specific btree types, and we don't want that stuff cluttering up libxfs. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
* xfs: remove the unnecessary daddr paramter to _init_blockDarrick J. Wong2024-02-221-1/+1
| | | | | | | | Now that all of the callers pass XFS_BUF_DADDR_NULL as the daddr parameter, we can elide that too. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
* xfs: rename btree block/buffer init functionsDarrick J. Wong2024-02-221-2/+2
| | | | | | | | | Rename xfs_btree_init_block_int to xfs_btree_init_block, and xfs_btree_init_block to xfs_btree_init_buf so that the name suggests the type that caller are supposed to pass in. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
* xfs: initialize btree blocks using btree_ops structureDarrick J. Wong2024-02-221-20/+8
| | | | | | | | | | Notice now that the btree ops structure encodes btree geometry flags and the magic number through the buffer ops. Refactor the btree block initialization functions to use the btree ops so that we no longer have to open code all that. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
* xfs: turn the allocbt cursor active field into a btree flagChristoph Hellwig2024-02-221-3/+3
| | | | | | | | Add a new XFS_BTREE_ALLOCBT_ACTIVE flag to replace the active field. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
* xfs: remove bc_ino.flagsChristoph Hellwig2024-02-221-6/+6
| | | | | | | | Just move the two flags into bc_flags where there is plenty of space. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
* xfs: encode the btree geometry flags in the btree ops structureDarrick J. Wong2024-02-221-9/+14
| | | | | | | | | | | | | | | | | | | | | | | | | Certain btree flags never change for the life of a btree cursor because they describe the geometry of the btree itself. Encode these in the btree ops structure and reduce the amount of code required in each btree type's init_cursor functions. This also frees up most of the bits in bc_flags. A previous version of this patch also converted the open-coded flags logic to helpers. This was removed due to the pending refactoring (that follows this patch) to eliminate most of the state flags. Conversion script: sed \ -e 's/XFS_BTREE_LONG_PTRS/XFS_BTGEO_LONG_PTRS/g' \ -e 's/XFS_BTREE_ROOT_IN_INODE/XFS_BTGEO_ROOT_IN_INODE/g' \ -e 's/XFS_BTREE_LASTREC_UPDATE/XFS_BTGEO_LASTREC_UPDATE/g' \ -e 's/XFS_BTREE_OVERLAPPING/XFS_BTGEO_OVERLAPPING/g' \ -e 's/cur->bc_flags & XFS_BTGEO_/cur->bc_ops->geom_flags \& XFS_BTGEO_/g' \ -i $(git ls-files fs/xfs/*.[ch] fs/xfs/libxfs/*.[ch] fs/xfs/scrub/*.[ch]) Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
* xfs: drop XFS_BTREE_CRC_BLOCKSDarrick J. Wong2024-02-221-1/+0
| | | | | | | | | | All existing btree types set XFS_BTREE_CRC_BLOCKS when running against a V5 filesystem. All currently proposed btree types are V5 only and use the richer XFS_BTREE_CRC_BLOCKS format. Therefore, we can drop this flag and change the conditional to xfs_has_crc. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
* xfs: set the btree cursor bc_ops in xfs_btree_alloc_cursorDarrick J. Wong2024-02-221-0/+2
| | | | | | | This is a precursor to putting more static data in the btree ops structure. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
* xfs: use __GFP_NOLOCKDEP instead of GFP_NOFSDave Chinner2024-02-131-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the past we've had problems with lockdep false positives stemming from inode locking occurring in memory reclaim contexts (e.g. from superblock shrinkers). Lockdep doesn't know that inodes access from above memory reclaim cannot be accessed from below memory reclaim (and vice versa) but there has never been a good solution to solving this problem with lockdep annotations. This situation isn't unique to inode locks - buffers are also locked above and below memory reclaim, and we have to maintain lock ordering for them - and against inodes - appropriately. IOWs, the same code paths and locks are taken both above and below memory reclaim and so we always need to make sure the lock orders are consistent. We are spared the lockdep problems this might cause by the fact that semaphores and bit locks aren't covered by lockdep. In general, this sort of lockdep false positive detection is cause by code that runs GFP_KERNEL memory allocation with an actively referenced inode locked. When it is run from a transaction, memory allocation is automatically GFP_NOFS, so we don't have reclaim recursion issues. So in the places where we do memory allocation with inodes locked outside of a transaction, we have explicitly set them to use GFP_NOFS allocations to prevent lockdep false positives from being reported if the allocation dips into direct memory reclaim. More recently, __GFP_NOLOCKDEP was added to the memory allocation flags to tell lockdep not to track that particular allocation for the purposes of reclaim recursion detection. This is a much better way of preventing false positives - it allows us to use GFP_KERNEL context outside of transactions, and allows direct memory reclaim to proceed normally without throwing out false positive deadlock warnings. The obvious places that lock inodes and do memory allocation are the lookup paths and inode extent list initialisation. These occur in non-transactional GFP_KERNEL contexts, and so can run direct reclaim and lock inodes. This patch makes a first path through all the explicit GFP_NOFS allocations in XFS and converts the obvious ones to GFP_KERNEL | __GFP_NOLOCKDEP as a first step towards removing explicit GFP_NOFS allocations from the XFS code. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* xfs: repair refcount btreesDarrick J. Wong2023-12-151-0/+2
| | | | | | | | | Reconstruct the refcount data from the rmap btree. Link: https://docs.kernel.org/filesystems/xfs-online-fsck-design.html#case-study-rebuilding-the-space-reference-counts Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
* xfs: read leaf blocks when computing keys for bulkloading into node blocksDarrick J. Wong2023-12-151-0/+3
| | | | | | | | | | | | | | | | | | | | | When constructing a new btree, xfs_btree_bload_node needs to read the btree blocks for level N to compute the keyptrs for the blocks that will be loaded into level N+1. The level N blocks must be formatted at that point. A subsequent patch will change the btree bulkloader to write new btree blocks in 256K chunks to moderate memory consumption if the new btree is very large. As a consequence of that, it's possible that the buffers for lower level blocks might have been reclaimed by the time the node builder comes back to the block. Therefore, change xfs_btree_bload_node to read the lower level blocks to handle the reclaimed buffer case. As a side effect, the read will increase the LRU refs, which will bias towards keeping new btree buffers in memory after the new btree commits. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
* overflow: Add struct_size_t() helperKees Cook2023-05-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While struct_size() is normally used in situations where the structure type already has a pointer instance, there are places where no variable is available. In the past, this has been worked around by using a typed NULL first argument, but this is a bit ugly. Add a helper to do this, and replace the handful of instances of the code pattern with it. Instances were found with this Coccinelle script: @struct_size_t@ identifier STRUCT, MEMBER; expression COUNT; @@ - struct_size((struct STRUCT *)\(0\|NULL\), + struct_size_t(struct STRUCT, MEMBER, COUNT) Suggested-by: Christoph Hellwig <hch@infradead.org> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: Tony Nguyen <anthony.l.nguyen@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: James Smart <james.smart@broadcom.com> Cc: Keith Busch <kbusch@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: HighPoint Linux Team <linux@highpoint-tech.com> Cc: "James E.J. Bottomley" <jejb@linux.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Sumit Saxena <sumit.saxena@broadcom.com> Cc: Shivasharan S <shivasharan.srikanteshwara@broadcom.com> Cc: Don Brace <don.brace@microchip.com> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Dave Chinner <dchinner@redhat.com> Cc: Guo Xuenan <guoxuenan@huawei.com> Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Daniel Latypov <dlatypov@google.com> Cc: kernel test robot <lkp@intel.com> Cc: intel-wired-lan@lists.osuosl.org Cc: netdev@vger.kernel.org Cc: linux-nvme@lists.infradead.org Cc: linux-scsi@vger.kernel.org Cc: megaraidlinux.pdl@broadcom.com Cc: storagedev@microchip.com Cc: linux-xfs@vger.kernel.org Cc: linux-hardening@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://lore.kernel.org/r/20230522211810.never.421-kees@kernel.org
* xfs: implement masked btree key comparisons for _has_records scansDarrick J. Wong2023-04-121-6/+44
| | | | | | | | | | | | | | | | | | For keyspace fullness scans, we want to be able to mask off the parts of the key that we don't care about. For most btree types we /do/ want the full keyspace, but for checking that a given space usage also has a full complement of rmapbt records (even if different/multiple owners) we need this masking so that we only track sparseness of rm_startblock, not the whole keyspace (which is extremely sparse). Augment the ->diff_two_keys and ->keys_contiguous helpers to take a third union xfs_btree_key argument, and wire up xfs_rmap_has_records to pass this through. This third "mask" argument should contain a nonzero value in each structure field that should be used in the key comparisons done during the scan. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
* xfs: replace xfs_btree_has_record with a general keyspace scannerDarrick J. Wong2023-04-121-2/+42
| | | | | | | | | | | | | | The current implementation of xfs_btree_has_record returns true if it finds /any/ record within the given range. Unfortunately, that's not sufficient for scrub. We want to be able to tell if a range of keyspace for a btree is devoid of records, is totally mapped to records, or is somewhere in between. By forcing this to be a boolean, we conflated sparseness and fullness, which caused scrub to return incorrect results. Fix the API so that we can tell the caller which of those three is the current state. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
* xfs: refactor ->diff_two_keys callsitesDarrick J. Wong2023-04-121-0/+55
| | | | | | | | | | | | Create wrapper functions around ->diff_two_keys so that we don't have to remember what the return values mean, and adjust some of the code comments to reflect the longtime code behavior. We're going to introduce more uses of ->diff_two_keys in the next patch, so reduce the cognitive load for readers by doing this refactoring now. Suggested-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
* xfs: get rid of assert from xfs_btree_islastblockGuo Xuenan2022-12-011-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xfs_btree_check_block contains debugging knobs. With XFS_DEBUG setting up, turn on the debugging knob can trigger the assert of xfs_btree_islastblock, test script as follows: while true do mount $disk $mountpoint fsstress -d $testdir -l 0 -n 10000 -p 4 >/dev/null echo 1 > /sys/fs/xfs/sda/errortag/btree_chk_sblk sleep 10 umount $mountpoint done Kick off fsstress and only *then* turn on the debugging knob. If it happens that the knob gets turned on after the cntbt lookup succeeds but before the call to xfs_btree_islastblock, then we *can* end up in the situation where a previously checked btree block suddenly starts returning EFSCORRUPTED from xfs_btree_check_block. Kaboom. Darrick give a very detailed explanation as follows: Looking back at commit 27d9ee577dcce, I think the point of all this was to make sure that the cursor has actually performed a lookup, and that the btree block at whatever level we're asking about is ok. If the caller hasn't ever done a lookup, the bc_levels array will be empty, so cur->bc_levels[level].bp pointer will be NULL. The call to xfs_btree_get_block will crash anyway, so the "ASSERT(block);" part is pointless. If the caller did a lookup but the lookup failed due to block corruption, the corresponding cur->bc_levels[level].bp pointer will also be NULL, and we'll still crash. The "ASSERT(xfs_btree_check_block);" logic is also unnecessary. If the cursor level points to an inode root, the block buffer will be incore, so it had better always be consistent. If the caller ignores a failed lookup after a successful one and calls this function, the cursor state is garbage and the assert wouldn't have tripped anyway. So get rid of the assert. Fixes: 27d9ee577dcc ("xfs: actually check xfs_btree_check_block return in xfs_btree_islastblock") Signed-off-by: Guo Xuenan <guoxuenan@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
* xfs: convert btree buffer log flags to unsigned.Dave Chinner2022-04-211-13/+13
| | | | | | | | | | | | | | 5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned fields to be unsigned. We also pass the fields to log to xfs_btree_offsets() as a uint32_t all cases now. I have no idea why we made that parameter a int64_t in the first place, but while we are fixing this up change it to a uint32_t field, too. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
* xfs: remove kmem_zone typedefDarrick J. Wong2021-10-231-2/+2
| | | | | | | Remove these typedefs by referencing kmem_cache directly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
* xfs: use separate btree cursor cache for each btree typeDarrick J. Wong2021-10-191-12/+8
| | | | | | | | | | | Now that we have the infrastructure to track the max possible height of each btree type, we can create a separate slab cache for cursors of each type of btree. For smaller indices like the free space btrees, this means that we can pack more cursors into a slab page, improving slab utilization. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
* xfs: kill XFS_BTREE_MAXLEVELSDarrick J. Wong2021-10-191-2/+0
| | | | | | | | Nobody uses this symbol anymore, so kill it. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
* xfs: compute the maximum height of the rmap btree when reflink enabledDarrick J. Wong2021-10-191-0/+2
| | | | | | | | | | | Instead of assuming that the hardcoded XFS_BTREE_MAXLEVELS value is big enough to handle the maximally tall rmap btree when all blocks are in use and maximally shared, let's compute the maximum height assuming the rmapbt consumes as many blocks as possible. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
* xfs: clean up xfs_btree_{calc_size,compute_maxlevels}Darrick J. Wong2021-10-191-2/+4
| | | | | | | | | | | | | | | | | | | During review of the next patch, Dave remarked that he found these two btree geometry calculation functions lacking in documentation and that they performed more work than was really necessary. These functions take the same parameters and have nearly the same logic; the only real difference is in the return values. Reword the function comment to make it clearer what each function does, and move them to be adjacent to reinforce their relation. Clean up both of them to stop opencoding the howmany functions, stop using the uint typedefs, and make them both support computations for more than 2^32 leaf records, since we're going to need all of the above for files with large data forks and large rmap btrees. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
* xfs: dynamically allocate cursors based on maxlevelsDarrick J. Wong2021-10-191-2/+11
| | | | | | | | | | | | | To support future btree code, we need to be able to size btree cursors dynamically for very large btrees. Switch the maxlevels computation to use the precomputed values in the superblock, and create cursors that can handle a certain height. For now, we retain the btree cursor cache that can handle up to 9-level btrees, though a subsequent patch introduces separate caches for each btree type, where each cache's objects will be exactly tall enough to handle the specific btree type. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
* xfs: encode the max btree height in the cursorDarrick J. Wong2021-10-191-0/+2
| | | | | | | | Encode the maximum btree height in the cursor, since we're soon going to allow smaller cursors for AG btrees and larger cursors for file btrees. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
* xfs: refactor btree cursor allocation functionDarrick J. Wong2021-10-191-0/+16
| | | | | | | | | Refactor btree allocation to a common helper. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
* xfs: rearrange xfs_btree_cur fields for better packingDarrick J. Wong2021-10-191-4/+4
| | | | | | | | | Reduce the size of the btree cursor structure some more by rearranging fields to eliminate unused space. While we're at it, fix the ragged indentation and a spelling error. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
* xfs: prepare xfs_btree_cur for dynamic cursor heightsDarrick J. Wong2021-10-191-6/+27
| | | | | | | | | | | | | | | | | Split out the btree level information into a separate struct and put it at the end of the cursor structure as a VLA. Files with huge data forks (and in the future, the realtime rmap btree) will require the ability to support many more levels than a per-AG btree cursor, which means that we're going to create per-btree type cursor caches to conserve memory for the more common case. Note that a subsequent patch actually introduces dynamic cursor heights. This one merely rearranges the structure to prepare for that. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
* xfs: reduce the size of nr_ops for refcount btree cursorsDarrick J. Wong2021-10-191-4/+4
| | | | | | | | | We're never going to run more than 4 billion btree operations on a refcount cursor, so shrink the field to an unsigned int to reduce the structure size. Fix whitespace alignment too. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
* xfs: remove xfs_btree_cur.bc_blocklogDarrick J. Wong2021-10-191-1/+0
| | | | | | | This field isn't used by anyone, so get rid of it. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
* xfs: remove xfs_btree_cur_t typedefDarrick J. Wong2021-10-141-6/+6
| | | | | | | | Get rid of this old typedef before we start changing other things. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>