summaryrefslogtreecommitdiffstats
path: root/mm/vma.h (follow)
Commit message (Collapse)AuthorAgeFilesLines
* mm: make mmap_region() internalLorenzo Stoakes4 days1-1/+1
| | | | | | | | | | | | | | | | | | | | Now that we have removed the one user of mmap_region() outside of mm, make it internal and add it to vma.c so it can be userland tested. This ensures that all external memory mappings are performed using the appropriate interfaces and allows us to modify memory mapping logic as we see fit. Additionally expand test stubs to allow for the mmap_region() code to compile and be userland testable. Link: https://lkml.kernel.org/r/de5a3c574d35c26237edf20a1d8652d7305709c9.1735819274.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Jann Horn <jannh@google.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm: enforce __must_check on VMA merge and splitLorenzo Stoakes2025-01-141-11/+15
| | | | | | | | | | | | | | It is of critical importance to check the return results on VMA merge (and split), failure to do so can result in use-after-free's. This bug has recurred, so have the compiler enforce this check to prevent any future repetition. Link: https://lkml.kernel.org/r/20241206225036.273103-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/vma: move __vm_munmap() to mm/vma.cLorenzo Stoakes2025-01-141-0/+2
| | | | | | | | | | | | | | | | | This was arbitrarily left in mmap.c it makes no sense being there, move it to vma.c to render it testable. Link: https://lkml.kernel.org/r/5e5e81807c54dfbe363edb2d431eb3d7a37fcdba.1733248985.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <kees@kernel.org> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/vma: move stack expansion logic to mm/vma.cLorenzo Stoakes2025-01-141-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | We build on previous work making expand_downwards() an entirely internal function. This logic is subtle and so it is highly useful to get it into vma.c so we can then userland unit test. We must additionally move acct_stack_growth() to vma.c as it is a helper function used by both expand_downwards() and expand_upwards(). We are also then able to mark anon_vma_interval_tree_pre_update_vma() and anon_vma_interval_tree_post_update_vma() static as these are no longer used by anything else. Link: https://lkml.kernel.org/r/0feb104eff85922019d4fb29280f3afb130c5204.1733248985.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <kees@kernel.org> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/vma: move unmapped_area() internals to mm/vma.cLorenzo Stoakes2025-01-141-0/+3
| | | | | | | | | | | | | | | | | | | | We want to be able to unit test the unmapped area logic, so move it to mm/vma.c. The wrappers which invoke this remain in place in mm/mmap.c. In addition, naturally, update the existing test code to enable this to be compiled in userland. Link: https://lkml.kernel.org/r/53a57a52a64ea54e9d129d2e2abca3a538022379.1733248985.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <kees@kernel.org> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/vma: move brk() internals to mm/vma.cLorenzo Stoakes2025-01-141-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "mm/vma: make more mmap logic userland testable". This series carries on the work started in previous series and continued in commit 52956b0d7fb9 ("mm: isolate mmap internal logic to mm/vma.c"), moving the remainder of memory mapping implementation details logic into mm/vma.c allowing the bulk of the mapping logic to be unit tested. It is highly useful to do so, as this means we can both fundamentally test this core logic, and introduce regression tests to ensure any issues previously resolved do not recur. Vitally, this includes the do_brk_flags() function, meaning we have both core means of userland mapping memory now testable. Performance testing was performed after this change given the brk() system call's sensitivity to change, and no performance regression was observed. The stack expansion logic is also moved into mm/vma.c, which necessitates a change in the API exposed to the exec code, removing the invocation of the expand_downwards() function used in get_arg_page() and instead adding mmap_read_lock_maybe_expand() to wrap this. This patch (of 5): Now we have moved mmap_region() internals to mm/vma.c, making it available to userland testing, it makes sense to do the same with brk(). This continues the pattern of VMA heavy lifting being done in mm/vma.c in an environment where it can be subject to straightforward unit and regression testing, with other VMA-adjacent files becoming wrappers around this functionality. [lorenzo.stoakes@oracle.com: add missing personality header import] Link: https://lkml.kernel.org/r/2a717265-985f-45eb-9257-8b2857088ed4@lucifer.local Link: https://lkml.kernel.org/r/cover.1733248985.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/3d24b9e67bb0261539ca921d1188a10a1b4d4357.1733248985.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <kees@kernel.org> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm: isolate mmap internal logic to mm/vma.cLorenzo Stoakes2024-11-071-93/+4
| | | | | | | | | | | | | | | | | | | | | | | In previous commits we effected improvements to the mmap() logic in mmap_region() and its newly introduced internal implementation function __mmap_region(). However as these changes are intended to be backported, we kept the delta as small as is possible and made as few changes as possible to the newly introduced mm/vma.* files. Take the opportunity to move this logic to mm/vma.c which not only isolates it, but also makes it available for later userland testing which can help us catch such logic errors far earlier. Link: https://lkml.kernel.org/r/93fc2c3aa37dd30590b7e4ee067dfd832007bf7e.1729858176.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm: refactor map_deny_write_exec()Lorenzo Stoakes2024-11-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Refactor the map_deny_write_exec() to not unnecessarily require a VMA parameter but rather to accept VMA flags parameters, which allows us to use this function early in mmap_region() in a subsequent commit. While we're here, we refactor the function to be more readable and add some additional documentation. Link: https://lkml.kernel.org/r/6be8bb59cd7c68006ebb006eb9d8dc27104b1f70.1730224667.git.lorenzo.stoakes@oracle.com Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails") Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reported-by: Jann Horn <jannh@google.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Jann Horn <jannh@google.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Helge Deller <deller@gmx.de> Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Brown <broonie@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Will Deacon <will@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm: unconditionally close VMAs on errorLorenzo Stoakes2024-11-061-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Incorrect invocation of VMA callbacks when the VMA is no longer in a consistent state is bug prone and risky to perform. With regards to the important vm_ops->close() callback We have gone to great lengths to try to track whether or not we ought to close VMAs. Rather than doing so and risking making a mistake somewhere, instead unconditionally close and reset vma->vm_ops to an empty dummy operations set with a NULL .close operator. We introduce a new function to do so - vma_close() - and simplify existing vms logic which tracked whether we needed to close or not. This simplifies the logic, avoids incorrect double-calling of the .close() callback and allows us to update error paths to simply call vma_close() unconditionally - making VMA closure idempotent. Link: https://lkml.kernel.org/r/28e89dda96f68c505cb6f8e9fc9b57c3e9f74b42.1730224667.git.lorenzo.stoakes@oracle.com Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails") Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reported-by: Jann Horn <jannh@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Jann Horn <jannh@google.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Helge Deller <deller@gmx.de> Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Brown <broonie@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Will Deacon <will@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/vma: add expand-only VMA merge mode and optimise do_brk_flags()Lorenzo Stoakes2024-10-291-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "introduce VMA merge mode to improve brk() performance". A ~5% performance regression was discovered on the aim9.brk_test.ops_per_sec by the linux kernel test bot [0]. In the past to satisfy brk() performance we duplicated VMA expansion code and special-cased do_brk_flags(). This is however horrid and undoes work to abstract this logic, so in resolving the issue I have endeavoured to avoid this. Investigating further I was able to observe that the use of a vma_iter_next_range() and vma_prev() pair, causing an unnecessary maple tree walk. In addition there is work that we do that is simply unnecessary for brk(). Therefore, add a special VMA merge mode VMG_FLAG_JUST_EXPAND to avoid doing any of this - it assumes the VMA iterator is pointing at the previous VMA and which skips logic that brk() does not require. This mostly eliminates the performance regression reducing it to ~2% which is in the realm of noise. In addition, the will-it-scale test brk2, written to be more representative of real-world brk() usage, shows a modest performance improvement - which gives me confidence that we are not meaningfully regressing real workloads here. This series includes a test asserting that the 'just expand' mode works as expected. With many thanks to Oliver Sang for helping with performance testing of candidate patch sets! [0]:https://lore.kernel.org/linux-mm/202409301043.629bea78-oliver.sang@intel.com This patch (of 2): We know in advance that do_brk_flags() wants only to perform a VMA expansion (if the prior VMA is compatible), and that we assume no mergeable VMA follows it. These are the semantics of this function prior to the recent rewrite of the VMA merging logic, however we are now doing more work than necessary - positioning the VMA iterator at the prior VMA and performing tasks that are not required. Add a new field to the vmg struct to permit merge flags and add a new merge flag VMG_FLAG_JUST_EXPAND which implies this behaviour, and have do_brk_flags() use this. This fixes a reported performance regression in a brk() benchmarking suite. Link: https://lkml.kernel.org/r/cover.1729174352.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/4e65d4395e5841c5acf8470dbcb714016364fd39.1729174352.git.lorenzo.stoakes@oracle.com Fixes: cacded5e42b9 ("mm: avoid using vma_merge() for new VMAs") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/linux-mm/202409301043.629bea78-oliver.sang@intel.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Jann Horn <jannh@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm: mark mas allocation in vms_abort_munmap_vmas as __GFP_NOFAILJann Horn2024-10-291-9/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | vms_abort_munmap_vmas() is a recovery path where, on entry, some VMAs have already been torn down halfway (in a way we can't undo) but are still present in the maple tree. At this point, we *must* remove the VMAs from the VMA tree, otherwise we get UAF. Because removing VMA tree nodes can require memory allocation, the existing code has an error path which tries to handle this by reattaching the VMAs; but that can't be done safely. A nicer way to fix it would probably be to preallocate enough maple tree nodes for the removal before the point of no return, or something like that; but for now, fix it the easy and kinda ugly way, by marking this allocation __GFP_NOFAIL. Link: https://lkml.kernel.org/r/20241016-fix-munmap-abort-v1-1-601c94b2240d@google.com Fixes: 4f87153e82c4 ("mm: change failure of MAP_FIXED to restoring the gap on failure") Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm: make vma_prepare() and friends static and internal to vma.cLorenzo Stoakes2024-09-041-25/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Now we have abstracted merge behaviour for new VMA ranges, we are able to render vma_prepare(), init_vma_prep(), vma_complete(), can_vma_merge_before() and can_vma_merge_after() static and internal to vma.c. These are internal implementation details of kernel VMA manipulation and merging mechanisms and thus should not be exposed. This also renders the functions userland testable. Link: https://lkml.kernel.org/r/7f7f1c34ce10405a6aab2714c505af3cf41b7851.1725040657.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Mark Brown <broonie@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm: avoid using vma_merge() for new VMAsLorenzo Stoakes2024-09-041-4/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Abstract vma_merge_new_vma() to use vma_merge_struct and rename the resultant function vma_merge_new_range() to be clear what the purpose of this function is - a new VMA is desired in the specified range, and we wish to see if it is possible to 'merge' surrounding VMAs into this range rather than having to allocate a new VMA. Note that this function uses vma_extend() exclusively, so adopts its requirement that the iterator point at or before the gap. We add an assert to this effect. This is as opposed to vma_merge_existing_range(), which will be introduced in a subsequent commit, and provide the same functionality for cases in which we are modifying an existing VMA. In mmap_region() and do_brk_flags() we open code scenarios where we prefer to use vma_expand() rather than invoke a full vma_merge() operation. Abstract this logic and eliminate all of the open-coding, and also use the same logic for all cases where we add new VMAs to, rather than ultimately use vma_merge(), rather use vma_expand(). Doing so removes duplication and simplifies VMA merging in all such cases, laying the ground for us to eliminate the merging of new VMAs in vma_merge() altogether. Also add the ability for the vmg to track state, and able to report errors, allowing for us to differentiate a failed merge from an inability to allocate memory in callers. This makes it far easier to understand what is happening in these cases avoiding confusion, bugs and allowing for future optimisation. Also introduce vma_iter_next_rewind() to allow for retrieval of the next, and (optionally) the prev VMA, rewinding to the start of the previous gap. Introduce are_anon_vmas_compatible() to abstract individual VMA anon_vma comparison for the case of merging on both sides where the anon_vma of the VMA being merged maybe compatible with prev and next, but prev and next's anon_vma's may not be compatible with each other. Finally also introduce can_vma_merge_left() / can_vma_merge_right() to check adjacent VMA compatibility and that they are indeed adjacent. Link: https://lkml.kernel.org/r/49d37c0769b6b9dc03b27fe4d059173832556392.1725040657.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Tested-by: Mark Brown <broonie@kernel.org> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm: abstract vma_expand() to use vma_merge_structLorenzo Stoakes2024-09-041-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The purpose of the vmg is to thread merge state through functions and avoid egregious parameter lists. We expand this to vma_expand(), which is used for a number of merge cases. Accordingly, adjust its callers, mmap_region() and relocate_vma_down(), to use a vmg. An added purpose of this change is the ability in a future commit to perform all new VMA range merging using vma_expand(). Link: https://lkml.kernel.org/r/4bc8c9dbc9ca52452ef8e587b28fe555854ceb38.1725040657.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Mark Brown <broonie@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm: introduce vma_merge_struct and abstract vma_merge(),vma_modify()Lorenzo Stoakes2024-09-041-51/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than passing around huge numbers of parameters to numerous helper functions, abstract them into a single struct that we thread through the operation, the vma_merge_struct ('vmg'). Adjust vma_merge() and vma_modify() to accept this parameter, as well as predicate functions can_vma_merge_before(), can_vma_merge_after(), and the vma_modify_...() helper functions. Also introduce VMG_STATE() and VMG_VMA_STATE() helper macros to allow for easy vmg declaration. We additionally remove the requirement that vma_merge() is passed a VMA object representing the candidate new VMA. Previously it used this to obtain the mm_struct, file and anon_vma properties of the proposed range (a rather confusing state of affairs), which are now provided by the vmg directly. We also remove the pgoff calculation previously performed vma_modify(), and instead calculate this in VMG_VMA_STATE() via the vma_pgoff_offset() helper. Link: https://lkml.kernel.org/r/a955aad09d81329f6fbeb636b2dd10cde7b73dab.1725040657.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Mark Brown <broonie@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/vma.h: optimise vma_munmap_structLiam R. Howlett2024-09-041-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The vma_munmap_struct has a hole of 4 bytes and pushes the struct to three cachelines. Relocating the three booleans upwards allows for the struct to only use two cachelines (as reported by pahole on amd64). Before: struct vma_munmap_struct { struct vma_iterator * vmi; /* 0 8 */ struct vm_area_struct * vma; /* 8 8 */ struct vm_area_struct * prev; /* 16 8 */ struct vm_area_struct * next; /* 24 8 */ struct list_head * uf; /* 32 8 */ long unsigned int start; /* 40 8 */ long unsigned int end; /* 48 8 */ long unsigned int unmap_start; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ long unsigned int unmap_end; /* 64 8 */ int vma_count; /* 72 4 */ /* XXX 4 bytes hole, try to pack */ long unsigned int nr_pages; /* 80 8 */ long unsigned int locked_vm; /* 88 8 */ long unsigned int nr_accounted; /* 96 8 */ long unsigned int exec_vm; /* 104 8 */ long unsigned int stack_vm; /* 112 8 */ long unsigned int data_vm; /* 120 8 */ /* --- cacheline 2 boundary (128 bytes) --- */ bool unlock; /* 128 1 */ bool clear_ptes; /* 129 1 */ bool closed_vm_ops; /* 130 1 */ /* size: 136, cachelines: 3, members: 19 */ /* sum members: 127, holes: 1, sum holes: 4 */ /* padding: 5 */ /* last cacheline: 8 bytes */ }; After: struct vma_munmap_struct { struct vma_iterator * vmi; /* 0 8 */ struct vm_area_struct * vma; /* 8 8 */ struct vm_area_struct * prev; /* 16 8 */ struct vm_area_struct * next; /* 24 8 */ struct list_head * uf; /* 32 8 */ long unsigned int start; /* 40 8 */ long unsigned int end; /* 48 8 */ long unsigned int unmap_start; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ long unsigned int unmap_end; /* 64 8 */ int vma_count; /* 72 4 */ bool unlock; /* 76 1 */ bool clear_ptes; /* 77 1 */ bool closed_vm_ops; /* 78 1 */ /* XXX 1 byte hole, try to pack */ long unsigned int nr_pages; /* 80 8 */ long unsigned int locked_vm; /* 88 8 */ long unsigned int nr_accounted; /* 96 8 */ long unsigned int exec_vm; /* 104 8 */ long unsigned int stack_vm; /* 112 8 */ long unsigned int data_vm; /* 120 8 */ /* size: 128, cachelines: 2, members: 19 */ /* sum members: 127, holes: 1, sum holes: 1 */ }; Link: https://lkml.kernel.org/r/20240830040101.822209-22-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Mark Brown <broonie@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm: move may_expand_vm() check in mmap_region()Liam R. Howlett2024-09-041-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The may_expand_vm() check requires the count of the pages within the munmap range. Since this is needed for accounting and obtained later, the reodering of ma_expand_vm() to later in the call stack, after the vma munmap struct (vms) is initialised and the gather stage is potentially run, will allow for a single loop over the vmas. The gather sage does not commit any work and so everything can be undone in the case of a failure. The MAP_FIXED page count is available after the vms_gather_munmap_vmas() call, so use it instead of looping over the vmas twice. Link: https://lkml.kernel.org/r/20240830040101.822209-20-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Mark Brown <broonie@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* ipc/shm, mm: drop do_vma_munmap()Liam R. Howlett2024-09-041-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The do_vma_munmap() wrapper existed for callers that didn't have a vma iterator and needed to check the vma mseal status prior to calling the underlying munmap(). All callers now use a vma iterator and since the mseal check has been moved to do_vmi_align_munmap() and the vmas are aligned, this function can just be called instead. do_vmi_align_munmap() can no longer be static as ipc/shm is using it and it is exported via the mm.h header. Link: https://lkml.kernel.org/r/20240830040101.822209-19-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Mark Brown <broonie@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm: change failure of MAP_FIXED to restoring the gap on failureLiam R. Howlett2024-09-041-22/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to call_mmap(), the vmas that will be replaced need to clear the way for what may happen in the call_mmap(). This clean up work includes clearing the ptes and calling the close() vm_ops. Some users do more setup than can be restored by calling the vm_ops open() function. It is safer to store the gap in the vma tree in these cases. That is to say that the failure scenario that existed before the MAP_FIXED gap exposure is restored as it is safer than trying to undo a partial mapping. Since abort_munmap_vmas() is only reattaching vmas with this change, the function is renamed to reattach_vmas(). There is also a secondary failure that may occur if there is not enough memory to store the gap. In this case, the vmas are reattached and resources freed. If the system cannot complete the call_mmap() and fails to allocate with GFP_KERNEL, then the system will print a warning about the failure. [lorenzo.stoakes@oracle.com: fix off-by-one error in vms_abort_munmap_vmas()] Link: https://lkml.kernel.org/r/52ee7eb3-955c-4ade-b5f0-28fed8ba3d0b@lucifer.local Link: https://lkml.kernel.org/r/20240830040101.822209-16-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Mark Brown <broonie@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/mmap: avoid zeroing vma tree in mmap_region()Liam R. Howlett2024-09-041-6/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of zeroing the vma tree and then overwriting the area, let the area be overwritten and then clean up the gathered vmas using vms_complete_munmap_vmas(). To ensure locking is downgraded correctly, the mm is set regardless of MAP_FIXED or not (NULL vma). If a driver is mapping over an existing vma, then clear the ptes before the call_mmap() invocation. This is done using the vms_clean_up_area() helper. If there is a close vm_ops, that must also be called to ensure any cleanup is done before mapping over the area. This also means that calling open has been added to the abort of an unmap operation, for now. Since vm_ops->open() and vm_ops->close() are not always undo each other (state cleanup may exist in ->close() that is lost forever), the code cannot be left in this way, but that change has been isolated to another commit to make this point very obvious for traceability. Temporarily keep track of the number of pages that will be removed and reduce the charged amount. This also drops the validate_mm() call in the vma_expand() function. It is necessary to drop the validate as it would fail since the mm map_count would be incorrect during a vma expansion, prior to the cleanup from vms_complete_munmap_vmas(). Clean up the error handing of the vms_gather_munmap_vmas() by calling the verification within the function. Link: https://lkml.kernel.org/r/20240830040101.822209-15-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Mark Brown <broonie@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm: clean up unmap_region() argument listLiam R. Howlett2024-09-041-4/+2
| | | | | | | | | | | | | | | | | | | | | | With the only caller to unmap_region() being the error path of mmap_region(), the argument list can be significantly reduced. Link: https://lkml.kernel.org/r/20240830040101.822209-14-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Mark Brown <broonie@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/vma: track start and end for munmap in vma_munmap_structLiam R. Howlett2024-09-041-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | Set the start and end address for munmap when the prev and next are gathered. This is needed to avoid incorrect addresses being used during the vms_complete_munmap_vmas() function if the prev/next vma are expanded. Add a new helper vms_complete_pte_clear(), which is needed later and will avoid growing the argument list to unmap_region() beyond the 9 it already has. Link: https://lkml.kernel.org/r/20240830040101.822209-13-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Mark Brown <broonie@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/mmap: reposition vma iterator in mmap_region()Liam R. Howlett2024-09-041-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of moving (or leaving) the vma iterator pointing at the previous vma, leave it pointing at the insert location. Pointing the vma iterator at the insert location allows for a cleaner walk of the vma tree for MAP_FIXED and the no expansion cases. The vma_prev() call in the case of merging the previous vma is equivalent to vma_iter_prev_range(), since the vma iterator will be pointing to the location just before the previous vma. This change needs to export abort_munmap_vmas() from mm/vma. Link: https://lkml.kernel.org/r/20240830040101.822209-12-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Mark Brown <broonie@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/vma: support vma == NULL in init_vma_munmap()Liam R. Howlett2024-09-041-3/+8
| | | | | | | | | | | | | | | | | | | | | | | Adding support for a NULL vma means the init_vma_munmap() can be initialized for a less error-prone process when calling vms_complete_munmap_vmas() later on. Link: https://lkml.kernel.org/r/20240830040101.822209-11-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Mark Brown <broonie@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/vma: expand mmap_region() munmap callLiam R. Howlett2024-09-041-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | Open code the do_vmi_align_munmap() call so that it can be broken up later in the series. This requires exposing a few more vma operations. Link: https://lkml.kernel.org/r/20240830040101.822209-10-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Mark Brown <broonie@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/vma: change munmap to use vma_munmap_struct() for accounting and ↵Liam R. Howlett2024-09-041-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | surrounding vmas Clean up the code by changing the munmap operation to use a structure for the accounting and munmap variables. Since remove_mt() is only called in one location and the contents will be reduced to almost nothing. The remains of the function can be added to vms_complete_munmap_vmas(). Link: https://lkml.kernel.org/r/20240830040101.822209-7-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Mark Brown <broonie@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/vma: introduce vma_munmap_struct for use in munmap operationsLiam R. Howlett2024-09-041-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | Use a structure to pass along all the necessary information and counters involved in removing vmas from the mm_struct. Update vmi_ function names to vms_ to indicate the first argument type change. Link: https://lkml.kernel.org/r/20240830040101.822209-6-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Mark Brown <broonie@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mseal: replace can_modify_mm_madv with a vma variantPedro Falcato2024-09-041-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | Replace can_modify_mm_madv() with a single vma variant, and associated checks in madvise. While we're at it, also invert the order of checks in: if (unlikely(is_ro_anon(vma) && !can_modify_vma(vma)) Checking if we can modify the vma itself (through vm_flags) is certainly cheaper than is_ro_anon() due to arch_vma_access_permitted() looking at e.g pkeys registers (with extra branches) in some architectures. This patch allows for partial madvise success when finding a sealed VMA, which historically has been allowed in Linux. Link: https://lkml.kernel.org/r/20240817-mseal-depessimize-v3-5-d8d2e037df30@gmail.com Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Kees Cook <kees@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Shuah Khan <shuah@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm: move can_modify_vma to mm/vma.hPedro Falcato2024-09-041-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "mm: Optimize mseal checks", v3. Optimize mseal checks by removing the separate can_modify_mm() step, and just doing checks on the individual vmas, when various operations are themselves iterating through the tree. This provides a nice speedup and restores performance parity with pre-mseal[3]. will-it-scale mmap1_process[1] -t 1 results: commit 3450fe2b574b4345e4296ccae395149e1a357fee: min:277605 max:277605 total:277605 min:281784 max:281784 total:281784 min:277238 max:277238 total:277238 min:281761 max:281761 total:281761 min:274279 max:274279 total:274279 min:254854 max:254854 total:254854 measurement min:269143 max:269143 total:269143 min:270454 max:270454 total:270454 min:243523 max:243523 total:243523 min:251148 max:251148 total:251148 min:209669 max:209669 total:209669 min:190426 max:190426 total:190426 min:231219 max:231219 total:231219 min:275364 max:275364 total:275364 min:266540 max:266540 total:266540 min:242572 max:242572 total:242572 min:284469 max:284469 total:284469 min:278882 max:278882 total:278882 min:283269 max:283269 total:283269 min:281204 max:281204 total:281204 After this patch set: min:280580 max:280580 total:280580 min:290514 max:290514 total:290514 min:291006 max:291006 total:291006 min:290352 max:290352 total:290352 min:294582 max:294582 total:294582 min:293075 max:293075 total:293075 measurement min:295613 max:295613 total:295613 min:294070 max:294070 total:294070 min:293193 max:293193 total:293193 min:291631 max:291631 total:291631 min:295278 max:295278 total:295278 min:293782 max:293782 total:293782 min:290361 max:290361 total:290361 min:294517 max:294517 total:294517 min:293750 max:293750 total:293750 min:293572 max:293572 total:293572 min:295239 max:295239 total:295239 min:292932 max:292932 total:292932 min:293319 max:293319 total:293319 min:294954 max:294954 total:294954 This was a Completely Unscientific test but seems to show there were around 5-10% gains on ops per second. Oliver performed his own tests and showed[3] a similar ~5% gain in them. [1]: mmap1_process does mmap and munmap in a loop. I didn't bother testing multithreading cases. [2]: https://lore.kernel.org/all/20240807124103.85644-1-mpe@ellerman.id.au/ [3]: https://lore.kernel.org/all/ZrMMJfe9aXSWxJz6@xsang-OptiPlex-9020/ Link: https://lore.kernel.org/all/202408041602.caa0372-oliver.sang@intel.com/ This patch (of 7): Move can_modify_vma to vma.h so it can be inlined properly (with the intent to remove can_modify_mm callsites). Link: https://lkml.kernel.org/r/20240817-mseal-depessimize-v3-1-d8d2e037df30@gmail.com Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Kees Cook <kees@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Pedro Falcato <pedro.falcato@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm: move internal core VMA manipulation functions to own fileLorenzo Stoakes2024-09-021-0/+364
This patch introduces vma.c and moves internal core VMA manipulation functions to this file from mmap.c. This allows us to isolate VMA functionality in a single place such that we can create userspace testing code that invokes this functionality in an environment where we can implement simple unit tests of core functionality. This patch ensures that core VMA functionality is explicitly marked as such by its presence in mm/vma.h. It also places the header includes required by vma.c in vma_internal.h, which is simply imported by vma.c. This makes the VMA functionality testable, as userland testing code can simply stub out functionality as required. Link: https://lkml.kernel.org/r/c77a6aafb4c42aaadb8e7271a853658cbdca2e22.1722251717.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Gow <davidgow@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Jan Kara <jack@suse.cz> Cc: Kees Cook <kees@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Rae Moar <rmoar@google.com> Cc: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Pengfei Xu <pengfei.xu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>