summaryrefslogtreecommitdiffstats
path: root/mm/mprotect.c
diff options
context:
space:
mode:
authorNadav Amit <namit@vmware.com>2022-11-08 18:46:46 +0100
committerAndrew Morton <akpm@linux-foundation.org>2022-12-01 00:58:48 +0100
commitd84887739d5c982afa50b155aad628bb8ff206c5 (patch)
tree4219fd142e6b07eaa8f38010274b929276ee224a /mm/mprotect.c
parenttools/vm/page_owner: ignore page_owner_sort binary (diff)
downloadlinux-d84887739d5c982afa50b155aad628bb8ff206c5.tar.xz
linux-d84887739d5c982afa50b155aad628bb8ff206c5.zip
mm/mprotect: allow clean exclusive anon pages to be writable
Patch series "mm/autonuma: replace savedwrite infrastructure", v2. As discussed in my talk at LPC, we can reuse the same mechanism for deciding whether to map a pte writable when upgrading permissions via mprotect() -- e.g., PROT_READ -> PROT_READ|PROT_WRITE -- to replace the savedwrite infrastructure used for NUMA hinting faults (e.g., PROT_NONE -> PROT_READ|PROT_WRITE). Instead of maintaining previous write permissions for a pte/pmd, we re-determine if the pte/pmd can be writable. The big benefit is that we have a common logic for deciding whether we can map a pte/pmd writable on protection changes. For private mappings, there should be no difference -- from what I understand, that is what autonuma benchmarks care about. I ran autonumabench for v1 on a system with 2 NUMA nodes, 96 GiB each via: perf stat --null --repeat 10 The numa01 benchmark is quite noisy in my environment and I failed to reduce the noise so far. numa01: mm-unstable: 146.88 +- 6.54 seconds time elapsed ( +- 4.45% ) mm-unstable++: 147.45 +- 13.39 seconds time elapsed ( +- 9.08% ) numa02: mm-unstable: 16.0300 +- 0.0624 seconds time elapsed ( +- 0.39% ) mm-unstable++: 16.1281 +- 0.0945 seconds time elapsed ( +- 0.59% ) It is worth noting that for shared writable mappings that require writenotify, we will only avoid write faults if the pte/pmd is dirty (inherited from the older mprotect logic). If we ever care about optimizing that further, we'd need a different mechanism to identify whether the FS still needs to get notified on the next write access. In any case, such an optimization will then not be autonuma-specific, but mprotect() permission upgrades would similarly benefit from it. This patch (of 7): Anonymous pages might have the dirty bit clear, but this should not prevent mprotect from making them writable if they are exclusive. Therefore, skip the test whether the page is dirty in this case. Note that there are already other ways to get a writable PTE mapping an anonymous page that is clean: for example, via MADV_FREE. In an ideal world, we'd have a different indication from the FS whether writenotify is still required. [david@redhat.com: return directly; update description] Link: https://lkml.kernel.org/r/20221108174652.198904-1-david@redhat.com Link: https://lkml.kernel.org/r/20221108174652.198904-2-david@redhat.com Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Dave Chinner <david@fromorbit.com> Cc: Peter Xu <peterx@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Diffstat (limited to 'mm/mprotect.c')
-rw-r--r--mm/mprotect.c7
1 files changed, 3 insertions, 4 deletions
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 8d770855b591..86a28c0e190f 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -46,7 +46,7 @@ static inline bool can_change_pte_writable(struct vm_area_struct *vma,
VM_BUG_ON(!(vma->vm_flags & VM_WRITE) || pte_write(pte));
- if (pte_protnone(pte) || !pte_dirty(pte))
+ if (pte_protnone(pte))
return false;
/* Do we need write faults for softdirty tracking? */
@@ -65,11 +65,10 @@ static inline bool can_change_pte_writable(struct vm_area_struct *vma,
* the PT lock.
*/
page = vm_normal_page(vma, addr, pte);
- if (!page || !PageAnon(page) || !PageAnonExclusive(page))
- return false;
+ return page && PageAnon(page) && PageAnonExclusive(page);
}
- return true;
+ return pte_dirty(pte);
}
static unsigned long change_pte_range(struct mmu_gather *tlb,