diff options
author | Qi Zheng <zhengqi.arch@bytedance.com> | 2024-12-04 12:09:41 +0100 |
---|---|---|
committer | Andrew Morton <akpm@linux-foundation.org> | 2025-01-14 07:40:46 +0100 |
commit | 6c18ec9af86ce96245324dd0cd5067dcfb026508 (patch) | |
tree | f4b32aacbff66117a4578c7a0cd88da322e1a04f /Documentation/mm/process_addrs.rst | |
parent | mm/hugetlb: don't map folios writable without VM_WRITE when copying during fo... (diff) | |
download | linux-6c18ec9af86ce96245324dd0cd5067dcfb026508.tar.xz linux-6c18ec9af86ce96245324dd0cd5067dcfb026508.zip |
mm: khugepaged: recheck pmd state in retract_page_tables()
Patch series "synchronously scan and reclaim empty user PTE pages", v4.
Previously, we tried to use a completely asynchronous method to reclaim
empty user PTE pages [1]. After discussing with David Hildenbrand, we
decided to implement synchronous reclaimation in the case of
madvise(MADV_DONTNEED) as the first step.
So this series aims to synchronously free the empty PTE pages in
madvise(MADV_DONTNEED) case. We will detect and free empty PTE pages in
zap_pte_range(), and will add zap_details.reclaim_pt to exclude cases
other than madvise(MADV_DONTNEED).
In zap_pte_range(), mmu_gather is used to perform batch tlb flushing and
page freeing operations. Therefore, if we want to free the empty PTE page
in this path, the most natural way is to add it to mmu_gather as well.
Now, if CONFIG_MMU_GATHER_RCU_TABLE_FREE is selected, mmu_gather will free
page table pages by semi RCU:
- batch table freeing: asynchronous free by RCU
- single table freeing: IPI + synchronous free
But this is not enough to free the empty PTE page table pages in paths
other that munmap and exit_mmap path, because IPI cannot be synchronized
with rcu_read_lock() in pte_offset_map{_lock}(). So we should let single
table also be freed by RCU like batch table freeing.
As a first step, we supported this feature on x86_64 and selectd the newly
introduced CONFIG_ARCH_SUPPORTS_PT_RECLAIM.
For other cases such as madvise(MADV_FREE), consider scanning and freeing
empty PTE pages asynchronously in the future.
Note: issues related to TLB flushing are not new to this series and are tracked
in the separate RFC patch [3]. And more context please refer to this
thread [4].
[1]. https://lore.kernel.org/lkml/cover.1718267194.git.zhengqi.arch@bytedance.com/
[2]. https://lore.kernel.org/lkml/cover.1727332572.git.zhengqi.arch@bytedance.com/
[3]. https://lore.kernel.org/lkml/20240815120715.14516-1-zhengqi.arch@bytedance.com/
[4]. https://lore.kernel.org/lkml/6f38cb19-9847-4f70-bbe7-06881bb016be@bytedance.com/
This patch (of 11):
In retract_page_tables(), the lock of new_folio is still held, we will be
blocked in the page fault path, which prevents the pte entries from being
set again. So even though the old empty PTE page may be concurrently
freed and a new PTE page is filled into the pmd entry, it is still empty
and can be removed.
So just refactor the retract_page_tables() a little bit and recheck the
pmd state after holding the pmd lock.
Link: https://lkml.kernel.org/r/cover.1733305182.git.zhengqi.arch@bytedance.com
Link: https://lkml.kernel.org/r/70a51804cd19d44ccaf031825d9fb6eaf92f2bad.1733305182.git.zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Suggested-by: Jann Horn <jannh@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: Zach O'Keefe <zokeefe@google.com>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Diffstat (limited to 'Documentation/mm/process_addrs.rst')
-rw-r--r-- | Documentation/mm/process_addrs.rst | 4 |
1 files changed, 4 insertions, 0 deletions
diff --git a/Documentation/mm/process_addrs.rst b/Documentation/mm/process_addrs.rst index 1d416658d7f5..81417fa2ed20 100644 --- a/Documentation/mm/process_addrs.rst +++ b/Documentation/mm/process_addrs.rst @@ -531,6 +531,10 @@ are extra requirements for accessing them: new page table has been installed in the same location and filled with entries. Writers normally need to take the PTE lock and revalidate that the PMD entry still refers to the same PTE-level page table. + If the writer does not care whether it is the same PTE-level page table, it + can take the PMD lock and revalidate that the contents of pmd entry still meet + the requirements. In particular, this also happens in :c:func:`!retract_page_tables` + when handling :c:macro:`!MADV_COLLAPSE`. To access PTE-level page tables, a helper like :c:func:`!pte_offset_map_lock` or :c:func:`!pte_offset_map` can be used depending on stability requirements. |