summaryrefslogtreecommitdiffstats
path: root/Documentation/admin-guide/mm
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/admin-guide/mm')
-rw-r--r--Documentation/admin-guide/mm/damon/start.rst46
-rw-r--r--Documentation/admin-guide/mm/damon/usage.rst10
-rw-r--r--Documentation/admin-guide/mm/index.rst2
-rw-r--r--Documentation/admin-guide/mm/pagemap.rst25
-rw-r--r--Documentation/admin-guide/mm/transhuge.rst85
5 files changed, 120 insertions, 48 deletions
diff --git a/Documentation/admin-guide/mm/damon/start.rst b/Documentation/admin-guide/mm/damon/start.rst
index 7aa0071ff1c3..054010a7f3d8 100644
--- a/Documentation/admin-guide/mm/damon/start.rst
+++ b/Documentation/admin-guide/mm/damon/start.rst
@@ -34,18 +34,56 @@ detail) of DAMON, you should ensure :doc:`sysfs </filesystems/sysfs>` is
mounted.
+Snapshot Data Access Patterns
+=============================
+
+The commands below show the memory access pattern of a program at the moment of
+the execution. ::
+
+ $ git clone https://github.com/sjp38/masim; cd masim; make
+ $ sudo damo start "./masim ./configs/stairs.cfg --quiet"
+ $ sudo ./damo show
+ 0 addr [85.541 TiB , 85.541 TiB ) (57.707 MiB ) access 0 % age 10.400 s
+ 1 addr [85.541 TiB , 85.542 TiB ) (413.285 MiB) access 0 % age 11.400 s
+ 2 addr [127.649 TiB , 127.649 TiB) (57.500 MiB ) access 0 % age 1.600 s
+ 3 addr [127.649 TiB , 127.649 TiB) (32.500 MiB ) access 0 % age 500 ms
+ 4 addr [127.649 TiB , 127.649 TiB) (9.535 MiB ) access 100 % age 300 ms
+ 5 addr [127.649 TiB , 127.649 TiB) (8.000 KiB ) access 60 % age 0 ns
+ 6 addr [127.649 TiB , 127.649 TiB) (6.926 MiB ) access 0 % age 1 s
+ 7 addr [127.998 TiB , 127.998 TiB) (120.000 KiB) access 0 % age 11.100 s
+ 8 addr [127.998 TiB , 127.998 TiB) (8.000 KiB ) access 40 % age 100 ms
+ 9 addr [127.998 TiB , 127.998 TiB) (4.000 KiB ) access 0 % age 11 s
+ total size: 577.590 MiB
+ $ sudo ./damo stop
+
+The first command of the above example downloads and builds an artificial
+memory access generator program called ``masim``. The second command asks DAMO
+to execute the artificial generator process start via the given command and
+make DAMON monitors the generator process. The third command retrieves the
+current snapshot of the monitored access pattern of the process from DAMON and
+shows the pattern in a human readable format.
+
+Each line of the output shows which virtual address range (``addr [XX, XX)``)
+of the process is how frequently (``access XX %``) accessed for how long time
+(``age XX``). For example, the fifth region of ~9 MiB size is being most
+frequently accessed for last 300 milliseconds. Finally, the fourth command
+stops DAMON.
+
+Note that DAMON can monitor not only virtual address spaces but multiple types
+of address spaces including the physical address space.
+
+
Recording Data Access Patterns
==============================
The commands below record the memory access patterns of a program and save the
monitoring results to a file. ::
- $ git clone https://github.com/sjp38/masim
- $ cd masim; make; ./masim ./configs/zigzag.cfg &
+ $ ./masim ./configs/zigzag.cfg &
$ sudo damo record -o damon.data $(pidof masim)
-The first two lines of the commands download an artificial memory access
-generator program and run it in the background. The generator will repeatedly
+The line of the commands run the artificial memory access
+generator program again. The generator will repeatedly
access two 100 MiB sized memory regions one by one. You can substitute this
with your real workload. The last line asks ``damo`` to record the access
pattern in the ``damon.data`` file.
diff --git a/Documentation/admin-guide/mm/damon/usage.rst b/Documentation/admin-guide/mm/damon/usage.rst
index e58ceb89ea2a..26df6cfa4441 100644
--- a/Documentation/admin-guide/mm/damon/usage.rst
+++ b/Documentation/admin-guide/mm/damon/usage.rst
@@ -78,7 +78,7 @@ comma (",").
│ │ │ │ │ │ │ │ ...
│ │ │ │ │ │ ...
│ │ │ │ │ :ref:`schemes <sysfs_schemes>`/nr_schemes
- │ │ │ │ │ │ :ref:`0 <sysfs_scheme>`/action,apply_interval_us
+ │ │ │ │ │ │ :ref:`0 <sysfs_scheme>`/action,target_nid,apply_interval_us
│ │ │ │ │ │ │ :ref:`access_pattern <sysfs_access_pattern>`/
│ │ │ │ │ │ │ │ sz/min,max
│ │ │ │ │ │ │ │ nr_accesses/min,max
@@ -289,14 +289,18 @@ schemes/<N>/
------------
In each scheme directory, five directories (``access_pattern``, ``quotas``,
-``watermarks``, ``filters``, ``stats``, and ``tried_regions``) and two files
-(``action`` and ``apply_interval``) exist.
+``watermarks``, ``filters``, ``stats``, and ``tried_regions``) and three files
+(``action``, ``target_nid`` and ``apply_interval``) exist.
The ``action`` file is for setting and getting the scheme's :ref:`action
<damon_design_damos_action>`. The keywords that can be written to and read
from the file and their meaning are same to those of the list on
:ref:`design doc <damon_design_damos_action>`.
+The ``target_nid`` file is for setting the migration target node, which is
+only meaningful when the ``action`` is either ``migrate_hot`` or
+``migrate_cold``.
+
The ``apply_interval_us`` file is for setting and getting the scheme's
:ref:`apply_interval <damon_design_damos>` in microseconds.
diff --git a/Documentation/admin-guide/mm/index.rst b/Documentation/admin-guide/mm/index.rst
index 1f883abf3f00..8b35795b664b 100644
--- a/Documentation/admin-guide/mm/index.rst
+++ b/Documentation/admin-guide/mm/index.rst
@@ -10,7 +10,7 @@ processes address space and many other cool things.
Linux memory management is a complex system with many configurable
settings. Most of these settings are available via ``/proc``
-filesystem and can be quired and adjusted using ``sysctl``. These APIs
+filesystem and can be queried and adjusted using ``sysctl``. These APIs
are described in Documentation/admin-guide/sysctl/vm.rst and in `man 5 proc`_.
.. _man 5 proc: http://man7.org/linux/man-pages/man5/proc.5.html
diff --git a/Documentation/admin-guide/mm/pagemap.rst b/Documentation/admin-guide/mm/pagemap.rst
index f5f065c67615..caba0f52dd36 100644
--- a/Documentation/admin-guide/mm/pagemap.rst
+++ b/Documentation/admin-guide/mm/pagemap.rst
@@ -118,7 +118,7 @@ Short descriptions to the page flags
21 - KSM
Identical memory pages dynamically shared between one or more processes.
22 - THP
- Contiguous pages which construct transparent hugepages.
+ Contiguous pages which construct THP of any size and mapped by any granularity.
23 - OFFLINE
The page is logically offline.
24 - ZERO_PAGE
@@ -173,27 +173,6 @@ LRU related page flags
The page-types tool in the tools/mm directory can be used to query the
above flags.
-Using pagemap to do something useful
-====================================
-
-The general procedure for using pagemap to find out about a process' memory
-usage goes like this:
-
- 1. Read ``/proc/pid/maps`` to determine which parts of the memory space are
- mapped to what.
- 2. Select the maps you are interested in -- all of them, or a particular
- library, or the stack or the heap, etc.
- 3. Open ``/proc/pid/pagemap`` and seek to the pages you would like to examine.
- 4. Read a u64 for each page from pagemap.
- 5. Open ``/proc/kpagecount`` and/or ``/proc/kpageflags``. For each PFN you
- just read, seek to that entry in the file, and read the data you want.
-
-For example, to find the "unique set size" (USS), which is the amount of
-memory that a process is using that is not shared with any other process,
-you can go through every map in the process, find the PFNs, look those up
-in kpagecount, and tally up the number of pages that are only referenced
-once.
-
Exceptions for Shared Memory
============================
@@ -252,7 +231,7 @@ Following flags about pages are currently supported:
- ``PAGE_IS_PRESENT`` - Page is present in the memory
- ``PAGE_IS_SWAPPED`` - Page is in swapped
- ``PAGE_IS_PFNZERO`` - Page has zero PFN
-- ``PAGE_IS_HUGE`` - Page is THP or Hugetlb backed
+- ``PAGE_IS_HUGE`` - Page is PMD-mapped THP or Hugetlb backed
- ``PAGE_IS_SOFT_DIRTY`` - Page is soft-dirty
The ``struct pm_scan_arg`` is used as the argument of the IOCTL.
diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index d414d3f5592a..058485daf186 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -202,12 +202,11 @@ PMD-mappable transparent hugepage::
cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size
-khugepaged will be automatically started when one or more hugepage
-sizes are enabled (either by directly setting "always" or "madvise",
-or by setting "inherit" while the top-level enabled is set to "always"
-or "madvise"), and it'll be automatically shutdown when the last
-hugepage size is disabled (either by directly setting "never", or by
-setting "inherit" while the top-level enabled is set to "never").
+khugepaged will be automatically started when PMD-sized THP is enabled
+(either of the per-size anon control or the top-level control are set
+to "always" or "madvise"), and it'll be automatically shutdown when
+PMD-sized THP is disabled (when both the per-size anon control and the
+top-level control are "never")
Khugepaged controls
-------------------
@@ -332,6 +331,31 @@ deny
force
Force the huge option on for all - very useful for testing;
+Shmem can also use "multi-size THP" (mTHP) by adding a new sysfs knob to
+control mTHP allocation:
+'/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/shmem_enabled',
+and its value for each mTHP is essentially consistent with the global
+setting. An 'inherit' option is added to ensure compatibility with these
+global settings. Conversely, the options 'force' and 'deny' are dropped,
+which are rather testing artifacts from the old ages.
+
+always
+ Attempt to allocate <size> huge pages every time we need a new page;
+
+inherit
+ Inherit the top-level "shmem_enabled" value. By default, PMD-sized hugepages
+ have enabled="inherit" and all other hugepage sizes have enabled="never";
+
+never
+ Do not allocate <size> huge pages;
+
+within_size
+ Only allocate <size> huge page if it will be fully within i_size.
+ Also respect fadvise()/madvise() hints;
+
+advise
+ Only allocate <size> huge pages if requested with fadvise()/madvise();
+
Need of application restart
===========================
@@ -344,10 +368,6 @@ also applies to the regions registered in khugepaged.
Monitoring usage
================
-.. note::
- Currently the below counters only record events relating to
- PMD-sized THP. Events relating to other THP sizes are not included.
-
The number of PMD-sized anonymous transparent huge pages currently used by the
system is available by reading the AnonHugePages field in ``/proc/meminfo``.
To identify what applications are using PMD-sized anonymous transparent huge
@@ -392,20 +412,23 @@ thp_collapse_alloc_failed
the allocation.
thp_file_alloc
- is incremented every time a file huge page is successfully
- allocated.
+ is incremented every time a shmem huge page is successfully
+ allocated (Note that despite being named after "file", the counter
+ measures only shmem).
thp_file_fallback
- is incremented if a file huge page is attempted to be allocated
- but fails and instead falls back to using small pages.
+ is incremented if a shmem huge page is attempted to be allocated
+ but fails and instead falls back to using small pages. (Note that
+ despite being named after "file", the counter measures only shmem).
thp_file_fallback_charge
- is incremented if a file huge page cannot be charged and instead
+ is incremented if a shmem huge page cannot be charged and instead
falls back to using small pages even though the allocation was
- successful.
+ successful. (Note that despite being named after "file", the
+ counter measures only shmem).
thp_file_mapped
- is incremented every time a file huge page is mapped into
+ is incremented every time a file or shmem huge page is mapped into
user address space.
thp_split_page
@@ -476,6 +499,34 @@ swpout_fallback
Usually because failed to allocate some continuous swap space
for the huge page.
+shmem_alloc
+ is incremented every time a shmem huge page is successfully
+ allocated.
+
+shmem_fallback
+ is incremented if a shmem huge page is attempted to be allocated
+ but fails and instead falls back to using small pages.
+
+shmem_fallback_charge
+ is incremented if a shmem huge page cannot be charged and instead
+ falls back to using small pages even though the allocation was
+ successful.
+
+split
+ is incremented every time a huge page is successfully split into
+ smaller orders. This can happen for a variety of reasons but a
+ common reason is that a huge page is old and is being reclaimed.
+
+split_failed
+ is incremented if kernel fails to split huge
+ page. This can happen if the page was pinned by somebody.
+
+split_deferred
+ is incremented when a huge page is put onto split queue.
+ This happens when a huge page is partially unmapped and splitting
+ it would free up some memory. Pages on split queue are going to
+ be split under memory pressure, if splitting is possible.
+
As the system ages, allocating huge pages may be expensive as the
system uses memory compaction to copy data around memory to free a
huge page for use. There are some counters in ``/proc/vmstat`` to help