From ae9277398840bf8576ea5a8c4a2ba4e23f8b9613 Mon Sep 17 00:00:00 2001 From: Dhairya Parmar Date: Fri, 22 Nov 2024 03:40:33 +0530 Subject: doc/cephfs: document purge queue and its perf counters Fixes: https://tracker.ceph.com/issues/68571 Signed-off-by: Dhairya Parmar --- doc/cephfs/index.rst | 1 + doc/cephfs/purge-queue.rst | 106 +++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 107 insertions(+) create mode 100644 doc/cephfs/purge-queue.rst (limited to 'doc') diff --git a/doc/cephfs/index.rst b/doc/cephfs/index.rst index 57ea336c00b..a8a991b01b0 100644 --- a/doc/cephfs/index.rst +++ b/doc/cephfs/index.rst @@ -93,6 +93,7 @@ Administration CephFS Top Utility Scheduled Snapshots CephFS Snapshot Mirroring + Purge Queue .. raw:: html diff --git a/doc/cephfs/purge-queue.rst b/doc/cephfs/purge-queue.rst new file mode 100644 index 00000000000..d7a68e7fa55 --- /dev/null +++ b/doc/cephfs/purge-queue.rst @@ -0,0 +1,106 @@ +============ +Purge Queue +============ + +MDS maintains a data structure known as **Purge Queue** which is responsible +for managing and executing the parallel deletion of files. +There is a purge queue for every MDS rank. Purge queues consist of purge items +which contain nominal information from the inodes such as size and the layout +(i.e. all other un-needed metadata information is discarded making it +independent of all metadata structures). + +Deletion process +================ + +When a client requests deletion of a directory (say ``rm -rf``): + +- MDS queues the files and subdirectories (purge items) from pq (purge queue) + journal in the purge queue. +- Processes the deletion of inodes in background in small and manageable + chunks. +- MDS instructs underlying OSDs to clean up the associated objects in data + pool. +- Updates the journal. + +.. note:: If the users delete the files more quickly than the + purge queue can process then the data pool usage might increase + substantially over time. In extreme scenarios, the purge queue + backlog can become so huge that it can slacken the capacity reclaim + and the linux ``du`` command for CephFS might report inconsistent + data compared to the CephFS Data pool. + +There are a few tunable configs that MDS uses internally to throttle purge +queue processing: + +.. confval:: filer_max_purge_ops +.. confval:: mds_max_purge_files +.. confval:: mds_max_purge_ops +.. confval:: mds_max_purge_ops_per_pg + +Generally, the defaults are adequate for most clusters. However, in +case of pretty huge clusters, if the need arises like ``pq_item_in_journal`` +(counter of things pending deletion) reaching gigantic figure then the configs +can be tuned to 4-5 times of the default value as a starting point and +further increments are subject to more requirements. + +Start from the most trivial config ``filer_max_purge_ops``, which should help +reclaim the space more quickly:: + + $ ceph config set mds filer_max_purge_ops 40 + +Incrementing ``filer_max_purge_ops`` should just work for most +clusters but if it doesn't then move ahead with tuning other configs:: + + $ ceph config set mds mds_max_purge_files 256 + $ ceph config set mds mds_max_purge_ops 32768 + $ ceph config set mds mds_max_purge_ops_per_pg 2 + +.. note:: Setting these values won't immediately break anything except + inasmuch as they control how many delete ops we issue to the + underlying RADOS cluster, but might eat up some cluster performance + if the values set are staggeringly high. + +.. note:: The purge queue is not an auto-tuning system in terms of its work + limits as compared to what is going on. So it is advised to make + a conscious decision while tuning the configs based on the cluster + size and workload. + +Examining purge queue perf counters +=================================== + +When analysing MDS perf dumps, the purge queue statistics look like:: + + "purge_queue": { + "pq_executing_ops": 56655, + "pq_executing_ops_high_water": 65350, + "pq_executing": 1, + "pq_executing_high_water": 3, + "pq_executed": 25, + "pq_item_in_journal": 6567004 + } + +Let us understand what each of these means: + +.. list-table:: + :widths: 50 50 + :header-rows: 1 + + * - Name + - Description + * - pq_executing_ops + - Purge queue operations in flight + * - pq_executing_ops_high_water + - Maximum number of executing purge operations recorded + * - pq_executing + - Purge queue files being deleted + * - pq_executing_high_water + - Maximum number of executing file purges + * - pq_executed + - Purge queue files deleted + * - pq_item_in_journal + - Purge items (files) left in journal + +.. note:: ``pq_executing`` and ``pq_executing_ops`` might look similar but + there is a small nuance. ``pq_executing`` tracks number of files + in the purge queue while ``pq_executing_ops`` is the count of RADOS + objects from all the files in purge queue. -- cgit v1.2.3