diff options
author | Zac Dover <zac.dover@proton.me> | 2024-04-22 08:59:15 +0200 |
---|---|---|
committer | Zac Dover <zac.dover@proton.me> | 2024-04-22 09:10:06 +0200 |
commit | 3c2e8d35a9ab3f78619bfbe32b2017cd47ffb3ff (patch) | |
tree | 312dc9e1b7fcc50c035b9ed94aabd97b2e25ecff | |
parent | Merge pull request #56353 from myoungwon/wip-apply-shallow-copy-rbm-overwrite (diff) | |
download | ceph-3c2e8d35a9ab3f78619bfbe32b2017cd47ffb3ff.tar.xz ceph-3c2e8d35a9ab3f78619bfbe32b2017cd47ffb3ff.zip |
doc/rados: remove redundant pg repair commands
Incorporate the material in /doc/rados/operations/pg-repair into
/doc/rados/troubleshooting/troubleshooting-pg. Remove
/doc/rados/operations/pg-repair from the documentation. Redirect all
links to the old location to the new location.
Signed-off-by: Zac Dover <zac.dover@proton.me>
-rw-r--r-- | doc/rados/operations/health-checks.rst | 4 | ||||
-rw-r--r-- | doc/rados/operations/index.rst | 1 | ||||
-rw-r--r-- | doc/rados/operations/pg-repair.rst | 118 | ||||
-rw-r--r-- | doc/rados/troubleshooting/troubleshooting-pg.rst | 60 |
4 files changed, 62 insertions, 121 deletions
diff --git a/doc/rados/operations/health-checks.rst b/doc/rados/operations/health-checks.rst index 54bfd427967..6f3ca281bbf 100644 --- a/doc/rados/operations/health-checks.rst +++ b/doc/rados/operations/health-checks.rst @@ -962,7 +962,7 @@ or ``snaptrim_error`` flag set, which indicates that an earlier data scrub operation found a problem, or (2) have the *repair* flag set, which means that a repair for such an inconsistency is currently in progress. -For more information, see :doc:`pg-repair`. +For more information, see :doc:`../troubleshooting/troubleshooting-pg`. OSD_SCRUB_ERRORS ________________ @@ -970,7 +970,7 @@ ________________ Recent OSD scrubs have discovered inconsistencies. This alert is generally paired with *PG_DAMAGED* (see above). -For more information, see :doc:`pg-repair`. +For more information, see :doc:`../troubleshooting/troubleshooting-pg`. OSD_TOO_MANY_REPAIRS ____________________ diff --git a/doc/rados/operations/index.rst b/doc/rados/operations/index.rst index 91301382da4..e8166acbd4b 100644 --- a/doc/rados/operations/index.rst +++ b/doc/rados/operations/index.rst @@ -20,7 +20,6 @@ and, monitoring an operating cluster. monitoring monitoring-osd-pg user-management - pg-repair pgcalc/index .. raw:: html diff --git a/doc/rados/operations/pg-repair.rst b/doc/rados/operations/pg-repair.rst deleted file mode 100644 index 609318fca5c..00000000000 --- a/doc/rados/operations/pg-repair.rst +++ /dev/null @@ -1,118 +0,0 @@ -============================ -Repairing PG Inconsistencies -============================ -Sometimes a Placement Group (PG) might become ``inconsistent``. To return the PG -to an ``active+clean`` state, you must first determine which of the PGs has become -inconsistent and then run the ``pg repair`` command on it. This page contains -commands for diagnosing PGs and the command for repairing PGs that have become -inconsistent. - -.. highlight:: console - -Commands for Diagnosing PG Problems -=================================== -The commands in this section provide various ways of diagnosing broken PGs. - -To see a high-level (low-detail) overview of Ceph cluster health, run the -following command: - -.. prompt:: bash # - - ceph health detail - -To see more detail on the status of the PGs, run the following command: - -.. prompt:: bash # - - ceph pg dump --format=json-pretty - -To see a list of inconsistent PGs, run the following command: - -.. prompt:: bash # - - rados list-inconsistent-pg {pool} - -To see a list of inconsistent RADOS objects, run the following command: - -.. prompt:: bash # - - rados list-inconsistent-obj {pgid} - -To see a list of inconsistent snapsets in a specific PG, run the following -command: - -.. prompt:: bash # - - rados list-inconsistent-snapset {pgid} - - -Commands for Repairing PGs -========================== -The form of the command to repair a broken PG is as follows: - -.. prompt:: bash # - - ceph pg repair {pgid} - -Here ``{pgid}`` represents the id of the affected PG. - -For example: - -.. prompt:: bash # - - ceph pg repair 1.4 - -.. note:: PG IDs have the form ``N.xxxxx``, where ``N`` is the number of the - pool that contains the PG. The command ``ceph osd listpools`` and the - command ``ceph osd dump | grep pool`` return a list of pool numbers. - -More Information on PG Repair -============================= -Ceph stores and updates the checksums of objects stored in the cluster. When a -scrub is performed on a PG, the OSD attempts to choose an authoritative copy -from among its replicas. Only one of the possible cases is consistent. After -performing a deep scrub, Ceph calculates the checksum of an object that is read -from disk and compares it to the checksum that was previously recorded. If the -current checksum and the previously recorded checksum do not match, that -mismatch is considered to be an inconsistency. In the case of replicated pools, -any mismatch between the checksum of any replica of an object and the checksum -of the authoritative copy means that there is an inconsistency. The discovery -of these inconsistencies cause a PG's state to be set to ``inconsistent``. - -The ``pg repair`` command attempts to fix inconsistencies of various kinds. If -``pg repair`` finds an inconsistent PG, it attempts to overwrite the digest of -the inconsistent copy with the digest of the authoritative copy. If ``pg -repair`` finds an inconsistent replicated pool, it marks the inconsistent copy -as missing. In the case of replicated pools, recovery is beyond the scope of -``pg repair``. - -In the case of erasure-coded and BlueStore pools, Ceph will automatically -perform repairs if ``osd_scrub_auto_repair`` (default ``false``) is set to -``true`` and if no more than ``osd_scrub_auto_repair_num_errors`` (default -``5``) errors are found. - -The ``pg repair`` command will not solve every problem. Ceph does not -automatically repair PGs when they are found to contain inconsistencies. - -The checksum of a RADOS object or an omap is not always available. Checksums -are calculated incrementally. If a replicated object is updated -non-sequentially, the write operation involved in the update changes the object -and invalidates its checksum. The whole object is not read while the checksum -is recalculated. The ``pg repair`` command is able to make repairs even when -checksums are not available to it, as in the case of Filestore. Users working -with replicated Filestore pools might prefer manual repair to ``ceph pg -repair``. - -This material is relevant for Filestore, but not for BlueStore, which has its -own internal checksums. The matched-record checksum and the calculated checksum -cannot prove that any specific copy is in fact authoritative. If there is no -checksum available, ``pg repair`` favors the data on the primary, but this -might not be the uncorrupted replica. Because of this uncertainty, human -intervention is necessary when an inconsistency is discovered. This -intervention sometimes involves use of ``ceph-objectstore-tool``. - -External Links -============== -https://ceph.io/geen-categorie/ceph-manually-repair-object/ - This page -contains a walkthrough of the repair of a PG. It is recommended reading if you -want to repair a PG but have never done so. diff --git a/doc/rados/troubleshooting/troubleshooting-pg.rst b/doc/rados/troubleshooting/troubleshooting-pg.rst index 74d04bd9ffe..9b204ef1e9b 100644 --- a/doc/rados/troubleshooting/troubleshooting-pg.rst +++ b/doc/rados/troubleshooting/troubleshooting-pg.rst @@ -544,6 +544,12 @@ form: .. prompt:: bash ceph pg repair {placement-group-ID} + +For example: + +.. prompt:: bash # + + ceph pg repair 1.4 .. warning: This command overwrites the "bad" copies with "authoritative" copies. In most cases, Ceph is able to choose authoritative copies from all @@ -553,6 +559,10 @@ form: ignored when Ceph chooses the authoritative copies. Be aware of this, and use the above command with caution. +.. note:: PG IDs have the form ``N.xxxxx``, where ``N`` is the number of the + pool that contains the PG. The command ``ceph osd listpools`` and the + command ``ceph osd dump | grep pool`` return a list of pool numbers. + If you receive ``active + clean + inconsistent`` states periodically due to clock skew, consider configuring the `NTP @@ -560,6 +570,56 @@ clock skew, consider configuring the `NTP hosts to act as peers. See `The Network Time Protocol <http://www.ntp.org>`_ and Ceph :ref:`Clock Settings <mon-config-ref-clock>` for more information. +More Information on PG Repair +----------------------------- +Ceph stores and updates the checksums of objects stored in the cluster. When a +scrub is performed on a PG, the OSD attempts to choose an authoritative copy +from among its replicas. Only one of the possible cases is consistent. After +performing a deep scrub, Ceph calculates the checksum of an object that is read +from disk and compares it to the checksum that was previously recorded. If the +current checksum and the previously recorded checksum do not match, that +mismatch is considered to be an inconsistency. In the case of replicated pools, +any mismatch between the checksum of any replica of an object and the checksum +of the authoritative copy means that there is an inconsistency. The discovery +of these inconsistencies cause a PG's state to be set to ``inconsistent``. + +The ``pg repair`` command attempts to fix inconsistencies of various kinds. If +``pg repair`` finds an inconsistent PG, it attempts to overwrite the digest of +the inconsistent copy with the digest of the authoritative copy. If ``pg +repair`` finds an inconsistent replicated pool, it marks the inconsistent copy +as missing. In the case of replicated pools, recovery is beyond the scope of +``pg repair``. + +In the case of erasure-coded and BlueStore pools, Ceph will automatically +perform repairs if ``osd_scrub_auto_repair`` (default ``false``) is set to +``true`` and if no more than ``osd_scrub_auto_repair_num_errors`` (default +``5``) errors are found. + +The ``pg repair`` command will not solve every problem. Ceph does not +automatically repair PGs when they are found to contain inconsistencies. + +The checksum of a RADOS object or an omap is not always available. Checksums +are calculated incrementally. If a replicated object is updated +non-sequentially, the write operation involved in the update changes the object +and invalidates its checksum. The whole object is not read while the checksum +is recalculated. The ``pg repair`` command is able to make repairs even when +checksums are not available to it, as in the case of Filestore. Users working +with replicated Filestore pools might prefer manual repair to ``ceph pg +repair``. + +This material is relevant for Filestore, but not for BlueStore, which has its +own internal checksums. The matched-record checksum and the calculated checksum +cannot prove that any specific copy is in fact authoritative. If there is no +checksum available, ``pg repair`` favors the data on the primary, but this +might not be the uncorrupted replica. Because of this uncertainty, human +intervention is necessary when an inconsistency is discovered. This +intervention sometimes involves use of ``ceph-objectstore-tool``. + +PG Repair Walkthrough +--------------------- +https://ceph.io/geen-categorie/ceph-manually-repair-object/ - This page +contains a walkthrough of the repair of a PG. It is recommended reading if you +want to repair a PG but have never done so. Erasure Coded PGs are not active+clean ====================================== |