summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorZac Dover <zac.dover@proton.me>2024-04-22 08:59:15 +0200
committerZac Dover <zac.dover@proton.me>2024-04-22 09:10:06 +0200
commit3c2e8d35a9ab3f78619bfbe32b2017cd47ffb3ff (patch)
tree312dc9e1b7fcc50c035b9ed94aabd97b2e25ecff
parentMerge pull request #56353 from myoungwon/wip-apply-shallow-copy-rbm-overwrite (diff)
downloadceph-3c2e8d35a9ab3f78619bfbe32b2017cd47ffb3ff.tar.xz
ceph-3c2e8d35a9ab3f78619bfbe32b2017cd47ffb3ff.zip
doc/rados: remove redundant pg repair commands
Incorporate the material in /doc/rados/operations/pg-repair into /doc/rados/troubleshooting/troubleshooting-pg. Remove /doc/rados/operations/pg-repair from the documentation. Redirect all links to the old location to the new location. Signed-off-by: Zac Dover <zac.dover@proton.me>
-rw-r--r--doc/rados/operations/health-checks.rst4
-rw-r--r--doc/rados/operations/index.rst1
-rw-r--r--doc/rados/operations/pg-repair.rst118
-rw-r--r--doc/rados/troubleshooting/troubleshooting-pg.rst60
4 files changed, 62 insertions, 121 deletions
diff --git a/doc/rados/operations/health-checks.rst b/doc/rados/operations/health-checks.rst
index 54bfd427967..6f3ca281bbf 100644
--- a/doc/rados/operations/health-checks.rst
+++ b/doc/rados/operations/health-checks.rst
@@ -962,7 +962,7 @@ or ``snaptrim_error`` flag set, which indicates that an earlier data scrub
operation found a problem, or (2) have the *repair* flag set, which means that
a repair for such an inconsistency is currently in progress.
-For more information, see :doc:`pg-repair`.
+For more information, see :doc:`../troubleshooting/troubleshooting-pg`.
OSD_SCRUB_ERRORS
________________
@@ -970,7 +970,7 @@ ________________
Recent OSD scrubs have discovered inconsistencies. This alert is generally
paired with *PG_DAMAGED* (see above).
-For more information, see :doc:`pg-repair`.
+For more information, see :doc:`../troubleshooting/troubleshooting-pg`.
OSD_TOO_MANY_REPAIRS
____________________
diff --git a/doc/rados/operations/index.rst b/doc/rados/operations/index.rst
index 91301382da4..e8166acbd4b 100644
--- a/doc/rados/operations/index.rst
+++ b/doc/rados/operations/index.rst
@@ -20,7 +20,6 @@ and, monitoring an operating cluster.
monitoring
monitoring-osd-pg
user-management
- pg-repair
pgcalc/index
.. raw:: html
diff --git a/doc/rados/operations/pg-repair.rst b/doc/rados/operations/pg-repair.rst
deleted file mode 100644
index 609318fca5c..00000000000
--- a/doc/rados/operations/pg-repair.rst
+++ /dev/null
@@ -1,118 +0,0 @@
-============================
-Repairing PG Inconsistencies
-============================
-Sometimes a Placement Group (PG) might become ``inconsistent``. To return the PG
-to an ``active+clean`` state, you must first determine which of the PGs has become
-inconsistent and then run the ``pg repair`` command on it. This page contains
-commands for diagnosing PGs and the command for repairing PGs that have become
-inconsistent.
-
-.. highlight:: console
-
-Commands for Diagnosing PG Problems
-===================================
-The commands in this section provide various ways of diagnosing broken PGs.
-
-To see a high-level (low-detail) overview of Ceph cluster health, run the
-following command:
-
-.. prompt:: bash #
-
- ceph health detail
-
-To see more detail on the status of the PGs, run the following command:
-
-.. prompt:: bash #
-
- ceph pg dump --format=json-pretty
-
-To see a list of inconsistent PGs, run the following command:
-
-.. prompt:: bash #
-
- rados list-inconsistent-pg {pool}
-
-To see a list of inconsistent RADOS objects, run the following command:
-
-.. prompt:: bash #
-
- rados list-inconsistent-obj {pgid}
-
-To see a list of inconsistent snapsets in a specific PG, run the following
-command:
-
-.. prompt:: bash #
-
- rados list-inconsistent-snapset {pgid}
-
-
-Commands for Repairing PGs
-==========================
-The form of the command to repair a broken PG is as follows:
-
-.. prompt:: bash #
-
- ceph pg repair {pgid}
-
-Here ``{pgid}`` represents the id of the affected PG.
-
-For example:
-
-.. prompt:: bash #
-
- ceph pg repair 1.4
-
-.. note:: PG IDs have the form ``N.xxxxx``, where ``N`` is the number of the
- pool that contains the PG. The command ``ceph osd listpools`` and the
- command ``ceph osd dump | grep pool`` return a list of pool numbers.
-
-More Information on PG Repair
-=============================
-Ceph stores and updates the checksums of objects stored in the cluster. When a
-scrub is performed on a PG, the OSD attempts to choose an authoritative copy
-from among its replicas. Only one of the possible cases is consistent. After
-performing a deep scrub, Ceph calculates the checksum of an object that is read
-from disk and compares it to the checksum that was previously recorded. If the
-current checksum and the previously recorded checksum do not match, that
-mismatch is considered to be an inconsistency. In the case of replicated pools,
-any mismatch between the checksum of any replica of an object and the checksum
-of the authoritative copy means that there is an inconsistency. The discovery
-of these inconsistencies cause a PG's state to be set to ``inconsistent``.
-
-The ``pg repair`` command attempts to fix inconsistencies of various kinds. If
-``pg repair`` finds an inconsistent PG, it attempts to overwrite the digest of
-the inconsistent copy with the digest of the authoritative copy. If ``pg
-repair`` finds an inconsistent replicated pool, it marks the inconsistent copy
-as missing. In the case of replicated pools, recovery is beyond the scope of
-``pg repair``.
-
-In the case of erasure-coded and BlueStore pools, Ceph will automatically
-perform repairs if ``osd_scrub_auto_repair`` (default ``false``) is set to
-``true`` and if no more than ``osd_scrub_auto_repair_num_errors`` (default
-``5``) errors are found.
-
-The ``pg repair`` command will not solve every problem. Ceph does not
-automatically repair PGs when they are found to contain inconsistencies.
-
-The checksum of a RADOS object or an omap is not always available. Checksums
-are calculated incrementally. If a replicated object is updated
-non-sequentially, the write operation involved in the update changes the object
-and invalidates its checksum. The whole object is not read while the checksum
-is recalculated. The ``pg repair`` command is able to make repairs even when
-checksums are not available to it, as in the case of Filestore. Users working
-with replicated Filestore pools might prefer manual repair to ``ceph pg
-repair``.
-
-This material is relevant for Filestore, but not for BlueStore, which has its
-own internal checksums. The matched-record checksum and the calculated checksum
-cannot prove that any specific copy is in fact authoritative. If there is no
-checksum available, ``pg repair`` favors the data on the primary, but this
-might not be the uncorrupted replica. Because of this uncertainty, human
-intervention is necessary when an inconsistency is discovered. This
-intervention sometimes involves use of ``ceph-objectstore-tool``.
-
-External Links
-==============
-https://ceph.io/geen-categorie/ceph-manually-repair-object/ - This page
-contains a walkthrough of the repair of a PG. It is recommended reading if you
-want to repair a PG but have never done so.
diff --git a/doc/rados/troubleshooting/troubleshooting-pg.rst b/doc/rados/troubleshooting/troubleshooting-pg.rst
index 74d04bd9ffe..9b204ef1e9b 100644
--- a/doc/rados/troubleshooting/troubleshooting-pg.rst
+++ b/doc/rados/troubleshooting/troubleshooting-pg.rst
@@ -544,6 +544,12 @@ form:
.. prompt:: bash
ceph pg repair {placement-group-ID}
+
+For example:
+
+.. prompt:: bash #
+
+ ceph pg repair 1.4
.. warning: This command overwrites the "bad" copies with "authoritative"
copies. In most cases, Ceph is able to choose authoritative copies from all
@@ -553,6 +559,10 @@ form:
ignored when Ceph chooses the authoritative copies. Be aware of this, and
use the above command with caution.
+.. note:: PG IDs have the form ``N.xxxxx``, where ``N`` is the number of the
+ pool that contains the PG. The command ``ceph osd listpools`` and the
+ command ``ceph osd dump | grep pool`` return a list of pool numbers.
+
If you receive ``active + clean + inconsistent`` states periodically due to
clock skew, consider configuring the `NTP
@@ -560,6 +570,56 @@ clock skew, consider configuring the `NTP
hosts to act as peers. See `The Network Time Protocol <http://www.ntp.org>`_
and Ceph :ref:`Clock Settings <mon-config-ref-clock>` for more information.
+More Information on PG Repair
+-----------------------------
+Ceph stores and updates the checksums of objects stored in the cluster. When a
+scrub is performed on a PG, the OSD attempts to choose an authoritative copy
+from among its replicas. Only one of the possible cases is consistent. After
+performing a deep scrub, Ceph calculates the checksum of an object that is read
+from disk and compares it to the checksum that was previously recorded. If the
+current checksum and the previously recorded checksum do not match, that
+mismatch is considered to be an inconsistency. In the case of replicated pools,
+any mismatch between the checksum of any replica of an object and the checksum
+of the authoritative copy means that there is an inconsistency. The discovery
+of these inconsistencies cause a PG's state to be set to ``inconsistent``.
+
+The ``pg repair`` command attempts to fix inconsistencies of various kinds. If
+``pg repair`` finds an inconsistent PG, it attempts to overwrite the digest of
+the inconsistent copy with the digest of the authoritative copy. If ``pg
+repair`` finds an inconsistent replicated pool, it marks the inconsistent copy
+as missing. In the case of replicated pools, recovery is beyond the scope of
+``pg repair``.
+
+In the case of erasure-coded and BlueStore pools, Ceph will automatically
+perform repairs if ``osd_scrub_auto_repair`` (default ``false``) is set to
+``true`` and if no more than ``osd_scrub_auto_repair_num_errors`` (default
+``5``) errors are found.
+
+The ``pg repair`` command will not solve every problem. Ceph does not
+automatically repair PGs when they are found to contain inconsistencies.
+
+The checksum of a RADOS object or an omap is not always available. Checksums
+are calculated incrementally. If a replicated object is updated
+non-sequentially, the write operation involved in the update changes the object
+and invalidates its checksum. The whole object is not read while the checksum
+is recalculated. The ``pg repair`` command is able to make repairs even when
+checksums are not available to it, as in the case of Filestore. Users working
+with replicated Filestore pools might prefer manual repair to ``ceph pg
+repair``.
+
+This material is relevant for Filestore, but not for BlueStore, which has its
+own internal checksums. The matched-record checksum and the calculated checksum
+cannot prove that any specific copy is in fact authoritative. If there is no
+checksum available, ``pg repair`` favors the data on the primary, but this
+might not be the uncorrupted replica. Because of this uncertainty, human
+intervention is necessary when an inconsistency is discovered. This
+intervention sometimes involves use of ``ceph-objectstore-tool``.
+
+PG Repair Walkthrough
+---------------------
+https://ceph.io/geen-categorie/ceph-manually-repair-object/ - This page
+contains a walkthrough of the repair of a PG. It is recommended reading if you
+want to repair a PG but have never done so.
Erasure Coded PGs are not active+clean
======================================