summaryrefslogtreecommitdiffstats
path: root/doc/dev/crimson/pipeline.rst
diff options
context:
space:
mode:
Diffstat (limited to 'doc/dev/crimson/pipeline.rst')
-rw-r--r--doc/dev/crimson/pipeline.rst124
1 files changed, 31 insertions, 93 deletions
diff --git a/doc/dev/crimson/pipeline.rst b/doc/dev/crimson/pipeline.rst
index e9115c6d7c3..6e47b79d80b 100644
--- a/doc/dev/crimson/pipeline.rst
+++ b/doc/dev/crimson/pipeline.rst
@@ -2,96 +2,34 @@
The ``ClientRequest`` pipeline
==============================
-In crimson, exactly like in the classical OSD, a client request has data and
-ordering dependencies which must be satisfied before processing (actually
-a particular phase of) can begin. As one of the goals behind crimson is to
-preserve the compatibility with the existing OSD incarnation, the same semantic
-must be assured. An obvious example of such data dependency is the fact that
-an OSD needs to have a version of OSDMap that matches the one used by the client
-(``Message::get_min_epoch()``).
-
-If a dependency is not satisfied, the processing stops. It is crucial to note
-the same must happen to all other requests that are sequenced-after (due to
-their ordering requirements).
-
-There are a few cases when the blocking of a client request can happen.
-
-
- ``ClientRequest::ConnectionPipeline::await_map``
- wait for particular OSDMap version is available at the OSD level
- ``ClientRequest::ConnectionPipeline::get_pg``
- wait a particular PG becomes available on OSD
- ``ClientRequest::PGPipeline::await_map``
- wait on a PG being advanced to particular epoch
- ``ClientRequest::PGPipeline::wait_for_active``
- wait for a PG to become *active* (i.e. have ``is_active()`` asserted)
- ``ClientRequest::PGPipeline::recover_missing``
- wait on an object to be recovered (i.e. leaving the ``missing`` set)
- ``ClientRequest::PGPipeline::get_obc``
- wait on an object to be available for locking. The ``obc`` will be locked
- before this operation is allowed to continue
- ``ClientRequest::PGPipeline::process``
- wait if any other ``MOSDOp`` message is handled against this PG
-
-At any moment, a ``ClientRequest`` being served should be in one and only one
-of the phases described above. Similarly, an object denoting particular phase
-can host not more than a single ``ClientRequest`` the same time. At low-level
-this is achieved with a combination of a barrier and an exclusive lock.
-They implement the semantic of a semaphore with a single slot for these exclusive
-phases.
-
-As the execution advances, request enters next phase and leaves the current one
-freeing it for another ``ClientRequest`` instance. All these phases form a pipeline
-which assures the order is preserved.
-
-These pipeline phases are divided into two ordering domains: ``ConnectionPipeline``
-and ``PGPipeline``. The former ensures order across a client connection while
-the latter does that across a PG. That is, requests originating from the same
-connection are executed in the same order as they were sent by the client.
-The same applies to the PG domain: when requests from multiple connections reach
-a PG, they are executed in the same order as they entered a first blocking phase
-of the ``PGPipeline``.
-
-Comparison with the classical OSD
-----------------------------------
-As the audience of this document are Ceph Developers, it seems reasonable to
-match the phases of crimson's ``ClientRequest`` pipeline with the blocking
-stages in the classical OSD. The names in the right column are names of
-containers (lists and maps) used to implement these stages. They are also
-already documented in the ``PG.h`` header.
-
-+----------------------------------------+--------------------------------------+
-| crimson | ceph-osd waiting list |
-+========================================+======================================+
-|``ConnectionPipeline::await_map`` | ``OSDShardPGSlot::waiting`` and |
-|``ConnectionPipeline::get_pg`` | ``OSDShardPGSlot::waiting_peering`` |
-+----------------------------------------+--------------------------------------+
-|``PGPipeline::await_map`` | ``PG::waiting_for_map`` |
-+----------------------------------------+--------------------------------------+
-|``PGPipeline::wait_for_active`` | ``PG::waiting_for_peered`` |
-| +--------------------------------------+
-| | ``PG::waiting_for_flush`` |
-| +--------------------------------------+
-| | ``PG::waiting_for_active`` |
-+----------------------------------------+--------------------------------------+
-|To be done (``PG_STATE_LAGGY``) | ``PG::waiting_for_readable`` |
-+----------------------------------------+--------------------------------------+
-|To be done | ``PG::waiting_for_scrub`` |
-+----------------------------------------+--------------------------------------+
-|``PGPipeline::recover_missing`` | ``PG::waiting_for_unreadable_object``|
-| +--------------------------------------+
-| | ``PG::waiting_for_degraded_object`` |
-+----------------------------------------+--------------------------------------+
-|To be done (proxying) | ``PG::waiting_for_blocked_object`` |
-+----------------------------------------+--------------------------------------+
-|``PGPipeline::get_obc`` | *obc rwlocks* |
-+----------------------------------------+--------------------------------------+
-|``PGPipeline::process`` | ``PG::lock`` (roughly) |
-+----------------------------------------+--------------------------------------+
-
-
-As the last word it might be worth to emphasize that the ordering implementations
-in both classical OSD and in crimson are stricter than a theoretical minimum one
-required by the RADOS protocol. For instance, we could parallelize read operations
-targeting the same object at the price of extra complexity but we don't -- the
-simplicity has won.
+RADOS requires writes on each object to be ordered. If a client
+submits a sequence of concurrent writes (doesn't want for the prior to
+complete before submitting the next), that client may rely on the
+writes being completed in the order in which they are submitted.
+
+As a result, the client->osd communication and queueing mechanisms on
+both sides must take care to ensure that writes on a (connection,
+object) pair remain ordered for the entire process.
+
+crimson-osd enforces this ordering via Pipelines and Stages
+(crimson/osd/osd_operation.h). Upon arrival at the OSD, messages
+enter the ConnectionPipeline::AwaitActive stage and proceed
+through a sequence of pipeline stages:
+
+* ConnectionPipeline: per-connection stages representing the message handling
+ path prior to being handed off to the target PG
+* PerShardPipeline: intermediate Pipeline representing the hand off from the
+ receiving shard to the shard with the target PG.
+* CommonPGPipeline: represents processing on the target PG prior to obtaining
+ the ObjectContext for the target of the operation.
+* CommonOBCPipeline: represents the actual processing of the IO on the target
+ object
+
+Because CommonOBCPipeline is per-object rather than per-connection or
+per-pg, multiple requests on different objects may be in the same
+CommonOBCPipeline stage concurrently. This allows us to serve
+multiple reads in the same PG concurrently. We can also process
+writes on multiple objects concurrently up to the point at which the
+write is actually submitted.
+
+See crimson/osd/osd_operations/client_request.(h|cc) for details.