diff options
author | Samuel Just <sjust@redhat.com> | 2024-12-13 02:37:28 +0100 |
---|---|---|
committer | Samuel Just <sjust@redhat.com> | 2024-12-13 21:32:26 +0100 |
commit | dbb129cc63359390f32c3dfbf519b02a4e2adf55 (patch) | |
tree | d14644c9956d4b3ad00e6aeea21ba00014f521a2 | |
parent | crimson: remove now unused pipeline stages (diff) | |
download | ceph-dbb129cc63359390f32c3dfbf519b02a4e2adf55.tar.xz ceph-dbb129cc63359390f32c3dfbf519b02a4e2adf55.zip |
doc/dev/crimson/pipeline.rst: simplify and update to reflect new stages
This commit updates pipeline.rst to include some basic information about
how the pipeline stages now work. I've removed the explicit listing of
the different stages as I'd rather readers refer to the actual
implementation for those details to avoid them getting out of date.
I also removed the comparison to classic as the approach has now diverged
quite a bit and I feel that the ordering part is more important to focus
on than the points at which processing might block.
Signed-off-by: Samuel Just <sjust@redhat.com>
-rw-r--r-- | doc/dev/crimson/pipeline.rst | 124 |
1 files changed, 31 insertions, 93 deletions
diff --git a/doc/dev/crimson/pipeline.rst b/doc/dev/crimson/pipeline.rst index e9115c6d7c3..6e47b79d80b 100644 --- a/doc/dev/crimson/pipeline.rst +++ b/doc/dev/crimson/pipeline.rst @@ -2,96 +2,34 @@ The ``ClientRequest`` pipeline ============================== -In crimson, exactly like in the classical OSD, a client request has data and -ordering dependencies which must be satisfied before processing (actually -a particular phase of) can begin. As one of the goals behind crimson is to -preserve the compatibility with the existing OSD incarnation, the same semantic -must be assured. An obvious example of such data dependency is the fact that -an OSD needs to have a version of OSDMap that matches the one used by the client -(``Message::get_min_epoch()``). - -If a dependency is not satisfied, the processing stops. It is crucial to note -the same must happen to all other requests that are sequenced-after (due to -their ordering requirements). - -There are a few cases when the blocking of a client request can happen. - - - ``ClientRequest::ConnectionPipeline::await_map`` - wait for particular OSDMap version is available at the OSD level - ``ClientRequest::ConnectionPipeline::get_pg`` - wait a particular PG becomes available on OSD - ``ClientRequest::PGPipeline::await_map`` - wait on a PG being advanced to particular epoch - ``ClientRequest::PGPipeline::wait_for_active`` - wait for a PG to become *active* (i.e. have ``is_active()`` asserted) - ``ClientRequest::PGPipeline::recover_missing`` - wait on an object to be recovered (i.e. leaving the ``missing`` set) - ``ClientRequest::PGPipeline::get_obc`` - wait on an object to be available for locking. The ``obc`` will be locked - before this operation is allowed to continue - ``ClientRequest::PGPipeline::process`` - wait if any other ``MOSDOp`` message is handled against this PG - -At any moment, a ``ClientRequest`` being served should be in one and only one -of the phases described above. Similarly, an object denoting particular phase -can host not more than a single ``ClientRequest`` the same time. At low-level -this is achieved with a combination of a barrier and an exclusive lock. -They implement the semantic of a semaphore with a single slot for these exclusive -phases. - -As the execution advances, request enters next phase and leaves the current one -freeing it for another ``ClientRequest`` instance. All these phases form a pipeline -which assures the order is preserved. - -These pipeline phases are divided into two ordering domains: ``ConnectionPipeline`` -and ``PGPipeline``. The former ensures order across a client connection while -the latter does that across a PG. That is, requests originating from the same -connection are executed in the same order as they were sent by the client. -The same applies to the PG domain: when requests from multiple connections reach -a PG, they are executed in the same order as they entered a first blocking phase -of the ``PGPipeline``. - -Comparison with the classical OSD ----------------------------------- -As the audience of this document are Ceph Developers, it seems reasonable to -match the phases of crimson's ``ClientRequest`` pipeline with the blocking -stages in the classical OSD. The names in the right column are names of -containers (lists and maps) used to implement these stages. They are also -already documented in the ``PG.h`` header. - -+----------------------------------------+--------------------------------------+ -| crimson | ceph-osd waiting list | -+========================================+======================================+ -|``ConnectionPipeline::await_map`` | ``OSDShardPGSlot::waiting`` and | -|``ConnectionPipeline::get_pg`` | ``OSDShardPGSlot::waiting_peering`` | -+----------------------------------------+--------------------------------------+ -|``PGPipeline::await_map`` | ``PG::waiting_for_map`` | -+----------------------------------------+--------------------------------------+ -|``PGPipeline::wait_for_active`` | ``PG::waiting_for_peered`` | -| +--------------------------------------+ -| | ``PG::waiting_for_flush`` | -| +--------------------------------------+ -| | ``PG::waiting_for_active`` | -+----------------------------------------+--------------------------------------+ -|To be done (``PG_STATE_LAGGY``) | ``PG::waiting_for_readable`` | -+----------------------------------------+--------------------------------------+ -|To be done | ``PG::waiting_for_scrub`` | -+----------------------------------------+--------------------------------------+ -|``PGPipeline::recover_missing`` | ``PG::waiting_for_unreadable_object``| -| +--------------------------------------+ -| | ``PG::waiting_for_degraded_object`` | -+----------------------------------------+--------------------------------------+ -|To be done (proxying) | ``PG::waiting_for_blocked_object`` | -+----------------------------------------+--------------------------------------+ -|``PGPipeline::get_obc`` | *obc rwlocks* | -+----------------------------------------+--------------------------------------+ -|``PGPipeline::process`` | ``PG::lock`` (roughly) | -+----------------------------------------+--------------------------------------+ - - -As the last word it might be worth to emphasize that the ordering implementations -in both classical OSD and in crimson are stricter than a theoretical minimum one -required by the RADOS protocol. For instance, we could parallelize read operations -targeting the same object at the price of extra complexity but we don't -- the -simplicity has won. +RADOS requires writes on each object to be ordered. If a client +submits a sequence of concurrent writes (doesn't want for the prior to +complete before submitting the next), that client may rely on the +writes being completed in the order in which they are submitted. + +As a result, the client->osd communication and queueing mechanisms on +both sides must take care to ensure that writes on a (connection, +object) pair remain ordered for the entire process. + +crimson-osd enforces this ordering via Pipelines and Stages +(crimson/osd/osd_operation.h). Upon arrival at the OSD, messages +enter the ConnectionPipeline::AwaitActive stage and proceed +through a sequence of pipeline stages: + +* ConnectionPipeline: per-connection stages representing the message handling + path prior to being handed off to the target PG +* PerShardPipeline: intermediate Pipeline representing the hand off from the + receiving shard to the shard with the target PG. +* CommonPGPipeline: represents processing on the target PG prior to obtaining + the ObjectContext for the target of the operation. +* CommonOBCPipeline: represents the actual processing of the IO on the target + object + +Because CommonOBCPipeline is per-object rather than per-connection or +per-pg, multiple requests on different objects may be in the same +CommonOBCPipeline stage concurrently. This allows us to serve +multiple reads in the same PG concurrently. We can also process +writes on multiple objects concurrently up to the point at which the +write is actually submitted. + +See crimson/osd/osd_operations/client_request.(h|cc) for details. |