diff options
115 files changed, 2111 insertions, 509 deletions
diff --git a/doc/cephadm/services/osd.rst b/doc/cephadm/services/osd.rst index 831bd238c79..90ebd86f897 100644 --- a/doc/cephadm/services/osd.rst +++ b/doc/cephadm/services/osd.rst @@ -198,6 +198,18 @@ There are a few ways to create new OSDs: .. warning:: When deploying new OSDs with ``cephadm``, ensure that the ``ceph-osd`` package is not already installed on the target host. If it is installed, conflicts may arise in the management and control of the OSD that may lead to errors or unexpected behavior. +* OSDs created via ``ceph orch daemon add`` are by default not added to the orchestrator's OSD service, they get added to 'osd' service. To attach an OSD to a different, existing OSD service, issue a command of the following form: + + .. prompt:: bash * + + ceph orch osd set-spec-affinity <service_name> <osd_id(s)> + + For example: + + .. prompt:: bash # + + ceph orch osd set-spec-affinity osd.default_drive_group 0 1 + Dry Run ------- diff --git a/doc/cephadm/services/rgw.rst b/doc/cephadm/services/rgw.rst index ed0b149365a..3df8ed2fc56 100644 --- a/doc/cephadm/services/rgw.rst +++ b/doc/cephadm/services/rgw.rst @@ -173,6 +173,32 @@ Then apply this yaml document: Note the value of ``rgw_frontend_ssl_certificate`` is a literal string as indicated by a ``|`` character preserving newline characters. +Disabling multisite sync traffic +-------------------------------- + +There is an RGW config option called ``rgw_run_sync_thread`` that tells the +RGW daemon to not transmit multisite replication data. This is useful if you want +that RGW daemon to be dedicated to I/O rather than multisite sync operations. +The RGW spec file includes a setting ``disable_multisite_sync_traffic`` that when +set to "True" will tell cephadm to set ``rgw_run_sync_thread`` to false for all +RGW daemons deployed for that RGW service. For example + +.. code-block:: yaml + + service_type: rgw + service_id: foo + placement: + label: rgw + spec: + rgw_realm: myrealm + rgw_zone: myzone + rgw_zonegroup: myzg + disable_multisite_sync_traffic: True + +.. note:: This will only stop the RGW daemon(s) from sending replication data. + The daemon can still receive replication data unless it has been removed + from the zonegroup and zone replication endpoints. + Service specification --------------------- diff --git a/doc/dev/developer_guide/testing_integration_tests/tests-integration-testing-teuthology-workflow.rst b/doc/dev/developer_guide/testing_integration_tests/tests-integration-testing-teuthology-workflow.rst index 34dfd521eaa..6964012ef31 100644 --- a/doc/dev/developer_guide/testing_integration_tests/tests-integration-testing-teuthology-workflow.rst +++ b/doc/dev/developer_guide/testing_integration_tests/tests-integration-testing-teuthology-workflow.rst @@ -6,7 +6,8 @@ Integration Tests using Teuthology Workflow Infrastructure -------------- -Components: +Components +********** 1. `ceph-ci`_: Clone of the main Ceph repository, used for triggering Jenkins Ceph builds for development. @@ -44,7 +45,27 @@ Components: Each Teuthology test *run* contains multiple test *jobs*. Each job runs in an environment isolated from other jobs, on a different collection of test nodes. -To test a change in Ceph, follow these steps: +Workflow Overview +***************** + +.. image:: workflow.png + + +To test a change in Ceph, start by pushing a branch with your changes to the +`ceph-ci`_ repository. This will automatically trigger the Jenkins process +to build Ceph binaries - the status of the build can be observed on `Shaman`_. +These built packages will be uploaded on `Chacra`_. + +To schedule a Teuthology integration test against this new build, you will +need access to the Sepia lab. Once you have access, log into the Teuthology +machine and complete the one-time initial Teuthology setup required to run +Teuthology commands. After the setup, use the ``teuthology-suite`` command to schedule +a Teuthology run. In this command, use the ``-c <ceph-ci branch name>`` option to +specify your build. The results of your test can be observed on `Pulpito`_. +Log into a `developer playground machine`_ to review the Teuthology run's archive logs. + + +The rest of the document will explain these steps in detail: 1. Getting binaries - Build Ceph. 2. Scheduling Test Run: @@ -98,6 +119,31 @@ Ceph binaries must be built for your branch before you can use teuthology to run .. _the Chacra site: https://shaman.ceph.com/api/search/?status=ready&project=ceph +Pushing to the ceph-ci repository +********************************* + +Follow these steps to push to the ceph-ci repository. After pushing, a new build will +automatically be scheduled. + +1. Add the ceph-ci repository as a remote to your local clone of the Ceph repository: + + .. prompt:: bash $ + + git remote add ceph-ci git@github.com:ceph/ceph-ci.git + + $ git remote -v + origin git@github.com:ceph/ceph.git (fetch) + origin git@github.com:ceph/ceph.git (push) + ceph-ci git@github.com:ceph/ceph-ci.git (fetch) + ceph-ci git@github.com:ceph/ceph-ci.git (push) + +2. Push your branch upstream by running a command of the following form: + + .. prompt:: bash $ + + $ git push ceph-ci wip-yourname-feature-x + + Naming the ceph-ci branch ************************* Prepend your branch with your name before you push it to ceph-ci. For example, @@ -110,15 +156,14 @@ the name of that stable branch in your ceph-ci branch name. For example, the ``feature-x`` PR branch should be named ``wip-feature-x-nautilus``. *This is not just a convention. This ensures that your branch is built in the correct environment.* -You can choose to only trigger a CentOS 9.Stream build (excluding other distro like ubuntu) -by adding "centos9-only" at the end of the ceph-ci branch name. For example, -``wip-$yourname-feature-centos9-only``. This helps to get quicker builds and save resources -when you don't require binaries for other distros. - Delete the branch from ceph-ci when you no longer need it. If you are logged in to GitHub, all your branches on ceph-ci can be found here: https://github.com/ceph/ceph-ci/branches. +.. note:: You can choose to only trigger a CentOS 9.Stream build (excluding other + distro like ubuntu) by adding "centos9-only" at the end of the ceph-ci branch name. + For example, ``wip-$yourname-feature-centos9-only``. This helps to get quicker builds + and save resources when you don't require binaries for other distros. Scheduling Test Run ------------------- diff --git a/doc/dev/developer_guide/testing_integration_tests/workflow.png b/doc/dev/developer_guide/testing_integration_tests/workflow.png Binary files differnew file mode 100644 index 00000000000..610baf683bc --- /dev/null +++ b/doc/dev/developer_guide/testing_integration_tests/workflow.png diff --git a/doc/man/8/cephadm.rst b/doc/man/8/cephadm.rst index b2cad6cb505..3c23a9867f7 100644 --- a/doc/man/8/cephadm.rst +++ b/doc/man/8/cephadm.rst @@ -13,7 +13,7 @@ Synopsis | [--log-dir LOG_DIR] [--logrotate-dir LOGROTATE_DIR] | [--unit-dir UNIT_DIR] [--verbose] [--timeout TIMEOUT] | [--retry RETRY] [--no-container-init] -| {version,pull,inspect-image,ls,list-networks,adopt,rm-daemon,rm-cluster,run,shell,enter,ceph-volume,unit,logs,bootstrap,deploy,check-host,prepare-host,add-repo,rm-repo,install,list-images} +| {version,pull,inspect-image,ls,list-networks,adopt,rm-daemon,rm-cluster,run,shell,enter,ceph-volume,unit,logs,bootstrap,deploy,check-host,prepare-host,add-repo,rm-repo,install,list-images,update-osd-service} | ... @@ -106,6 +106,7 @@ Synopsis | **cephadm** **list-images** +| **cephadm** **update-osd-service** [-h] [--fsid FSID] --osd-ids OSD_IDS --service-name SERVICE_NAME Description @@ -535,6 +536,18 @@ list-images List the default container images for all services in ini format. The output can be modified with custom images and passed to --config flag during bootstrap. +update-osd-service +------------------ + +Update the OSD service for specific OSDs + +Arguments: + +* [--fsid FSID] cluster FSID +* --osd-ids OSD_IDS Comma-separated OSD IDs +* --service-name SERVICE_NAME OSD service name + + Availability ============ diff --git a/doc/radosgw/account.rst b/doc/radosgw/account.rst index 6dab997d93e..0e4ede5a50a 100644 --- a/doc/radosgw/account.rst +++ b/doc/radosgw/account.rst @@ -174,6 +174,11 @@ An existing user can be adopted into an account with ``user modify``:: .. note:: Account membership is permanent. Once added, users cannot be removed from their account. +.. note:: The IAM User API imposes additional requirements on the format + of ``UserName``, which is enforced when migrating users into an account. + If migration fails with "UserName contains invalid characters", the + ``--display-name`` should be modified to match ``[\w+=,.@-]+``. + .. warning:: Ownership of the user's notification topics will not be transferred to the account. Notifications will continue to work, but the topics will no longer be visible to SNS Topic APIs. Topics and diff --git a/doc/radosgw/config-ref.rst b/doc/radosgw/config-ref.rst index edc6a90b0f9..b4aa56fff54 100644 --- a/doc/radosgw/config-ref.rst +++ b/doc/radosgw/config-ref.rst @@ -78,7 +78,7 @@ These values can be tuned based upon your specific workload to further increase aggressiveness of lifecycle processing. For a workload with a larger number of buckets (thousands) you would look at increasing the :confval:`rgw_lc_max_worker` value from the default value of 3 whereas for a workload with a smaller number of buckets but higher number of objects (hundreds of thousands) -per bucket you would consider decreasing :confval:`rgw_lc_max_wp_worker` from the default value of 3. +per bucket you would consider increasing :confval:`rgw_lc_max_wp_worker` from the default value of 3. .. note:: When looking to tune either of these specific values please validate the current Cluster performance and Ceph Object Gateway utilization before increasing. diff --git a/doc/releases/index.rst b/doc/releases/index.rst index a8015c65465..1393770878f 100644 --- a/doc/releases/index.rst +++ b/doc/releases/index.rst @@ -23,7 +23,6 @@ security fixes. Squid (v19.2.*) <squid> Reef (v18.2.*) <reef> - Quincy (v17.2.*) <quincy> .. ceph_releases:: releases.yml current @@ -40,6 +39,7 @@ receive bug fixes or backports). :maxdepth: 1 :hidden: + Quincy (v17.2.*) <quincy> Pacific (v16.2.*) <pacific> Octopus (v15.2.*) <octopus> Nautilus (v14.2.*) <nautilus> diff --git a/monitoring/ceph-mixin/config.libsonnet b/monitoring/ceph-mixin/config.libsonnet index a15b88422fc..e917b4c2dac 100644 --- a/monitoring/ceph-mixin/config.libsonnet +++ b/monitoring/ceph-mixin/config.libsonnet @@ -9,12 +9,12 @@ CephNodeNetworkPacketDropsPerSec: 10, CephRBDMirrorImageTransferBandwidthThreshold: 0.8, CephRBDMirrorImagesPerDaemonThreshold: 100, - NVMeoFMaxGatewaysPerGroup: 4, - NVMeoFMaxGatewaysPerCluster: 4, + NVMeoFMaxGatewaysPerGroup: 8, + NVMeoFMaxGatewaysPerCluster: 32, NVMeoFHighGatewayCPU: 80, NVMeoFMaxSubsystemsPerGateway: 128, - NVMeoFMaxNamespaces: 1024, - NVMeoFHighClientCount: 32, + NVMeoFMaxNamespaces: 2048, + NVMeoFHighClientCount: 128, NVMeoFHighHostCPU: 80, // // Read/Write latency is defined in ms diff --git a/monitoring/ceph-mixin/prometheus_alerts.yml b/monitoring/ceph-mixin/prometheus_alerts.yml index 3440d761351..7c0da4d51a4 100644 --- a/monitoring/ceph-mixin/prometheus_alerts.yml +++ b/monitoring/ceph-mixin/prometheus_alerts.yml @@ -776,18 +776,18 @@ groups: type: "ceph_default" - alert: "NVMeoFTooManyGateways" annotations: - description: "You may create many gateways, but 4 is the tested limit" + description: "You may create many gateways, but 32 is the tested limit" summary: "Max supported gateways exceeded on cluster {{ $labels.cluster }}" - expr: "count(ceph_nvmeof_gateway_info) by (cluster) > 4.00" + expr: "count(ceph_nvmeof_gateway_info) by (cluster) > 32.00" for: "1m" labels: severity: "warning" type: "ceph_default" - alert: "NVMeoFMaxGatewayGroupSize" annotations: - description: "You may create many gateways in a gateway group, but 4 is the tested limit" + description: "You may create many gateways in a gateway group, but 8 is the tested limit" summary: "Max gateways within a gateway group ({{ $labels.group }}) exceeded on cluster {{ $labels.cluster }}" - expr: "count(ceph_nvmeof_gateway_info) by (cluster,group) > 4.00" + expr: "count(ceph_nvmeof_gateway_info) by (cluster,group) > 8.00" for: "1m" labels: severity: "warning" @@ -832,7 +832,7 @@ groups: annotations: description: "Although you may continue to create namespaces in {{ $labels.gateway_host }}, the configuration may not be supported" summary: "The number of namespaces defined to the gateway exceeds supported values on cluster {{ $labels.cluster }}" - expr: "sum by(gateway_host, cluster) (label_replace(ceph_nvmeof_subsystem_namespace_count,\"gateway_host\",\"$1\",\"instance\",\"(.*?)(?::.*)?\")) > 1024.00" + expr: "sum by(gateway_host, cluster) (label_replace(ceph_nvmeof_subsystem_namespace_count,\"gateway_host\",\"$1\",\"instance\",\"(.*?)(?::.*)?\")) > 2048.00" for: "1m" labels: severity: "warning" @@ -848,9 +848,9 @@ groups: type: "ceph_default" - alert: "NVMeoFHighClientCount" annotations: - description: "The supported limit for clients connecting to a subsystem is 32" + description: "The supported limit for clients connecting to a subsystem is 128" summary: "The number of clients connected to {{ $labels.nqn }} is too high on cluster {{ $labels.cluster }}" - expr: "ceph_nvmeof_subsystem_host_count > 32.00" + expr: "ceph_nvmeof_subsystem_host_count > 128.00" for: "1m" labels: severity: "warning" diff --git a/monitoring/ceph-mixin/tests_alerts/test_alerts.yml b/monitoring/ceph-mixin/tests_alerts/test_alerts.yml index b3b29308d08..83b4ff80375 100644 --- a/monitoring/ceph-mixin/tests_alerts/test_alerts.yml +++ b/monitoring/ceph-mixin/tests_alerts/test_alerts.yml @@ -2331,12 +2331,69 @@ tests: values: '1+0x20' - series: 'ceph_nvmeof_gateway_info{addr="1.1.1.5",cluster="mycluster"}' values: '1+0x20' + - series: 'ceph_nvmeof_gateway_info{addr="1.1.1.6",cluster="mycluster"}' + values: '1+0x20' + - series: 'ceph_nvmeof_gateway_info{addr="1.1.1.7",cluster="mycluster"}' + values: '1+0x20' + - series: 'ceph_nvmeof_gateway_info{addr="1.1.1.8",cluster="mycluster"}' + values: '1+0x20' + - series: 'ceph_nvmeof_gateway_info{addr="1.1.1.9",cluster="mycluster"}' + values: '1+0x20' + - series: 'ceph_nvmeof_gateway_info{addr="1.1.1.10",cluster="mycluster"}' + values: '1+0x20' + - series: 'ceph_nvmeof_gateway_info{addr="1.1.1.11",cluster="mycluster"}' + values: '1+0x20' + - series: 'ceph_nvmeof_gateway_info{addr="1.1.1.12",cluster="mycluster"}' + values: '1+0x20' + - series: 'ceph_nvmeof_gateway_info{addr="1.1.1.13",cluster="mycluster"}' + values: '1+0x20' + - series: 'ceph_nvmeof_gateway_info{addr="1.1.1.14",cluster="mycluster"}' + values: '1+0x20' + - series: 'ceph_nvmeof_gateway_info{addr="1.1.1.15",cluster="mycluster"}' + values: '1+0x20' + - series: 'ceph_nvmeof_gateway_info{addr="1.1.1.16",cluster="mycluster"}' + values: '1+0x20' + - series: 'ceph_nvmeof_gateway_info{addr="1.1.1.17",cluster="mycluster"}' + values: '1+0x20' + - series: 'ceph_nvmeof_gateway_info{addr="1.1.1.18",cluster="mycluster"}' + values: '1+0x20' + - series: 'ceph_nvmeof_gateway_info{addr="1.1.1.19",cluster="mycluster"}' + values: '1+0x20' + - series: 'ceph_nvmeof_gateway_info{addr="1.1.1.20",cluster="mycluster"}' + values: '1+0x20' + - series: 'ceph_nvmeof_gateway_info{addr="1.1.1.21",cluster="mycluster"}' + values: '1+0x20' + - series: 'ceph_nvmeof_gateway_info{addr="1.1.1.22",cluster="mycluster"}' + values: '1+0x20' + - series: 'ceph_nvmeof_gateway_info{addr="1.1.1.23",cluster="mycluster"}' + values: '1+0x20' + - series: 'ceph_nvmeof_gateway_info{addr="1.1.1.24",cluster="mycluster"}' + values: '1+0x20' + - series: 'ceph_nvmeof_gateway_info{addr="1.1.1.25",cluster="mycluster"}' + values: '1+0x20' + - series: 'ceph_nvmeof_gateway_info{addr="1.1.1.26",cluster="mycluster"}' + values: '1+0x20' + - series: 'ceph_nvmeof_gateway_info{addr="1.1.1.27",cluster="mycluster"}' + values: '1+0x20' + - series: 'ceph_nvmeof_gateway_info{addr="1.1.1.28",cluster="mycluster"}' + values: '1+0x20' + - series: 'ceph_nvmeof_gateway_info{addr="1.1.1.29",cluster="mycluster"}' + values: '1+0x20' + - series: 'ceph_nvmeof_gateway_info{addr="1.1.1.30",cluster="mycluster"}' + values: '1+0x20' + - series: 'ceph_nvmeof_gateway_info{addr="1.1.1.31",cluster="mycluster"}' + values: '1+0x20' + - series: 'ceph_nvmeof_gateway_info{addr="1.1.1.32",cluster="mycluster"}' + values: '1+0x20' + - series: 'ceph_nvmeof_gateway_info{addr="1.1.1.33",cluster="mycluster"}' + values: '1+0x20' + promql_expr_test: - - expr: count(ceph_nvmeof_gateway_info) by (cluster) > 4.00 + - expr: count(ceph_nvmeof_gateway_info) by (cluster) > 32.00 eval_time: 1m exp_samples: - labels: '{cluster="mycluster"}' - value: 5 + value: 33 alert_rule_test: - eval_time: 5m alertname: NVMeoFTooManyGateways @@ -2347,7 +2404,7 @@ tests: type: ceph_default exp_annotations: summary: "Max supported gateways exceeded on cluster mycluster" - description: "You may create many gateways, but 4 is the tested limit" + description: "You may create many gateways, but 32 is the tested limit" # NVMeoFMaxGatewayGroupSize - interval: 1m @@ -2362,16 +2419,24 @@ tests: values: '1+0x20' - series: 'ceph_nvmeof_gateway_info{group="group-1",addr="1.1.1.12",cluster="mycluster"}' values: '1+0x20' + - series: 'ceph_nvmeof_gateway_info{group="group-1",addr="1.1.1.10",cluster="mycluster"}' + values: '1+0x20' + - series: 'ceph_nvmeof_gateway_info{group="group-1",addr="1.1.1.14",cluster="mycluster"}' + values: '1+0x20' + - series: 'ceph_nvmeof_gateway_info{group="group-1",addr="1.1.1.11",cluster="mycluster"}' + values: '1+0x20' + - series: 'ceph_nvmeof_gateway_info{group="group-1",addr="1.1.1.13",cluster="mycluster"}' + values: '1+0x20' - series: 'ceph_nvmeof_gateway_info{group="group-2",addr="1.1.1.4",cluster="mycluster"}' values: '1+0x20' - series: 'ceph_nvmeof_gateway_info{group="group-2",addr="1.1.1.5",cluster="mycluster"}' values: '1+0x20' promql_expr_test: - - expr: count(ceph_nvmeof_gateway_info) by (cluster, group) > 4.00 + - expr: count(ceph_nvmeof_gateway_info) by (cluster, group) > 8.00 eval_time: 1m exp_samples: - labels: '{cluster="mycluster",group="group-1"}' - value: 5 + value: 9 alert_rule_test: - eval_time: 5m alertname: NVMeoFMaxGatewayGroupSize @@ -2383,7 +2448,7 @@ tests: type: ceph_default exp_annotations: summary: "Max gateways within a gateway group (group-1) exceeded on cluster mycluster" - description: "You may create many gateways in a gateway group, but 4 is the tested limit" + description: "You may create many gateways in a gateway group, but 8 is the tested limit" # NVMeoFSingleGatewayGroup - interval: 1m @@ -2767,12 +2832,14 @@ tests: values: '200+0x10' - series: 'ceph_nvmeof_subsystem_namespace_count{instance="node-1:10008",nqn="nqn10",cluster="mycluster"}' values: '200+0x10' + - series: 'ceph_nvmeof_subsystem_namespace_count{instance="node-1:10008",nqn="nqn11",cluster="mycluster"}' + values: '200+0x10' promql_expr_test: - - expr: sum by(gateway_host, cluster) (label_replace(ceph_nvmeof_subsystem_namespace_count,"gateway_host","$1","instance","(.*):.*")) > 1024 + - expr: sum by(gateway_host, cluster) (label_replace(ceph_nvmeof_subsystem_namespace_count,"gateway_host","$1","instance","(.*):.*")) > 2048 eval_time: 1m exp_samples: - labels: '{gateway_host="node-1", cluster="mycluster"}' - value: 2000 + value: 2200 alert_rule_test: - eval_time: 5m alertname: NVMeoFTooManyNamespaces @@ -2815,15 +2882,15 @@ tests: - interval: 1m input_series: - series: 'ceph_nvmeof_subsystem_host_count{nqn="nqn1",cluster="mycluster"}' - values: '2 2 2 4 4 8 8 8 10 10 20 20 32 34 34 38 38 40 44 44' + values: '2 4 8 10 20 30 40 50 62 74 80 95 100 110 130 130 130 130 130 130' - series: 'ceph_nvmeof_subsystem_host_count{nqn="nqn2",cluster="mycluster"}' - values: '2 2 2 8 8 8 16 16 16 16 16 16 16 16 16 16 16 16 16 16' + values: '2 8 16 16 16 16 16 16 16 16 20 20 32 34 34 36 37 37 37 37' promql_expr_test: - - expr: ceph_nvmeof_subsystem_host_count > 32.00 + - expr: ceph_nvmeof_subsystem_host_count > 128.00 eval_time: 15m exp_samples: - labels: '{__name__="ceph_nvmeof_subsystem_host_count",nqn="nqn1",cluster="mycluster"}' - value: 38 + value: 130 alert_rule_test: - eval_time: 20m alertname: NVMeoFHighClientCount @@ -2835,7 +2902,7 @@ tests: type: ceph_default exp_annotations: summary: "The number of clients connected to nqn1 is too high on cluster mycluster" - description: "The supported limit for clients connecting to a subsystem is 32" + description: "The supported limit for clients connecting to a subsystem is 128" # NVMeoFMissingListener - interval: 1m diff --git a/qa/crontab/teuthology-cronjobs b/qa/crontab/teuthology-cronjobs index c979e5b105f..c558a1382ef 100644 --- a/qa/crontab/teuthology-cronjobs +++ b/qa/crontab/teuthology-cronjobs @@ -52,7 +52,6 @@ TEUTHOLOGY_SUITE_ARGS="--non-interactive --newest=100 --ceph-repo=https://git.ce 00 05 * * 0,2,4 $CW $SS 1 --ceph main --suite smoke -p 100 --force-priority 08 05 * * 0 $CW $SS 1 --ceph squid --suite smoke -p 100 --force-priority 16 05 * * 0 $CW $SS 1 --ceph reef --suite smoke -p 100 --force-priority -24 05 * * 0 $CW $SS 1 --ceph quincy --suite smoke -p 100 --force-priority ## ********** windows tests on main branch - weekly # 00 03 * * 1 CEPH_BRANCH=main; MACHINE_NAME=smithi; $CW teuthology-suite -v -c $CEPH_BRANCH -n 100 -m $MACHINE_NAME -s windows -k distro -e $CEPH_QA_EMAIL @@ -122,7 +121,6 @@ TEUTHOLOGY_SUITE_ARGS="--non-interactive --newest=100 --ceph-repo=https://git.ce 16 00 * * 1 $CW $SS 1 --ceph quincy --suite upgrade-clients/client-upgrade-pacific-quincy --suite-branch pacific -p 820 24 00 * * 1 $CW $SS 120000 --ceph quincy --suite upgrade:octopus-x -p 820 32 00 * * 1 $CW $SS 120000 --ceph quincy --suite upgrade:pacific-x -p 820 -40 00 * * 1 $CW $SS 1 --ceph quincy --suite upgrade/quincy-p2p -p 820 ### upgrade runs for reef release ###### on smithi diff --git a/qa/suites/rados/monthrash/workloads/rados_mon_osdmap_prune.yaml b/qa/suites/rados/monthrash/workloads/rados_mon_osdmap_prune.yaml index 372bf2561fa..8b3c4c11ac6 100644 --- a/qa/suites/rados/monthrash/workloads/rados_mon_osdmap_prune.yaml +++ b/qa/suites/rados/monthrash/workloads/rados_mon_osdmap_prune.yaml @@ -15,6 +15,7 @@ overrides: # causing tests to fail due to health warns, even if # the tests themselves are successful. - \(OSDMAP_FLAGS\) + - \(PG_DEGRADED\) tasks: - workunit: clients: diff --git a/qa/suites/rados/verify/validater/valgrind.yaml b/qa/suites/rados/verify/validater/valgrind.yaml index e2dc29b5f7e..17cf141b0cd 100644 --- a/qa/suites/rados/verify/validater/valgrind.yaml +++ b/qa/suites/rados/verify/validater/valgrind.yaml @@ -27,6 +27,7 @@ overrides: - \(SLOW_OPS\) - slow request - OSD bench result + - OSD_DOWN valgrind: mon: [--tool=memcheck, --leak-check=full, --show-reachable=yes] osd: [--tool=memcheck] diff --git a/qa/suites/upgrade/quincy-x/parallel/0-start.yaml b/qa/suites/upgrade/quincy-x/parallel/0-start.yaml index 40fbcefe728..62fb6427f72 100644 --- a/qa/suites/upgrade/quincy-x/parallel/0-start.yaml +++ b/qa/suites/upgrade/quincy-x/parallel/0-start.yaml @@ -32,13 +32,22 @@ overrides: osd: osd shutdown pgref assert: true log-ignorelist: - - \(POOL_APP_NOT_ENABLED\) + - do not have an application enabled + - application not enabled + - or freeform for custom applications + - POOL_APP_NOT_ENABLED + - is down - OSD_DOWN - mons down - mon down - MON_DOWN - out of quorum + - PG_AVAILABILITY - PG_DEGRADED - Reduced data availability - Degraded data redundancy + - pg .* is stuck inactive + - pg .* is .*degraded + - FS_DEGRADED - OSDMAP_FLAGS + - OSD_UPGRADE_FINISHED diff --git a/qa/suites/upgrade/quincy-x/parallel/1-tasks.yaml b/qa/suites/upgrade/quincy-x/parallel/1-tasks.yaml index e27c7c0f092..f7167975aa9 100644 --- a/qa/suites/upgrade/quincy-x/parallel/1-tasks.yaml +++ b/qa/suites/upgrade/quincy-x/parallel/1-tasks.yaml @@ -1,11 +1,8 @@ overrides: ceph: log-ignorelist: - - mons down - - mon down - - MON_DOWN - - out of quorum - - PG_AVAILABILITY + - Telemetry requires re-opt-in + - telemetry module includes new collections tasks: - install: branch: quincy diff --git a/qa/suites/upgrade/quincy-x/stress-split/1-start.yaml b/qa/suites/upgrade/quincy-x/stress-split/1-start.yaml index 005514292ce..5641471629e 100644 --- a/qa/suites/upgrade/quincy-x/stress-split/1-start.yaml +++ b/qa/suites/upgrade/quincy-x/stress-split/1-start.yaml @@ -1,17 +1,25 @@ overrides: ceph: log-ignorelist: - - \(POOL_APP_NOT_ENABLED\) + - do not have an application enabled + - application not enabled + - or freeform for custom applications + - POOL_APP_NOT_ENABLED + - is down - OSD_DOWN - mons down - mon down - MON_DOWN - out of quorum + - PG_AVAILABILITY - PG_DEGRADED - Reduced data availability - Degraded data redundancy + - pg .* is stuck inactive + - pg .* is .*degraded + - FS_DEGRADED - OSDMAP_FLAGS - - PG_AVAILABILITY + - OSD_UPGRADE_FINISHED tasks: - install: branch: quincy diff --git a/qa/suites/upgrade/reef-x/parallel/0-start.yaml b/qa/suites/upgrade/reef-x/parallel/0-start.yaml index 146bd57960d..62fb6427f72 100644 --- a/qa/suites/upgrade/reef-x/parallel/0-start.yaml +++ b/qa/suites/upgrade/reef-x/parallel/0-start.yaml @@ -32,4 +32,22 @@ overrides: osd: osd shutdown pgref assert: true log-ignorelist: - - PG_DEGRADED + - do not have an application enabled + - application not enabled + - or freeform for custom applications + - POOL_APP_NOT_ENABLED + - is down + - OSD_DOWN + - mons down + - mon down + - MON_DOWN + - out of quorum + - PG_AVAILABILITY + - PG_DEGRADED + - Reduced data availability + - Degraded data redundancy + - pg .* is stuck inactive + - pg .* is .*degraded + - FS_DEGRADED + - OSDMAP_FLAGS + - OSD_UPGRADE_FINISHED diff --git a/qa/suites/upgrade/reef-x/parallel/1-tasks.yaml b/qa/suites/upgrade/reef-x/parallel/1-tasks.yaml index ce4e0cc228b..b5160c2dd00 100644 --- a/qa/suites/upgrade/reef-x/parallel/1-tasks.yaml +++ b/qa/suites/upgrade/reef-x/parallel/1-tasks.yaml @@ -1,12 +1,8 @@ overrides: ceph: log-ignorelist: - - mons down - - mon down - - MON_DOWN - - out of quorum - - PG_AVAILABILITY - - PG_DEGRADED + - Telemetry requires re-opt-in + - telemetry module includes new collections tasks: - install: branch: reef diff --git a/qa/suites/upgrade/reef-x/parallel/overrides/ignorelist_health.yaml b/qa/suites/upgrade/reef-x/parallel/overrides/ignorelist_health.yaml index 5e995da7d2c..fa93b2f2ece 100644 --- a/qa/suites/upgrade/reef-x/parallel/overrides/ignorelist_health.yaml +++ b/qa/suites/upgrade/reef-x/parallel/overrides/ignorelist_health.yaml @@ -1,20 +1,19 @@ overrides: ceph: log-ignorelist: - - \(MDS_ALL_DOWN\) - - \(MDS_UP_LESS_THAN_MAX\) - - \(OSD_SLOW_PING_TIME + - MDS_ALL_DOWN + - MDS_UP_LESS_THAN_MAX + - OSD_SLOW_PING_TIME - reached quota + - running out of quota - overall HEALTH_ - - \(CACHE_POOL_NO_HIT_SET\) - - \(POOL_FULL\) - - \(SMALLER_PGP_NUM\) - - \(SLOW_OPS\) - - \(CACHE_POOL_NEAR_FULL\) - - \(POOL_APP_NOT_ENABLED\) - - \(PG_AVAILABILITY\) - - \(OBJECT_MISPLACED\) + - CACHE_POOL_NO_HIT_SET + - pool\(s\) full + - POOL_FULL + - SMALLER_PGP_NUM + - SLOW_OPS + - CACHE_POOL_NEAR_FULL + - OBJECT_MISPLACED - slow request - - \(MON_DOWN\) - noscrub - nodeep-scrub diff --git a/qa/suites/upgrade/reef-x/stress-split/1-start.yaml b/qa/suites/upgrade/reef-x/stress-split/1-start.yaml index 992f9e1bc36..59ccfe2cd02 100644 --- a/qa/suites/upgrade/reef-x/stress-split/1-start.yaml +++ b/qa/suites/upgrade/reef-x/stress-split/1-start.yaml @@ -1,11 +1,25 @@ overrides: ceph: log-ignorelist: + - do not have an application enabled + - application not enabled + - or freeform for custom applications + - POOL_APP_NOT_ENABLED + - is down + - OSD_DOWN - mons down - mon down - MON_DOWN - out of quorum - PG_AVAILABILITY + - PG_DEGRADED + - Reduced data availability + - Degraded data redundancy + - pg .* is stuck inactive + - pg .* is .*degraded + - FS_DEGRADED + - OSDMAP_FLAGS + - OSD_UPGRADE_FINISHED tasks: - install: branch: reef diff --git a/qa/suites/upgrade/reef-x/stress-split/overrides/ignorelist_health.yaml b/qa/suites/upgrade/reef-x/stress-split/overrides/ignorelist_health.yaml index 5e995da7d2c..fa93b2f2ece 100644 --- a/qa/suites/upgrade/reef-x/stress-split/overrides/ignorelist_health.yaml +++ b/qa/suites/upgrade/reef-x/stress-split/overrides/ignorelist_health.yaml @@ -1,20 +1,19 @@ overrides: ceph: log-ignorelist: - - \(MDS_ALL_DOWN\) - - \(MDS_UP_LESS_THAN_MAX\) - - \(OSD_SLOW_PING_TIME + - MDS_ALL_DOWN + - MDS_UP_LESS_THAN_MAX + - OSD_SLOW_PING_TIME - reached quota + - running out of quota - overall HEALTH_ - - \(CACHE_POOL_NO_HIT_SET\) - - \(POOL_FULL\) - - \(SMALLER_PGP_NUM\) - - \(SLOW_OPS\) - - \(CACHE_POOL_NEAR_FULL\) - - \(POOL_APP_NOT_ENABLED\) - - \(PG_AVAILABILITY\) - - \(OBJECT_MISPLACED\) + - CACHE_POOL_NO_HIT_SET + - pool\(s\) full + - POOL_FULL + - SMALLER_PGP_NUM + - SLOW_OPS + - CACHE_POOL_NEAR_FULL + - OBJECT_MISPLACED - slow request - - \(MON_DOWN\) - noscrub - nodeep-scrub diff --git a/src/cephadm/cephadm.py b/src/cephadm/cephadm.py index d2ddf564116..a8616980e4d 100755 --- a/src/cephadm/cephadm.py +++ b/src/cephadm/cephadm.py @@ -111,6 +111,7 @@ from cephadmlib.file_utils import ( unlink_file, write_new, write_tmp, + update_meta_file, ) from cephadmlib.net_utils import ( build_addrv_params, @@ -3453,6 +3454,7 @@ def list_daemons( detail: bool = True, legacy_dir: Optional[str] = None, daemon_name: Optional[str] = None, + type_of_daemon: Optional[str] = None, ) -> List[Dict[str, str]]: host_version: Optional[str] = None ls = [] @@ -3489,6 +3491,8 @@ def list_daemons( if os.path.exists(data_dir): for i in os.listdir(data_dir): if i in ['mon', 'osd', 'mds', 'mgr', 'rgw']: + if type_of_daemon and type_of_daemon != i: + continue daemon_type = i for j in os.listdir(os.path.join(data_dir, i)): if '-' not in j: @@ -3525,6 +3529,8 @@ def list_daemons( if daemon_name and name != daemon_name: continue (daemon_type, daemon_id) = j.split('.', 1) + if type_of_daemon and type_of_daemon != daemon_type: + continue unit_name = get_unit_name(fsid, daemon_type, daemon_id) @@ -4705,6 +4711,34 @@ def command_list_images(ctx: CephadmContext) -> None: # print default images cp_obj.write(sys.stdout) + +def update_service_for_daemon(ctx: CephadmContext, + available_daemons: list, + update_daemons: list) -> None: + """ Update the unit.meta file of daemon with required service name for valid daemons""" + + data = {'service_name': ctx.service_name} + # check if all the daemon names are valid + if not set(update_daemons).issubset(set(available_daemons)): + raise Error(f'Error EINVAL: one or more daemons of {update_daemons} does not exist on this host') + for name in update_daemons: + path = os.path.join(ctx.data_dir, ctx.fsid, name, 'unit.meta') + update_meta_file(path, data) + print(f'Successfully updated daemon {name} with service {ctx.service_name}') + + +@infer_fsid +def command_update_osd_service(ctx: CephadmContext) -> int: + """update service for provided daemon""" + update_daemons = [f'osd.{osd_id}' for osd_id in ctx.osd_ids.split(',')] + daemons = list_daemons(ctx, detail=False, type_of_daemon='osd') + if not daemons: + raise Error(f'Daemon {ctx.osd_ids} does not exists on this host') + available_daemons = [d['name'] for d in daemons] + update_service_for_daemon(ctx, available_daemons, update_daemons) + return 0 + + ################################## @@ -5571,6 +5605,14 @@ def _get_parser(): parser_list_images = subparsers.add_parser( 'list-images', help='list all the default images') parser_list_images.set_defaults(func=command_list_images) + + parser_update_service = subparsers.add_parser( + 'update-osd-service', help='update service for provided daemon') + parser_update_service.set_defaults(func=command_update_osd_service) + parser_update_service.add_argument('--fsid', help='cluster FSID') + parser_update_service.add_argument('--osd-ids', required=True, help='Comma-separated OSD IDs') + parser_update_service.add_argument('--service-name', required=True, help='OSD service name') + return parser diff --git a/src/cephadm/cephadmlib/daemons/monitoring.py b/src/cephadm/cephadmlib/daemons/monitoring.py index 9a9402632b0..4ba00daaefb 100644 --- a/src/cephadm/cephadmlib/daemons/monitoring.py +++ b/src/cephadm/cephadmlib/daemons/monitoring.py @@ -16,7 +16,13 @@ from ..daemon_form import register as register_daemon_form from ..daemon_identity import DaemonIdentity from ..deployment_utils import to_deployment_container from ..exceptions import Error -from ..net_utils import get_fqdn, get_hostname, get_ip_addresses, wrap_ipv6 +from ..net_utils import ( + get_fqdn, + get_hostname, + get_ip_addresses, + wrap_ipv6, + EndPoint, +) @register_daemon_form @@ -89,11 +95,6 @@ class Monitoring(ContainerDaemonForm): 'image': DefaultImages.ALERTMANAGER.image_ref, 'cpus': '2', 'memory': '2GB', - 'args': [ - '--cluster.listen-address=:{}'.format( - port_map['alertmanager'][1] - ), - ], 'config-json-files': [ 'alertmanager.yml', ], @@ -248,11 +249,14 @@ class Monitoring(ContainerDaemonForm): ip = meta['ip'] if 'ports' in meta and meta['ports']: port = meta['ports'][0] - if daemon_type == 'prometheus': - config = fetch_configs(ctx) + config = fetch_configs(ctx) + if daemon_type in ['prometheus', 'alertmanager']: ip_to_bind_to = config.get('ip_to_bind_to', '') if ip_to_bind_to: ip = ip_to_bind_to + web_listen_addr = str(EndPoint(ip, port)) + r += [f'--web.listen-address={web_listen_addr}'] + if daemon_type == 'prometheus': retention_time = config.get('retention_time', '15d') retention_size = config.get( 'retention_size', '0' @@ -276,9 +280,11 @@ class Monitoring(ContainerDaemonForm): r += ['--web.route-prefix=/prometheus/'] else: r += [f'--web.external-url={scheme}://{host}:{port}'] - r += [f'--web.listen-address={ip}:{port}'] if daemon_type == 'alertmanager': - config = fetch_configs(ctx) + clus_listen_addr = str( + EndPoint(ip, self.port_map[daemon_type][1]) + ) + r += [f'--cluster.listen-address={clus_listen_addr}'] use_url_prefix = config.get('use_url_prefix', False) peers = config.get('peers', list()) # type: ignore for peer in peers: @@ -294,13 +300,11 @@ class Monitoring(ContainerDaemonForm): if daemon_type == 'promtail': r += ['--config.expand-env'] if daemon_type == 'prometheus': - config = fetch_configs(ctx) try: r += [f'--web.config.file={config["web_config"]}'] except KeyError: pass if daemon_type == 'node-exporter': - config = fetch_configs(ctx) try: r += [f'--web.config.file={config["web_config"]}'] except KeyError: diff --git a/src/cephadm/cephadmlib/file_utils.py b/src/cephadm/cephadmlib/file_utils.py index 27e70e31756..4dd88cc3671 100644 --- a/src/cephadm/cephadmlib/file_utils.py +++ b/src/cephadm/cephadmlib/file_utils.py @@ -5,6 +5,7 @@ import datetime import logging import os import tempfile +import json from contextlib import contextmanager from pathlib import Path @@ -157,3 +158,26 @@ def unlink_file( except Exception: if not ignore_errors: raise + + +def update_meta_file(file_path: str, update_key_val: dict) -> None: + """Update key in the file with provided value""" + try: + with open(file_path, 'r') as fh: + data = json.load(fh) + file_stat = os.stat(file_path) + except FileNotFoundError: + raise + except Exception: + logger.exception(f'Failed to update {file_path}') + raise + data.update( + {key: value for key, value in update_key_val.items() if key in data} + ) + + with write_new( + file_path, + owner=(file_stat.st_uid, file_stat.st_gid), + perms=(file_stat.st_mode & 0o777), + ) as fh: + fh.write(json.dumps(data, indent=4) + '\n') diff --git a/src/cephadm/cephadmlib/net_utils.py b/src/cephadm/cephadmlib/net_utils.py index 9a7f138b1c6..bfa61d933ef 100644 --- a/src/cephadm/cephadmlib/net_utils.py +++ b/src/cephadm/cephadmlib/net_utils.py @@ -24,12 +24,22 @@ class EndPoint: def __init__(self, ip: str, port: int) -> None: self.ip = ip self.port = port + self.is_ipv4 = True + try: + if ip and ipaddress.ip_network(ip).version == 6: + self.is_ipv4 = False + except Exception: + logger.exception('Failed to check ip address version') def __str__(self) -> str: - return f'{self.ip}:{self.port}' + if self.is_ipv4: + return f'{self.ip}:{self.port}' + return f'[{self.ip}]:{self.port}' def __repr__(self) -> str: - return f'{self.ip}:{self.port}' + if self.is_ipv4: + return f'{self.ip}:{self.port}' + return f'[{self.ip}]:{self.port}' def attempt_bind(ctx, s, address, port): diff --git a/src/cephadm/tests/test_deploy.py b/src/cephadm/tests/test_deploy.py index c5094db335f..1736639ed55 100644 --- a/src/cephadm/tests/test_deploy.py +++ b/src/cephadm/tests/test_deploy.py @@ -316,7 +316,7 @@ def test_deploy_a_monitoring_container(cephadm_fs, funkypatch): runfile_lines = f.read().splitlines() assert 'podman' in runfile_lines[-1] assert runfile_lines[-1].endswith( - 'quay.io/titans/prometheus:latest --config.file=/etc/prometheus/prometheus.yml --storage.tsdb.path=/prometheus --storage.tsdb.retention.time=15d --storage.tsdb.retention.size=0 --web.external-url=http://10.10.10.10:9095 --web.listen-address=1.2.3.4:9095' + 'quay.io/titans/prometheus:latest --config.file=/etc/prometheus/prometheus.yml --storage.tsdb.path=/prometheus --web.listen-address=1.2.3.4:9095 --storage.tsdb.retention.time=15d --storage.tsdb.retention.size=0 --web.external-url=http://10.10.10.10:9095' ) assert '--user 8765' in runfile_lines[-1] assert f'-v /var/lib/ceph/{fsid}/prometheus.fire/etc/prometheus:/etc/prometheus:Z' in runfile_lines[-1] diff --git a/src/common/bit_vector.hpp b/src/common/bit_vector.hpp index 961d9a0192e..c5fd491ed29 100644 --- a/src/common/bit_vector.hpp +++ b/src/common/bit_vector.hpp @@ -29,8 +29,8 @@ private: static const uint8_t MASK = static_cast<uint8_t>((1 << _bit_count) - 1); // must be power of 2 - BOOST_STATIC_ASSERT((_bit_count != 0) && !(_bit_count & (_bit_count - 1))); - BOOST_STATIC_ASSERT(_bit_count <= BITS_PER_BYTE); + static_assert((_bit_count != 0) && !(_bit_count & (_bit_count - 1))); + static_assert(_bit_count <= BITS_PER_BYTE); template <typename DataIterator> class ReferenceImpl { diff --git a/src/common/ceph_time.h b/src/common/ceph_time.h index 01feff4c063..0b05be5372e 100644 --- a/src/common/ceph_time.h +++ b/src/common/ceph_time.h @@ -342,6 +342,23 @@ public: } }; +// Please note time_guard is not thread safety -- multiple threads +// updating same diff_accumulator can corrupt it. +template <class ClockT = mono_clock> +class time_guard { + const typename ClockT::time_point start; + timespan& diff_accumulator; + +public: + time_guard(timespan& diff_accumulator) + : start(ClockT::now()), + diff_accumulator(diff_accumulator) { + } + ~time_guard() { + diff_accumulator += ClockT::now() - start; + } +}; + namespace time_detail { // So that our subtractions produce negative spans rather than // arithmetic underflow. diff --git a/src/crimson/os/alienstore/alien_store.cc b/src/crimson/os/alienstore/alien_store.cc index a9c69f4660e..db6decd84f9 100644 --- a/src/crimson/os/alienstore/alien_store.cc +++ b/src/crimson/os/alienstore/alien_store.cc @@ -435,8 +435,21 @@ auto AlienStore::omap_get_values(CollectionRef ch, return do_with_op_gate(omap_values_t{}, [=, this] (auto &values) { return tp->submit(ch->get_cid().hash_to_shard(tp->size()), [=, this, &values] { auto c = static_cast<AlienCollection*>(ch.get()); - return store->omap_get_values(c->collection, oid, start, - reinterpret_cast<map<string, bufferlist>*>(&values)); + return store->omap_iterate( + c->collection, oid, + ObjectStore::omap_iter_seek_t{ + .seek_position = start.value_or(std::string{}), + // FIXME: classical OSDs begins iteration from LOWER_BOUND + // (or UPPER_BOUND if filter_prefix > start). However, these + // bits are not implemented yet + .seek_type = ObjectStore::omap_iter_seek_t::UPPER_BOUND + }, + [&values] + (std::string_view key, std::string_view value) mutable { + values[std::string{key}].append(value); + // FIXME: there is limit on number of entries yet + return ObjectStore::omap_iter_ret_t::NEXT; + }); }).then([&values] (int r) -> read_errorator::future<std::tuple<bool, omap_values_t>> { if (r == -ENOENT) { diff --git a/src/crimson/os/seastore/CMakeLists.txt b/src/crimson/os/seastore/CMakeLists.txt index e5b8960c38c..3da5e65ceec 100644 --- a/src/crimson/os/seastore/CMakeLists.txt +++ b/src/crimson/os/seastore/CMakeLists.txt @@ -1,5 +1,6 @@ set(crimson_seastore_srcs cached_extent.cc + lba_mapping.cc seastore_types.cc segment_manager.cc segment_manager/ephemeral.cc @@ -19,7 +20,6 @@ set(crimson_seastore_srcs omap_manager.cc omap_manager/btree/btree_omap_manager.cc omap_manager/btree/omap_btree_node_impl.cc - btree/btree_range_pin.cc btree/fixed_kv_node.cc onode.cc onode_manager/staged-fltree/node.cc diff --git a/src/crimson/os/seastore/async_cleaner.h b/src/crimson/os/seastore/async_cleaner.h index 01ab44c4c7c..1cef771aeb8 100644 --- a/src/crimson/os/seastore/async_cleaner.h +++ b/src/crimson/os/seastore/async_cleaner.h @@ -17,6 +17,7 @@ #include "crimson/os/seastore/randomblock_manager_group.h" #include "crimson/os/seastore/transaction.h" #include "crimson/os/seastore/segment_seq_allocator.h" +#include "crimson/os/seastore/backref_mapping.h" namespace crimson::os::seastore { diff --git a/src/crimson/os/seastore/backref/btree_backref_manager.h b/src/crimson/os/seastore/backref/btree_backref_manager.h index 38084bb00e6..24897dd55da 100644 --- a/src/crimson/os/seastore/backref/btree_backref_manager.h +++ b/src/crimson/os/seastore/backref/btree_backref_manager.h @@ -9,44 +9,28 @@ namespace crimson::os::seastore::backref { -constexpr size_t BACKREF_BLOCK_SIZE = 4096; - -class BtreeBackrefMapping : public BtreeNodeMapping<paddr_t, laddr_t> { - extent_types_t type; +class BtreeBackrefMapping : public BackrefMapping { public: BtreeBackrefMapping(op_context_t<paddr_t> ctx) - : BtreeNodeMapping(ctx) {} + : BackrefMapping(ctx) {} BtreeBackrefMapping( op_context_t<paddr_t> ctx, CachedExtentRef parent, uint16_t pos, backref_map_val_t &val, backref_node_meta_t &&meta) - : BtreeNodeMapping( + : BackrefMapping( + val.type, ctx, parent, pos, val.laddr, val.len, - std::forward<backref_node_meta_t>(meta)), - type(val.type) - {} - extent_types_t get_type() const final { - return type; - } - - bool is_clone() const final { - return false; - } - -protected: - std::unique_ptr<BtreeNodeMapping<paddr_t, laddr_t>> _duplicate( - op_context_t<paddr_t> ctx) const final { - return std::unique_ptr<BtreeNodeMapping<paddr_t, laddr_t>>( - new BtreeBackrefMapping(ctx)); - } + std::forward<backref_node_meta_t>(meta)) {} }; +constexpr size_t BACKREF_BLOCK_SIZE = 4096; + using BackrefBtree = FixedKVBtree< paddr_t, backref_map_val_t, BackrefInternalNode, BackrefLeafNode, BtreeBackrefMapping, BACKREF_BLOCK_SIZE, false>; diff --git a/src/crimson/os/seastore/backref_manager.h b/src/crimson/os/seastore/backref_manager.h index 3feedb997b4..8c746b571b2 100644 --- a/src/crimson/os/seastore/backref_manager.h +++ b/src/crimson/os/seastore/backref_manager.h @@ -6,6 +6,7 @@ #include "crimson/os/seastore/cache.h" #include "crimson/os/seastore/cached_extent.h" #include "crimson/os/seastore/transaction.h" +#include "crimson/os/seastore/backref_mapping.h" namespace crimson::os::seastore { diff --git a/src/crimson/os/seastore/backref_mapping.h b/src/crimson/os/seastore/backref_mapping.h new file mode 100644 index 00000000000..d0a6a0ea6ff --- /dev/null +++ b/src/crimson/os/seastore/backref_mapping.h @@ -0,0 +1,27 @@ +// -*- mode:C++; tab-width:8; c-basic-offset:2; indent-tabs-mode:t -*- +// vim: ts=8 sw=2 smarttab + +#pragma once + +#include "crimson/os/seastore/btree/btree_range_pin.h" + +namespace crimson::os::seastore { + +class BackrefMapping : public BtreeNodeMapping<paddr_t, laddr_t> { + extent_types_t type; +public: + BackrefMapping(op_context_t<paddr_t> ctx) + : BtreeNodeMapping(ctx) {} + template <typename... T> + BackrefMapping(extent_types_t type, T&&... t) + : BtreeNodeMapping(std::forward<T>(t)...), + type(type) {} + extent_types_t get_type() const { + return type; + } +}; + +using BackrefMappingRef = std::unique_ptr<BackrefMapping>; +using backref_pin_list_t = std::list<BackrefMappingRef>; + +} // namespace crimson::os::seastore diff --git a/src/crimson/os/seastore/btree/btree_range_pin.cc b/src/crimson/os/seastore/btree/btree_range_pin.cc deleted file mode 100644 index f0d507a24c4..00000000000 --- a/src/crimson/os/seastore/btree/btree_range_pin.cc +++ /dev/null @@ -1,54 +0,0 @@ -// -*- mode:C++; tab-width:8; c-basic-offset:2; indent-tabs-mode:t -*- -// vim: ts=8 sw=2 smarttab - -#include "crimson/os/seastore/btree/btree_range_pin.h" -#include "crimson/os/seastore/btree/fixed_kv_node.h" - -namespace crimson::os::seastore { - -template <typename key_t, typename val_t> -get_child_ret_t<LogicalCachedExtent> -BtreeNodeMapping<key_t, val_t>::get_logical_extent( - Transaction &t) -{ - ceph_assert(is_parent_viewable()); - assert(pos != std::numeric_limits<uint16_t>::max()); - ceph_assert(t.get_trans_id() == ctx.trans.get_trans_id()); - auto &p = (FixedKVNode<key_t>&)*parent; - auto k = this->is_indirect() - ? this->get_intermediate_base() - : get_key(); - auto v = p.template get_child<LogicalCachedExtent>(ctx, pos, k); - if (!v.has_child()) { - this->child_pos = v.get_child_pos(); - } - return v; -} - -template <typename key_t, typename val_t> -bool BtreeNodeMapping<key_t, val_t>::is_stable() const -{ - assert(!this->parent_modified()); - assert(pos != std::numeric_limits<uint16_t>::max()); - auto &p = (FixedKVNode<key_t>&)*parent; - auto k = this->is_indirect() - ? this->get_intermediate_base() - : get_key(); - return p.is_child_stable(ctx, pos, k); -} - -template <typename key_t, typename val_t> -bool BtreeNodeMapping<key_t, val_t>::is_data_stable() const -{ - assert(!this->parent_modified()); - assert(pos != std::numeric_limits<uint16_t>::max()); - auto &p = (FixedKVNode<key_t>&)*parent; - auto k = this->is_indirect() - ? this->get_intermediate_base() - : get_key(); - return p.is_child_data_stable(ctx, pos, k); -} - -template class BtreeNodeMapping<laddr_t, paddr_t>; -template class BtreeNodeMapping<paddr_t, laddr_t>; -} // namespace crimson::os::seastore diff --git a/src/crimson/os/seastore/btree/btree_range_pin.h b/src/crimson/os/seastore/btree/btree_range_pin.h index 91751801e5d..bfd350a8bed 100644 --- a/src/crimson/os/seastore/btree/btree_range_pin.h +++ b/src/crimson/os/seastore/btree/btree_range_pin.h @@ -7,11 +7,12 @@ #include "crimson/common/log.h" -#include "crimson/os/seastore/cache.h" #include "crimson/os/seastore/cached_extent.h" #include "crimson/os/seastore/seastore_types.h" +#include "crimson/os/seastore/transaction.h" namespace crimson::os::seastore { +class Cache; template <typename node_key_t> struct op_context_t { @@ -116,8 +117,6 @@ protected: extent_len_t len = 0; fixed_kv_node_meta_t<key_t> range; uint16_t pos = std::numeric_limits<uint16_t>::max(); - - virtual std::unique_ptr<BtreeNodeMapping> _duplicate(op_context_t<key_t>) const = 0; fixed_kv_node_meta_t<key_t> _get_pin_range() const { return range; } @@ -139,11 +138,7 @@ public: len(len), range(meta), pos(pos) - { - if (!parent->is_pending()) { - this->child_pos = {parent, pos}; - } - } + {} CachedExtentRef get_parent() const final { return parent; @@ -162,11 +157,6 @@ public: return len; } - extent_types_t get_type() const override { - ceph_abort("should never happen"); - return extent_types_t::ROOT; - } - val_t get_val() const final { if constexpr (std::is_same_v<val_t, paddr_t>) { return value.get_paddr(); @@ -180,16 +170,6 @@ public: return range.begin; } - PhysicalNodeMappingRef<key_t, val_t> duplicate() const final { - auto ret = _duplicate(ctx); - ret->range = range; - ret->value = value; - ret->parent = parent; - ret->len = len; - ret->pos = pos; - return ret; - } - bool has_been_invalidated() const final { return parent->has_been_invalidated(); } @@ -215,9 +195,6 @@ public: return unviewable; } - get_child_ret_t<LogicalCachedExtent> get_logical_extent(Transaction&) final; - bool is_stable() const final; - bool is_data_stable() const final; bool is_parent_viewable() const final { ceph_assert(parent); if (!parent->is_valid()) { diff --git a/src/crimson/os/seastore/cached_extent.cc b/src/crimson/os/seastore/cached_extent.cc index ab2492f5bb6..49fede1d9a8 100644 --- a/src/crimson/os/seastore/cached_extent.cc +++ b/src/crimson/os/seastore/cached_extent.cc @@ -7,6 +7,7 @@ #include "crimson/common/log.h" #include "crimson/os/seastore/btree/fixed_kv_node.h" +#include "crimson/os/seastore/lba_mapping.h" namespace { [[maybe_unused]] seastar::logger& logger() { @@ -142,6 +143,12 @@ void LogicalCachedExtent::on_replace_prior() { parent->children[off] = this; } +void LogicalCachedExtent::maybe_set_intermediate_laddr(LBAMapping &mapping) { + laddr = mapping.is_indirect() + ? mapping.get_intermediate_base() + : mapping.get_key(); +} + parent_tracker_t::~parent_tracker_t() { // this is parent's tracker, reset it auto &p = (FixedKVNode<laddr_t>&)*parent; @@ -150,32 +157,6 @@ parent_tracker_t::~parent_tracker_t() { } } -std::ostream &operator<<(std::ostream &out, const LBAMapping &rhs) -{ - out << "LBAMapping(" << rhs.get_key() - << "~0x" << std::hex << rhs.get_length() << std::dec - << "->" << rhs.get_val(); - if (rhs.is_indirect()) { - out << ",indirect(" << rhs.get_intermediate_base() - << "~0x" << std::hex << rhs.get_intermediate_length() - << "@0x" << rhs.get_intermediate_offset() << std::dec - << ")"; - } - out << ")"; - return out; -} - -std::ostream &operator<<(std::ostream &out, const lba_pin_list_t &rhs) -{ - bool first = true; - out << '['; - for (const auto &i: rhs) { - out << (first ? "" : ",") << *i; - first = false; - } - return out << ']'; -} - bool BufferSpace::is_range_loaded(extent_len_t offset, extent_len_t length) const { assert(length > 0); diff --git a/src/crimson/os/seastore/cached_extent.h b/src/crimson/os/seastore/cached_extent.h index f9356f40b83..9dc60d719eb 100644 --- a/src/crimson/os/seastore/cached_extent.h +++ b/src/crimson/os/seastore/cached_extent.h @@ -1279,7 +1279,6 @@ private: }; class ChildableCachedExtent; -class LogicalCachedExtent; class child_pos_t { public: @@ -1337,48 +1336,18 @@ using PhysicalNodeMappingRef = std::unique_ptr<PhysicalNodeMapping<key_t, val_t> template <typename key_t, typename val_t> class PhysicalNodeMapping { public: + PhysicalNodeMapping() = default; + PhysicalNodeMapping(const PhysicalNodeMapping&) = delete; virtual extent_len_t get_length() const = 0; - virtual extent_types_t get_type() const = 0; virtual val_t get_val() const = 0; virtual key_t get_key() const = 0; - virtual PhysicalNodeMappingRef<key_t, val_t> duplicate() const = 0; - virtual PhysicalNodeMappingRef<key_t, val_t> refresh_with_pending_parent() { - ceph_abort("impossible"); - return {}; - } virtual bool has_been_invalidated() const = 0; virtual CachedExtentRef get_parent() const = 0; virtual uint16_t get_pos() const = 0; - // An lba pin may be indirect, see comments in lba_manager/btree/btree_lba_manager.h - virtual bool is_indirect() const { return false; } - virtual key_t get_intermediate_key() const { return min_max_t<key_t>::null; } - virtual key_t get_intermediate_base() const { return min_max_t<key_t>::null; } - virtual extent_len_t get_intermediate_length() const { return 0; } virtual uint32_t get_checksum() const { ceph_abort("impossible"); return 0; } - // The start offset of the pin, must be 0 if the pin is not indirect - virtual extent_len_t get_intermediate_offset() const { - return std::numeric_limits<extent_len_t>::max(); - } - - virtual get_child_ret_t<LogicalCachedExtent> - get_logical_extent(Transaction &t) = 0; - - void link_child(ChildableCachedExtent *c) { - ceph_assert(child_pos); - child_pos->link_child(c); - } - - // For reserved mappings, the return values are - // undefined although it won't crash - virtual bool is_stable() const = 0; - virtual bool is_data_stable() const = 0; - virtual bool is_clone() const = 0; - bool is_zero_reserved() const { - return !get_val().is_real(); - } virtual bool is_parent_viewable() const = 0; virtual bool is_parent_valid() const = 0; virtual bool parent_modified() const { @@ -1391,24 +1360,8 @@ public: } virtual ~PhysicalNodeMapping() {} -protected: - std::optional<child_pos_t> child_pos = std::nullopt; }; -using LBAMapping = PhysicalNodeMapping<laddr_t, paddr_t>; -using LBAMappingRef = PhysicalNodeMappingRef<laddr_t, paddr_t>; - -std::ostream &operator<<(std::ostream &out, const LBAMapping &rhs); - -using lba_pin_list_t = std::list<LBAMappingRef>; - -std::ostream &operator<<(std::ostream &out, const lba_pin_list_t &rhs); - -using BackrefMapping = PhysicalNodeMapping<paddr_t, laddr_t>; -using BackrefMappingRef = PhysicalNodeMappingRef<paddr_t, laddr_t>; - -using backref_pin_list_t = std::list<BackrefMappingRef>; - /** * RetiredExtentPlaceholder * @@ -1522,6 +1475,8 @@ private: return out; } }; + +class LBAMapping; /** * LogicalCachedExtent * @@ -1556,11 +1511,7 @@ public: laddr = nladdr; } - void maybe_set_intermediate_laddr(LBAMapping &mapping) { - laddr = mapping.is_indirect() - ? mapping.get_intermediate_base() - : mapping.get_key(); - } + void maybe_set_intermediate_laddr(LBAMapping &mapping); void apply_delta_and_adjust_crc( paddr_t base, const ceph::bufferlist &bl) final { @@ -1660,8 +1611,6 @@ using lextent_list_t = addr_extent_list_base_t< } #if FMT_VERSION >= 90000 -template <> struct fmt::formatter<crimson::os::seastore::lba_pin_list_t> : fmt::ostream_formatter {}; template <> struct fmt::formatter<crimson::os::seastore::CachedExtent> : fmt::ostream_formatter {}; template <> struct fmt::formatter<crimson::os::seastore::LogicalCachedExtent> : fmt::ostream_formatter {}; -template <> struct fmt::formatter<crimson::os::seastore::LBAMapping> : fmt::ostream_formatter {}; #endif diff --git a/src/crimson/os/seastore/lba_manager.h b/src/crimson/os/seastore/lba_manager.h index a050b2cdf47..9a34bf56157 100644 --- a/src/crimson/os/seastore/lba_manager.h +++ b/src/crimson/os/seastore/lba_manager.h @@ -19,6 +19,7 @@ #include "crimson/os/seastore/cache.h" #include "crimson/os/seastore/seastore_types.h" +#include "crimson/os/seastore/lba_mapping.h" namespace crimson::os::seastore { diff --git a/src/crimson/os/seastore/lba_manager/btree/btree_lba_manager.cc b/src/crimson/os/seastore/lba_manager/btree/btree_lba_manager.cc index 007737ff450..888d3c359ac 100644 --- a/src/crimson/os/seastore/lba_manager/btree/btree_lba_manager.cc +++ b/src/crimson/os/seastore/lba_manager/btree/btree_lba_manager.cc @@ -94,6 +94,45 @@ void unlink_phy_tree_root_node<laddr_t>(RootBlockRef &root_block) { namespace crimson::os::seastore::lba_manager::btree { +get_child_ret_t<LogicalCachedExtent> +BtreeLBAMapping::get_logical_extent(Transaction &t) +{ + ceph_assert(is_parent_viewable()); + assert(pos != std::numeric_limits<uint16_t>::max()); + ceph_assert(t.get_trans_id() == ctx.trans.get_trans_id()); + auto &p = static_cast<LBALeafNode&>(*parent); + auto k = this->is_indirect() + ? this->get_intermediate_base() + : get_key(); + auto v = p.template get_child<LogicalCachedExtent>(ctx, pos, k); + if (!v.has_child()) { + this->child_pos = v.get_child_pos(); + } + return v; +} + +bool BtreeLBAMapping::is_stable() const +{ + assert(!this->parent_modified()); + assert(pos != std::numeric_limits<uint16_t>::max()); + auto &p = static_cast<LBALeafNode&>(*parent); + auto k = this->is_indirect() + ? this->get_intermediate_base() + : get_key(); + return p.is_child_stable(ctx, pos, k); +} + +bool BtreeLBAMapping::is_data_stable() const +{ + assert(!this->parent_modified()); + assert(pos != std::numeric_limits<uint16_t>::max()); + auto &p = static_cast<LBALeafNode&>(*parent); + auto k = this->is_indirect() + ? this->get_intermediate_base() + : get_key(); + return p.is_child_data_stable(ctx, pos, k); +} + BtreeLBAManager::mkfs_ret BtreeLBAManager::mkfs( Transaction &t) diff --git a/src/crimson/os/seastore/lba_manager/btree/btree_lba_manager.h b/src/crimson/os/seastore/lba_manager/btree/btree_lba_manager.h index ef10ff9623b..e0902053d0e 100644 --- a/src/crimson/os/seastore/lba_manager/btree/btree_lba_manager.h +++ b/src/crimson/os/seastore/lba_manager/btree/btree_lba_manager.h @@ -23,11 +23,15 @@ #include "crimson/os/seastore/lba_manager/btree/lba_btree_node.h" #include "crimson/os/seastore/btree/btree_range_pin.h" +namespace crimson::os::seastore { +class LogicalCachedExtent; +} + namespace crimson::os::seastore::lba_manager::btree { struct LBALeafNode; -class BtreeLBAMapping : public BtreeNodeMapping<laddr_t, paddr_t> { +class BtreeLBAMapping : public LBAMapping { // To support cloning, there are two kinds of lba mappings: // 1. physical lba mapping: the pladdr in the value of which is the paddr of // the corresponding extent; @@ -61,14 +65,14 @@ class BtreeLBAMapping : public BtreeNodeMapping<laddr_t, paddr_t> { // their keys. public: BtreeLBAMapping(op_context_t<laddr_t> ctx) - : BtreeNodeMapping(ctx) {} + : LBAMapping(ctx) {} BtreeLBAMapping( op_context_t<laddr_t> c, LBALeafNodeRef parent, uint16_t pos, lba_map_val_t &val, lba_node_meta_t meta) - : BtreeNodeMapping( + : LBAMapping( c, parent, pos, @@ -190,8 +194,12 @@ public: SUBDEBUGT(seastore_lba, "new pin {}", ctx.trans, static_cast<LBAMapping&>(*new_pin)); return new_pin; } + bool is_stable() const final; + bool is_data_stable() const final; + get_child_ret_t<LogicalCachedExtent> get_logical_extent(Transaction &t); + protected: - std::unique_ptr<BtreeNodeMapping<laddr_t, paddr_t>> _duplicate( + LBAMappingRef _duplicate( op_context_t<laddr_t> ctx) const final { auto pin = std::unique_ptr<BtreeLBAMapping>(new BtreeLBAMapping(ctx)); pin->key = key; diff --git a/src/crimson/os/seastore/lba_mapping.cc b/src/crimson/os/seastore/lba_mapping.cc new file mode 100644 index 00000000000..90fae09ce21 --- /dev/null +++ b/src/crimson/os/seastore/lba_mapping.cc @@ -0,0 +1,44 @@ +// -*- mode:C++; tab-width:8; c-basic-offset:2; indent-tabs-mode:t -*- +// vim: ts=8 sw=2 smarttab + +#include "lba_mapping.h" + +namespace crimson::os::seastore { + +std::ostream &operator<<(std::ostream &out, const LBAMapping &rhs) +{ + out << "LBAMapping(" << rhs.get_key() + << "~0x" << std::hex << rhs.get_length() << std::dec + << "->" << rhs.get_val(); + if (rhs.is_indirect()) { + out << ",indirect(" << rhs.get_intermediate_base() + << "~0x" << std::hex << rhs.get_intermediate_length() + << "@0x" << rhs.get_intermediate_offset() << std::dec + << ")"; + } + out << ")"; + return out; +} + +std::ostream &operator<<(std::ostream &out, const lba_pin_list_t &rhs) +{ + bool first = true; + out << '['; + for (const auto &i: rhs) { + out << (first ? "" : ",") << *i; + first = false; + } + return out << ']'; +} + +LBAMappingRef LBAMapping::duplicate() const { + auto ret = _duplicate(ctx); + ret->range = range; + ret->value = value; + ret->parent = parent; + ret->len = len; + ret->pos = pos; + return ret; +} + +} // namespace crimson::os::seastore diff --git a/src/crimson/os/seastore/lba_mapping.h b/src/crimson/os/seastore/lba_mapping.h new file mode 100644 index 00000000000..338d4d53f55 --- /dev/null +++ b/src/crimson/os/seastore/lba_mapping.h @@ -0,0 +1,73 @@ +// -*- mode:C++; tab-width:8; c-basic-offset:2; indent-tabs-mode:t -*- +// vim: ts=8 sw=2 smarttab + +#pragma once + +#include "crimson/os/seastore/cached_extent.h" +#include "crimson/os/seastore/btree/btree_range_pin.h" + +namespace crimson::os::seastore { + +class LBAMapping; +using LBAMappingRef = std::unique_ptr<LBAMapping>; + +class LogicalCachedExtent; + +class LBAMapping : public BtreeNodeMapping<laddr_t, paddr_t> { +public: + LBAMapping(op_context_t<laddr_t> ctx) + : BtreeNodeMapping<laddr_t, paddr_t>(ctx) {} + template <typename... T> + LBAMapping(T&&... t) + : BtreeNodeMapping<laddr_t, paddr_t>(std::forward<T>(t)...) + { + if (!parent->is_pending()) { + this->child_pos = {parent, pos}; + } + } + + // An lba pin may be indirect, see comments in lba_manager/btree/btree_lba_manager.h + virtual bool is_indirect() const = 0; + virtual laddr_t get_intermediate_key() const = 0; + virtual laddr_t get_intermediate_base() const = 0; + virtual extent_len_t get_intermediate_length() const = 0; + // The start offset of the pin, must be 0 if the pin is not indirect + virtual extent_len_t get_intermediate_offset() const = 0; + + virtual get_child_ret_t<LogicalCachedExtent> + get_logical_extent(Transaction &t) = 0; + + void link_child(ChildableCachedExtent *c) { + ceph_assert(child_pos); + child_pos->link_child(c); + } + virtual LBAMappingRef refresh_with_pending_parent() = 0; + + // For reserved mappings, the return values are + // undefined although it won't crash + virtual bool is_stable() const = 0; + virtual bool is_data_stable() const = 0; + virtual bool is_clone() const = 0; + bool is_zero_reserved() const { + return !get_val().is_real(); + } + + LBAMappingRef duplicate() const; + + virtual ~LBAMapping() {} +protected: + virtual LBAMappingRef _duplicate(op_context_t<laddr_t>) const = 0; + std::optional<child_pos_t> child_pos = std::nullopt; +}; + +std::ostream &operator<<(std::ostream &out, const LBAMapping &rhs); +using lba_pin_list_t = std::list<LBAMappingRef>; + +std::ostream &operator<<(std::ostream &out, const lba_pin_list_t &rhs); + +} // namespace crimson::os::seastore + +#if FMT_VERSION >= 90000 +template <> struct fmt::formatter<crimson::os::seastore::LBAMapping> : fmt::ostream_formatter {}; +template <> struct fmt::formatter<crimson::os::seastore::lba_pin_list_t> : fmt::ostream_formatter {}; +#endif diff --git a/src/crimson/osd/backfill_state.cc b/src/crimson/osd/backfill_state.cc index 0ea9e6372f0..f957f072c93 100644 --- a/src/crimson/osd/backfill_state.cc +++ b/src/crimson/osd/backfill_state.cc @@ -381,16 +381,25 @@ BackfillState::Enqueuing::Enqueuing(my_context ctx) } } while (!all_emptied(primary_bi, backfill_state().peer_backfill_info)); - if (backfill_state().progress_tracker->tracked_objects_completed() - && Enqueuing::all_enqueued(peering_state(), - backfill_state().backfill_info, - backfill_state().peer_backfill_info)) { - backfill_state().last_backfill_started = hobject_t::get_max(); - backfill_listener().update_peers_last_backfill(hobject_t::get_max()); + if (should_rescan_primary(backfill_state().peer_backfill_info, + primary_bi)) { + // need to grab one another chunk of the object namespace and restart + // the queueing. + DEBUGDPP("reached end for current local chunk", pg()); + post_event(RequestPrimaryScanning{}); + return; + } else { + if (backfill_state().progress_tracker->tracked_objects_completed() + && Enqueuing::all_enqueued(peering_state(), + backfill_state().backfill_info, + backfill_state().peer_backfill_info)) { + backfill_state().last_backfill_started = hobject_t::get_max(); + backfill_listener().update_peers_last_backfill(hobject_t::get_max()); + } + DEBUGDPP("reached end for both local and all peers " + "but still has in-flight operations", pg()); + post_event(RequestWaiting{}); } - DEBUGDPP("reached end for both local and all peers " - "but still has in-flight operations", pg()); - post_event(RequestWaiting{}); } // -- PrimaryScanning @@ -677,6 +686,17 @@ void BackfillState::enqueue_standalone_push( backfill_machine.backfill_listener.enqueue_push(obj, v, peers); } +void BackfillState::enqueue_standalone_delete( + const hobject_t &obj, + const eversion_t &v, + const std::vector<pg_shard_t> &peers) +{ + progress_tracker->enqueue_drop(obj); + for (auto bt : peers) { + backfill_machine.backfill_listener.enqueue_drop(bt, obj, v); + } +} + std::ostream &operator<<(std::ostream &out, const BackfillState::PGFacade &pg) { return pg.print(out); } diff --git a/src/crimson/osd/backfill_state.h b/src/crimson/osd/backfill_state.h index 0217886832d..517a02ea4df 100644 --- a/src/crimson/osd/backfill_state.h +++ b/src/crimson/osd/backfill_state.h @@ -292,6 +292,11 @@ public: const hobject_t &obj, const eversion_t &v, const std::vector<pg_shard_t> &peers); + void enqueue_standalone_delete( + const hobject_t &obj, + const eversion_t &v, + const std::vector<pg_shard_t> &peers); + bool is_triggered() const { return backfill_machine.triggering_event() != nullptr; diff --git a/src/crimson/osd/pg.cc b/src/crimson/osd/pg.cc index bf521498abf..2746e730f2b 100644 --- a/src/crimson/osd/pg.cc +++ b/src/crimson/osd/pg.cc @@ -879,6 +879,17 @@ void PG::enqueue_push_for_backfill( backfill_state->enqueue_standalone_push(obj, v, peers); } +void PG::enqueue_delete_for_backfill( + const hobject_t &obj, + const eversion_t &v, + const std::vector<pg_shard_t> &peers) +{ + assert(recovery_handler); + assert(recovery_handler->backfill_state); + auto backfill_state = recovery_handler->backfill_state.get(); + backfill_state->enqueue_standalone_delete(obj, v, peers); +} + PG::interruptible_future< std::tuple<PG::interruptible_future<>, PG::interruptible_future<>>> diff --git a/src/crimson/osd/pg.h b/src/crimson/osd/pg.h index 6db73ee835b..06038c0aa00 100644 --- a/src/crimson/osd/pg.h +++ b/src/crimson/osd/pg.h @@ -904,6 +904,11 @@ private: const hobject_t &obj, const eversion_t &v, const std::vector<pg_shard_t> &peers); + void enqueue_delete_for_backfill( + const hobject_t &obj, + const eversion_t &v, + const std::vector<pg_shard_t> &peers); + bool can_discard_replica_op(const Message& m, epoch_t m_map_epoch) const; bool can_discard_op(const MOSDOp& m) const; void context_registry_on_change(); diff --git a/src/crimson/osd/replicated_backend.cc b/src/crimson/osd/replicated_backend.cc index f09cd147ea9..6c8abecffaf 100644 --- a/src/crimson/osd/replicated_backend.cc +++ b/src/crimson/osd/replicated_backend.cc @@ -96,11 +96,18 @@ ReplicatedBackend::submit_transaction( bufferlist encoded_txn; encode(txn, encoded_txn); + bool is_delete = false; for (auto &le : log_entries) { le.mark_unrollbackable(); + if (le.is_delete()) { + is_delete = true; + } } + co_await pg.update_snap_map(log_entries, txn); + std::vector<pg_shard_t> to_push_clone; + std::vector<pg_shard_t> to_push_delete; auto sends = std::make_unique<std::vector<seastar::future<>>>(); for (auto &pg_shard : pg_shards) { if (pg_shard == whoami) { @@ -115,12 +122,17 @@ ReplicatedBackend::submit_transaction( m = new_repop_msg( pg_shard, hoid, encoded_txn, osd_op_p, min_epoch, map_epoch, log_entries, false, tid); - if (_new_clone && pg.is_missing_on_peer(pg_shard, hoid)) { - // The head is in the push queue but hasn't been pushed yet. - // We need to ensure that the newly created clone will be - // pushed as well, otherwise we might skip it. - // See: https://tracker.ceph.com/issues/68808 - to_push_clone.push_back(pg_shard); + if (pg.is_missing_on_peer(pg_shard, hoid)) { + if (_new_clone) { + // The head is in the push queue but hasn't been pushed yet. + // We need to ensure that the newly created clone will be + // pushed as well, otherwise we might skip it. + // See: https://tracker.ceph.com/issues/68808 + to_push_clone.push_back(pg_shard); + } + if (is_delete) { + to_push_delete.push_back(pg_shard); + } } } pending_txn->second.acked_peers.push_back({pg_shard, eversion_t{}}); @@ -130,8 +142,6 @@ ReplicatedBackend::submit_transaction( pg_shard.osd, std::move(m), map_epoch)); } - co_await pg.update_snap_map(log_entries, txn); - pg.log_operation( std::move(log_entries), osd_op_p.pg_trim_to, @@ -157,7 +167,8 @@ ReplicatedBackend::submit_transaction( return seastar::now(); } return peers->all_committed.get_shared_future(); - }).then_interruptible([pending_txn, this, _new_clone, + }).then_interruptible([pending_txn, this, _new_clone, &hoid, + to_push_delete=std::move(to_push_delete), to_push_clone=std::move(to_push_clone)] { auto acked_peers = std::move(pending_txn->second.acked_peers); pending_trans.erase(pending_txn); @@ -167,6 +178,9 @@ ReplicatedBackend::submit_transaction( _new_clone->obs.oi.version, to_push_clone); } + if (!to_push_delete.empty()) { + pg.enqueue_delete_for_backfill(hoid, {}, to_push_delete); + } return seastar::make_ready_future< crimson::osd::acked_peers_t>(std::move(acked_peers)); }); diff --git a/src/crimson/osd/replicated_recovery_backend.cc b/src/crimson/osd/replicated_recovery_backend.cc index 30e7a8a333d..0d6c9d38236 100644 --- a/src/crimson/osd/replicated_recovery_backend.cc +++ b/src/crimson/osd/replicated_recovery_backend.cc @@ -35,6 +35,15 @@ ReplicatedRecoveryBackend::recover_object( logger().debug("recover_object: loading obc: {}", soid); return pg.obc_loader.with_obc<RWState::RWREAD>(soid, [this, soid, need](auto head, auto obc) { + if (!obc->obs.exists) { + // XXX: this recovery must be triggered by backfills and the corresponding + // object must have been deleted by some client request after the object + // is enqueued for push but before the lock is acquired by the recovery. + // + // Abort the recovery in this case, a "recover_delete" must have been + // added for this object by the client request that deleted it. + return interruptor::now(); + } logger().debug("recover_object: loaded obc: {}", obc->obs.oi.soid); auto& recovery_waiter = get_recovering(soid); recovery_waiter.obc = obc; diff --git a/src/exporter/ceph_exporter.cc b/src/exporter/ceph_exporter.cc index 44b67c7e615..2232851c094 100644 --- a/src/exporter/ceph_exporter.cc +++ b/src/exporter/ceph_exporter.cc @@ -30,13 +30,13 @@ static void handle_signal(int signum) static void usage() { std::cout << "usage: ceph-exporter [options]\n" << "options:\n" - " --sock-dir: The path to ceph daemons socket files dir\n" - " --addrs: Host ip address where exporter is deployed\n" - " --port: Port to deploy exporter on. Default is 9926\n" - " --cert-file: Path to the certificate file to use https\n" - " --key-file: Path to the certificate key file to use https\n" + " --sock-dir: The path to Ceph daemon sockets (*.asok)\n" + " --addrs: Host IP address on which the exporter is to listen\n" + " --port: TCP Port on which the exporter is to listen. Default is 9926\n" + " --cert-file: Path to the certificate file when using HTTPS\n" + " --key-file: Path to the certificate key file when using HTTPS\n" " --prio-limit: Only perf counters greater than or equal to prio-limit are fetched. Default: 5\n" - " --stats-period: Time to wait before sending requests again to exporter server (seconds). Default: 5s" + " --stats-period: Interval between daemon scrapes (seconds). Default: 5s" << std::endl; generic_server_usage(); } diff --git a/src/include/random.h b/src/include/random.h index f2e3e37bcd7..6b7c9405efd 100644 --- a/src/include/random.h +++ b/src/include/random.h @@ -16,9 +16,9 @@ #define CEPH_RANDOM_H 1 #include <mutex> +#include <optional> #include <random> #include <type_traits> -#include <boost/optional.hpp> // Workaround for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85494 #ifdef __MINGW32__ @@ -123,7 +123,7 @@ void randomize_rng() template <typename EngineT> EngineT& engine() { - thread_local boost::optional<EngineT> rng_engine; + thread_local std::optional<EngineT> rng_engine; if (!rng_engine) { rng_engine.emplace(EngineT()); diff --git a/src/kv/KeyValueDB.h b/src/kv/KeyValueDB.h index 858742d511e..d926840180e 100644 --- a/src/kv/KeyValueDB.h +++ b/src/kv/KeyValueDB.h @@ -9,6 +9,7 @@ #include <map> #include <optional> #include <string> +#include <string_view> #include <boost/scoped_ptr.hpp> #include "include/encoding.h" #include "common/Formatter.h" @@ -211,6 +212,10 @@ public: return ""; } virtual ceph::buffer::list value() = 0; + // When valid() returns true, value returned as string-view + // is guaranteed to be valid until iterator is moved to another + // position; that is until call to next() / seek_to_first() / etc. + virtual std::string_view value_as_sv() = 0; virtual int status() = 0; virtual ~SimplestIteratorImpl() {} }; @@ -220,7 +225,12 @@ public: virtual ~IteratorImpl() {} virtual int seek_to_last() = 0; virtual int prev() = 0; + // When valid() returns true, key returned as string-view + // is guaranteed to be valid until iterator is moved to another + // position; that is until call to next() / seek_to_first() / etc. + virtual std::string_view key_as_sv() = 0; virtual std::pair<std::string, std::string> raw_key() = 0; + virtual std::pair<std::string_view, std::string_view> raw_key_as_sv() = 0; virtual ceph::buffer::ptr value_as_ptr() { ceph::buffer::list bl = value(); if (bl.length() == 1) { @@ -247,7 +257,9 @@ public: virtual int next() = 0; virtual int prev() = 0; virtual std::string key() = 0; + virtual std::string_view key_as_sv() = 0; virtual std::pair<std::string,std::string> raw_key() = 0; + virtual std::pair<std::string_view, std::string_view> raw_key_as_sv() = 0; virtual bool raw_key_is_prefixed(const std::string &prefix) = 0; virtual ceph::buffer::list value() = 0; virtual ceph::buffer::ptr value_as_ptr() { @@ -258,6 +270,7 @@ public: return ceph::buffer::ptr(); } } + virtual std::string_view value_as_sv() = 0; virtual int status() = 0; virtual size_t key_size() { return 0; @@ -315,15 +328,24 @@ private: std::string key() override { return generic_iter->key(); } + std::string_view key_as_sv() override { + return generic_iter->key_as_sv(); + } std::pair<std::string, std::string> raw_key() override { return generic_iter->raw_key(); } + std::pair<std::string_view, std::string_view> raw_key_as_sv() override { + return generic_iter->raw_key_as_sv(); + } ceph::buffer::list value() override { return generic_iter->value(); } ceph::buffer::ptr value_as_ptr() override { return generic_iter->value_as_ptr(); } + std::string_view value_as_sv() override { + return generic_iter->value_as_sv(); + } int status() override { return generic_iter->status(); } diff --git a/src/kv/RocksDBStore.cc b/src/kv/RocksDBStore.cc index ca63ea06484..51d224b67c0 100644 --- a/src/kv/RocksDBStore.cc +++ b/src/kv/RocksDBStore.cc @@ -6,6 +6,7 @@ #include <memory> #include <set> #include <string> +#include <string_view> #include <errno.h> #include <unistd.h> #include <sys/types.h> @@ -47,6 +48,7 @@ using std::ostream; using std::pair; using std::set; using std::string; +using std::string_view; using std::unique_ptr; using std::vector; @@ -1992,7 +1994,7 @@ int RocksDBStore::split_key(rocksdb::Slice in, string *prefix, string *key) // Find separator inside Slice char* separator = (char*) memchr(in.data(), 0, in.size()); - if (separator == NULL) + if (separator == nullptr) return -EINVAL; prefix_len = size_t(separator - in.data()); if (prefix_len >= in.size()) @@ -2006,6 +2008,27 @@ int RocksDBStore::split_key(rocksdb::Slice in, string *prefix, string *key) return 0; } +// TODO: deduplicate the code, preferrably by removing the string variant +int RocksDBStore::split_key(rocksdb::Slice in, string_view *prefix, string_view *key) +{ + size_t prefix_len = 0; + + // Find separator inside Slice + char* separator = (char*) memchr(in.data(), 0, in.size()); + if (separator == nullptr) + return -EINVAL; + prefix_len = size_t(separator - in.data()); + if (prefix_len >= in.size()) + return -EINVAL; + + // Fetch prefix and/or key directly from Slice + if (prefix) + *prefix = string_view(in.data(), prefix_len); + if (key) + *key = string_view(separator + 1, in.size() - prefix_len - 1); + return 0; +} + void RocksDBStore::compact() { dout(2) << __func__ << " starting" << dendl; @@ -2226,7 +2249,13 @@ int RocksDBStore::RocksDBWholeSpaceIteratorImpl::prev() string RocksDBStore::RocksDBWholeSpaceIteratorImpl::key() { string out_key; - split_key(dbiter->key(), 0, &out_key); + split_key(dbiter->key(), nullptr, &out_key); + return out_key; +} +string_view RocksDBStore::RocksDBWholeSpaceIteratorImpl::key_as_sv() +{ + string_view out_key; + split_key(dbiter->key(), nullptr, &out_key); return out_key; } pair<string,string> RocksDBStore::RocksDBWholeSpaceIteratorImpl::raw_key() @@ -2235,6 +2264,12 @@ pair<string,string> RocksDBStore::RocksDBWholeSpaceIteratorImpl::raw_key() split_key(dbiter->key(), &prefix, &key); return make_pair(prefix, key); } +pair<string_view,string_view> RocksDBStore::RocksDBWholeSpaceIteratorImpl::raw_key_as_sv() +{ + string_view prefix, key; + split_key(dbiter->key(), &prefix, &key); + return make_pair(prefix, key); +} bool RocksDBStore::RocksDBWholeSpaceIteratorImpl::raw_key_is_prefixed(const string &prefix) { // Look for "prefix\0" right in rocksb::Slice @@ -2267,6 +2302,12 @@ bufferptr RocksDBStore::RocksDBWholeSpaceIteratorImpl::value_as_ptr() return bufferptr(val.data(), val.size()); } +std::string_view RocksDBStore::RocksDBWholeSpaceIteratorImpl::value_as_sv() +{ + rocksdb::Slice val = dbiter->value(); + return std::string_view{val.data(), val.size()}; +} + int RocksDBStore::RocksDBWholeSpaceIteratorImpl::status() { return dbiter->status().ok() ? 0 : -1; @@ -2348,9 +2389,15 @@ public: string key() override { return dbiter->key().ToString(); } + string_view key_as_sv() override { + return dbiter->key().ToStringView(); + } std::pair<std::string, std::string> raw_key() override { return make_pair(prefix, key()); } + std::pair<std::string_view, std::string_view> raw_key_as_sv() override { + return make_pair(prefix, dbiter->key().ToStringView()); + } bufferlist value() override { return to_bufferlist(dbiter->value()); } @@ -2358,6 +2405,10 @@ public: rocksdb::Slice val = dbiter->value(); return bufferptr(val.data(), val.size()); } + std::string_view value_as_sv() override { + rocksdb::Slice val = dbiter->value(); + return std::string_view{val.data(), val.size()}; + } int status() override { return dbiter->status().ok() ? 0 : -1; } @@ -2668,6 +2719,15 @@ public: } } + std::string_view key_as_sv() override + { + if (smaller == on_main) { + return main->key_as_sv(); + } else { + return current_shard->second->key_as_sv(); + } + } + std::pair<std::string,std::string> raw_key() override { if (smaller == on_main) { @@ -2677,6 +2737,15 @@ public: } } + std::pair<std::string_view,std::string_view> raw_key_as_sv() override + { + if (smaller == on_main) { + return main->raw_key_as_sv(); + } else { + return { current_shard->first, current_shard->second->key_as_sv() }; + } + } + bool raw_key_is_prefixed(const std::string &prefix) override { if (smaller == on_main) { @@ -2695,6 +2764,15 @@ public: } } + std::string_view value_as_sv() override + { + if (smaller == on_main) { + return main->value_as_sv(); + } else { + return current_shard->second->value_as_sv(); + } + } + int status() override { //because we already had to inspect key, it must be ok @@ -3017,9 +3095,15 @@ public: string key() override { return iters[0]->key().ToString(); } + string_view key_as_sv() override { + return iters[0]->key().ToStringView(); + } std::pair<std::string, std::string> raw_key() override { return make_pair(prefix, key()); } + std::pair<std::string_view, std::string_view> raw_key_as_sv() override { + return make_pair(prefix, iters[0]->key().ToStringView()); + } bufferlist value() override { return to_bufferlist(iters[0]->value()); } @@ -3027,6 +3111,10 @@ public: rocksdb::Slice val = iters[0]->value(); return bufferptr(val.data(), val.size()); } + std::string_view value_as_sv() override { + rocksdb::Slice val = iters[0]->value(); + return std::string_view{val.data(), val.size()}; + } int status() override { return iters[0]->status().ok() ? 0 : -1; } diff --git a/src/kv/RocksDBStore.h b/src/kv/RocksDBStore.h index 477b209854c..50b91be2bf6 100644 --- a/src/kv/RocksDBStore.h +++ b/src/kv/RocksDBStore.h @@ -386,10 +386,13 @@ public: int next() override; int prev() override; std::string key() override; + std::string_view key_as_sv() override; std::pair<std::string,std::string> raw_key() override; + std::pair<std::string_view,std::string_view> raw_key_as_sv() override; bool raw_key_is_prefixed(const std::string &prefix) override; ceph::bufferlist value() override; ceph::bufferptr value_as_ptr() override; + std::string_view value_as_sv() override; int status() override; size_t key_size() override; size_t value_size() override; @@ -419,6 +422,7 @@ public: } static int split_key(rocksdb::Slice in, std::string *prefix, std::string *key); + static int split_key(rocksdb::Slice in, std::string_view *prefix, std::string_view *key); static std::string past_prefix(const std::string &prefix); diff --git a/src/librbd/migration/HttpClient.cc b/src/librbd/migration/HttpClient.cc index 09fe91da02a..d212981a917 100644 --- a/src/librbd/migration/HttpClient.cc +++ b/src/librbd/migration/HttpClient.cc @@ -193,7 +193,7 @@ protected: ldout(cct, 15) << dendl; boost::system::error_code ec; - boost::beast::get_lowest_layer(derived().stream()).socket().close(ec); + derived().stream().lowest_layer().close(ec); } private: @@ -357,8 +357,7 @@ private: } int shutdown_socket() { - if (!boost::beast::get_lowest_layer( - derived().stream()).socket().is_open()) { + if (!derived().stream().lowest_layer().is_open()) { return 0; } @@ -366,7 +365,7 @@ private: ldout(cct, 15) << dendl; boost::system::error_code ec; - boost::beast::get_lowest_layer(derived().stream()).socket().shutdown( + derived().stream().lowest_layer().shutdown( boost::asio::ip::tcp::socket::shutdown_both, ec); if (ec && ec != boost::beast::errc::not_connected) { @@ -595,7 +594,7 @@ public: this->close_socket(); } - inline boost::beast::tcp_stream& + inline boost::asio::ip::tcp::socket& stream() { return m_stream; } @@ -607,12 +606,13 @@ protected: auto cct = http_client->m_cct; ldout(cct, 15) << dendl; - ceph_assert(!m_stream.socket().is_open()); - m_stream.async_connect( - results, - [on_finish](boost::system::error_code ec, const auto& endpoint) { - on_finish->complete(-ec.value()); - }); + ceph_assert(!m_stream.is_open()); + boost::asio::async_connect(m_stream, + results, + [on_finish](boost::system::error_code ec, + const auto& endpoint) { + on_finish->complete(-ec.value()); + }); } void disconnect(Context* on_finish) override { @@ -624,7 +624,7 @@ protected: } private: - boost::beast::tcp_stream m_stream; + boost::asio::ip::tcp::socket m_stream; }; #undef dout_prefix @@ -643,7 +643,7 @@ public: this->close_socket(); } - inline boost::beast::ssl_stream<boost::beast::tcp_stream>& + inline boost::asio::ssl::stream<boost::asio::ip::tcp::socket>& stream() { return m_stream; } @@ -655,8 +655,9 @@ protected: auto cct = http_client->m_cct; ldout(cct, 15) << dendl; - ceph_assert(!boost::beast::get_lowest_layer(m_stream).socket().is_open()); - boost::beast::get_lowest_layer(m_stream).async_connect( + ceph_assert(!m_stream.lowest_layer().is_open()); + async_connect( + m_stream.lowest_layer(), results, [this, on_finish](boost::system::error_code ec, const auto& endpoint) { handle_connect(-ec.value(), on_finish); @@ -681,12 +682,12 @@ protected: // ssl_stream object can't be reused after shut down -- move-in // a freshly constructed instance - m_stream = boost::beast::ssl_stream<boost::beast::tcp_stream>( + m_stream = boost::asio::ssl::stream<boost::asio::ip::tcp::socket>( http_client->m_strand, http_client->m_ssl_context); } private: - boost::beast::ssl_stream<boost::beast::tcp_stream> m_stream; + boost::asio::ssl::stream<boost::asio::ip::tcp::socket> m_stream; void handle_connect(int r, Context* on_finish) { auto http_client = this->m_http_client; diff --git a/src/librbd/migration/HttpClient.h b/src/librbd/migration/HttpClient.h index 3997e6159e7..5844f918693 100644 --- a/src/librbd/migration/HttpClient.h +++ b/src/librbd/migration/HttpClient.h @@ -13,13 +13,12 @@ #include <boost/asio/strand.hpp> #include <boost/asio/ip/tcp.hpp> #include <boost/asio/ssl/context.hpp> +#include <boost/asio/ssl/stream.hpp> #include <boost/beast/version.hpp> -#include <boost/beast/core/tcp_stream.hpp> #include <boost/beast/http/empty_body.hpp> #include <boost/beast/http/message.hpp> #include <boost/beast/http/string_body.hpp> #include <boost/beast/http/write.hpp> -#include <boost/beast/ssl/ssl_stream.hpp> #include <functional> #include <memory> #include <string> @@ -97,7 +96,7 @@ public: completion(r, std::move(response)); } - void operator()(boost::beast::tcp_stream& stream) override { + void operator()(boost::asio::ip::tcp::socket& stream) override { preprocess_request(); boost::beast::http::async_write( @@ -110,7 +109,7 @@ public: } void operator()( - boost::beast::ssl_stream<boost::beast::tcp_stream>& stream) override { + boost::asio::ssl::stream<boost::asio::ip::tcp::socket>& stream) override { preprocess_request(); boost::beast::http::async_write( @@ -152,9 +151,9 @@ private: virtual bool need_eof() const = 0; virtual bool header_only() const = 0; virtual void complete(int r, Response&&) = 0; - virtual void operator()(boost::beast::tcp_stream& stream) = 0; + virtual void operator()(boost::asio::ip::tcp::socket& stream) = 0; virtual void operator()( - boost::beast::ssl_stream<boost::beast::tcp_stream>& stream) = 0; + boost::asio::ssl::stream<boost::asio::ip::tcp::socket>& stream) = 0; }; template <typename D> struct HttpSession; diff --git a/src/mgr/PyModule.cc b/src/mgr/PyModule.cc index cff63ef4a6b..4f996489ba0 100644 --- a/src/mgr/PyModule.cc +++ b/src/mgr/PyModule.cc @@ -38,6 +38,18 @@ std::string PyModule::mgr_store_prefix = "mgr/"; #define BOOST_BIND_GLOBAL_PLACEHOLDERS // Boost apparently can't be bothered to fix its own usage of its own // deprecated features. + +// Fix instances of "'BOOST_PP_ITERATION_02' was not declared in this scope; did +// you mean 'BOOST_PP_ITERATION_05'" and related macro error bullshit that spans +// 300 lines of errors +// +// Apparently you can't include boost/python stuff _and_ have this header +// defined +// +// Thanks to the ceph-aur folks for the fix at: +// https://github.com/bazaah/aur-ceph/commit/8c5cc7d8deec002f7596b6d0860859a0a718f12b +#undef BOOST_MPL_CFG_NO_PREPROCESSED_HEADERS + #include <boost/python/extract.hpp> #include <boost/python/import.hpp> #include <boost/python/object.hpp> diff --git a/src/mgr/PyModule.h b/src/mgr/PyModule.h index 177447c2cb3..a47db3a47ef 100644 --- a/src/mgr/PyModule.h +++ b/src/mgr/PyModule.h @@ -161,9 +161,9 @@ public: } const std::string &get_name() const { - std::lock_guard l(lock) ; return module_name; + return module_name; } - const std::string &get_error_string() const { + std::string get_error_string() const { std::lock_guard l(lock) ; return error_string; } bool get_can_run() const { diff --git a/src/os/DBObjectMap.cc b/src/os/DBObjectMap.cc index 7da9a67be62..65627b5f818 100644 --- a/src/os/DBObjectMap.cc +++ b/src/os/DBObjectMap.cc @@ -519,6 +519,11 @@ bufferlist DBObjectMap::DBObjectMapIteratorImpl::value() return cur_iter->value(); } +std::string_view DBObjectMap::DBObjectMapIteratorImpl::value_as_sv() +{ + return cur_iter->value_as_sv(); +} + int DBObjectMap::DBObjectMapIteratorImpl::status() { return r; diff --git a/src/os/DBObjectMap.h b/src/os/DBObjectMap.h index 444f21eb815..1e1452010e7 100644 --- a/src/os/DBObjectMap.h +++ b/src/os/DBObjectMap.h @@ -393,6 +393,7 @@ private: int next() override { ceph_abort(); return 0; } std::string key() override { ceph_abort(); return ""; } ceph::buffer::list value() override { ceph_abort(); return ceph::buffer::list(); } + std::string_view value_as_sv() override { ceph_abort(); return std::string_view(); } int status() override { return 0; } }; @@ -431,6 +432,7 @@ private: int next() override; std::string key() override; ceph::buffer::list value() override; + std::string_view value_as_sv() override; int status() override; bool on_parent() { diff --git a/src/os/ObjectStore.h b/src/os/ObjectStore.h index 521435b6c31..df3ae920a2f 100644 --- a/src/os/ObjectStore.h +++ b/src/os/ObjectStore.h @@ -29,6 +29,7 @@ #include <errno.h> #include <sys/stat.h> +#include <functional> #include <map> #include <memory> #include <vector> @@ -735,15 +736,6 @@ public: std::map<std::string, ceph::buffer::list> *out ///< [out] Returned keys and values ) = 0; -#ifdef WITH_SEASTAR - virtual int omap_get_values( - CollectionHandle &c, ///< [in] Collection containing oid - const ghobject_t &oid, ///< [in] Object containing omap - const std::optional<std::string> &start_after, ///< [in] Keys to get - std::map<std::string, ceph::buffer::list> *out ///< [out] Returned keys and values - ) = 0; -#endif - /// Filters keys into out which are defined on oid virtual int omap_check_keys( CollectionHandle &c, ///< [in] Collection containing oid @@ -766,6 +758,48 @@ public: const ghobject_t &oid ///< [in] object ) = 0; + struct omap_iter_seek_t { + std::string seek_position; + enum { + // start with provided key (seek_position), if it exists + LOWER_BOUND, + // skip provided key (seek_position) even if it exists + UPPER_BOUND + } seek_type = LOWER_BOUND; + static omap_iter_seek_t min_lower_bound() { return {}; } + }; + enum class omap_iter_ret_t { + STOP, + NEXT + }; + /** + * Iterate over object map with user-provided callable + * + * Warning! The callable is executed under lock on bluestore + * operations in c. Do not use bluestore methods on c while + * iterating. (Filling in a transaction is no problem). + * + * @param c collection + * @param oid object + * @param start_from where the iterator should point to at + * the beginning + * @param visitor callable that takes OMAP key and corresponding + * value as string_views and controls iteration + * by the return. It is executed for every object's + * OMAP entry from `start_from` till end of the + * object's OMAP or till the iteration is stopped + * by `STOP`. Please note that if there is no such + * entry, `visitor` will be called 0 times. + * @return error code, zero on success + */ + virtual int omap_iterate( + CollectionHandle &c, + const ghobject_t &oid, + omap_iter_seek_t start_from, + std::function<omap_iter_ret_t(std::string_view, + std::string_view)> visitor + ) = 0; + virtual int flush_journal() { return -EOPNOTSUPP; } virtual int dump_journal(std::ostream& out) { return -EOPNOTSUPP; } diff --git a/src/os/bluestore/BlueStore.cc b/src/os/bluestore/BlueStore.cc index a024a0c2105..25e6c4fe596 100644 --- a/src/os/bluestore/BlueStore.cc +++ b/src/os/bluestore/BlueStore.cc @@ -4830,7 +4830,7 @@ void BlueStore::Onode::rewrite_omap_key(const string& old, string *out) out->append(old.c_str() + out->length(), old.size() - out->length()); } -void BlueStore::Onode::decode_omap_key(const string& key, string *user_key) +size_t BlueStore::Onode::calc_userkey_offset_in_omap_key() const { size_t pos = sizeof(uint64_t) + 1; if (!onode.is_pgmeta_omap()) { @@ -4840,9 +4840,15 @@ void BlueStore::Onode::decode_omap_key(const string& key, string *user_key) pos += sizeof(uint64_t); } } - *user_key = key.substr(pos); + return pos; } +void BlueStore::Onode::decode_omap_key(const string& key, string *user_key) +{ + *user_key = key.substr(calc_userkey_offset_in_omap_key()); +} + + void BlueStore::Onode::finish_write(TransContext* txc, uint32_t offset, uint32_t length) { while (true) { @@ -5519,7 +5525,13 @@ BlueStore::OmapIteratorImpl::OmapIteratorImpl( if (o->onode.has_omap()) { o->get_omap_key(string(), &head); o->get_omap_tail(&tail); + auto start1 = mono_clock::now(); it->lower_bound(head); + c->store->log_latency( + __func__, + l_bluestore_omap_seek_to_first_lat, + mono_clock::now() - start1, + c->store->cct->_conf->bluestore_log_omap_iterator_age); } } BlueStore::OmapIteratorImpl::~OmapIteratorImpl() @@ -5654,6 +5666,13 @@ bufferlist BlueStore::OmapIteratorImpl::value() return it->value(); } +std::string_view BlueStore::OmapIteratorImpl::value_as_sv() +{ + std::shared_lock l(c->lock); + ceph_assert(it->valid()); + return it->value_as_sv(); +} + // ===================================== @@ -13601,52 +13620,6 @@ int BlueStore::omap_get_values( return r; } -#ifdef WITH_SEASTAR -int BlueStore::omap_get_values( - CollectionHandle &c_, ///< [in] Collection containing oid - const ghobject_t &oid, ///< [in] Object containing omap - const std::optional<string> &start_after, ///< [in] Keys to get - map<string, bufferlist> *output ///< [out] Returned keys and values - ) -{ - Collection *c = static_cast<Collection *>(c_.get()); - dout(15) << __func__ << " " << c->get_cid() << " oid " << oid << dendl; - if (!c->exists) - return -ENOENT; - std::shared_lock l(c->lock); - int r = 0; - OnodeRef o = c->get_onode(oid, false); - if (!o || !o->exists) { - r = -ENOENT; - goto out; - } - if (!o->onode.has_omap()) { - goto out; - } - o->flush(); - { - ObjectMap::ObjectMapIterator iter = get_omap_iterator(c_, oid); - if (!iter) { - r = -ENOENT; - goto out; - } - if (start_after) { - iter->upper_bound(*start_after); - } else { - iter->seek_to_first(); - } - for (; iter->valid(); iter->next()) { - output->insert(make_pair(iter->key(), iter->value())); - } - } - -out: - dout(10) << __func__ << " " << c->get_cid() << " oid " << oid << " = " << r - << dendl; - return r; -} -#endif - int BlueStore::omap_check_keys( CollectionHandle &c_, ///< [in] Collection containing oid const ghobject_t &oid, ///< [in] Object containing omap @@ -13724,6 +13697,94 @@ ObjectMap::ObjectMapIterator BlueStore::get_omap_iterator( return ObjectMap::ObjectMapIterator(new OmapIteratorImpl(logger,c, o, it)); } +int BlueStore::omap_iterate( + CollectionHandle &c_, ///< [in] collection + const ghobject_t &oid, ///< [in] object + ObjectStore::omap_iter_seek_t start_from, ///< [in] where the iterator should point to at the beginning + std::function<omap_iter_ret_t(std::string_view, std::string_view)> f + ) +{ + Collection *c = static_cast<Collection *>(c_.get()); + dout(10) << __func__ << " " << c->get_cid() << " " << oid << dendl; + if (!c->exists) { + return -ENOENT; + } + std::shared_lock l(c->lock); + OnodeRef o = c->get_onode(oid, false); + if (!o || !o->exists) { + dout(10) << __func__ << " " << oid << "doesn't exist" <<dendl; + return -ENOENT; + } + o->flush(); + dout(10) << __func__ << " has_omap = " << (int)o->onode.has_omap() <<dendl; + if (!o->onode.has_omap()) { + // nothing to do + return 0; + } + + KeyValueDB::Iterator it; + { + auto bounds = KeyValueDB::IteratorBounds(); + std::string lower_bound, upper_bound; + o->get_omap_key(string(), &lower_bound); + o->get_omap_tail(&upper_bound); + bounds.lower_bound = std::move(lower_bound); + bounds.upper_bound = std::move(upper_bound); + it = db->get_iterator(o->get_omap_prefix(), 0, std::move(bounds)); + } + + // seek the iterator + { + std::string key; + o->get_omap_key(start_from.seek_position, &key); + auto start = ceph::mono_clock::now(); + if (start_from.seek_type == omap_iter_seek_t::LOWER_BOUND) { + it->lower_bound(key); + c->store->log_latency( + __func__, + l_bluestore_omap_lower_bound_lat, + ceph::mono_clock::now() - start, + c->store->cct->_conf->bluestore_log_omap_iterator_age); + } else { + it->upper_bound(key); + c->store->log_latency( + __func__, + l_bluestore_omap_upper_bound_lat, + ceph::mono_clock::now() - start, + c->store->cct->_conf->bluestore_log_omap_iterator_age); + } + } + + // iterate! + std::string tail; + o->get_omap_tail(&tail); + const std::string_view::size_type userkey_offset_in_dbkey = + o->calc_userkey_offset_in_omap_key(); + ceph::timespan next_lat_acc{0}; + while (it->valid()) { + const auto& db_key = it->raw_key_as_sv().second; + if (db_key >= tail) { + break; + } + std::string_view user_key = db_key.substr(userkey_offset_in_dbkey); + omap_iter_ret_t ret = f(user_key, it->value_as_sv()); + if (ret == omap_iter_ret_t::STOP) { + break; + } else if (ret == omap_iter_ret_t::NEXT) { + ceph::time_guard<ceph::mono_clock>{next_lat_acc}; + it->next(); + } else { + ceph_abort(); + } + } + c->store->log_latency( + __func__, + l_bluestore_omap_next_lat, + next_lat_acc, + c->store->cct->_conf->bluestore_log_omap_iterator_age); + return 0; +} + // ----------------- // write helpers diff --git a/src/os/bluestore/BlueStore.h b/src/os/bluestore/BlueStore.h index 99f8d057cf0..5549f97ffea 100644 --- a/src/os/bluestore/BlueStore.h +++ b/src/os/bluestore/BlueStore.h @@ -1457,6 +1457,7 @@ public: } void rewrite_omap_key(const std::string& old, std::string *out); + size_t calc_userkey_offset_in_omap_key() const; void decode_omap_key(const std::string& key, std::string *user_key); void finish_write(TransContext* txc, uint32_t offset, uint32_t length); @@ -1753,6 +1754,7 @@ public: int next() override; std::string key() override; ceph::buffer::list value() override; + std::string_view value_as_sv() override; std::string tail_key() override { return tail; } @@ -3416,15 +3418,6 @@ public: std::map<std::string, ceph::buffer::list> *out ///< [out] Returned keys and values ) override; -#ifdef WITH_SEASTAR - int omap_get_values( - CollectionHandle &c, ///< [in] Collection containing oid - const ghobject_t &oid, ///< [in] Object containing omap - const std::optional<std::string> &start_after, ///< [in] Keys to get - std::map<std::string, ceph::buffer::list> *out ///< [out] Returned keys and values - ) override; -#endif - /// Filters keys into out which are defined on oid int omap_check_keys( CollectionHandle &c, ///< [in] Collection containing oid @@ -3438,6 +3431,13 @@ public: const ghobject_t &oid ///< [in] object ) override; + int omap_iterate( + CollectionHandle &c, ///< [in] collection + const ghobject_t &oid, ///< [in] object + omap_iter_seek_t start_from, ///< [in] where the iterator should point to at the beginning + std::function<omap_iter_ret_t(std::string_view, std::string_view)> f + ) override; + void set_fsid(uuid_d u) override { fsid = u; } diff --git a/src/os/kstore/KStore.cc b/src/os/kstore/KStore.cc index 7158486ca38..a069d429155 100644 --- a/src/os/kstore/KStore.cc +++ b/src/os/kstore/KStore.cc @@ -1651,6 +1651,13 @@ bufferlist KStore::OmapIteratorImpl::value() return it->value(); } +std::string_view KStore::OmapIteratorImpl::value_as_sv() +{ + std::shared_lock l{c->lock}; + ceph_assert(it->valid()); + return it->value_as_sv(); +} + int KStore::omap_get( CollectionHandle& ch, ///< [in] Collection containing oid const ghobject_t &oid, ///< [in] Object containing omap @@ -1866,6 +1873,71 @@ ObjectMap::ObjectMapIterator KStore::get_omap_iterator( return ObjectMap::ObjectMapIterator(new OmapIteratorImpl(c, o, it)); } +int KStore::omap_iterate( + CollectionHandle &ch, ///< [in] collection + const ghobject_t &oid, ///< [in] object + ObjectStore::omap_iter_seek_t start_from, ///< [in] where the iterator should point to at the beginning + std::function<omap_iter_ret_t(std::string_view, std::string_view)> f) +{ + dout(10) << __func__ << " " << ch->cid << " " << oid << dendl; + Collection *c = static_cast<Collection*>(ch.get()); + { + std::shared_lock l{c->lock}; + + OnodeRef o = c->get_onode(oid, false); + if (!o || !o->exists) { + dout(10) << __func__ << " " << oid << "doesn't exist" <<dendl; + return -ENOENT; + } + o->flush(); + dout(10) << __func__ << " header = " << o->onode.omap_head <<dendl; + + KeyValueDB::Iterator it = db->get_iterator(PREFIX_OMAP); + std::string tail; + std::string seek_key; + if (o->onode.omap_head) { + return 0; // nothing to do + } + + // acquire data depedencies for seek & iterate + get_omap_key(o->onode.omap_head, start_from.seek_position, &seek_key); + get_omap_tail(o->onode.omap_head, &tail); + + // acquire the iterator + { + it = db->get_iterator(PREFIX_OMAP); + } + + // seek the iterator + { + if (start_from.seek_type == omap_iter_seek_t::LOWER_BOUND) { + it->lower_bound(seek_key); + } else { + it->upper_bound(seek_key); + } + } + + // iterate! + while (it->valid()) { + std::string user_key; + if (const auto& db_key = it->raw_key().second; db_key >= tail) { + break; + } else { + decode_omap_key(db_key, &user_key); + } + omap_iter_ret_t ret = f(user_key, it->value_as_sv()); + if (ret == omap_iter_ret_t::STOP) { + break; + } else if (ret == omap_iter_ret_t::NEXT) { + it->next(); + } else { + ceph_abort(); + } + } + } + return 0; +} + // ----------------- // write helpers diff --git a/src/os/kstore/KStore.h b/src/os/kstore/KStore.h index 9a9d413c66a..06115d3cab7 100644 --- a/src/os/kstore/KStore.h +++ b/src/os/kstore/KStore.h @@ -180,6 +180,7 @@ public: int next() override; std::string key() override; ceph::buffer::list value() override; + std::string_view value_as_sv() override; int status() override { return 0; } @@ -553,6 +554,13 @@ public: const ghobject_t &oid ///< [in] object ) override; + int omap_iterate( + CollectionHandle &c, ///< [in] collection + const ghobject_t &oid, ///< [in] object + omap_iter_seek_t start_from, ///< [in] where the iterator should point to at the beginning + std::function<omap_iter_ret_t(std::string_view, std::string_view)> f + ) override; + void set_fsid(uuid_d u) override { fsid = u; } diff --git a/src/os/memstore/MemStore.cc b/src/os/memstore/MemStore.cc index 89cb09361cf..f9d3bf0d8a2 100644 --- a/src/os/memstore/MemStore.cc +++ b/src/os/memstore/MemStore.cc @@ -537,30 +537,6 @@ int MemStore::omap_get_values( return 0; } -#ifdef WITH_SEASTAR -int MemStore::omap_get_values( - CollectionHandle& ch, ///< [in] Collection containing oid - const ghobject_t &oid, ///< [in] Object containing omap - const std::optional<std::string> &start_after, ///< [in] Keys to get - std::map<std::string, ceph::buffer::list> *out ///< [out] Returned keys and values - ) -{ - dout(10) << __func__ << " " << ch->cid << " " << oid << dendl; - Collection *c = static_cast<Collection*>(ch.get()); - ObjectRef o = c->get_object(oid); - if (!o) - return -ENOENT; - assert(start_after); - std::lock_guard lock{o->omap_mutex}; - for (auto it = o->omap.upper_bound(*start_after); - it != std::end(o->omap); - ++it) { - out->insert(*it); - } - return 0; -} -#endif - int MemStore::omap_check_keys( CollectionHandle& ch, ///< [in] Collection containing oid const ghobject_t &oid, ///< [in] Object containing omap @@ -622,6 +598,10 @@ public: std::lock_guard lock{o->omap_mutex}; return it->second; } + std::string_view value_as_sv() override { + std::lock_guard lock{o->omap_mutex}; + return std::string_view{it->second.c_str(), it->second.length()}; + } int status() override { return 0; } @@ -639,6 +619,48 @@ ObjectMap::ObjectMapIterator MemStore::get_omap_iterator( return ObjectMap::ObjectMapIterator(new OmapIteratorImpl(c, o)); } +int MemStore::omap_iterate( + CollectionHandle &ch, ///< [in] collection + const ghobject_t &oid, ///< [in] object + ObjectStore::omap_iter_seek_t start_from, ///< [in] where the iterator should point to at the beginning + std::function<omap_iter_ret_t(std::string_view, std::string_view)> f) +{ + Collection *c = static_cast<Collection*>(ch.get()); + ObjectRef o = c->get_object(oid); + if (!o) { + return -ENOENT; + } + + { + std::lock_guard lock{o->omap_mutex}; + + // obtain seek the iterator + decltype(o->omap)::iterator it; + { + if (start_from.seek_type == omap_iter_seek_t::LOWER_BOUND) { + it = o->omap.lower_bound(start_from.seek_position); + } else { + it = o->omap.upper_bound(start_from.seek_position); + } + } + + // iterate! + while (it != o->omap.end()) { + // potentially rectifying memcpy but who cares for memstore? + omap_iter_ret_t ret = + f(it->first, std::string_view{it->second.c_str(), it->second.length()}); + if (ret == omap_iter_ret_t::STOP) { + break; + } else if (ret == omap_iter_ret_t::NEXT) { + ++it; + } else { + ceph_abort(); + } + } + } + return 0; +} + // --------------- // write operations diff --git a/src/os/memstore/MemStore.h b/src/os/memstore/MemStore.h index 2abe552891f..9621773598f 100644 --- a/src/os/memstore/MemStore.h +++ b/src/os/memstore/MemStore.h @@ -363,14 +363,6 @@ public: const std::set<std::string> &keys, ///< [in] Keys to get std::map<std::string, ceph::buffer::list> *out ///< [out] Returned keys and values ) override; -#ifdef WITH_SEASTAR - int omap_get_values( - CollectionHandle &c, ///< [in] Collection containing oid - const ghobject_t &oid, ///< [in] Object containing omap - const std::optional<std::string> &start_after, ///< [in] Keys to get - std::map<std::string, ceph::buffer::list> *out ///< [out] Returned keys and values - ) override; -#endif using ObjectStore::omap_check_keys; /// Filters keys into out which are defined on oid @@ -387,6 +379,13 @@ public: const ghobject_t &oid ///< [in] object ) override; + int omap_iterate( + CollectionHandle &c, ///< [in] collection + const ghobject_t &oid, ///< [in] object + omap_iter_seek_t start_from, ///< [in] where the iterator should point to at the beginning + std::function<omap_iter_ret_t(std::string_view, std::string_view)> f + ) override; + void set_fsid(uuid_d u) override; uuid_d get_fsid() override; diff --git a/src/osd/ExtentCache.h b/src/osd/ExtentCache.h index 972228cd077..7dc1d4f7263 100644 --- a/src/osd/ExtentCache.h +++ b/src/osd/ExtentCache.h @@ -363,7 +363,7 @@ private: extent, boost::intrusive::list_member_hook<>, &extent::pin_list_member>; - using list = boost::intrusive::list<extent, list_member_options>; + using list = boost::intrusive::list<extent, boost::intrusive::constant_time_size<false>, list_member_options>; list pin_list; ~pin_state() { ceph_assert(pin_list.empty()); diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc index aa0f84783cd..3324ba9dc91 100644 --- a/src/osd/PrimaryLogPG.cc +++ b/src/osd/PrimaryLogPG.cc @@ -5808,10 +5808,19 @@ int PrimaryLogPG::do_extent_cmp(OpContext *ctx, OSDOp& osd_op) int PrimaryLogPG::finish_extent_cmp(OSDOp& osd_op, const bufferlist &read_bl) { - for (uint64_t idx = 0; idx < osd_op.indata.length(); ++idx) { - char read_byte = (idx < read_bl.length() ? read_bl[idx] : 0); - if (osd_op.indata[idx] != read_byte) { - return (-MAX_ERRNO - idx); + auto input_iter = osd_op.indata.begin(); + auto read_iter = read_bl.begin(); + uint64_t idx = 0; + + while (input_iter != osd_op.indata.end()) { + char read_byte = (read_iter != read_bl.end() ? *read_iter : 0); + if (*input_iter != read_byte) { + return (-MAX_ERRNO - idx); + } + ++idx; + ++input_iter; + if (read_iter != read_bl.end()) { + ++read_iter; } } @@ -7777,27 +7786,34 @@ int PrimaryLogPG::do_osd_ops(OpContext *ctx, vector<OSDOp>& ops) bool truncated = false; bufferlist bl; if (oi.is_omap()) { - ObjectMap::ObjectMapIterator iter = osd->store->get_omap_iterator( - ch, ghobject_t(soid) - ); - if (!iter) { - result = -ENOENT; - goto fail; - } - iter->upper_bound(start_after); - if (filter_prefix > start_after) iter->lower_bound(filter_prefix); - for (num = 0; - iter->valid() && - iter->key().substr(0, filter_prefix.size()) == filter_prefix; - ++num, iter->next()) { - dout(20) << "Found key " << iter->key() << dendl; - if (num >= max_return || - bl.length() >= cct->_conf->osd_max_omap_bytes_per_request) { - truncated = true; - break; - } - encode(iter->key(), bl); - encode(iter->value(), bl); + using omap_iter_seek_t = ObjectStore::omap_iter_seek_t; + result = osd->store->omap_iterate( + ch, ghobject_t(soid), + // try to seek as many keys-at-once as possible for the sake of performance. + // note complexity should be logarithmic, so seek(n/2) + seek(n/2) is worse + // than just seek(n). + ObjectStore::omap_iter_seek_t{ + .seek_position = std::max(start_after, filter_prefix), + .seek_type = filter_prefix > start_after ? omap_iter_seek_t::LOWER_BOUND + : omap_iter_seek_t::UPPER_BOUND + }, + [&bl, &truncated, &filter_prefix, &num, max_return, + max_bytes=cct->_conf->osd_max_omap_bytes_per_request] + (std::string_view key, std::string_view value) mutable { + if (key.substr(0, filter_prefix.size()) != filter_prefix) { + return ObjectStore::omap_iter_ret_t::STOP; + } + if (num >= max_return || bl.length() >= max_bytes) { + truncated = true; + return ObjectStore::omap_iter_ret_t::STOP; + } + encode(key, bl); + encode(value, bl); + ++num; + return ObjectStore::omap_iter_ret_t::NEXT; + }); + if (result < 0) { + goto fail; } } // else return empty out_set encode(num, osd_op.outdata); diff --git a/src/osd/Session.h b/src/osd/Session.h index 9fa9c655456..05a0119d31e 100644 --- a/src/osd/Session.h +++ b/src/osd/Session.h @@ -136,7 +136,7 @@ struct Session : public RefCountedObject { ceph::mutex session_dispatch_lock = ceph::make_mutex("Session::session_dispatch_lock"); - boost::intrusive::list<OpRequest> waiting_on_map; + boost::intrusive::list<OpRequest, boost::intrusive::constant_time_size<false>> waiting_on_map; ceph::spinlock projected_epoch_lock; epoch_t projected_epoch = 0; diff --git a/src/osd/osd_types.h b/src/osd/osd_types.h index 1e92d5cd3d6..485fddead7a 100644 --- a/src/osd/osd_types.h +++ b/src/osd/osd_types.h @@ -1151,9 +1151,8 @@ public: bool is_set(key_t key) const; template<typename T> - void set(key_t key, const T &val) { - value_t value = val; - opts[key] = value; + void set(key_t key, T &&val) { + opts.insert_or_assign(key, std::forward<T>(val)); } template<typename T> diff --git a/src/osdc/Objecter.cc b/src/osdc/Objecter.cc index 087b623333b..82d43bb3dde 100644 --- a/src/osdc/Objecter.cc +++ b/src/osdc/Objecter.cc @@ -1393,7 +1393,7 @@ void Objecter::handle_osd_map(MOSDMap *m) for (auto& [c, ec] : p->second) { asio::post(service.get_executor(), asio::append(std::move(c), ec)); } - waiting_for_map.erase(p++); + p = waiting_for_map.erase(p); } monc->sub_got("osdmap", osdmap->get_epoch()); diff --git a/src/pybind/mgr/cephadm/inventory.py b/src/pybind/mgr/cephadm/inventory.py index f1c56d75378..550604fc55b 100644 --- a/src/pybind/mgr/cephadm/inventory.py +++ b/src/pybind/mgr/cephadm/inventory.py @@ -2036,8 +2036,8 @@ class CertKeyStore(): var = service_name if entity in self.service_name_cert else host j = {} self.known_certs[entity][var] = cert_obj - for service_name in self.known_certs[entity].keys(): - j[var] = Cert.to_json(self.known_certs[entity][var]) + for cert_key in self.known_certs[entity]: + j[cert_key] = Cert.to_json(self.known_certs[entity][cert_key]) else: self.known_certs[entity] = cert_obj j = Cert.to_json(cert_obj) diff --git a/src/pybind/mgr/cephadm/module.py b/src/pybind/mgr/cephadm/module.py index bf14f8d1715..6690153d435 100644 --- a/src/pybind/mgr/cephadm/module.py +++ b/src/pybind/mgr/cephadm/module.py @@ -2460,7 +2460,7 @@ Then run the following: @handle_orch_error def service_action(self, action: str, service_name: str) -> List[str]: - if service_name not in self.spec_store.all_specs.keys(): + if service_name not in self.spec_store.all_specs.keys() and service_name != 'osd': raise OrchestratorError(f'Invalid service name "{service_name}".' + ' View currently running services using "ceph orch ls"') dds: List[DaemonDescription] = self.cache.get_daemons_by_service(service_name) @@ -3925,6 +3925,50 @@ Then run the following: return self.to_remove_osds.all_osds() @handle_orch_error + def set_osd_spec(self, service_name: str, osd_ids: List[str]) -> str: + """ + Update unit.meta file for osd with service name + """ + if service_name not in self.spec_store: + raise OrchestratorError(f"Cannot find service '{service_name}' in the inventory. " + "Please try again after applying an OSD service that matches " + "the service name to which you want to attach OSDs.") + + daemons: List[orchestrator.DaemonDescription] = self.cache.get_daemons_by_type('osd') + update_osd = defaultdict(list) + for daemon in daemons: + if daemon.daemon_id in osd_ids and daemon.hostname: + update_osd[daemon.hostname].append(daemon.daemon_id) + + if not update_osd: + raise OrchestratorError(f"Unable to find OSDs: {osd_ids}") + + failed_osds = [] + success_osds = [] + for host in update_osd: + osds = ",".join(update_osd[host]) + # run cephadm command with all host osds on specific host, + # if it fails, continue with other hosts + try: + with self.async_timeout_handler(host): + outs, errs, _code = self.wait_async( + CephadmServe(self)._run_cephadm(host, + cephadmNoImage, + 'update-osd-service', + ['--service-name', service_name, '--osd-ids', osds])) + if _code: + self.log.error(f"Failed to update service for {osds} osd. Cephadm error: {errs}") + failed_osds.extend(update_osd[host]) + else: + success_osds.extend(update_osd[host]) + except Exception: + self.log.exception(f"Failed to set service name for {osds}") + failed_osds.extend(update_osd[host]) + self.cache.invalidate_host_daemons(host) + self._kick_serve_loop() + return f"Updated service for osd {','.join(success_osds)}" + (f" and failed for {','.join(failed_osds)}" if failed_osds else "") + + @handle_orch_error @host_exists() def drain_host(self, hostname: str, force: bool = False, keep_conf_keyring: bool = False, zap_osd_devices: bool = False) -> str: """ diff --git a/src/pybind/mgr/cephadm/schedule.py b/src/pybind/mgr/cephadm/schedule.py index 98d2fe99897..04d3712c50a 100644 --- a/src/pybind/mgr/cephadm/schedule.py +++ b/src/pybind/mgr/cephadm/schedule.py @@ -385,6 +385,8 @@ class HostAssignment(object): def find_ip_on_host(self, hostname: str, subnets: List[str]) -> Optional[str]: for subnet in subnets: + # to normalize subnet + subnet = str(ipaddress.ip_network(subnet)) ips: List[str] = [] # following is to allow loopback interfaces for both ipv4 and ipv6. Since we # only have the subnet (and no IP) we assume default loopback IP address. diff --git a/src/pybind/mgr/cephadm/services/cephadmservice.py b/src/pybind/mgr/cephadm/services/cephadmservice.py index 04f5af28a9b..4f83d7bb0fb 100644 --- a/src/pybind/mgr/cephadm/services/cephadmservice.py +++ b/src/pybind/mgr/cephadm/services/cephadmservice.py @@ -1157,6 +1157,14 @@ class RgwService(CephService): 'value': str(spec.rgw_bucket_counters_cache_size), }) + if getattr(spec, 'disable_multisite_sync_traffic', None) is not None: + ret, out, err = self.mgr.check_mon_command({ + 'prefix': 'config set', + 'who': daemon_name, + 'name': 'rgw_run_sync_thread', + 'value': 'false' if spec.disable_multisite_sync_traffic else 'true', + }) + daemon_spec.keyring = keyring daemon_spec.final_config, daemon_spec.deps = self.generate_config(daemon_spec) diff --git a/src/pybind/mgr/cephadm/services/monitoring.py b/src/pybind/mgr/cephadm/services/monitoring.py index 1b9cf618570..9c5b5a112f3 100644 --- a/src/pybind/mgr/cephadm/services/monitoring.py +++ b/src/pybind/mgr/cephadm/services/monitoring.py @@ -3,6 +3,7 @@ import logging import os import socket from typing import List, Any, Tuple, Dict, Optional, cast +import ipaddress from mgr_module import HandleCommandResult @@ -57,6 +58,8 @@ class GrafanaService(CephadmService): if ip_to_bind_to: daemon_spec.port_ips = {str(grafana_port): ip_to_bind_to} grafana_ip = ip_to_bind_to + if ipaddress.ip_network(grafana_ip).version == 6: + grafana_ip = f"[{grafana_ip}]" domain = self.mgr.get_fqdn(daemon_spec.host) mgmt_gw_ips = [] @@ -354,6 +357,13 @@ class AlertmanagerService(CephadmService): addr = self.mgr.get_fqdn(dd.hostname) peers.append(build_url(host=addr, port=port).lstrip('/')) + ip_to_bind_to = '' + if spec.only_bind_port_on_networks and spec.networks: + assert daemon_spec.host is not None + ip_to_bind_to = self.mgr.get_first_matching_network_ip(daemon_spec.host, spec) or '' + if ip_to_bind_to: + daemon_spec.port_ips = {str(port): ip_to_bind_to} + deps.append(f'secure_monitoring_stack:{self.mgr.secure_monitoring_stack}') if security_enabled: alertmanager_user, alertmanager_password = self.mgr._get_alertmanager_credentials() @@ -376,7 +386,8 @@ class AlertmanagerService(CephadmService): }, 'peers': peers, 'web_config': '/etc/alertmanager/web.yml', - 'use_url_prefix': mgmt_gw_enabled + 'use_url_prefix': mgmt_gw_enabled, + 'ip_to_bind_to': ip_to_bind_to }, sorted(deps) else: return { @@ -384,7 +395,8 @@ class AlertmanagerService(CephadmService): "alertmanager.yml": yml }, "peers": peers, - 'use_url_prefix': mgmt_gw_enabled + 'use_url_prefix': mgmt_gw_enabled, + 'ip_to_bind_to': ip_to_bind_to }, sorted(deps) def get_active_daemon(self, daemon_descrs: List[DaemonDescription]) -> DaemonDescription: diff --git a/src/pybind/mgr/cephadm/ssh.py b/src/pybind/mgr/cephadm/ssh.py index 1622cb001ab..acb5a77c51b 100644 --- a/src/pybind/mgr/cephadm/ssh.py +++ b/src/pybind/mgr/cephadm/ssh.py @@ -358,7 +358,7 @@ class SSHManager: await self._check_execute_command(host, chown, addr=addr) chmod = RemoteCommand(Executables.CHMOD, [oct(mode)[2:], tmp_path]) await self._check_execute_command(host, chmod, addr=addr) - mv = RemoteCommand(Executables.MV, [tmp_path, path]) + mv = RemoteCommand(Executables.MV, ['-Z', tmp_path, path]) await self._check_execute_command(host, mv, addr=addr) except Exception as e: msg = f"Unable to write {host}:{path}: {e}" diff --git a/src/pybind/mgr/cephadm/templates/services/alertmanager/alertmanager.yml.j2 b/src/pybind/mgr/cephadm/templates/services/alertmanager/alertmanager.yml.j2 index de993cb6ce3..b6955caf616 100644 --- a/src/pybind/mgr/cephadm/templates/services/alertmanager/alertmanager.yml.j2 +++ b/src/pybind/mgr/cephadm/templates/services/alertmanager/alertmanager.yml.j2 @@ -8,6 +8,8 @@ global: tls_config: {% if security_enabled %} ca_file: root_cert.pem + cert_file: alertmanager.crt + key_file: alertmanager.key {% else %} insecure_skip_verify: true {% endif %} diff --git a/src/pybind/mgr/cephadm/templates/services/mgmt-gateway/nginx.conf.j2 b/src/pybind/mgr/cephadm/templates/services/mgmt-gateway/nginx.conf.j2 index b9773ceeeb3..14af0fd48ca 100644 --- a/src/pybind/mgr/cephadm/templates/services/mgmt-gateway/nginx.conf.j2 +++ b/src/pybind/mgr/cephadm/templates/services/mgmt-gateway/nginx.conf.j2 @@ -9,6 +9,7 @@ events { http { #access_log /dev/stdout; + error_log /dev/stderr info; client_header_buffer_size 32K; large_client_header_buffers 4 32k; proxy_busy_buffers_size 512k; diff --git a/src/pybind/mgr/cephadm/templates/services/prometheus/prometheus.yml.j2 b/src/pybind/mgr/cephadm/templates/services/prometheus/prometheus.yml.j2 index ecfd899af71..961da145dac 100644 --- a/src/pybind/mgr/cephadm/templates/services/prometheus/prometheus.yml.j2 +++ b/src/pybind/mgr/cephadm/templates/services/prometheus/prometheus.yml.j2 @@ -28,6 +28,8 @@ alerting: password: {{ service_discovery_password }} tls_config: ca_file: root_cert.pem + cert_file: prometheus.crt + key_file: prometheus.key {% else %} - scheme: http http_sd_configs: @@ -56,6 +58,8 @@ scrape_configs: password: {{ service_discovery_password }} tls_config: ca_file: root_cert.pem + cert_file: prometheus.crt + key_file: prometheus.key {% else %} honor_labels: true http_sd_configs: @@ -81,6 +85,8 @@ scrape_configs: password: {{ service_discovery_password }} tls_config: ca_file: root_cert.pem + cert_file: prometheus.crt + key_file: prometheus.key {% else %} http_sd_configs: - url: {{ node_exporter_sd_url }} @@ -104,6 +110,8 @@ scrape_configs: password: {{ service_discovery_password }} tls_config: ca_file: root_cert.pem + cert_file: prometheus.crt + key_file: prometheus.key {% else %} http_sd_configs: - url: {{ haproxy_sd_url }} @@ -128,6 +136,8 @@ scrape_configs: password: {{ service_discovery_password }} tls_config: ca_file: root_cert.pem + cert_file: prometheus.crt + key_file: prometheus.key {% else %} honor_labels: true http_sd_configs: @@ -149,6 +159,8 @@ scrape_configs: password: {{ service_discovery_password }} tls_config: ca_file: root_cert.pem + cert_file: prometheus.crt + key_file: prometheus.key {% else %} http_sd_configs: - url: {{ nvmeof_sd_url }} @@ -169,6 +181,8 @@ scrape_configs: password: {{ service_discovery_password }} tls_config: ca_file: root_cert.pem + cert_file: prometheus.crt + key_file: prometheus.key {% else %} http_sd_configs: - url: {{ nfs_sd_url }} @@ -189,6 +203,8 @@ scrape_configs: password: {{ service_discovery_password }} tls_config: ca_file: root_cert.pem + cert_file: prometheus.crt + key_file: prometheus.key {% else %} http_sd_configs: - url: {{ smb_sd_url }} diff --git a/src/pybind/mgr/cephadm/tests/test_cephadm.py b/src/pybind/mgr/cephadm/tests/test_cephadm.py index b81510504d9..22bd26def91 100644 --- a/src/pybind/mgr/cephadm/tests/test_cephadm.py +++ b/src/pybind/mgr/cephadm/tests/test_cephadm.py @@ -1741,16 +1741,23 @@ class TestCephadm(object): nvmeof_client_cert = 'fake-nvmeof-client-cert' nvmeof_server_cert = 'fake-nvmeof-server-cert' nvmeof_root_ca_cert = 'fake-nvmeof-root-ca-cert' + grafana_cert_host_1 = 'grafana-cert-host-1' + grafana_cert_host_2 = 'grafana-cert-host-2' cephadm_module.cert_key_store.save_cert('rgw_frontend_ssl_cert', rgw_frontend_rgw_foo_host2_cert, service_name='rgw.foo', user_made=True) cephadm_module.cert_key_store.save_cert('nvmeof_server_cert', nvmeof_server_cert, service_name='nvmeof.foo', user_made=True) cephadm_module.cert_key_store.save_cert('nvmeof_client_cert', nvmeof_client_cert, service_name='nvmeof.foo', user_made=True) cephadm_module.cert_key_store.save_cert('nvmeof_root_ca_cert', nvmeof_root_ca_cert, service_name='nvmeof.foo', user_made=True) + cephadm_module.cert_key_store.save_cert('grafana_cert', grafana_cert_host_1, host='host-1', user_made=True) + cephadm_module.cert_key_store.save_cert('grafana_cert', grafana_cert_host_2, host='host-2', user_made=True) expected_calls = [ mock.call(f'{CERT_STORE_CERT_PREFIX}rgw_frontend_ssl_cert', json.dumps({'rgw.foo': Cert(rgw_frontend_rgw_foo_host2_cert, True).to_json()})), mock.call(f'{CERT_STORE_CERT_PREFIX}nvmeof_server_cert', json.dumps({'nvmeof.foo': Cert(nvmeof_server_cert, True).to_json()})), mock.call(f'{CERT_STORE_CERT_PREFIX}nvmeof_client_cert', json.dumps({'nvmeof.foo': Cert(nvmeof_client_cert, True).to_json()})), mock.call(f'{CERT_STORE_CERT_PREFIX}nvmeof_root_ca_cert', json.dumps({'nvmeof.foo': Cert(nvmeof_root_ca_cert, True).to_json()})), + mock.call(f'{CERT_STORE_CERT_PREFIX}grafana_cert', json.dumps({'host-1': Cert(grafana_cert_host_1, True).to_json()})), + mock.call(f'{CERT_STORE_CERT_PREFIX}grafana_cert', json.dumps({'host-1': Cert(grafana_cert_host_1, True).to_json(), + 'host-2': Cert(grafana_cert_host_2, True).to_json()})) ] _set_store.assert_has_calls(expected_calls) @@ -1795,17 +1802,20 @@ class TestCephadm(object): cephadm_module.cert_key_store._init_known_cert_key_dicts() grafana_host1_key = 'fake-grafana-host1-key' + grafana_host2_key = 'fake-grafana-host2-key' nvmeof_client_key = 'nvmeof-client-key' nvmeof_server_key = 'nvmeof-server-key' nvmeof_encryption_key = 'nvmeof-encryption-key' - grafana_host1_key = 'fake-grafana-host1-cert' cephadm_module.cert_key_store.save_key('grafana_key', grafana_host1_key, host='host1') + cephadm_module.cert_key_store.save_key('grafana_key', grafana_host2_key, host='host2') cephadm_module.cert_key_store.save_key('nvmeof_client_key', nvmeof_client_key, service_name='nvmeof.foo') cephadm_module.cert_key_store.save_key('nvmeof_server_key', nvmeof_server_key, service_name='nvmeof.foo') cephadm_module.cert_key_store.save_key('nvmeof_encryption_key', nvmeof_encryption_key, service_name='nvmeof.foo') expected_calls = [ mock.call(f'{CERT_STORE_KEY_PREFIX}grafana_key', json.dumps({'host1': PrivKey(grafana_host1_key).to_json()})), + mock.call(f'{CERT_STORE_KEY_PREFIX}grafana_key', json.dumps({'host1': PrivKey(grafana_host1_key).to_json(), + 'host2': PrivKey(grafana_host2_key).to_json()})), mock.call(f'{CERT_STORE_KEY_PREFIX}nvmeof_client_key', json.dumps({'nvmeof.foo': PrivKey(nvmeof_client_key).to_json()})), mock.call(f'{CERT_STORE_KEY_PREFIX}nvmeof_server_key', json.dumps({'nvmeof.foo': PrivKey(nvmeof_server_key).to_json()})), mock.call(f'{CERT_STORE_KEY_PREFIX}nvmeof_encryption_key', json.dumps({'nvmeof.foo': PrivKey(nvmeof_encryption_key).to_json()})), diff --git a/src/pybind/mgr/cephadm/tests/test_services.py b/src/pybind/mgr/cephadm/tests/test_services.py index 0d89657ac8c..d872219df80 100644 --- a/src/pybind/mgr/cephadm/tests/test_services.py +++ b/src/pybind/mgr/cephadm/tests/test_services.py @@ -581,7 +581,14 @@ class TestMonitoring: mock_getfqdn.return_value = purl.hostname with with_host(cephadm_module, "test"): - with with_service(cephadm_module, AlertManagerSpec()): + cephadm_module.cache.update_host_networks('test', { + '1.2.3.0/24': { + 'if0': ['1.2.3.1'] + }, + }) + with with_service(cephadm_module, AlertManagerSpec('alertmanager', + networks=['1.2.3.0/24'], + only_bind_port_on_networks=True)): y = dedent(self._get_config(expected_yaml_url)).lstrip() _run_cephadm.assert_called_with( 'test', @@ -595,11 +602,12 @@ class TestMonitoring: "deploy_arguments": [], "params": { 'tcp_ports': [9093, 9094], + 'port_ips': {"9094": "1.2.3.1"}, }, "meta": { 'service_name': 'alertmanager', 'ports': [9093, 9094], - 'ip': None, + 'ip': '1.2.3.1', 'deployed_by': [], 'rank': None, 'rank_generation': None, @@ -612,6 +620,7 @@ class TestMonitoring: }, "peers": [], "use_url_prefix": False, + "ip_to_bind_to": "1.2.3.1", } }), error_ok=True, @@ -634,8 +643,16 @@ class TestMonitoring: cephadm_module.secure_monitoring_stack = True cephadm_module.set_store(AlertmanagerService.USER_CFG_KEY, 'alertmanager_user') cephadm_module.set_store(AlertmanagerService.PASS_CFG_KEY, 'alertmanager_plain_password') + + cephadm_module.cache.update_host_networks('test', { + 'fd12:3456:789a::/64': { + 'if0': ['fd12:3456:789a::10'] + }, + }) with with_service(cephadm_module, MgmtGatewaySpec("mgmt-gateway")) as _, \ - with_service(cephadm_module, AlertManagerSpec()): + with_service(cephadm_module, AlertManagerSpec('alertmanager', + networks=['fd12:3456:789a::/64'], + only_bind_port_on_networks=True)): y = dedent(""" # This file is generated by cephadm. @@ -646,6 +663,8 @@ class TestMonitoring: http_config: tls_config: ca_file: root_cert.pem + cert_file: alertmanager.crt + key_file: alertmanager.key route: receiver: 'default' @@ -686,11 +705,12 @@ class TestMonitoring: "deploy_arguments": [], "params": { 'tcp_ports': [9093, 9094], + 'port_ips': {"9094": "fd12:3456:789a::10"} }, "meta": { 'service_name': 'alertmanager', 'ports': [9093, 9094], - 'ip': None, + 'ip': 'fd12:3456:789a::10', 'deployed_by': [], 'rank': None, 'rank_generation': None, @@ -708,6 +728,7 @@ class TestMonitoring: 'peers': [], 'web_config': '/etc/alertmanager/web.yml', "use_url_prefix": True, + "ip_to_bind_to": "fd12:3456:789a::10", } }), error_ok=True, @@ -741,6 +762,8 @@ class TestMonitoring: http_config: tls_config: ca_file: root_cert.pem + cert_file: alertmanager.crt + key_file: alertmanager.key route: receiver: 'default' @@ -801,6 +824,7 @@ class TestMonitoring: 'peers': [], 'web_config': '/etc/alertmanager/web.yml', "use_url_prefix": False, + "ip_to_bind_to": "", } }), error_ok=True, @@ -1170,6 +1194,8 @@ class TestMonitoring: password: sd_password tls_config: ca_file: root_cert.pem + cert_file: prometheus.crt + key_file: prometheus.key scrape_configs: - job_name: 'ceph' @@ -1191,6 +1217,8 @@ class TestMonitoring: password: sd_password tls_config: ca_file: root_cert.pem + cert_file: prometheus.crt + key_file: prometheus.key - job_name: 'node' relabel_configs: @@ -1209,6 +1237,8 @@ class TestMonitoring: password: sd_password tls_config: ca_file: root_cert.pem + cert_file: prometheus.crt + key_file: prometheus.key - job_name: 'haproxy' relabel_configs: @@ -1225,6 +1255,8 @@ class TestMonitoring: password: sd_password tls_config: ca_file: root_cert.pem + cert_file: prometheus.crt + key_file: prometheus.key - job_name: 'ceph-exporter' relabel_configs: @@ -1242,6 +1274,8 @@ class TestMonitoring: password: sd_password tls_config: ca_file: root_cert.pem + cert_file: prometheus.crt + key_file: prometheus.key - job_name: 'nvmeof' honor_labels: true @@ -1255,6 +1289,8 @@ class TestMonitoring: password: sd_password tls_config: ca_file: root_cert.pem + cert_file: prometheus.crt + key_file: prometheus.key - job_name: 'nfs' honor_labels: true @@ -1268,6 +1304,8 @@ class TestMonitoring: password: sd_password tls_config: ca_file: root_cert.pem + cert_file: prometheus.crt + key_file: prometheus.key - job_name: 'smb' honor_labels: true @@ -1281,6 +1319,8 @@ class TestMonitoring: password: sd_password tls_config: ca_file: root_cert.pem + cert_file: prometheus.crt + key_file: prometheus.key """).lstrip() @@ -2071,6 +2111,26 @@ class TestRGWService: }) assert f == expected + @pytest.mark.parametrize( + "disable_sync_traffic", + [ + (True), + (False), + ] + ) + @patch("cephadm.serve.CephadmServe._run_cephadm", _run_cephadm('{}')) + def test_rgw_disable_sync_traffic(self, disable_sync_traffic, cephadm_module: CephadmOrchestrator): + with with_host(cephadm_module, 'host1'): + s = RGWSpec(service_id="foo", + disable_multisite_sync_traffic=disable_sync_traffic) + with with_service(cephadm_module, s) as dds: + _, f, _ = cephadm_module.check_mon_command({ + 'prefix': 'config get', + 'who': f'client.{dds[0]}', + 'key': 'rgw_run_sync_thread', + }) + assert f == ('false' if disable_sync_traffic else 'true') + class TestMonService: @@ -3874,6 +3934,7 @@ class TestMgmtGateway: http { #access_log /dev/stdout; + error_log /dev/stderr info; client_header_buffer_size 32K; large_client_header_buffers 4 32k; proxy_busy_buffers_size 512k; @@ -4121,6 +4182,7 @@ class TestMgmtGateway: http { #access_log /dev/stdout; + error_log /dev/stderr info; client_header_buffer_size 32K; large_client_header_buffers 4 32k; proxy_busy_buffers_size 512k; diff --git a/src/pybind/mgr/orchestrator/_interface.py b/src/pybind/mgr/orchestrator/_interface.py index a505801eea5..4fbc975ae9f 100644 --- a/src/pybind/mgr/orchestrator/_interface.py +++ b/src/pybind/mgr/orchestrator/_interface.py @@ -747,6 +747,10 @@ class Orchestrator(object): """ raise NotImplementedError() + def set_osd_spec(self, service_name: str, osd_ids: List[str]) -> OrchResult: + """ set service of osd """ + raise NotImplementedError() + def blink_device_light(self, ident_fault: str, on: bool, locations: List['DeviceLightLoc']) -> OrchResult[List[str]]: """ Instructs the orchestrator to enable or disable either the ident or the fault LED. diff --git a/src/pybind/mgr/orchestrator/module.py b/src/pybind/mgr/orchestrator/module.py index 332bc75d862..d5a1bb3da2b 100644 --- a/src/pybind/mgr/orchestrator/module.py +++ b/src/pybind/mgr/orchestrator/module.py @@ -1472,6 +1472,14 @@ Usage: return HandleCommandResult(stdout=out) + @_cli_write_command('orch osd set-spec-affinity') + def _osd_set_spec(self, service_name: str, osd_id: List[str]) -> HandleCommandResult: + """Set service spec affinity for osd""" + completion = self.set_osd_spec(service_name, osd_id) + res = raise_if_exception(completion) + + return HandleCommandResult(stdout=res) + @_cli_write_command('orch daemon add') def daemon_add_misc(self, daemon_type: Optional[ServiceType] = None, @@ -1666,7 +1674,13 @@ Usage: specs: List[Union[ServiceSpec, HostSpec]] = [] # YAML '---' document separator with no content generates # None entries in the output. Let's skip them silently. - content = [o for o in yaml_objs if o is not None] + try: + content = [o for o in yaml_objs if o is not None] + except yaml.scanner.ScannerError as e: + msg = f"Invalid YAML received : {str(e)}" + self.log.exception(msg) + return HandleCommandResult(-errno.EINVAL, stderr=msg) + for s in content: try: spec = json_to_generic_spec(s) @@ -2191,7 +2205,13 @@ Usage: specs: List[TunedProfileSpec] = [] # YAML '---' document separator with no content generates # None entries in the output. Let's skip them silently. - content = [o for o in yaml_objs if o is not None] + try: + content = [o for o in yaml_objs if o is not None] + except yaml.scanner.ScannerError as e: + msg = f"Invalid YAML received : {str(e)}" + self.log.exception(msg) + return HandleCommandResult(-errno.EINVAL, stderr=msg) + for spec in content: specs.append(TunedProfileSpec.from_json(spec)) else: diff --git a/src/python-common/ceph/deployment/service_spec.py b/src/python-common/ceph/deployment/service_spec.py index 8a2a38b86ee..1ac9fa49e32 100644 --- a/src/python-common/ceph/deployment/service_spec.py +++ b/src/python-common/ceph/deployment/service_spec.py @@ -1231,6 +1231,7 @@ class RGWSpec(ServiceSpec): rgw_bucket_counters_cache: Optional[bool] = False, rgw_bucket_counters_cache_size: Optional[int] = None, generate_cert: bool = False, + disable_multisite_sync_traffic: Optional[bool] = None, ): assert service_type == 'rgw', service_type @@ -1283,6 +1284,8 @@ class RGWSpec(ServiceSpec): self.rgw_bucket_counters_cache_size = rgw_bucket_counters_cache_size #: Whether we should generate a cert/key for the user if not provided self.generate_cert = generate_cert + #: Used to make RGW not do multisite replication so it can dedicate to IO + self.disable_multisite_sync_traffic = disable_multisite_sync_traffic def get_port_start(self) -> List[int]: return [self.get_port()] @@ -2328,6 +2331,7 @@ class AlertManagerSpec(MonitoringSpec): user_data: Optional[Dict[str, Any]] = None, config: Optional[Dict[str, str]] = None, networks: Optional[List[str]] = None, + only_bind_port_on_networks: bool = False, port: Optional[int] = None, secure: bool = False, extra_container_args: Optional[GeneralArgList] = None, @@ -2358,6 +2362,7 @@ class AlertManagerSpec(MonitoringSpec): # <webhook_configs> configuration. self.user_data = user_data or {} self.secure = secure + self.only_bind_port_on_networks = only_bind_port_on_networks def get_port_start(self) -> List[int]: return [self.get_port(), 9094] @@ -2404,7 +2409,7 @@ class GrafanaSpec(MonitoringSpec): self.protocol = protocol # whether ports daemons for this service bind to should - # bind to only hte networks listed in networks param, or + # bind to only the networks listed in networks param, or # to all networks. Defaults to false which is saying to bind # on all networks. self.only_bind_port_on_networks = only_bind_port_on_networks diff --git a/src/rgw/driver/posix/rgw_sal_posix.cc b/src/rgw/driver/posix/rgw_sal_posix.cc index 1345468210f..9d76462baa0 100644 --- a/src/rgw/driver/posix/rgw_sal_posix.cc +++ b/src/rgw/driver/posix/rgw_sal_posix.cc @@ -2893,6 +2893,14 @@ int POSIXObject::copy_object(const ACLOwner& owner, return dobj->set_obj_attrs(dpp, &attrs, nullptr, y, rgw::sal::FLAG_LOG_OP); } +int POSIXObject::list_parts(const DoutPrefixProvider* dpp, CephContext* cct, + int max_parts, int marker, int* next_marker, + bool* truncated, list_parts_each_t each_func, + optional_yield y) +{ + return -EOPNOTSUPP; +} + int POSIXObject::load_obj_state(const DoutPrefixProvider* dpp, optional_yield y, bool follow_olh) { int ret = stat(dpp); diff --git a/src/rgw/driver/posix/rgw_sal_posix.h b/src/rgw/driver/posix/rgw_sal_posix.h index 8ec72bbc1bc..bf3478ad6ab 100644 --- a/src/rgw/driver/posix/rgw_sal_posix.h +++ b/src/rgw/driver/posix/rgw_sal_posix.h @@ -653,6 +653,13 @@ public: const DoutPrefixProvider* dpp, optional_yield y) override; virtual RGWAccessControlPolicy& get_acl(void) override { return acls; } virtual int set_acl(const RGWAccessControlPolicy& acl) override { acls = acl; return 0; } + + /** If multipart, enumerate (a range [marker..marker+[min(max_parts, parts_count-1)] of) parts of the object */ + virtual int list_parts(const DoutPrefixProvider* dpp, CephContext* cct, + int max_parts, int marker, int* next_marker, + bool* truncated, list_parts_each_t each_func, + optional_yield y) override; + virtual int load_obj_state(const DoutPrefixProvider* dpp, optional_yield y, bool follow_olh = true) override; virtual int set_obj_attrs(const DoutPrefixProvider* dpp, Attrs* setattrs, Attrs* delattrs, optional_yield y, uint32_t flags) override; diff --git a/src/rgw/driver/rados/rgw_rados.cc b/src/rgw/driver/rados/rgw_rados.cc index 0b77bca1da7..69075c506f1 100644 --- a/src/rgw/driver/rados/rgw_rados.cc +++ b/src/rgw/driver/rados/rgw_rados.cc @@ -6962,13 +6962,13 @@ int RGWRados::set_attrs(const DoutPrefixProvider *dpp, RGWObjectCtx* octx, RGWBu } return 0; -} +} /* RGWRados::set_attrs() */ -static int get_part_obj_state(const DoutPrefixProvider* dpp, optional_yield y, - RGWRados* store, RGWBucketInfo& bucket_info, - RGWObjectCtx* rctx, RGWObjManifest* manifest, - int part_num, int* parts_count, bool prefetch, - RGWObjState** pstate, RGWObjManifest** pmanifest) +int RGWRados::get_part_obj_state(const DoutPrefixProvider* dpp, optional_yield y, + RGWRados* store, RGWBucketInfo& bucket_info, + RGWObjectCtx* rctx, RGWObjManifest* manifest, + int part_num, int* parts_count, bool prefetch, + RGWObjState** pstate, RGWObjManifest** pmanifest) { if (!manifest) { return -ERR_INVALID_PART; @@ -7047,6 +7047,9 @@ static int get_part_obj_state(const DoutPrefixProvider* dpp, optional_yield y, // update the object size sm->state.size = part_manifest.get_obj_size(); + if (!sm->state.attrset.count(RGW_ATTR_COMPRESSION)) { + sm->state.accounted_size = sm->state.size; + } *pmanifest = &part_manifest; return 0; diff --git a/src/rgw/driver/rados/rgw_rados.h b/src/rgw/driver/rados/rgw_rados.h index b24823b60dc..fe79916392f 100644 --- a/src/rgw/driver/rados/rgw_rados.h +++ b/src/rgw/driver/rados/rgw_rados.h @@ -1071,6 +1071,12 @@ public: }; // class RGWRados::Bucket::List }; // class RGWRados::Bucket + static int get_part_obj_state(const DoutPrefixProvider* dpp, optional_yield y, + RGWRados* store, RGWBucketInfo& bucket_info, + RGWObjectCtx* rctx, RGWObjManifest* manifest, + int part_num, int* parts_count, bool prefetch, + RGWObjState** pstate, RGWObjManifest** pmanifest); + int on_last_entry_in_listing(const DoutPrefixProvider *dpp, RGWBucketInfo& bucket_info, const std::string& obj_prefix, diff --git a/src/rgw/driver/rados/rgw_sal_rados.cc b/src/rgw/driver/rados/rgw_sal_rados.cc index 26d7ca9d983..4c67d0ee71a 100644 --- a/src/rgw/driver/rados/rgw_sal_rados.cc +++ b/src/rgw/driver/rados/rgw_sal_rados.cc @@ -2471,7 +2471,108 @@ bool RadosObject::is_sync_completed(const DoutPrefixProvider* dpp, const rgw_bi_log_entry& earliest_marker = entries.front(); return earliest_marker.timestamp > obj_mtime; -} +} /* is_sync_completed */ + +int RadosObject::list_parts(const DoutPrefixProvider* dpp, CephContext* cct, + int max_parts, int marker, int* next_marker, + bool* truncated, list_parts_each_t each_func, + optional_yield y) +{ + int ret{0}; + + /* require an object with a manifest, so call to get_obj_state() must precede this */ + if (! manifest) { + return -EINVAL; + } + + RGWObjManifest::obj_iterator end = manifest->obj_end(dpp); + if (end.get_cur_part_id() == 0) { // not multipart + ldpp_dout(dpp, 20) << __func__ << " object does not have a multipart manifest" + << dendl; + return 0; + } + + auto end_part_id = end.get_cur_part_id(); + auto parts_count = (end_part_id == 1) ? 1 : end_part_id - 1; + if (marker > (parts_count - 1)) { + return 0; + } + + RGWObjManifest::obj_iterator part_iter = manifest->obj_begin(dpp); + + if (marker != 0) { + ldpp_dout_fmt(dpp, 20, + "{} seeking to part #{} in the object manifest", + __func__, marker); + + part_iter = manifest->obj_find_part(dpp, marker + 1); + + if (part_iter == end) { + ldpp_dout_fmt(dpp, 5, + "{} failed to find part #{} in the object manifest", + __func__, marker + 1); + return 0; + } + } + + RGWObjectCtx& obj_ctx = get_ctx(); + RGWBucketInfo& bucket_info = get_bucket()->get_info(); + + Object::Part obj_part{}; + for (; part_iter != manifest->obj_end(dpp); ++part_iter) { + + /* we're only interested in the first object in each logical part */ + auto cur_part_id = part_iter.get_cur_part_id(); + if (cur_part_id == obj_part.part_number) { + continue; + } + + if (max_parts < 1) { + *truncated = true; + break; + } + + /* get_part_obj_state alters the passed manifest** to point to a part + * manifest, which we don't want to leak out here */ + RGWObjManifest* obj_m = manifest; + RGWObjState* astate; + bool part_prefetch = false; + ret = RGWRados::get_part_obj_state(dpp, y, store->getRados(), bucket_info, &obj_ctx, + obj_m, cur_part_id, &parts_count, + part_prefetch, &astate, &obj_m); + + if (ret < 0) { + ldpp_dout_fmt(dpp, 4, + "{} get_part_obj_state() failed ret={}", + __func__, ret); + break; + } + + obj_part.part_number = part_iter.get_cur_part_id(); + obj_part.part_size = astate->accounted_size; + + if (auto iter = astate->attrset.find(RGW_ATTR_CKSUM); + iter != astate->attrset.end()) { + try { + rgw::cksum::Cksum part_cksum; + auto ck_iter = iter->second.cbegin(); + part_cksum.decode(ck_iter); + obj_part.cksum = std::move(part_cksum); + } catch (buffer::error& err) { + ldpp_dout_fmt(dpp, 4, + "WARN: {} could not decode stored cksum, " + "caught buffer::error", + __func__); + } + } + + each_func(obj_part); + *next_marker = ++marker; + --max_parts; + } /* each part */ + + return ret; +} /* RadosObject::list_parts */ int RadosObject::load_obj_state(const DoutPrefixProvider* dpp, optional_yield y, bool follow_olh) { diff --git a/src/rgw/driver/rados/rgw_sal_rados.h b/src/rgw/driver/rados/rgw_sal_rados.h index bd00e726fe7..85ea247e345 100644 --- a/src/rgw/driver/rados/rgw_sal_rados.h +++ b/src/rgw/driver/rados/rgw_sal_rados.h @@ -592,12 +592,18 @@ class RadosObject : public StoreObject { StoreObject::set_compressed(); } - virtual bool is_sync_completed(const DoutPrefixProvider* dpp, const ceph::real_time& obj_mtime) override; /* For rgw_admin.cc */ RGWObjState& get_state() { return state; } virtual int load_obj_state(const DoutPrefixProvider* dpp, optional_yield y, bool follow_olh = true) override; + + /** If multipart, enumerate (a range [marker..marker+[min(max_parts, parts_count-1)] of) parts of the object */ + virtual int list_parts(const DoutPrefixProvider* dpp, CephContext* cct, + int max_parts, int marker, int* next_marker, + bool* truncated, list_parts_each_t each_func, + optional_yield y) override; + virtual int set_obj_attrs(const DoutPrefixProvider* dpp, Attrs* setattrs, Attrs* delattrs, optional_yield y, uint32_t flags) override; virtual int get_obj_attrs(optional_yield y, const DoutPrefixProvider* dpp, rgw_obj* target_obj = NULL) override; virtual int modify_obj_attrs(const char* attr_name, bufferlist& attr_val, optional_yield y, const DoutPrefixProvider* dpp) override; diff --git a/src/rgw/rgw_cksum_pipe.cc b/src/rgw/rgw_cksum_pipe.cc index e06957e2715..0bec8d341af 100644 --- a/src/rgw/rgw_cksum_pipe.cc +++ b/src/rgw/rgw_cksum_pipe.cc @@ -18,6 +18,7 @@ #include <string> #include <fmt/format.h> #include <boost/algorithm/string.hpp> +#include "rgw_cksum.h" #include "rgw_common.h" #include "common/dout.h" #include "rgw_client_io.h" @@ -34,7 +35,8 @@ namespace rgw::putobj { {} std::unique_ptr<RGWPutObj_Cksum> RGWPutObj_Cksum::Factory( - rgw::sal::DataProcessor* next, const RGWEnv& env) + rgw::sal::DataProcessor* next, const RGWEnv& env, + rgw::cksum::Type override_type) { /* look for matching headers */ auto algo_header = cksum_algorithm_hdr(env); @@ -49,6 +51,13 @@ namespace rgw::putobj { throw rgw::io::Exception(EINVAL, std::system_category()); } /* no checksum header */ + if (override_type != rgw::cksum::Type::none) { + /* XXXX safe? do we need to fixup env as well? */ + auto algo_header = cksum_algorithm_hdr(override_type); + return + std::make_unique<RGWPutObj_Cksum>( + next, override_type, std::move(algo_header)); + } return std::unique_ptr<RGWPutObj_Cksum>(); } diff --git a/src/rgw/rgw_cksum_pipe.h b/src/rgw/rgw_cksum_pipe.h index fddcd283c84..c459d156335 100644 --- a/src/rgw/rgw_cksum_pipe.h +++ b/src/rgw/rgw_cksum_pipe.h @@ -20,6 +20,7 @@ #include <tuple> #include <cstring> #include <boost/algorithm/string/case_conv.hpp> +#include "rgw_cksum.h" #include "rgw_cksum_digest.h" #include "rgw_common.h" #include "rgw_putobj.h" @@ -29,6 +30,38 @@ namespace rgw::putobj { namespace cksum = rgw::cksum; using cksum_hdr_t = std::pair<const char*, const char*>; + static inline const cksum_hdr_t cksum_algorithm_hdr(rgw::cksum::Type t) { + static constexpr std::string_view hdr = + "HTTP_X_AMZ_SDK_CHECKSUM_ALGORITHM"; + using rgw::cksum::Type; + switch (t) { + case Type::sha256: + return cksum_hdr_t(hdr.data(), "SHA256"); + break; + case Type::crc32: + return cksum_hdr_t(hdr.data(), "CRC32"); + break; + case Type::crc32c: + return cksum_hdr_t(hdr.data(), "CRC32C"); + break; + case Type::xxh3: + return cksum_hdr_t(hdr.data(), "XX3"); + break; + case Type::sha1: + return cksum_hdr_t(hdr.data(), "SHA1"); + break; + case Type::sha512: + return cksum_hdr_t(hdr.data(), "SHA512"); + break; + case Type::blake3: + return cksum_hdr_t(hdr.data(), "BLAKE3"); + break; + default: + break; + }; + return cksum_hdr_t(nullptr, nullptr);; + } + static inline const cksum_hdr_t cksum_algorithm_hdr(const RGWEnv& env) { /* If the individual checksum value you provide through x-amz-checksum-algorithm doesn't match the checksum algorithm @@ -102,7 +135,8 @@ namespace rgw::putobj { using VerifyResult = std::tuple<bool, const cksum::Cksum&>; static std::unique_ptr<RGWPutObj_Cksum> Factory( - rgw::sal::DataProcessor* next, const RGWEnv&); + rgw::sal::DataProcessor* next, const RGWEnv&, + rgw::cksum::Type override_type); RGWPutObj_Cksum(rgw::sal::DataProcessor* next, rgw::cksum::Type _type, cksum_hdr_t&& _hdr); diff --git a/src/rgw/rgw_iam_policy.cc b/src/rgw/rgw_iam_policy.cc index 2a5c9cd313e..ef6761d4222 100644 --- a/src/rgw/rgw_iam_policy.cc +++ b/src/rgw/rgw_iam_policy.cc @@ -94,6 +94,8 @@ static const actpair actpairs[] = { "s3:GetPublicAccessBlock", s3GetPublicAccessBlock }, { "s3:GetObjectAcl", s3GetObjectAcl }, { "s3:GetObject", s3GetObject }, + { "s3:GetObjectAttributes", s3GetObjectAttributes }, + { "s3:GetObjectVersionAttributes", s3GetObjectVersionAttributes }, { "s3:GetObjectTorrent", s3GetObjectTorrent }, { "s3:GetObjectVersionAcl", s3GetObjectVersionAcl }, { "s3:GetObjectVersion", s3GetObjectVersion }, @@ -1335,6 +1337,7 @@ const char* action_bit_string(uint64_t action) { case s3ListBucketVersions: return "s3:ListBucketVersions"; + case s3ListAllMyBuckets: return "s3:ListAllMyBuckets"; @@ -1479,6 +1482,12 @@ const char* action_bit_string(uint64_t action) { case s3BypassGovernanceRetention: return "s3:BypassGovernanceRetention"; + case s3GetObjectAttributes: + return "s3:GetObjectAttributes"; + + case s3GetObjectVersionAttributes: + return "s3:GetObjectVersionAttributes"; + case s3DescribeJob: return "s3:DescribeJob"; diff --git a/src/rgw/rgw_iam_policy.h b/src/rgw/rgw_iam_policy.h index 0476926143f..dd323ee4b9c 100644 --- a/src/rgw/rgw_iam_policy.h +++ b/src/rgw/rgw_iam_policy.h @@ -115,6 +115,8 @@ enum { s3GetBucketEncryption, s3PutBucketEncryption, s3DescribeJob, + s3GetObjectAttributes, + s3GetObjectVersionAttributes, s3All, s3objectlambdaGetObject, @@ -247,6 +249,8 @@ inline int op_to_perm(std::uint64_t op) { case s3GetObjectVersionTagging: case s3GetObjectRetention: case s3GetObjectLegalHold: + case s3GetObjectAttributes: + case s3GetObjectVersionAttributes: case s3ListAllMyBuckets: case s3ListBucket: case s3ListBucketMultipartUploads: diff --git a/src/rgw/rgw_op.cc b/src/rgw/rgw_op.cc index 0e08878e747..7b0ca3134a3 100644 --- a/src/rgw/rgw_op.cc +++ b/src/rgw/rgw_op.cc @@ -25,8 +25,10 @@ #include "common/ceph_json.h" #include "common/static_ptr.h" #include "common/perf_counters_key.h" +#include "rgw_cksum.h" #include "rgw_cksum_digest.h" #include "rgw_common.h" +#include "common/split.h" #include "rgw_tracer.h" #include "rgw_rados.h" @@ -4341,6 +4343,9 @@ void RGWPutObj::execute(optional_yield y) } return; } + + multipart_cksum_type = upload->cksum_type; + /* upload will go out of scope, so copy the dest placement for later use */ s->dest_placement = *pdest_placement; pdest_placement = &s->dest_placement; @@ -4471,11 +4476,12 @@ void RGWPutObj::execute(optional_yield y) /* optional streaming checksum */ try { cksum_filter = - rgw::putobj::RGWPutObj_Cksum::Factory(filter, *s->info.env); + rgw::putobj::RGWPutObj_Cksum::Factory(filter, *s->info.env, multipart_cksum_type); } catch (const rgw::io::Exception& e) { op_ret = -e.code().value(); return; } + if (cksum_filter) { filter = &*cksum_filter; } @@ -4622,10 +4628,12 @@ void RGWPutObj::execute(optional_yield y) if (cksum_filter) { const auto& hdr = cksum_filter->header(); + auto expected_ck = cksum_filter->expected(*s->info.env); auto cksum_verify = cksum_filter->verify(*s->info.env); // valid or no supplied cksum cksum = get<1>(cksum_verify); - if (std::get<0>(cksum_verify)) { + if ((!expected_ck) || + std::get<0>(cksum_verify)) { buffer::list cksum_bl; ldpp_dout_fmt(this, 16, @@ -4633,14 +4641,13 @@ void RGWPutObj::execute(optional_yield y) "\n\tcomputed={} == \n\texpected={}", hdr.second, cksum->to_armor(), - cksum_filter->expected(*s->info.env)); + (!!expected_ck) ? expected_ck : "(checksum unavailable)"); cksum->encode(cksum_bl); emplace_attr(RGW_ATTR_CKSUM, std::move(cksum_bl)); } else { /* content checksum mismatch */ auto computed_ck = cksum->to_armor(); - auto expected_ck = cksum_filter->expected(*s->info.env); ldpp_dout_fmt(this, 4, "{} content checksum mismatch" @@ -4843,7 +4850,8 @@ void RGWPostObj::execute(optional_yield y) /* optional streaming checksum */ try { cksum_filter = - rgw::putobj::RGWPutObj_Cksum::Factory(filter, *s->info.env); + rgw::putobj::RGWPutObj_Cksum::Factory( + filter, *s->info.env, rgw::cksum::Type::none /* no override */); } catch (const rgw::io::Exception& e) { op_ret = -e.code().value(); return; @@ -5982,8 +5990,6 @@ void RGWGetACLs::execute(optional_yield y) acls = ss.str(); } - - int RGWPutACLs::verify_permission(optional_yield y) { bool perm; @@ -6005,6 +6011,74 @@ int RGWPutACLs::verify_permission(optional_yield y) return 0; } +uint16_t RGWGetObjAttrs::recognize_attrs(const std::string& hdr, uint16_t deflt) +{ + auto attrs{deflt}; + auto sa = ceph::split(hdr, ","); + for (auto& k : sa) { + if (boost::iequals(k, "etag")) { + attrs |= as_flag(ReqAttributes::Etag); + } + if (boost::iequals(k, "checksum")) { + attrs |= as_flag(ReqAttributes::Checksum); + } + if (boost::iequals(k, "objectparts")) { + attrs |= as_flag(ReqAttributes::ObjectParts); + } + if (boost::iequals(k, "objectsize")) { + attrs |= as_flag(ReqAttributes::ObjectSize); + } + if (boost::iequals(k, "storageclass")) { + attrs |= as_flag(ReqAttributes::StorageClass); + } + } + return attrs; +} /* RGWGetObjAttrs::recognize_attrs */ + +int RGWGetObjAttrs::verify_permission(optional_yield y) +{ + bool perm = false; + auto [has_s3_existing_tag, has_s3_resource_tag] = + rgw_check_policy_condition(this, s); + + if (! rgw::sal::Object::empty(s->object.get())) { + + auto iam_action1 = s->object->get_instance().empty() ? + rgw::IAM::s3GetObject : + rgw::IAM::s3GetObjectVersion; + + auto iam_action2 = s->object->get_instance().empty() ? + rgw::IAM::s3GetObjectAttributes : + rgw::IAM::s3GetObjectVersionAttributes; + + if (has_s3_existing_tag || has_s3_resource_tag) { + rgw_iam_add_objtags(this, s, has_s3_existing_tag, has_s3_resource_tag); + } + + /* XXXX the following conjunction should be &&--but iam_action2 is currently not + * hooked up and always fails (but should succeed if the requestor has READ + * acess to the object) */ + perm = (verify_object_permission(this, s, iam_action1) || /* && */ + verify_object_permission(this, s, iam_action2)); + } + + if (! perm) { + return -EACCES; + } + + return 0; +} + +void RGWGetObjAttrs::pre_exec() +{ + rgw_bucket_object_pre_exec(s); +} + +void RGWGetObjAttrs::execute(optional_yield y) +{ + RGWGetObj::execute(y); +} /* RGWGetObjAttrs::execute */ + int RGWGetLC::verify_permission(optional_yield y) { auto [has_s3_existing_tag, has_s3_resource_tag] = rgw_check_policy_condition(this, s, false); @@ -6672,6 +6746,14 @@ try_sum_part_cksums(const DoutPrefixProvider *dpp, ++parts_ix; auto& part_cksum = part.second->get_cksum(); + if (! part_cksum) { + ldpp_dout_fmt(dpp, 0, + "ERROR: multipart part checksum not present (ix=={})", + parts_ix); + op_ret = -ERR_INVALID_REQUEST; + return op_ret; + } + ldpp_dout_fmt(dpp, 16, "INFO: {} iterate part: {} {} {}", __func__, parts_ix, part_cksum->type_string(), diff --git a/src/rgw/rgw_op.h b/src/rgw/rgw_op.h index feddb21eb56..dcf64c31572 100644 --- a/src/rgw/rgw_op.h +++ b/src/rgw/rgw_op.h @@ -12,6 +12,7 @@ #pragma once +#include <cstdint> #include <limits.h> #include <array> @@ -1238,6 +1239,7 @@ protected: std::string multipart_upload_id; std::string multipart_part_str; int multipart_part_num = 0; + rgw::cksum::Type multipart_cksum_type{rgw::cksum::Type::none}; jspan_ptr multipart_trace; boost::optional<ceph::real_time> delete_at; @@ -1645,6 +1647,50 @@ public: uint32_t op_mask() override { return RGW_OP_TYPE_WRITE; } }; +class RGWGetObjAttrs : public RGWGetObj { +protected: + std::string version_id; + std::string expected_bucket_owner; + std::optional<int> marker; + std::optional<int> max_parts; + uint16_t requested_attributes{0}; +#if 0 + /* used to decrypt attributes for objects stored with SSE-C */ + x-amz-server-side-encryption-customer-algorithm + x-amz-server-side-encryption-customer-key + x-amz-server-side-encryption-customer-key-MD5 +#endif +public: + + enum class ReqAttributes : uint16_t { + None = 0, + Etag, + Checksum, + ObjectParts, + StorageClass, + ObjectSize + }; + + static uint16_t as_flag(ReqAttributes attr) { + return 1 << (uint16_t(attr) ? uint16_t(attr) - 1 : 0); + } + + static uint16_t recognize_attrs(const std::string& hdr, uint16_t deflt = 0); + + RGWGetObjAttrs() : RGWGetObj() + { + RGWGetObj::get_data = false; // it's extra false + } + + int verify_permission(optional_yield y) override; + void pre_exec() override; + void execute(optional_yield y) override; + void send_response() override = 0; + const char* name() const override { return "get_obj_attrs"; } + RGWOpType get_type() override { return RGW_OP_GET_OBJ_ATTRS; } + uint32_t op_mask() override { return RGW_OP_TYPE_READ; } +}; /* RGWGetObjAttrs */ + class RGWGetLC : public RGWOp { protected: diff --git a/src/rgw/rgw_op_type.h b/src/rgw/rgw_op_type.h index 49faea6403d..2c8225d289e 100644 --- a/src/rgw/rgw_op_type.h +++ b/src/rgw/rgw_op_type.h @@ -30,6 +30,7 @@ enum RGWOpType { RGW_OP_COPY_OBJ, RGW_OP_GET_ACLS, RGW_OP_PUT_ACLS, + RGW_OP_GET_OBJ_ATTRS, RGW_OP_GET_CORS, RGW_OP_PUT_CORS, RGW_OP_DELETE_CORS, diff --git a/src/rgw/rgw_rest.h b/src/rgw/rgw_rest.h index aa33080af56..9111696453e 100644 --- a/src/rgw/rgw_rest.h +++ b/src/rgw/rgw_rest.h @@ -403,6 +403,17 @@ public: virtual std::string canonical_name() const override { return fmt::format("REST.{}.ACL", s->info.method); } }; +class RGWGetObjAttrs_ObjStore : public RGWGetObjAttrs { +public: + RGWGetObjAttrs_ObjStore() {} + ~RGWGetObjAttrs_ObjStore() override {} + + int get_params(optional_yield y) = 0; + /* not actually used */ + int send_response_data_error(optional_yield y) override { return 0; }; + int send_response_data(bufferlist& bl, off_t ofs, off_t len) override { return 0; }; +}; + class RGWGetLC_ObjStore : public RGWGetLC { public: RGWGetLC_ObjStore() {} diff --git a/src/rgw/rgw_rest_s3.cc b/src/rgw/rgw_rest_s3.cc index 13f6bb0015a..9edb79d8fd0 100644 --- a/src/rgw/rgw_rest_s3.cc +++ b/src/rgw/rgw_rest_s3.cc @@ -9,6 +9,7 @@ #include <string_view> #include "common/ceph_crypto.h" +#include "common/dout.h" #include "common/split.h" #include "common/Formatter.h" #include "common/utf8.h" @@ -807,7 +808,6 @@ void RGWGetObjTags_ObjStore_S3::send_response_data(bufferlist& bl) } } - int RGWPutObjTags_ObjStore_S3::get_params(optional_yield y) { RGWXMLParser parser; @@ -3819,6 +3819,196 @@ void RGWPutACLs_ObjStore_S3::send_response() dump_start(s); } +int RGWGetObjAttrs_ObjStore_S3::get_params(optional_yield y) +{ + string err; + auto& env = s->info.env; + version_id = s->info.args.get("versionId"); + + auto hdr = env->get_optional("HTTP_X_AMZ_EXPECTED_BUCKET_OWNER"); + if (hdr) { + expected_bucket_owner = *hdr; + } + + hdr = env->get_optional("HTTP_X_AMZ_MAX_PARTS"); + if (hdr) { + max_parts = strict_strtol(hdr->c_str(), 10, &err); + if (!err.empty()) { + s->err.message = "Invalid value for MaxParts: " + err; + ldpp_dout(s, 10) << "Invalid value for MaxParts " << *hdr << ": " + << err << dendl; + return -ERR_INVALID_PART; + } + max_parts = std::min(*max_parts, 1000); + } + + hdr = env->get_optional("HTTP_X_AMZ_PART_NUMBER_MARKER"); + if (hdr) { + marker = strict_strtol(hdr->c_str(), 10, &err); + if (!err.empty()) { + s->err.message = "Invalid value for PartNumberMarker: " + err; + ldpp_dout(s, 10) << "Invalid value for PartNumberMarker " << *hdr << ": " + << err << dendl; + return -ERR_INVALID_PART; + } + } + + hdr = env->get_optional("HTTP_X_AMZ_OBJECT_ATTRIBUTES"); + if (hdr) { + requested_attributes = recognize_attrs(*hdr); + } + + /* XXX skipping SSE-C params for now */ + + return 0; +} /* RGWGetObjAttrs_ObjStore_S3::get_params(...) */ + +int RGWGetObjAttrs_ObjStore_S3::get_decrypt_filter( + std::unique_ptr<RGWGetObj_Filter> *filter, + RGWGetObj_Filter* cb, bufferlist* manifest_bl) +{ + // we aren't actually decrypting the data, but for objects encrypted with + // SSE-C we do need to verify that required headers are present and valid + // + // in the SSE-KMS and SSE-S3 cases, this unfortunately causes us to fetch + // decryption keys which we don't need :( + std::unique_ptr<BlockCrypt> block_crypt; // ignored + std::map<std::string, std::string> crypt_http_responses; // ignored + return rgw_s3_prepare_decrypt(s, s->yield, attrs, &block_crypt, + crypt_http_responses); +} + +void RGWGetObjAttrs_ObjStore_S3::send_response() +{ + if (op_ret) + set_req_state_err(s, op_ret); + dump_errno(s); + + if (op_ret == 0) { + version_id = s->object->get_instance(); + + // x-amz-delete-marker: DeleteMarker // not sure we can plausibly do this? + dump_last_modified(s, lastmod); + dump_header_if_nonempty(s, "x-amz-version-id", version_id); + // x-amz-request-charged: RequestCharged + } + + end_header(s, this, to_mime_type(s->format)); + dump_start(s); + + if (op_ret == 0) { + s->formatter->open_object_section("GetObjectAttributes"); + if (requested_attributes & as_flag(ReqAttributes::Etag)) { + if (lo_etag.empty()) { + auto iter = attrs.find(RGW_ATTR_ETAG); + if (iter != attrs.end()) { + lo_etag = iter->second.to_str(); + } + } + s->formatter->dump_string("ETag", lo_etag); + } + + if (requested_attributes & as_flag(ReqAttributes::Checksum)) { + s->formatter->open_object_section("Checksum"); + auto iter = attrs.find(RGW_ATTR_CKSUM); + if (iter != attrs.end()) { + try { + rgw::cksum::Cksum cksum; + auto bliter = iter->second.cbegin(); + cksum.decode(bliter); + if (multipart_parts_count && multipart_parts_count > 0) { + s->formatter->dump_string(cksum.element_name(), + fmt::format("{}-{}", cksum.to_armor(), *multipart_parts_count)); + } else { + s->formatter->dump_string(cksum.element_name(), cksum.to_armor()); + } + } catch (buffer::error& err) { + ldpp_dout(this, 0) + << "ERROR: could not decode stored cksum, caught buffer::error" << dendl; + } + } + s->formatter->close_section(); /* Checksum */ + } /* Checksum */ + + if (requested_attributes & as_flag(ReqAttributes::ObjectParts)) { + if (multipart_parts_count && multipart_parts_count > 0) { + + /* XXX the following was needed to see a manifest at list_parts()! */ + op_ret = s->object->load_obj_state(s, s->yield); + if (op_ret < 0) { + ldpp_dout_fmt(this, 0, + "ERROR: {} load_obj_state() failed ret={}", __func__, + op_ret); + } + + ldpp_dout_fmt(this, 16, + "{} attr flags={} parts_count={}", + __func__, requested_attributes, *multipart_parts_count); + + s->formatter->open_object_section("ObjectParts"); + + bool truncated = false; + int next_marker; + + using namespace rgw::sal; + + int ret = + s->object->list_parts( + this, s->cct, + max_parts ? *max_parts : 1000, + marker ? *marker : 0, + &next_marker, &truncated, + [&](const Object::Part& part) -> int { + s->formatter->open_object_section("Part"); + s->formatter->dump_int("PartNumber", part.part_number); + s->formatter->dump_unsigned("Size", part.part_size); + if (part.cksum.type != rgw::cksum::Type::none) { + s->formatter->dump_string(part.cksum.element_name(), part.cksum.to_armor()); + } + s->formatter->close_section(); /* Part */ + return 0; + }, s->yield); + + if (ret < 0) { + ldpp_dout_fmt(this, 0, + "ERROR: {} list-parts failed for {}", + __func__, s->object->get_name()); + } + /* AWS docs disagree on the name of this element */ + s->formatter->dump_int("PartsCount", *multipart_parts_count); + s->formatter->dump_int("TotalPartsCount", *multipart_parts_count); + s->formatter->dump_bool("IsTruncated", truncated); + if (max_parts) { + s->formatter->dump_int("MaxParts", *max_parts); + } + if(truncated) { + s->formatter->dump_int("NextPartNumberMarker", next_marker); + } + if (marker) { + s->formatter->dump_int("PartNumberMarker", *marker); + } + s->formatter->close_section(); + } /* multipart_parts_count positive */ + } /* ObjectParts */ + + if (requested_attributes & as_flag(ReqAttributes::ObjectSize)) { + s->formatter->dump_int("ObjectSize", s->obj_size); + } + + if (requested_attributes & as_flag(ReqAttributes::StorageClass)) { + auto iter = attrs.find(RGW_ATTR_STORAGE_CLASS); + if (iter != attrs.end()) { + s->formatter->dump_string("StorageClass", iter->second.to_str()); + } else { + s->formatter->dump_string("StorageClass", "STANDARD"); + } + } + s->formatter->close_section(); + } /* op_ret == 0 */ + + rgw_flush_formatter_and_reset(s, s->formatter); +} /* RGWGetObjAttrs_ObjStore_S3::send_response */ + void RGWGetLC_ObjStore_S3::execute(optional_yield y) { config.set_ctx(s->cct); @@ -4798,6 +4988,7 @@ RGWOp *RGWHandler_REST_Bucket_S3::get_obj_op(bool get_data) const RGWOp *RGWHandler_REST_Bucket_S3::op_get() { + /* XXX maybe we could replace this with an indexing operation */ if (s->info.args.sub_resource_exists("encryption")) return nullptr; @@ -4994,6 +5185,8 @@ RGWOp *RGWHandler_REST_Obj_S3::op_get() return new RGWGetObjLayout_ObjStore_S3; } else if (is_tagging_op()) { return new RGWGetObjTags_ObjStore_S3; + } else if (is_attributes_op()) { + return new RGWGetObjAttrs_ObjStore_S3; } else if (is_obj_retention_op()) { return new RGWGetObjRetention_ObjStore_S3; } else if (is_obj_legal_hold_op()) { diff --git a/src/rgw/rgw_rest_s3.h b/src/rgw/rgw_rest_s3.h index 50160d79a42..e8fdc69751c 100644 --- a/src/rgw/rgw_rest_s3.h +++ b/src/rgw/rgw_rest_s3.h @@ -374,6 +374,18 @@ public: int get_params(optional_yield y) override; }; +class RGWGetObjAttrs_ObjStore_S3 : public RGWGetObjAttrs_ObjStore { +public: + RGWGetObjAttrs_ObjStore_S3() {} + ~RGWGetObjAttrs_ObjStore_S3() override {} + + int get_params(optional_yield y) override; + int get_decrypt_filter(std::unique_ptr<RGWGetObj_Filter>* filter, + RGWGetObj_Filter* cb, + bufferlist* manifest_bl) override; + void send_response() override; +}; + class RGWGetLC_ObjStore_S3 : public RGWGetLC_ObjStore { protected: RGWLifecycleConfiguration_S3 config; @@ -701,6 +713,9 @@ protected: bool is_acl_op() const { return s->info.args.exists("acl"); } + bool is_attributes_op() const { + return s->info.args.exists("attributes"); + } bool is_cors_op() const { return s->info.args.exists("cors"); } @@ -759,6 +774,9 @@ protected: bool is_acl_op() const { return s->info.args.exists("acl"); } + bool is_attributes_op() const { + return s->info.args.exists("attributes"); + } bool is_tagging_op() const { return s->info.args.exists("tagging"); } diff --git a/src/rgw/rgw_sal.h b/src/rgw/rgw_sal.h index d0bdcea25e6..4b94f74b851 100644 --- a/src/rgw/rgw_sal.h +++ b/src/rgw/rgw_sal.h @@ -15,6 +15,7 @@ #pragma once +#include <cstdint> #include <optional> #include <boost/intrusive_ptr.hpp> #include <boost/smart_ptr/intrusive_ref_counter.hpp> @@ -26,6 +27,7 @@ #include "rgw_notify_event_type.h" #include "rgw_req_context.h" #include "include/random.h" +#include "include/function2.hpp" // FIXME: following subclass dependencies #include "driver/rados/rgw_user.h" @@ -1169,6 +1171,9 @@ class Object { std::string* version_id, std::string* tag, std::string* etag, void (*progress_cb)(off_t, void *), void* progress_data, const DoutPrefixProvider* dpp, optional_yield y) = 0; + + /** return logging subsystem */ + virtual unsigned get_subsys() { return ceph_subsys_rgw; }; /** Get the ACL for this object */ virtual RGWAccessControlPolicy& get_acl(void) = 0; /** Set the ACL for this object */ @@ -1249,6 +1254,28 @@ class Object { /** Dump driver-specific object layout info in JSON */ virtual int dump_obj_layout(const DoutPrefixProvider *dpp, optional_yield y, Formatter* f) = 0; + /* A transfer data type describing metadata specific to one part of a + * completed multipart upload object, following the GetObjectAttributes + * response syntax for Object::Parts here: + * https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObjectAttributes.html */ + class Part + { + public: + int part_number; + uint32_t part_size; + rgw::cksum::Cksum cksum; + }; /* Part */ + + /* callback function/object used by list_parts */ + using list_parts_each_t = + const fu2::unique_function<int(const Part&) const>; + + /** If multipart, enumerate (a range [marker..marker+[min(max_parts, parts_count-1)] of) parts of the object */ + virtual int list_parts(const DoutPrefixProvider* dpp, CephContext* cct, + int max_parts, int marker, int* next_marker, + bool* truncated, list_parts_each_t each_func, + optional_yield y) = 0; + /** Get the cached attributes for this object */ virtual Attrs& get_attrs(void) = 0; /** Get the (const) cached attributes for this object */ @@ -1447,7 +1474,7 @@ public: virtual int init(const DoutPrefixProvider* dpp, optional_yield y, ACLOwner& owner, rgw_placement_rule& dest_placement, rgw::sal::Attrs& attrs) = 0; /** List all the parts of this upload, filling the parts cache */ virtual int list_parts(const DoutPrefixProvider* dpp, CephContext* cct, - int num_parts, int marker, + int max_parts, int marker, int* next_marker, bool* truncated, optional_yield y, bool assume_unsorted = false) = 0; /** Abort this upload */ diff --git a/src/rgw/rgw_sal_dbstore.cc b/src/rgw/rgw_sal_dbstore.cc index 2c569b07b8c..02fd7a49cda 100644 --- a/src/rgw/rgw_sal_dbstore.cc +++ b/src/rgw/rgw_sal_dbstore.cc @@ -488,6 +488,14 @@ namespace rgw::sal { return std::make_unique<DBLuaManager>(this); } + int DBObject::list_parts(const DoutPrefixProvider* dpp, CephContext* cct, + int max_parts, int marker, int* next_marker, + bool* truncated, list_parts_each_t each_func, + optional_yield y) + { + return -EOPNOTSUPP; + } + int DBObject::load_obj_state(const DoutPrefixProvider* dpp, optional_yield y, bool follow_olh) { RGWObjState* astate; diff --git a/src/rgw/rgw_sal_dbstore.h b/src/rgw/rgw_sal_dbstore.h index 0207d233dc7..4df10d1dce1 100644 --- a/src/rgw/rgw_sal_dbstore.h +++ b/src/rgw/rgw_sal_dbstore.h @@ -528,6 +528,7 @@ protected: DBObject(DBObject& _o) = default; + virtual unsigned get_subsys() { return ceph_subsys_rgw_dbstore; }; virtual int delete_object(const DoutPrefixProvider* dpp, optional_yield y, uint32_t flags, @@ -553,6 +554,13 @@ protected: virtual int set_acl(const RGWAccessControlPolicy& acl) override { acls = acl; return 0; } virtual int set_obj_attrs(const DoutPrefixProvider* dpp, Attrs* setattrs, Attrs* delattrs, optional_yield y, uint32_t flags) override; + + /** If multipart, enumerate (a range [marker..marker+[min(max_parts, parts_count-1)] of) parts of the object */ + virtual int list_parts(const DoutPrefixProvider* dpp, CephContext* cct, + int max_parts, int marker, int* next_marker, + bool* truncated, list_parts_each_t each_func, + optional_yield y) override; + virtual int load_obj_state(const DoutPrefixProvider* dpp, optional_yield y, bool follow_olh = true) override; virtual int get_obj_attrs(optional_yield y, const DoutPrefixProvider* dpp, rgw_obj* target_obj = NULL) override; virtual int modify_obj_attrs(const char* attr_name, bufferlist& attr_val, optional_yield y, const DoutPrefixProvider* dpp) override; diff --git a/src/rgw/rgw_sal_filter.cc b/src/rgw/rgw_sal_filter.cc index 733bfa39ee2..15da580988e 100644 --- a/src/rgw/rgw_sal_filter.cc +++ b/src/rgw/rgw_sal_filter.cc @@ -1046,6 +1046,17 @@ RGWAccessControlPolicy& FilterObject::get_acl() return next->get_acl(); } +int FilterObject::list_parts(const DoutPrefixProvider* dpp, CephContext* cct, + int max_parts, int marker, int* next_marker, + bool* truncated, list_parts_each_t each_func, + optional_yield y) +{ + return next->list_parts(dpp, cct, max_parts, marker, next_marker, + truncated, + sal::Object::list_parts_each_t(each_func), + y); +} + int FilterObject::load_obj_state(const DoutPrefixProvider *dpp, optional_yield y, bool follow_olh) { return next->load_obj_state(dpp, y, follow_olh); diff --git a/src/rgw/rgw_sal_filter.h b/src/rgw/rgw_sal_filter.h index e36e841df3f..947ce9d4bf5 100644 --- a/src/rgw/rgw_sal_filter.h +++ b/src/rgw/rgw_sal_filter.h @@ -778,6 +778,12 @@ public: virtual bool empty() const override { return next->empty(); } virtual const std::string &get_name() const override { return next->get_name(); } + /** If multipart, enumerate (a range [marker..marker+[min(max_parts, parts_count-1)] of) parts of the object */ + virtual int list_parts(const DoutPrefixProvider* dpp, CephContext* cct, + int max_parts, int marker, int* next_marker, + bool* truncated, list_parts_each_t each_func, + optional_yield y) override; + virtual int load_obj_state(const DoutPrefixProvider *dpp, optional_yield y, bool follow_olh = true) override; virtual int set_obj_attrs(const DoutPrefixProvider* dpp, Attrs* setattrs, diff --git a/src/test/ObjectMap/KeyValueDBMemory.cc b/src/test/ObjectMap/KeyValueDBMemory.cc index 234e963397e..cfe25930d6a 100644 --- a/src/test/ObjectMap/KeyValueDBMemory.cc +++ b/src/test/ObjectMap/KeyValueDBMemory.cc @@ -132,12 +132,26 @@ public: return ""; } + string_view key_as_sv() override { + if (valid()) + return (*it).first.second; + else + return ""; + } + pair<string,string> raw_key() override { if (valid()) return (*it).first; else return make_pair("", ""); } + + pair<string_view,string_view> raw_key_as_sv() override { + if (valid()) + return (*it).first; + else + return make_pair("", ""); + } bool raw_key_is_prefixed(const string &prefix) override { return prefix == (*it).first.first; @@ -150,6 +164,13 @@ public: return bufferlist(); } + std::string_view value_as_sv() override { + if (valid()) + return std::string_view{it->second.c_str(), it->second.length()}; + else + return std::string_view(); + } + int status() override { return 0; } diff --git a/src/test/librbd/migration/test_mock_HttpClient.cc b/src/test/librbd/migration/test_mock_HttpClient.cc index f3888755c79..901c4231dd0 100644 --- a/src/test/librbd/migration/test_mock_HttpClient.cc +++ b/src/test/librbd/migration/test_mock_HttpClient.cc @@ -307,7 +307,7 @@ TEST_F(TestMockMigrationHttpClient, OpenCloseHttps) { boost::asio::ssl::context ssl_context{boost::asio::ssl::context::tlsv12}; load_server_certificate(ssl_context); - boost::beast::ssl_stream<boost::beast::tcp_stream> ssl_stream{ + boost::asio::ssl::stream<boost::asio::ip::tcp::socket> ssl_stream{ std::move(socket), ssl_context}; C_SaferCond on_ssl_handshake_ctx; @@ -341,7 +341,7 @@ TEST_F(TestMockMigrationHttpClient, OpenHttpsHandshakeFail) { boost::asio::ssl::context ssl_context{boost::asio::ssl::context::tlsv12}; load_server_certificate(ssl_context); - boost::beast::ssl_stream<boost::beast::tcp_stream> ssl_stream{ + boost::asio::ssl::stream<boost::asio::ip::tcp::socket> ssl_stream{ std::move(socket), ssl_context}; C_SaferCond on_ssl_handshake_ctx; diff --git a/src/test/objectstore/ObjectStoreImitator.h b/src/test/objectstore/ObjectStoreImitator.h index d71d7f2fe58..875f9041b83 100644 --- a/src/test/objectstore/ObjectStoreImitator.h +++ b/src/test/objectstore/ObjectStoreImitator.h @@ -347,6 +347,16 @@ public: ) override { return {}; } + + int omap_iterate(CollectionHandle &c, ///< [in] collection + const ghobject_t &oid, ///< [in] object + /// [in] where the iterator should point to at the beginning + omap_iter_seek_t start_from, + std::function<omap_iter_ret_t(std::string_view, std::string_view)> f + ) override { + return 0; + } + void set_fsid(uuid_d u) override {} uuid_d get_fsid() override { return {}; } uint64_t estimate_objects_overhead(uint64_t num_objects) override { diff --git a/src/test/objectstore/allocsim/ops_replayer.cc b/src/test/objectstore/allocsim/ops_replayer.cc index fd947f5c454..c5908d9f576 100644 --- a/src/test/objectstore/allocsim/ops_replayer.cc +++ b/src/test/objectstore/allocsim/ops_replayer.cc @@ -1,4 +1,5 @@ #include <algorithm> +#include <functional> #include <boost/program_options/value_semantic.hpp> #include <cassert> #include <cctype> @@ -13,26 +14,46 @@ #include <fstream> #include <filesystem> #include <mutex> -#include "include/rados/buffer_fwd.h" -#include "include/rados/librados.hpp" #include <atomic> -#include <fmt/format.h> #include <map> #include <memory> #include <random> #include <string> #include <iostream> #include <vector> +#include <format> + +#include <fmt/format.h> #include <boost/program_options/variables_map.hpp> #include <boost/program_options/parsers.hpp> +#include "include/rados/buffer_fwd.h" +#include "include/rados/librados.hpp" + namespace po = boost::program_options; using namespace std; using namespace ceph; +namespace settings { + +// Returns a function which restricts a value to a specified range by throwing if it is not in range: +// (Note: std::clamp() does not throw.) +auto clamp_or_throw(auto min, auto max) +{ + return [=](auto& x) { + if(std::less<>{}(x, min) or std::greater<>{}(x, max)) { + throw std::out_of_range(fmt::format("value expected between {} and {}, but got {}", min, max, x)); + } + + return x; + }; +} + +} // namespace settings + // compare shared_ptr<string> struct StringPtrCompare { @@ -338,8 +359,8 @@ int main(int argc, char** argv) { // options uint64_t io_depth = 8; - uint64_t nparser_threads = 16; - uint64_t nworker_threads = 16; + int nparser_threads = 16; + int nworker_threads = 16; string file("input.txt"); string ceph_conf_path("./ceph.conf"); string pool("test_pool"); @@ -351,8 +372,8 @@ int main(int argc, char** argv) { ("input-files,i", po::value<vector<string>>()->multitoken(), "List of input files (output of op_scraper.py). Multiple files will be merged and sorted by time order") ("ceph-conf", po::value<string>(&ceph_conf_path)->default_value("ceph.conf"), "Path to ceph conf") ("io-depth", po::value<uint64_t>(&io_depth)->default_value(64), "I/O depth") - ("parser-threads", po::value<uint64_t>(&nparser_threads)->default_value(16), "Number of parser threads") - ("worker-threads", po::value<uint64_t>(&nworker_threads)->default_value(16), "Number of I/O worker threads") + ("parser-threads", po::value<int>(&nparser_threads)->default_value(16)->notifier(settings::clamp_or_throw(1, 256)), "Number of parser threads") + ("worker-threads", po::value<int>(&nworker_threads)->default_value(16)->notifier(settings::clamp_or_throw(1, 256)), "Number of I/O worker threads") ("pool", po::value<string>(&pool)->default_value("test_pool"), "Pool to use for I/O") ("skip-do-ops", po::bool_switch(&skip_do_ops)->default_value(false), "Skip doing operations") ; diff --git a/src/test/pybind/pytest.ini b/src/test/pybind/pytest.ini index dccf2a346dc..97569e88299 100644 --- a/src/test/pybind/pytest.ini +++ b/src/test/pybind/pytest.ini @@ -7,3 +7,4 @@ markers = stats tier watch + wait diff --git a/src/test/pybind/test_rados.py b/src/test/pybind/test_rados.py index cb2a4f96101..25423bd8dcb 100644 --- a/src/test/pybind/test_rados.py +++ b/src/test/pybind/test_rados.py @@ -207,7 +207,7 @@ class TestRados(object): def test_get_fsid(self): fsid = self.rados.get_fsid() - assert re.match('[0-9a-f\-]{36}', fsid, re.I) + assert re.match(r'[0-9a-f\-]{36}', fsid, re.I) def test_blocklist_add(self): self.rados.blocklist_add("1.2.3.4/123", 1) diff --git a/src/test/rgw/test_rgw_iam_policy.cc b/src/test/rgw/test_rgw_iam_policy.cc index 7dadb7812ff..1d13c2aa013 100644 --- a/src/test/rgw/test_rgw_iam_policy.cc +++ b/src/test/rgw/test_rgw_iam_policy.cc @@ -75,6 +75,8 @@ using rgw::IAM::s3GetObjectTagging; using rgw::IAM::s3GetObjectVersion; using rgw::IAM::s3GetObjectVersionTagging; using rgw::IAM::s3GetObjectVersionTorrent; +using rgw::IAM::s3GetObjectAttributes; +using rgw::IAM::s3GetObjectVersionAttributes; using rgw::IAM::s3GetPublicAccessBlock; using rgw::IAM::s3GetReplicationConfiguration; using rgw::IAM::s3ListAllMyBuckets; @@ -419,6 +421,8 @@ TEST_F(PolicyTest, Parse3) { act2[s3GetObjectVersionAcl] = 1; act2[s3GetObjectTorrent] = 1; act2[s3GetObjectVersionTorrent] = 1; + act2[s3GetObjectAttributes] = 1; + act2[s3GetObjectVersionAttributes] = 1; act2[s3GetAccelerateConfiguration] = 1; act2[s3GetBucketAcl] = 1; act2[s3GetBucketOwnershipControls] = 1; @@ -487,6 +491,8 @@ TEST_F(PolicyTest, Eval3) { s3allow[s3GetObjectVersion] = 1; s3allow[s3GetObjectAcl] = 1; s3allow[s3GetObjectVersionAcl] = 1; + s3allow[s3GetObjectAttributes] = 1; + s3allow[s3GetObjectVersionAttributes] = 1; s3allow[s3GetObjectTorrent] = 1; s3allow[s3GetObjectVersionTorrent] = 1; s3allow[s3GetAccelerateConfiguration] = 1; @@ -883,6 +889,8 @@ TEST_F(ManagedPolicyTest, AmazonS3ReadOnlyAccess) act[s3GetObjectVersionAcl] = 1; act[s3GetObjectTorrent] = 1; act[s3GetObjectVersionTorrent] = 1; + act[s3GetObjectAttributes] = 1; + act[s3GetObjectVersionAttributes] = 1; act[s3GetAccelerateConfiguration] = 1; act[s3GetBucketAcl] = 1; act[s3GetBucketOwnershipControls] = 1; |