ceph - ceph

	Commit message (Collapse)	Author	Age	Files	Lines
*	common,osd: Use last valid OSD IOPS value if measured IOPS is unrealistic	Sridhar Seshasayee	2024-10-17	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The OSD's IOPS capacity is used by the mClock scheduler to determine the quantum of bandwidth allocation for the various operations on the OSD. Prior to this commit, maybe_override_max_osd_capacity_for_qos() only checked if the measured IOPS capacity exceeded the higher threshold defined by 'osd_mclock_iops_capacity_threshold_[hdd\|ssd]' and if so fallback to the last valid or the default IOPS capacity as defined by osd_mclock_max_capacity_iops_[hdd\|ssd]. It's quite possible that the reported IOPS is unrealistically low. This could be due to transient factors on the underlying device or it could indicate bad health of the device. Either way, the safer option would be to fallback to the last valid or the default IOPS setting for that OSD in order to avoid cluster performance (slow or stalled ops) issues down the line. Therefore, to handle this case, the commit introduces additional config options viz., - osd_mclock_iops_capacity_low_threshold_hdd - set to 50 IOPS and - osd_mclock_iops_capacity_low_threshold_ssd - set to 1000 IOPS If the measured IOPS capacity doesn't fall within the low and high threshold range, the default or the last valid IOPS capacity is used. The existing cluster log warning is suitably modified to convey the reason. Additionally, for a couple of valgrind related teuthology tests, the cluster warning is added to the ignorelist since the reported IOPS can be very low due to slowness. Fixes: https://tracker.ceph.com/issues/67421 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
*	common/options: Change HDD OSD shard configuration defaults for mClock	Sridhar Seshasayee	2024-09-03	1	-0/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Based on tests performed at scale on a HDD based cluster, it was found that scheduling with mClock was not optimal with multiple OSD shards. For e.g., in the scaled cluster with multiple OSD node failures, the client throughput was found to be inconsistent across test runs coupled with multiple reported slow requests. However, the same test with a single OSD shard and with multiple worker threads yielded significantly better results in terms of consistency of client and recovery throughput across multiple test runs. For more details see https://tracker.ceph.com/issues/66289. Therefore, as an interim measure until the issue with multiple OSD shards (or multiple mClock queues per OSD) is investigated and fixed, the following change to the default HDD OSD shard configuration is made: - osd_op_num_shards_hdd = 1 (was 5) - osd_op_num_threads_per_shard_hdd = 5 (was 1) The other changes in this commit include: - Doc change to the OSD and mClock config reference describing this change. - OSD troubleshooting entry on the procedure to change the shard configuration for clusters affected by this issue running on older releases. - Add release note for this change. Fixes: https://tracker.ceph.com/issues/66289 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com> # Conflicts: # doc/rados/troubleshooting/troubleshooting-osd.rst
*	osd/scheduler: Reset ephemeral changes to mClock built-in profile	Sridhar Seshasayee	2023-05-18	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a follow-up to PR: https://github.com/ceph/ceph/pull/48703. This commit also considers changes made ephemerally using either the 'daemon' or the 'tell' interfaces to override the built-in mClock QoS parameters. In such a scenario, the ephemeral changes are removed using the rm_val() method exposed by the config subsytem and logging this information. Other changes: 1. Add a standalone test to exercise the fix. 2. Add documentation note on the outcome of the attempt to modify built-in profile defaults. Fixes: https://tracker.ceph.com/issues/61155 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
*	doc/: Modify mClock configuration documentation to reflect profile changes	Sridhar Seshasayee	2023-05-08	1	-52/+64
\| \| \| \| \| \| \| \| \| \| \|	Modify the relevant documentation to reflect: - change in the default mClock profile to 'balanced' - new allocations for ops across mClock profiles - change in the osd_max_backfills limit - miscellaneous changes related to warnings. Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
*	doc: Modify mClock configuration documentation to reflect new cost model	Sridhar Seshasayee	2023-05-08	1	-24/+22
\| \| \| \|	Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
*	docs: warning and remove few docs section for Filestore	Nitzan Mordechai	2023-04-20	1	-4/+0
\| \| \| \| \| \|	Update docs after filestore removal. Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
*	doc: Update mClock config reference doc to reflect new max recovery limits	Sridhar Seshasayee	2022-12-12	1	-23/+156
\| \| \| \| \| \| \| \| \| \| \| \| \|	Document the following: - New max backfill/recovery defaults for mClock. - Steps to modify the backfill/recovery defaults. - Modify defaults using new osd_mclock_override_recovery_settings option - Steps to mitigate unrealistic OSD bench results to set OSD capacity. - New capacity threshold options for ssd/hdd Fixes: https://tracker.ceph.com/issues/57529 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
*	osd: Set initial mClock QoS params at CONF_DEFAULT level	Sridhar Seshasayee	2022-07-06	1	-3/+142
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Create the initial mClock QoS params at CONF_DEFAULT level using set_val_default(). This allows switching to a custom profile on a running OSD and to make necessary changes to the desired QoS params. Note that Switching to ‘custom’ profile and then subsequently changing the QoS params using “config set osd.n …” will be at a higher level i.e. at CONF_MON. But When switching back to a built-in profile, the new values won’t take effect since CONF_DEFAULT < CONF_MON. For the values to take effect, the config keys created as part of the ‘custom’ profile must be removed from the ConfigMonitor store after switching back to a built-in profile. - Added a couple of standalone tests to exercise the scenario. - Updated the mClock configuration document and the mClock internal documentation with a couple of typos relating to the best effort weights. - Added new sections to the mClock configuration document outlining the steps to switch between the built-in and custom profile and vice-versa. Fixes: https://tracker.ceph.com/issues/55153 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
*	doc: Improvements to mClock configuration reference documentation	Sridhar Seshasayee	2022-03-24	1	-45/+135
\| \| \| \| \| \| \| \| \| \| \|	Improve the documentation around. - mclock client types. - Describe in greater detail about mclock config profiles. - Add notes about manually benchmarking OSDs and tuning bluestore throttle parameters. - Include a couple of missing mclock configuration options. Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
*	doc: Update mclock-config-ref doc steps to override osd max iops capacity.	Sridhar Seshasayee	2021-09-01	1	-18/+20
\| \| \| \| \| \| \| \| \| \|	Update the steps in the mclock config reference document to manually override an OSDs max IOPS capacity. Provide information on the alternative ways to override the osd_mclock_max_capacity_iops_[hdd,ssd] options for an OSD. Fixes: https://tracker.ceph.com/issues/52025 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
*	doc: Update mclock-config-ref to reflect automated OSD benchmarking	Sridhar Seshasayee	2021-06-03	1	-101/+133
\| \| \| \|	Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
*	osd: Override recovery/backfill/sleep options with mclock scheduler.	Sridhar Seshasayee	2021-04-27	1	-13/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make the osd__sleep options modifiable during runtime and add them to the set of tracked conf keys. This is to ensure that the sleep options can be disabled/overridden during OSD bring-up if mclock scheduler is employed. Introduce OSD::maybe_override_options_for_qos(): This method does the following if the mclock scheduler is enabled: - overrides the "recovery_max_active" to a high limit of 1000, - overrides "osd_max_backfills" option to a high limit of 1000 and sets the corresponding Async local and remote reserver objects also to the same value (1000), - disables osd__sleep options so that appropriate QoS may be provided with the mclock scheduler. The above method is called in the following scenarios: - After all the op shards are brought up during OSD initialization. - In OSD::handle_conf_change() to override any settings related to QoS that the user intended to change. Modify the mclock config reference to accurately reflect what options can be changed when using mclock's "custom" profile and clean up some whitespaces. Fixes: https://tracker.ceph.com/issues/50501 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
*	doc/rados/configuration: reference options using :confval:	Kefu Chai	2021-04-26	1	-24/+24
\| \| \| \|	Signed-off-by: Kefu Chai <kchai@redhat.com>
*	doc/rados/configuration: add missing options	Kefu Chai	2021-04-22	1	-5/+5
\| \| \| \| \| \|	and link to them when appropriate Signed-off-by: Kefu Chai <kchai@redhat.com>
*	doc/rados/configuration/mclock-config-ref: use confval directive	Kefu Chai	2021-04-19	1	-87/+10
\| \| \| \| \| \|	for defining options Signed-off-by: Kefu Chai <kchai@redhat.com>
*	doc/rados: rewrite mclock docs (2 of 2)	Zac Dover	2021-04-09	1	-28/+37
\| \| \| \| \| \| \| \| \|	This is my second attempt to rewrite the second half of the mclock docs. The first attempt is enshrined in https://github.com/ceph/ceph/pull/40571, in which I got cute with git and got burned. Signed-off-by: Zac Dover <zac.dover@gmail.com>
*	doc/rados: rewrite mclock-config-ref	Zac Dover	2021-04-06	1	-62/+63
\| \| \| \| \| \| \| \| \| \| \|	This PR rewrites the material provided in PR#40531, improving its elegance and readability. This PR rewrites the first 48 percent of the material provided in PR#40531. Signed-off-by: Zac Dover <zac.dover@gmail.com>
*	doc: Add mclock configuration reference documentation	Sridhar Seshasayee	2021-04-01	1	-0/+361
	Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>