| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The OSD's IOPS capacity is used by the mClock scheduler to determine the
quantum of bandwidth allocation for the various operations on the OSD.
Prior to this commit, maybe_override_max_osd_capacity_for_qos() only
checked if the measured IOPS capacity exceeded the higher threshold defined
by 'osd_mclock_iops_capacity_threshold_[hdd|ssd]' and if so fallback to the
last valid or the default IOPS capacity as defined by
osd_mclock_max_capacity_iops_[hdd|ssd].
It's quite possible that the reported IOPS is unrealistically low. This
could be due to transient factors on the underlying device or it could
indicate bad health of the device. Either way, the safer option would be
to fallback to the last valid or the default IOPS setting for that OSD in
order to avoid cluster performance (slow or stalled ops) issues down the
line.
Therefore, to handle this case, the commit introduces additional config
options viz.,
- osd_mclock_iops_capacity_low_threshold_hdd - set to 50 IOPS and
- osd_mclock_iops_capacity_low_threshold_ssd - set to 1000 IOPS
If the measured IOPS capacity doesn't fall within the low and high
threshold range, the default or the last valid IOPS capacity is used.
The existing cluster log warning is suitably modified to convey the
reason.
Additionally, for a couple of valgrind related teuthology tests, the
cluster warning is added to the ignorelist since the reported IOPS can
be very low due to slowness.
Fixes: https://tracker.ceph.com/issues/67421
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Based on tests performed at scale on a HDD based cluster, it was found
that scheduling with mClock was not optimal with multiple OSD shards. For
e.g., in the scaled cluster with multiple OSD node failures, the client
throughput was found to be inconsistent across test runs coupled with
multiple reported slow requests.
However, the same test with a single OSD shard and with multiple worker
threads yielded significantly better results in terms of consistency of
client and recovery throughput across multiple test runs.
For more details see https://tracker.ceph.com/issues/66289.
Therefore, as an interim measure until the issue with multiple OSD shards
(or multiple mClock queues per OSD) is investigated and fixed, the
following change to the default HDD OSD shard configuration is made:
- osd_op_num_shards_hdd = 1 (was 5)
- osd_op_num_threads_per_shard_hdd = 5 (was 1)
The other changes in this commit include:
- Doc change to the OSD and mClock config reference describing
this change.
- OSD troubleshooting entry on the procedure to change the shard
configuration for clusters affected by this issue running on older
releases.
- Add release note for this change.
Fixes: https://tracker.ceph.com/issues/66289
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
# Conflicts:
# doc/rados/troubleshooting/troubleshooting-osd.rst
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a follow-up to PR: https://github.com/ceph/ceph/pull/48703.
This commit also considers changes made ephemerally using either the
'daemon' or the 'tell' interfaces to override the built-in mClock
QoS parameters. In such a scenario, the ephemeral changes are removed
using the rm_val() method exposed by the config subsytem and logging
this information.
Other changes:
1. Add a standalone test to exercise the fix.
2. Add documentation note on the outcome of the attempt to modify
built-in profile defaults.
Fixes: https://tracker.ceph.com/issues/61155
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Modify the relevant documentation to reflect:
- change in the default mClock profile to 'balanced'
- new allocations for ops across mClock profiles
- change in the osd_max_backfills limit
- miscellaneous changes related to warnings.
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
|
|
|
|
| |
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
|
|
|
|
|
|
| |
Update docs after filestore removal.
Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Document the following:
- New max backfill/recovery defaults for mClock.
- Steps to modify the backfill/recovery defaults.
- Modify defaults using new osd_mclock_override_recovery_settings option
- Steps to mitigate unrealistic OSD bench results to set OSD capacity.
- New capacity threshold options for ssd/hdd
Fixes: https://tracker.ceph.com/issues/57529
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Create the initial mClock QoS params at CONF_DEFAULT level using
set_val_default(). This allows switching to a custom profile on a
running OSD and to make necessary changes to the desired QoS params.
Note that Switching to ‘custom’ profile and then subsequently changing
the QoS params using “config set osd.n …” will be at a higher level i.e.
at CONF_MON.
But When switching back to a built-in profile, the new values won’t take
effect since CONF_DEFAULT < CONF_MON. For the values to take effect, the
config keys created as part of the ‘custom’ profile must be removed from
the ConfigMonitor store after switching back to a built-in profile.
- Added a couple of standalone tests to exercise the scenario.
- Updated the mClock configuration document and the mClock internal
documentation with a couple of typos relating to the best effort weights.
- Added new sections to the mClock configuration document outlining the
steps to switch between the built-in and custom profile and vice-versa.
Fixes: https://tracker.ceph.com/issues/55153
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Improve the documentation around.
- mclock client types.
- Describe in greater detail about mclock config profiles.
- Add notes about manually benchmarking OSDs and tuning bluestore throttle
parameters.
- Include a couple of missing mclock configuration options.
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Update the steps in the mclock config reference document to manually
override an OSDs max IOPS capacity. Provide information on the alternative
ways to override the osd_mclock_max_capacity_iops_[hdd,ssd] options for
an OSD.
Fixes: https://tracker.ceph.com/issues/52025
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
|
|
|
|
| |
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make the osd_*_sleep options modifiable during runtime and add them
to the set of tracked conf keys. This is to ensure that the sleep
options can be disabled/overridden during OSD bring-up if mclock
scheduler is employed.
Introduce OSD::maybe_override_options_for_qos():
This method does the following if the mclock scheduler is enabled:
- overrides the "recovery_max_active" to a high limit of 1000,
- overrides "osd_max_backfills" option to a high limit of 1000 and
sets the corresponding Async local and remote reserver objects also
to the same value (1000),
- disables osd_*_sleep options so that appropriate QoS may be
provided with the mclock scheduler.
The above method is called in the following scenarios:
- After all the op shards are brought up during OSD initialization.
- In OSD::handle_conf_change() to override any settings related to
QoS that the user intended to change.
Modify the mclock config reference to accurately reflect what options
can be changed when using mclock's "custom" profile and clean up
some whitespaces.
Fixes: https://tracker.ceph.com/issues/50501
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
|
|
|
|
| |
Signed-off-by: Kefu Chai <kchai@redhat.com>
|
|
|
|
|
|
| |
and link to them when appropriate
Signed-off-by: Kefu Chai <kchai@redhat.com>
|
|
|
|
|
|
| |
for defining options
Signed-off-by: Kefu Chai <kchai@redhat.com>
|
|
|
|
|
|
|
|
|
| |
This is my second attempt to rewrite the
second half of the mclock docs. The first attempt
is enshrined in https://github.com/ceph/ceph/pull/40571,
in which I got cute with git and got burned.
Signed-off-by: Zac Dover <zac.dover@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
| |
This PR rewrites the material provided in
PR#40531, improving its elegance and
readability.
This PR rewrites the first 48 percent of the
material provided in PR#40531.
Signed-off-by: Zac Dover <zac.dover@gmail.com>
|
|
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
|