| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| | |
os/bluestore: Create additional bdev labels when expanding block device.
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
There was a problem when expansion of 'block' device crossed location
of bdev label copy. The extra label that did not exist before and now
exists was not initialized.
This makes test to fail.
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
|
| |
| |
| |
| | |
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
|
| |
| |
| |
| | |
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
|
| |
| |
| |
| |
| |
| |
| | |
verify that an operator scrub aborts a reserving scrub of the
same PG.
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
a helper function that builds bash dictionaries:
pg to acting set, pg to primary & pg to pool.
Also added are two helper functions that make use of the dictionaries:
count_common_active() to count the number of common OSDs
in the acting set of two PGs, and find_disjoint_but_primary()
to find a PG that is disjoint from the first PG, apart from
possibly having the same primary OSD.
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
A bogus change introduced as part of PR#54363 (commit
fbb7d73) changed multiple 'scrub' commands to 'scheduled-scrub'.
In this one instance - that was wrong.
Fixes: https://tracker.ceph.com/issues/69276
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | | |
TEST_backfill_grow fails after finding "num_bytes mismatch" in osd log
Reviewed-by: Ronen Friedman <rfriedma@redhat.com>
Reviewed-by: Samuel Just <sjust@redhat.com>
|
| |/
| |
| |
| |
| |
| |
| |
| | |
Need to ignore "num_bytes mismatch" messages during throw backfill/recovery
progress.
Fixes: https://tracker.ceph.com/issues/68585
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
|
|\ \
| | |
| | |
| | |
| | |
| | | |
qa/standalone: bugfix for wait_for_scrub
Reviewed-by: Ronen Friedman <rfriedma@redhat.com>
|
| | |
| | |
| | |
| | | |
Signed-off-by: Mingyuan Liang <liangmingyuan@baidu.com>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | | |
qa/standalone/mon/mon_cluster_log.sh: retry check for log line
Reviewed-by: Nitzan Mordechai <nmordech@redhat.com>
|
| |/ /
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Issue: The test was failing as we were checking for the osd boot
log before it was actually emitted in the log file.
Solution: We retry checking for the desired string in the log file
for a duration of 60s after OSD has come up successfully.
Fixes: https://tracker.ceph.com/issues/67282
Signed-off-by: Shraddha Agrawal <shraddha.agrawal000@gmail.com>
Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com>
|
|\ \ \
| |_|/
|/| |
| | |
| | |
| | | |
osd: add clear_shards_repaired command
Reviewed-by: Ronen Friedman <rfriedma@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This command will allow us to clear the OSD_TOO_MANY_REPAIRS alert
by setting the shard repair count to 0. This will help in cases where
the alert was a false positive, or a condition that has since cleared
at the disk level. Often, zeroing out the repair count is
better than muting the alert or restarting the OSD.
Fixes: https://tracker.ceph.com/issues/54182
Co-authored-by: David Zafman <dzafman@redhat.com>
Signed-off-by: Daniel Radjenovic <dradjenovic@digitalocean.com>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | | |
osd/scrub: separate shallow vs deep errors storage
Reviewed-by: Samuel Just <sjust@redhat.com>
|
| | |/
| |/|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The ScrubStore is now comprised of two separate data
structures, one for shallow errors and one for deep.
A new test is added to verify the main objective of that
design change: shallow scrubs should not overwrite deep
scrub data.
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
|
|/ /
| |
| |
| |
| |
| |
| |
| |
| |
| | |
That test does no longer match the actual requirements and
implementation of scrubbing.
It was already deactivated in
https://github.com/ceph/ceph/pull/59590. Here - it is
fully removed, mainly for the sake of backporting.
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
To prevent test timeouts.
Also - remove a failing assertion on a specific 'pg query'
output, as it is not central to the test.
Fixes: https://tracker.ceph.com/issues/61385
Fixes: https://tracker.ceph.com/issues/64346
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
|
|\ \
| | |
| | |
| | |
| | |
| | | |
test/scrub: only instruct clean PGs to scrub
Reviewed-by: Laura Flores <lflores@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Recent changes to the scrub scheduling mechanism, especially
regarding the 'must_scrub' flag, cause operator scrub commands
issued on a not-clean PG to be rejected - and forgotten.
This commit changes the tests to issue a scrub command only
after the target PG is clean.
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
|
| | |
| | |
| | |
| | | |
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | | |
shortening the delay times following various scrub events.
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
|
|\ \ \
| |/ /
|/| |
| | |
| | | |
qa: drop XMLSTARLET variable, use xmlstarlet directly
Reviewed-by: Ramana Raja <rraja@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The variable was added in commit 9b6b7c35d03f ("Handle
differently-named xmlstarlet binary for *suse") but this
compatibility business is long outdated:
Mon Oct 13 08:52:37 UTC 2014 - toms@opensuse.org
- SPEC file changes
- Added link from /usr/bin/xml to /usr/bin/xmlstarlet as other
distributions do the same
- Did the same for the manpage
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
To match the modified log message in
OsdScrub::restrictions_on_scrubbing().
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The scrub scheduler no longer "upgrades" shallow scrubs into
deep ones on error, so the tests that check this functionality
are no longer valid.
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The conditions for auto-repair scrubs should have been changed
when need_auto lost some of its setters.
Also fix the rescheduling of repair scrubs
when the last scrub ended with errors.
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
|
|/ /
| |
| |
| |
| |
| |
| |
| | |
Disabling osd-scrub-test.sh::TEST_scrub_extended_sleep,
as the test is no longer valid (updated code no longer
produces the same logs or the same behavior).
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
|
|\ \
| | |
| | |
| | |
| | | |
qa/standalone: bugfix for latecy repair after scrub
Reviewed-by: Ronen Friedman <rfriedma@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When pg repair is called manully, a deep-scrub will be executed
firstly, and requeue DoRecovery() if there are inconsistent objects.
But in repair() of ceph-helpers.sh, it use scrub_stamp to determine
repair completing time. This will leads to the repair is not
completed before another test case.
Signed-off-by: Mingyuan Liang <liangmingyuan@baidu.com>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | | |
qa/standalone/scrub: fix the searched-for text for snaps decode errors
Reviewed-by: Adam Kupczyk <akupczyk@ibm.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
to match the updated error log, as modified by commit ebd8283
Fixes: https://tracker.ceph.com/issues/67228
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
|
|\ \ \ \
| |/ / /
|/| | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
NitzanMordhai/wip-nitzan-osd-recovery-standalone-test-wait-for-too-full
Test: osd-recovery-space.sh extends the wait time for "recovery toofull"
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
|
| |/ /
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The osd-recovery-space test involves writing objects and expecting to receive
the "toofull" flag.
If we don't wait long enough, we might check the "toofull" flag before all objects
have completed writing, and the "toofull" status hasn't been activated yet.
The change will extend the waiting time and will also incorporate additional
checks for the return code from the status wait.
Fixes: https://tracker.ceph.com/issues/44510
Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | | |
osd/scrub: no shared scrub-job ownership between PGs and the scrub queue
Reviewed-by: Samuel Just <sjust@redhat.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
following changes in scrub code
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
|
|\ \ \ \
| |/ / /
|/| | |
| | | |
| | | |
| | | |
| | | |
| | | | |
NitzanMordhai/wip-nitzan-standalone-pg-split-merge-daemon-commands
test: ceph daemon command with asok path
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
|
| |/ /
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
pg-split-merge using ceph daemon command to check merge.
but it doesn't use asok path, which causes the check not to
return the correct output. change the command to use asok path.
Fixes: https://tracker.ceph.com/issues/65737
Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
|
|\ \ \
| | | |
| | | | |
qa/standalone/mon: Fix mkfs test & TEST_LOG failures
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Problem:
In TEST_journald_cluster_log_level:
we are currently facing insufficient permissions
issues with `journalctl` command
Solution:
`super user do` on journalctl commands
Fixes: https://tracker.ceph.com/issues/63784
Signed-off-by: Kamoltat <ksirivad@redhat.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Problem:
I don't know why we are greping and evaluating a
INF log before setting mon_cluster_log_level `info`
Solution:
set mon_cluster_log_level to info
before evaluation.
Fixes: https://tracker.ceph.com/issues/63784
Signed-off-by: Kamoltat <ksirivad@redhat.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Fixes: https://tracker.ceph.com/issues/63784
Signed-off-by: Kamoltat <ksirivad@redhat.com>
|
|\ \ \ \
| |/ / /
|/| | |
| | | |
| | | | |
osd: EC Partial Stripe Reads (Retry of #23138 and #52746)
Reviewed-by: Samuel Just <sjust@redhat.com>
|
| |/ /
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This is supposed to fix:
```
2024-05-15T01:19:55.945 INFO:tasks.workunit.client.0.smithi190.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-eio.sh:243: rados_get_data_bad_size: rados_get td/test-erasure-
eio pool-jerasure obj-size-81362-1-10 fail
2024-05-15T01:19:55.946 INFO:tasks.workunit.client.0.smithi190.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-eio.sh:104: rados_get: local dir=td/test-erasure-eio
2024-05-15T01:19:55.946 INFO:tasks.workunit.client.0.smithi190.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-eio.sh:105: rados_get: local poolname=pool-jerasure
2024-05-15T01:19:55.946 INFO:tasks.workunit.client.0.smithi190.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-eio.sh:106: rados_get: local objname=obj-size-81362-1-10
2024-05-15T01:19:55.946 INFO:tasks.workunit.client.0.smithi190.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-eio.sh:107: rados_get: local expect=fail
2024-05-15T01:19:55.946 INFO:tasks.workunit.client.0.smithi190.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-eio.sh:112: rados_get: '[' fail = fail ']'
2024-05-15T01:19:55.946 INFO:tasks.workunit.client.0.smithi190.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-eio.sh:114: rados_get: rados --pool pool-jerasure get obj-size-
81362-1-10 td/test-erasure-eio/COPY
2024-05-15T01:19:56.175 INFO:tasks.workunit.client.0.smithi190.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-eio.sh:115: rados_get: return
2024-05-15T01:19:56.175 INFO:tasks.workunit.client.0.smithi190.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-eio.sh:243: rados_get_data_bad_size: return 1
2024-05-15T01:19:56.175 INFO:tasks.workunit.client.0.smithi190.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-eio.sh:323: TEST_rados_get_bad_size_shard_1: return 1
2024-05-15T01:19:56.175 INFO:tasks.workunit.client.0.smithi190.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-eio.sh:41: run: return 1
```
(https://pulpito.ceph.com/rzarzynski-2024-05-14_22:09:16-rados-wip-osd-ec-partial-reads-distro-default-smithi/7706517/)
The failed scenario was exercising a behavior that got truly
changed by introduction of partial reads. Before, regardless
of read size, OSD was always reading and checking for errors
entire stripe.
In this test first 4 KB has been read from an EC pool with
m=2 k=1 while errors had been injected to shards 1 and 2.
Handling the first 4 KB doesn't really require the damaged
shards but, because of the full-stripe alignment, EIO was
returned. This is not anymore.
Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
|
|\ \ \
| |/ /
|/| |
| | |
| | | |
mon, qa: suites override ec profiles with --yes_i_really_mean_it; monitors accept that
Reviewed-by: Laura Flores <lflores@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This fixes a fallout from 629ba7bd349d48cdaa6d094751e7cfce651ba2bc.
The problem has been nailed down by Laura Flores.
Fixes: https://tracker.ceph.com/issues/65183
Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
|
|/ /
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We do not control the verbosity of the LogEntry
which is getting logged to stderr, graylog and
journald. This causes excessive flooding of logs
to /var/log, making a filesystem to fill up quickly.
Also we have different config variables namely
mon_cluster_log_file_level and mon_cluster_log_to_syslog_level
to control verbosity at cluster log file and
syslog level respectively. Add a generic cluster log
level config variable which controls cluster log
verbosity for all external entities.
Additionally, this patch addresses the regression of
`mon_cluster_log_file_level` option which doesn't take effect
because of code refactoring of LogMonitor::update_from_paxos
(commit : 7c84e06).
Fixes: https://tracker.ceph.com/issues/57061
Fixes: https://tracker.ceph.com/issues/57049
Signed-off-by: Prashant D <pdhange@redhat.com>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
jianwei1216/fix_osd_pg_stat_report_interval_max_cmain
fix: resolve inconsistent judgment of osd_pg_stat_report_interval_max
Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Matan Breizman <Matan.Brz@gmail.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
osd_pg_stat_report_max was previously used as either a max time in seconds
or a max number of epochs. Instead, separate into two configs and adjust
PeeringState::prepare_stats_for_publish to check both.
Additionally, this commit removes a superfluous check in
PeeringState::Active::react(const AdvMap&) and calls publish_stats_to_osd
unconditionally as with other callers in PeeringState.
Fixes: https://tracker.ceph.com/issues/63520
Signed-off-by: zhangjianwei2 <zhangjianwei2@cmss.chinamobile.com>
|