summaryrefslogtreecommitdiffstats
path: root/qa/standalone (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Merge pull request #60363 from aclamk/wip-aclamk-fix-bluefs-bdev-expandAdam Kupczyk27 hours1-1/+1
|\ | | | | os/bluestore: Create additional bdev labels when expanding block device.
| * qa/standalone/bluefs: Fix CBT bluefs-bdev-expandAdam Kupczyk2024-10-161-1/+1
| | | | | | | | | | | | | | | | | | There was a problem when expansion of 'block' device crossed location of bdev label copy. The extra label that did not exist before and now exists was not initialized. This makes test to fail. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
* | qa/scrub: more delay when waiting for noscrub to take effectRonen Friedman4 days1-4/+3
| | | | | | | | Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
* | qa/scrub: change 'bin/ceph' to 'ceph'Ronen Friedman4 days3-39/+39
| | | | | | | | Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
* | qa/standalone/scrub: osd-scrub-test.sh - test operator overridesRonen Friedman2024-12-311-0/+231
| | | | | | | | | | | | | | verify that an operator scrub aborts a reserving scrub of the same PG. Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
* | qa/standalone/scrub: add build_pg_dicts()Ronen Friedman2024-12-311-2/+106
| | | | | | | | | | | | | | | | | | | | | | | | | | | | a helper function that builds bash dictionaries: pg to acting set, pg to primary & pg to pool. Also added are two helper functions that make use of the dictionaries: count_common_active() to count the number of common OSDs in the acting set of two PGs, and find_disjoint_but_primary() to find a PG that is disjoint from the first PG, apart from possibly having the same primary OSD. Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
* | qa/standalone/scrub: fix TEST_periodic_scrub_replicatedRonen Friedman2024-12-171-1/+1
| | | | | | | | | | | | | | | | | | | | A bogus change introduced as part of PR#54363 (commit fbb7d73) changed multiple 'scrub' commands to 'scheduled-scrub'. In this one instance - that was wrong. Fixes: https://tracker.ceph.com/issues/69276 Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
* | Merge pull request #60636 from mohit84/issue_68585Yuri Weinstein2024-11-221-1/+8
|\ \ | | | | | | | | | | | | | | | | | | TEST_backfill_grow fails after finding "num_bytes mismatch" in osd log Reviewed-by: Ronen Friedman <rfriedma@redhat.com> Reviewed-by: Samuel Just <sjust@redhat.com>
| * | TEST_backfill_grow fails after finding "num_bytes mismatch" in osd logMohit Agrawal2024-11-061-1/+8
| |/ | | | | | | | | | | | | | | Need to ignore "num_bytes mismatch" messages during throw backfill/recovery progress. Fixes: https://tracker.ceph.com/issues/68585 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* | Merge pull request #59524 from liangmingyuanneo/wip-standalone-test-pg-repairYuri Weinstein2024-11-221-4/+3
|\ \ | | | | | | | | | | | | | | | qa/standalone: bugfix for wait_for_scrub Reviewed-by: Ronen Friedman <rfriedma@redhat.com>
| * | qa/standalone: bugfix for wait_for_scrubliangmingyuan2024-09-271-4/+3
| | | | | | | | | | | | Signed-off-by: Mingyuan Liang <liangmingyuan@baidu.com>
* | | Merge pull request #60071 from shraddhaag/fix-mon-cluster-log-testYuri Weinstein2024-11-061-4/+12
|\ \ \ | | | | | | | | | | | | | | | | | | | | qa/standalone/mon/mon_cluster_log.sh: retry check for log line Reviewed-by: Nitzan Mordechai <nmordech@redhat.com>
| * | | qa/standalone/mon/mon_cluster_log.sh: retry check for log lineShraddha Agrawal2024-10-101-4/+12
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Issue: The test was failing as we were checking for the osd boot log before it was actually emitted in the log file. Solution: We retry checking for the desired string in the log file for a duration of 60s after OSD has come up successfully. Fixes: https://tracker.ceph.com/issues/67282 Signed-off-by: Shraddha Agrawal <shraddha.agrawal000@gmail.com> Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com>
* | | Merge pull request #54954 from diffs/mainYuri Weinstein2024-10-301-1/+13
|\ \ \ | |_|/ |/| | | | | | | | | | | osd: add clear_shards_repaired command Reviewed-by: Ronen Friedman <rfriedma@redhat.com>
| * | osd: add clear_shards_repaired commandDanWritesCode2024-03-041-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This command will allow us to clear the OSD_TOO_MANY_REPAIRS alert by setting the shard repair count to 0. This will help in cases where the alert was a false positive, or a condition that has since cleared at the disk level. Often, zeroing out the repair count is better than muting the alert or restarting the OSD. Fixes: https://tracker.ceph.com/issues/54182 Co-authored-by: David Zafman <dzafman@redhat.com> Signed-off-by: Daniel Radjenovic <dradjenovic@digitalocean.com>
* | | Merge pull request #59942 from ronen-fr/wip-rf-store2-stepsRonen Friedman2024-10-141-1/+248
|\ \ \ | | | | | | | | | | | | | | | | | | | | osd/scrub: separate shallow vs deep errors storage Reviewed-by: Samuel Just <sjust@redhat.com>
| * | | qa/standalone/scrub: test new ScrubStore implementationRonen Friedman2024-10-101-1/+248
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | The ScrubStore is now comprised of two separate data structures, one for shallow errors and one for deep. A new test is added to verify the main objective of that design change: shallow scrubs should not overwrite deep scrub data. Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
* / | qa/standalone/scrub: remove TEST_recovery_scrub_2Ronen Friedman2024-10-101-140/+0
|/ / | | | | | | | | | | | | | | | | | | That test does no longer match the actual requirements and implementation of scrubbing. It was already deactivated in https://github.com/ceph/ceph/pull/59590. Here - it is fully removed, mainly for the sake of backporting. Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
* | qa/standalone/scrub: increase status updates frequencyRonen Friedman2024-09-241-8/+9
| | | | | | | | | | | | | | | | | | | | | | To prevent test timeouts. Also - remove a failing assertion on a specific 'pg query' output, as it is not central to the test. Fixes: https://tracker.ceph.com/issues/61385 Fixes: https://tracker.ceph.com/issues/64346 Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
* | Merge pull request #59437 from ronen-fr/wip-rf-early-commandRonen Friedman2024-09-221-6/+53
|\ \ | | | | | | | | | | | | | | | test/scrub: only instruct clean PGs to scrub Reviewed-by: Laura Flores <lflores@redhat.com>
| * | test/scrub: only instruct clean PGs to scrubRonen Friedman2024-09-101-6/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recent changes to the scrub scheduling mechanism, especially regarding the 'must_scrub' flag, cause operator scrub commands issued on a not-clean PG to be rejected - and forgotten. This commit changes the tests to issue a scrub command only after the target PG is clean. Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
* | | test/osd: fix 'recovery scrub' standalone testRonen Friedman2024-09-041-3/+16
| | | | | | | | | | | | Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
* | | test/osd/scrub: set new scrub-related config options to test valuesRonen Friedman2024-09-041-0/+3
| | | | | | | | | | | | | | | | | | shortening the delay times following various scrub events. Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
* | | Merge pull request #59433 from idryomov/wip-drop-xmlstarlet-variableIlya Dryomov2024-08-272-10/+1
|\ \ \ | |/ / |/| | | | | | | | qa: drop XMLSTARLET variable, use xmlstarlet directly Reviewed-by: Ramana Raja <rraja@redhat.com>
| * | qa: drop XMLSTARLET variable, use xmlstarlet directlyIlya Dryomov2024-08-252-10/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The variable was added in commit 9b6b7c35d03f ("Handle differently-named xmlstarlet binary for *suse") but this compatibility business is long outdated: Mon Oct 13 08:52:37 UTC 2014 - toms@opensuse.org - SPEC file changes - Added link from /usr/bin/xml to /usr/bin/xmlstarlet as other distributions do the same - Did the same for the manpage Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* | | test/osd/scrub: fix searched-for log stringRonen Friedman2024-08-251-6/+6
| | | | | | | | | | | | | | | | | | | | | To match the modified log message in OsdScrub::restrictions_on_scrubbing(). Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
* | | test/osd/scrub: disable tests for deleted scrub functionalityRonen Friedman2024-08-251-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | The scrub scheduler no longer "upgrades" shallow scrubs into deep ones on error, so the tests that check this functionality are no longer valid. Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
* | | osd/scrub: fix the conditions for auto-repair scrubsRonen Friedman2024-08-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The conditions for auto-repair scrubs should have been changed when need_auto lost some of its setters. Also fix the rescheduling of repair scrubs when the last scrub ended with errors. Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
* | | qa/standalone/scrub: disable scrub_extended_sleep testRonen Friedman2024-08-251-1/+3
|/ / | | | | | | | | | | | | | | Disabling osd-scrub-test.sh::TEST_scrub_extended_sleep, as the test is no longer valid (updated code no longer produces the same logs or the same behavior). Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
* | Merge pull request #57888 from liangmingyuanneo/wip-standalone-test-pg-repairRonen Friedman2024-08-141-0/+1
|\ \ | | | | | | | | | | | | qa/standalone: bugfix for latecy repair after scrub Reviewed-by: Ronen Friedman <rfriedma@redhat.com>
| * | qa/standalone: bugfix for latecy repair after scrubliangmingyuan2024-06-051-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When pg repair is called manully, a deep-scrub will be executed firstly, and requeue DoRecovery() if there are inconsistent objects. But in repair() of ceph-helpers.sh, it use scrub_stamp to determine repair completing time. This will leads to the repair is not completed before another test case. Signed-off-by: Mingyuan Liang <liangmingyuan@baidu.com>
* | | Merge pull request #58931 from ronen-fr/wip-rf-repair-decodeRonen Friedman2024-07-311-2/+2
|\ \ \ | | | | | | | | | | | | | | | | | | | | qa/standalone/scrub: fix the searched-for text for snaps decode errors Reviewed-by: Adam Kupczyk <akupczyk@ibm.com>
| * | | qa/standalone/scrub: fix the searched-for text for snaps decode errorsRonen Friedman2024-07-301-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | to match the updated error log, as modified by commit ebd8283 Fixes: https://tracker.ceph.com/issues/67228 Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
* | | | Merge pull request #57193 from ↵Yuri Weinstein2024-07-301-1/+9
|\ \ \ \ | |/ / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NitzanMordhai/wip-nitzan-osd-recovery-standalone-test-wait-for-too-full Test: osd-recovery-space.sh extends the wait time for "recovery toofull" Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com> Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
| * | | Test: osd-recovery-space.sh extends the wait time for "recovery toofull".Nitzan Mordechai2024-05-011-1/+9
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The osd-recovery-space test involves writing objects and expecting to receive the "toofull" flag. If we don't wait long enough, we might check the "toofull" flag before all objects have completed writing, and the "toofull" status hasn't been activated yet. The change will extend the waiting time and will also incorporate additional checks for the return code from the status wait. Fixes: https://tracker.ceph.com/issues/44510 Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
* | | Merge pull request #58173 from ronen-fr/wip-rf-targets-j13Ronen Friedman2024-07-182-8/+17
|\ \ \ | | | | | | | | | | | | | | | | | | | | osd/scrub: no shared scrub-job ownership between PGs and the scrub queue Reviewed-by: Samuel Just <sjust@redhat.com>
| * | | qa/standalone/scrub: fix osd-scrub-test.shRonen Friedman2024-07-162-8/+17
| | | | | | | | | | | | | | | | | | | | | | | | following changes in scrub code Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
* | | | Merge pull request #57201 from ↵Yuri Weinstein2024-07-171-2/+2
|\ \ \ \ | |/ / / |/| | | | | | | | | | | | | | | | | | | | | | | NitzanMordhai/wip-nitzan-standalone-pg-split-merge-daemon-commands test: ceph daemon command with asok path Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
| * | | test: ceph daemon command with asok pathNitzan Mordechai2024-05-011-2/+2
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | pg-split-merge using ceph daemon command to check merge. but it doesn't use asok path, which causes the check not to return the correct output. change the command to use asok path. Fixes: https://tracker.ceph.com/issues/65737 Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
* | | Merge pull request #58324 from kamoltat/wip-ksirivad-fix-63784Laura Flores2024-07-122-8/+7
|\ \ \ | | | | | | | | qa/standalone/mon: Fix mkfs test & TEST_LOG failures
| * | | qa/standlone/mon/mon-cluster-log.sh: TEST_journald_cluster_log_levelKamoltat2024-07-011-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: In TEST_journald_cluster_log_level: we are currently facing insufficient permissions issues with `journalctl` command Solution: `super user do` on journalctl commands Fixes: https://tracker.ceph.com/issues/63784 Signed-off-by: Kamoltat <ksirivad@redhat.com>
| * | | qa/standalone/mon/mon-cluster-log: TEST_cluster_log_levelKamoltat2024-07-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: I don't know why we are greping and evaluating a INF log before setting mon_cluster_log_level `info` Solution: set mon_cluster_log_level to info before evaluation. Fixes: https://tracker.ceph.com/issues/63784 Signed-off-by: Kamoltat <ksirivad@redhat.com>
| * | | qa/standalone/mon/mkfs.sh: remove $MON_DIR correctlyKamoltat2024-06-271-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | Fixes: https://tracker.ceph.com/issues/63784 Signed-off-by: Kamoltat <ksirivad@redhat.com>
* | | | Merge pull request #55196 from rzarzynski/wip-osd-ec-partial-readsRadoslaw Zarzynski2024-07-101-6/+26
|\ \ \ \ | |/ / / |/| | | | | | | | | | | osd: EC Partial Stripe Reads (Retry of #23138 and #52746) Reviewed-by: Samuel Just <sjust@redhat.com>
| * | | qa: test-erasure-eio.sh honors the EC partial read supportRadosław Zarzyński2024-06-201-6/+26
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is supposed to fix: ``` 2024-05-15T01:19:55.945 INFO:tasks.workunit.client.0.smithi190.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-eio.sh:243: rados_get_data_bad_size: rados_get td/test-erasure- eio pool-jerasure obj-size-81362-1-10 fail 2024-05-15T01:19:55.946 INFO:tasks.workunit.client.0.smithi190.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-eio.sh:104: rados_get: local dir=td/test-erasure-eio 2024-05-15T01:19:55.946 INFO:tasks.workunit.client.0.smithi190.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-eio.sh:105: rados_get: local poolname=pool-jerasure 2024-05-15T01:19:55.946 INFO:tasks.workunit.client.0.smithi190.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-eio.sh:106: rados_get: local objname=obj-size-81362-1-10 2024-05-15T01:19:55.946 INFO:tasks.workunit.client.0.smithi190.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-eio.sh:107: rados_get: local expect=fail 2024-05-15T01:19:55.946 INFO:tasks.workunit.client.0.smithi190.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-eio.sh:112: rados_get: '[' fail = fail ']' 2024-05-15T01:19:55.946 INFO:tasks.workunit.client.0.smithi190.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-eio.sh:114: rados_get: rados --pool pool-jerasure get obj-size- 81362-1-10 td/test-erasure-eio/COPY 2024-05-15T01:19:56.175 INFO:tasks.workunit.client.0.smithi190.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-eio.sh:115: rados_get: return 2024-05-15T01:19:56.175 INFO:tasks.workunit.client.0.smithi190.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-eio.sh:243: rados_get_data_bad_size: return 1 2024-05-15T01:19:56.175 INFO:tasks.workunit.client.0.smithi190.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-eio.sh:323: TEST_rados_get_bad_size_shard_1: return 1 2024-05-15T01:19:56.175 INFO:tasks.workunit.client.0.smithi190.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-eio.sh:41: run: return 1 ``` (https://pulpito.ceph.com/rzarzynski-2024-05-14_22:09:16-rados-wip-osd-ec-partial-reads-distro-default-smithi/7706517/) The failed scenario was exercising a behavior that got truly changed by introduction of partial reads. Before, regardless of read size, OSD was always reading and checking for errors entire stripe. In this test first 4 KB has been read from an EC pool with m=2 k=1 while errors had been injected to shards 1 and 2. Handling the first 4 KB doesn't really require the damaged shards but, because of the full-stripe alignment, EIO was returned. This is not anymore. Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
* | | Merge pull request #56531 from rzarzynski/wip-bug-65183Yuri Weinstein2024-06-251-2/+3
|\ \ \ | |/ / |/| | | | | | | | mon, qa: suites override ec profiles with --yes_i_really_mean_it; monitors accept that Reviewed-by: Laura Flores <lflores@redhat.com>
| * | qa: tests override ec profiles with --yes-i-really-mean-itRadosław Zarzyński2024-04-171-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | This fixes a fallout from 629ba7bd349d48cdaa6d094751e7cfce651ba2bc. The problem has been nailed down by Laura Flores. Fixes: https://tracker.ceph.com/issues/65183 Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
* | | mon/LogMonitor: Use generic cluster log level configPrashant D2024-03-132-0/+221
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We do not control the verbosity of the LogEntry which is getting logged to stderr, graylog and journald. This causes excessive flooding of logs to /var/log, making a filesystem to fill up quickly. Also we have different config variables namely mon_cluster_log_file_level and mon_cluster_log_to_syslog_level to control verbosity at cluster log file and syslog level respectively. Add a generic cluster log level config variable which controls cluster log verbosity for all external entities. Additionally, this patch addresses the regression of `mon_cluster_log_file_level` option which doesn't take effect because of code refactoring of LogMonitor::update_from_paxos (commit : 7c84e06). Fixes: https://tracker.ceph.com/issues/57061 Fixes: https://tracker.ceph.com/issues/57049 Signed-off-by: Prashant D <pdhange@redhat.com>
* | Merge pull request #54491 from ↵Yuri Weinstein2024-01-242-6/+10
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | jianwei1216/fix_osd_pg_stat_report_interval_max_cmain fix: resolve inconsistent judgment of osd_pg_stat_report_interval_max Reviewed-by: Samuel Just <sjust@redhat.com> Reviewed-by: Matan Breizman <Matan.Brz@gmail.com>
| * | osd: distinguish between osd_pg_stat_report_max_(epoch|seconds)zhangjianwei22023-11-232-6/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | osd_pg_stat_report_max was previously used as either a max time in seconds or a max number of epochs. Instead, separate into two configs and adjust PeeringState::prepare_stats_for_publish to check both. Additionally, this commit removes a superfluous check in PeeringState::Active::react(const AdvMap&) and calls publish_stats_to_osd unconditionally as with other callers in PeeringState. Fixes: https://tracker.ceph.com/issues/63520 Signed-off-by: zhangjianwei2 <zhangjianwei2@cmss.chinamobile.com>