summaryrefslogtreecommitdiffstats
path: root/src/mds (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Merge PR #60996 into mainVenky Shankar3 days4-4/+10
|\ | | | | | | | | | | | | | | | | * refs/pull/60996/head: mds/SimpleLock: add is_xlocked_by() mds/SimpleLock: add has_xlock_by() Reviewed-by: Patrick Donnelly <pdonnell@ibm.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
| * mds/SimpleLock: add is_xlocked_by()Max Kellermann2024-12-093-2/+5
| | | | | | | | | | | | This eliminates more reference counter manipulations. Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
| * mds/SimpleLock: add has_xlock_by()Max Kellermann2024-12-092-2/+5
| | | | | | | | | | | | | | | | This replaces get_xlock_by() in cases where only a not-nullptr check is needed; this eliminates costly implicit reference counter manipulations. Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
* | Merge PR #61250 into mainVenky Shankar3 days4-5/+16
|\ \ | | | | | | | | | | | | | | | | | | | | | * refs/pull/61250/head: mds: avoid acquiring the wrlock twice for a single request mds: add 'mds_allow_async_dirops' opt to allow/disable async dirop Reviewed-by: Venky Shankar <vshankar@redhat.com>
| * | mds: avoid acquiring the wrlock twice for a single requestXiubo Li2025-01-072-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case the current request has lock cache attached and then the lock cache must have already acquired the wrlock of filelock. So currently the path_traverse() will acquire the wrlock twice and possibly caused deadlock by itself. Fixes: https://tracker.ceph.com/issues/65607 Signed-off-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Sunnatillo <sunnat.samadov@est.tech>
| * | mds: add 'mds_allow_async_dirops' opt to allow/disable async diropXiubo Li2025-01-073-4/+9
| | | | | | | | | | | | | | | | | | | | | The lock cache is buggy and we need to disable it as a workaround. Fixes: https://tracker.ceph.com/issues/65607 Signed-off-by: Xiubo Li <xiubli@redhat.com>
* | | Merge pull request #60889 from anoopcs9/fix-invalid-access-mdsMilind Changire14 days1-2/+2
|\ \ \ | |/ / |/| | mds: Fix invalid access of mdr->dn[0].back()
| * | mds: Fix invalid access of mdr->dn[0].back()Anoop C S2024-11-291-2/+2
| |/ | | | | | | | | | | | | | | See https://github.com/ceph/ceph/pull/31534 for a similar fix. Fixes: https://tracker.ceph.com/issues/69059 Signed-off-by: Anoop C S <anoopcs@cryptolab.net>
* | Merge PR #55616 into mainVenky Shankar2024-12-275-1/+78
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * refs/pull/55616/head: PendingReleaseNotes: add note for replay completion warning qa: test to verify `MDS_ESTIMATED_REPLAY_TIME` warning doc: add a note for `MDS_ESTIMATED_REPLAY_TIME` MDS warning mds: emit warning for estinated replay time Reviewed-by: Patrick Donnelly <pdonnell@ibm.com> Reviewed-by: Milind Changire <mchangir@redhat.com>
| * | mds: emit warning for estinated replay timeVenky Shankar2024-11-295-1/+78
| |/ | | | | | | | | | | | | | | | | If replay runs more than 30 seconds, emit a warning with estimated replay completion time. Fixes: https://tracker.ceph.com/issues/61863 Signed-off-by: Manish M Yathnalli <myathnal@redhat.com> Signed-off-by: Venky Shankar <vshankar@redhat.com>
* | Merge PR #60653 into mainVenky Shankar2024-12-262-14/+0
|\ \ | |/ |/| | | | | | | | | | | | | * refs/pull/60653/head: mds: do not process client metrics message with fast dispatch Reviewed-by: Patrick Donnelly <pdonnell@ibm.com> Reviewed-by: Jos Collin <jcollin@redhat.com> Reviewed-by: Dhairya Parmar <dparmar@redhat.com>
| * mds: do not process client metrics message with fast dispatchVenky Shankar2024-11-072-14/+0
| | | | | | | | | | | | | | | | So that the MDS can transition faster to up:active without spending much time in processing metrics message during reconnect. Fixes: http://tracker.ceph.com/issues/68865 Signed-off-by: Venky Shankar <vshankar@redhat.com>
* | mds: account for header size during omap commitVenky Shankar2024-11-181-0/+4
| | | | | | | | | | | | | | | | | | fnode_t is set in the omap header during directory commit operation which isn't accounted when tracking operation size. Fixes: http://tracker.ceph.com/issues/67597 Signed-off-by: Venky Shankar <vshankar@redhat.com>
* | mds: client is evicted when an export subtree task is interruptedZhansong Gao2024-11-134-6/+40
| | | | | | | | | | | | | | | | | | | | | | The importer will force open some sessions provided by the exporter but the client does not know about the new sessions until the exporter notifies it, and the notifications cannot be sent if the exporter is interrupted. The client does not renew the sessions regularly that it does not know about, so the client will be evicted by the importer after `session_autoclose` seconds (300 seconds by default). The sessions that are forced opened in the importer need to be closed when the import process is reversed. Signed-off-by: Zhansong Gao <zhsgao@hotmail.com>
* | mds: session in the importing state cannot be cleared if an export subtree ↵Zhansong Gao2024-11-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | task is interrupted while the state of importer is acking The related sessions in the importer are in the importing state(`Session::is_importing` return true) when the state of importer is `acking`, `Migrator::import_reverse` called by `MDCache::handle_resolve` should reverse the process to clear the importing state if the exporter restarts at this time, but it doesn't do that actually because of its bug. And it will cause these sessions to not be cleared when the client is unmounted(evicted or timeout) until the mds is restarted. The bug in `import_reverse` is that it contains the code to handle state `IMPORT_ACKING` but it will never be executed because the state is modified to `IMPORT_ABORTING` at the beginning. Move `stat.state = IMPORT_ABORTING` to the end of import_reverse so that it can handle the state `IMPORT_ACKING`. Fixes: https://tracker.ceph.com/issues/61459 Signed-off-by: Zhansong Gao <zhsgao@hotmail.com>
* | mds: the assert should be before the journal entry submit otherwise it's racyZhansong Gao2024-11-131-1/+1
| | | | | | | | Signed-off-by: Zhansong Gao <zhsgao@hotmail.com>
* | mds: add `importing_count` to session dumpZhansong Gao2024-11-131-0/+1
| | | | | | | | Signed-off-by: Zhansong Gao <zhsgao@hotmail.com>
* | Merge PR #60464 into mainPatrick Donnelly2024-11-139-11/+16
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | * refs/pull/60464/head: mds: add or update MDS thread names log: cache recent threads up to a day common: cache pthread names log: concatenate thread names and print once per thread Reviewed-by: Milind Changire <mchangir@redhat.com>
| * | mds: add or update MDS thread namesPatrick Donnelly2024-10-259-11/+16
| | | | | | | | | | | | | | | | | | To be consistent and sensical. Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
| * | common: cache pthread namesPatrick Donnelly2024-10-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This provides common ceph entrypoints for the pthread_[gs]name functions which will also cache a thread_local copy. This also removes the pthread_t parameter which precipitated the bug i50743. Obviously, the overall goal here is to avoid system calls. See-also: https://tracker.ceph.com/issues/50743 Fixes: 0be8d01c9ddde0d7d24edd34dc75f6cfc861b5ba Fixes: https://tracker.ceph.com/issues/68691 Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
* | | Merge PR #60381 into mainPatrick Donnelly2024-11-133-18/+31
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * refs/pull/60381/head: doc: remove refrences to `mds_log_major_segment_event_ratio` mds: start a new major segment after reaching minor segment threshold mds: make parts of mdlog reusable to be used by beacon Reviewed-by: Anthony D Atri <anthony.datri@gmail.com> Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
| * | | mds: start a new major segment after reaching minor segment thresholdVenky Shankar2024-10-232-11/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Credit goes to Patrick (@batrick) for identifying this. When there are huge number of subtree exports (such as done in export thrashing test), the MDS would log an EExport event. The EExport event is relatively large in size. This causes the MDS to log new minor log segments frequently. Moreover, the MDS logs a major segment (boundary) after a certain number of events have been logged. This casues large number of (minor) events to get build up and cause delays in trimming expired segments, since journal expire position is updated on segment boundaries. To mitigate this issue, the MDS now starts a major segment after a configured number of minor segments have been logged. This threshold is configurable by adjusting `mds_log_minor_segments_per_major_segment` MDS config (defaults to 16). Fixes: https://tracker.ceph.com/issues/66948 Signed-off-by: Venky Shankar <vshankar@redhat.com>
| * | | mds: make parts of mdlog reusable to be used by beaconVenky Shankar2024-10-233-7/+18
| |/ / | | | | | | | | | Signed-off-by: Venky Shankar <vshankar@redhat.com>
* | | Merge PR #60325 into mainPatrick Donnelly2024-11-131-0/+1
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | * refs/pull/60325/head: mds/Beacon: wake up the thread in shutdown() Reviewed-by: Patrick Donnelly <pdonnell@ibm.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
| * | | mds/Beacon: wake up the thread in shutdown()Max Kellermann2024-10-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This eliminates the `wait_for()` delay and speeds up MDS shutdown. Fixes: https://tracker.ceph.com/issues/68759 Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
* | | | Merge PR #60283 into mainPatrick Donnelly2024-11-133-8/+10
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * refs/pull/60283/head: mds: add issue_seq to all cap messages include/ceph_fs: correct ceph_mds_cap_peer field name include/ceph_fs: correct ceph_mds_cap_item field name messages/MClientCaps: use correct ceph_seq_t for cap sequence types messages/MClientCaps: dump issue_seq for debugging mds: remove dead code Reviewed-by: Venky Shankar <vshankar@redhat.com>
| * | | | mds: add issue_seq to all cap messagesPatrick Donnelly2024-10-132-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Right now only the clients tell the MDS what they believe the issue_seq to be. The clients are expected to figure out issue_seq updates at Fixes: https://tracker.ceph.com/issues/68515 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
| * | | | include/ceph_fs: correct ceph_mds_cap_item field namePatrick Donnelly2024-10-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Originally, the last_sent sequence from the MDS was sent by the client during bulk cap release but it was shortly after changed to the last_issue which is the sequence number that the cap was originally "issued" by the MDS rank (which may be updated after import of caps). Fixes: 6208f57f487ac170df24a9018f1cc87a5ac8b4b3 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
| * | | | mds: remove dead codePatrick Donnelly2024-10-101-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A const getter already exists. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
* | | | | Merge PR #60226 into mainPatrick Donnelly2024-11-131-24/+24
|\ \ \ \ \ | |_|_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | * refs/pull/60226/head: mds/QuiesceDbEncoding: add `inline` to work around linker error Reviewed-by: Rishabh Dave <ridave@redhat.com> Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
| * | | | mds/QuiesceDbEncoding: add `inline` to work around linker errorMax Kellermann2024-10-101-24/+24
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes duplicate symbols because every source that includes this header contains a copy: mold: error: duplicate symbol: lib/libceph-common.a(Message.cc.o): lib/libmds.a(MDSRankQuiesce.cc.o): encode(QuiesceDbVersion const&, ceph::buffer::v15_2_0::list&, unsigned long) mold: error: duplicate symbol: lib/libceph-common.a(Message.cc.o): lib/libmds.a(MDSRankQuiesce.cc.o): encode(QuiesceState const&, ceph::buffer::v15_2_0::list&, unsigned long) (and many more) Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
* | | | Merge PR #60236 into mainVenky Shankar2024-11-041-1/+4
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * refs/pull/60236/head: MDS/CDir: return as early as possible from CDir::should_split_fast() Reviewed-by: Patrick Donnelly <pdonnell@ibm.com> Reviewed-by: Dhairya Parmar <dparmar@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
| * | | | MDS/CDir: return as early as possible from CDir::should_split_fast()Max Kellermann2024-10-181-1/+4
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All we want to know is whether we're above the `fast_limit`; if we're above that, we don't need to know how many exactly. By returning early instead of iterating over all entries, a lot of CPU time can be saved. In a microbenchmark where `fio` was used to create thousands of files, the CPU usage of `CDir::should_split_fast()` went from 6% to less than 1% in the `perf report`. Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
* | | | Merge pull request #50105 from zhsgao/mds_export_stateVenky Shankar2024-10-306-21/+143
|\ \ \ \ | |_|/ / |/| | | | | | | | | | | mds: add an asok command to dump export states Reviewed-by: Venky Shankar <vshankar@redhat.com>
| * | | mds: add an asok command to dump export statesZhansong Gao2024-10-086-21/+143
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Task to export subtree may be blocked, use this command to find out what's going on. Fixes: https://tracker.ceph.com/issues/58835 Signed-off-by: Zhansong Gao <zhsgao@hotmail.com>
* | | | mds: remove obsolete commentsPatrick Donnelly2024-10-231-9/+0
| |_|/ |/| | | | | | | | | | | | | | An mdr is a smart pointer now. The reference is now codified. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
* | | Merge PR #60214 into mainPatrick Donnelly2024-10-223-136/+122
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * refs/pull/60214/head: mds/MDCache: use `auto` mds/CDir: use the erase() return value mds/MDCache: remove unnecessary empty() check mds/MDCache: use the erase() return value mds/MDCache: pass iterator by value Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
| * | | mds/MDCache: use `auto`Max Kellermann2024-10-091-99/+97
| | | | | | | | | | | | | | | | Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
| * | | mds/CDir: use the erase() return valueMax Kellermann2024-10-091-5/+2
| | | | | | | | | | | | | | | | Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
| * | | mds/MDCache: remove unnecessary empty() checkMax Kellermann2024-10-091-7/+5
| | | | | | | | | | | | | | | | Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
| * | | mds/MDCache: use the erase() return valueMax Kellermann2024-10-092-26/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When erasing items from a linked list while iterating it, it is good practice (and safer and sometimes faster) to use the erase() return value instead of incrementing the iterator. Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
| * | | mds/MDCache: pass iterator by valueMax Kellermann2024-10-092-2/+2
| | |/ | |/| | | | | | | | | | | | | | | | An iterator is just a pointer, and passing it by reference means we pass a pointer to a pointer, which is useless overhead. Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
* / | mds/Beacon: set a thread nameMax Kellermann2024-10-151-0/+2
|/ / | | | | | | Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
* | Merge PR #60032 into mainPatrick Donnelly2024-10-091-2/+5
|\ \ | | | | | | | | | | | | | | | | | | | | | * refs/pull/60032/head: mds: do not dump empty bufptr Reviewed-by: Jos Collin <jcollin@redhat.com> Reviewed-by: Dhairya Parmar <dparmar@redhat.com>
| * | mds: do not dump empty bufptrPatrick Donnelly2024-09-271-2/+5
| |/ | | | | | | | | Fixes: https://tracker.ceph.com/issues/68243 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
* | Merge PR #59874 into mainVenky Shankar2024-09-301-1/+1
|\ \ | | | | | | | | | | | | | | | | | | * refs/pull/59874/head: mds: invalid id for client eviction is to be treated as success Reviewed-by: Neeraj Pratap Singh <neesingh@redhat.com>
| * | mds: invalid id for client eviction is to be treated as successVenky Shankar2024-09-231-1/+1
| | | | | | | | | | | | | | | | | | Introduced-by: 0ef5941a2e79 Fixes: http://tracker.ceph.com/issues/68132 Signed-off-by: Venky Shankar <vshankar@redhat.com>
* | | Merge PR #52623 into mainVenky Shankar2024-09-3013-12/+135
|\ \ \ | |_|/ |/| | | | | | | | | | | | | | * refs/pull/52623/head: ceph-dencoder: MDS - Add missing types Reviewed-by: Venky Shankar <vshankar@redhat.com>
| * | ceph-dencoder: MDS - Add missing typesNitzanMordhai2024-04-1013-12/+135
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, ceph-dencoder lacks certain mds types, preventing us from accurately checking the ceph corpus for encode-decode mismatches. This pull request aims to address this issue by adding the missing types to ceph-dencoder. To successfully incorporate these types into ceph-dencoder, we need to introduce the necessary `dump` and `generate_test_instances` functions that was missing in some types. These functions are essential for proper encode and decode of the added types. This PR will enhance the functionality of ceph-dencoder by including the missing types, enabling a comprehensive analysis of encode-decode consistency. With the addition of these types, we can ensure the robustness and correctness of the ceph corpus. This update will significantly contribute to improving the overall reliability and accuracy of ceph-dencoder. It allows for a more comprehensive assessment of the encode-decode behavior, leading to enhanced data integrity and stability within the ceph ecosystem. Fixes: https://tracker.ceph.com/issues/61788 Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
* | | Merge PR #58936 into mainPatrick Donnelly2024-09-268-129/+150
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * refs/pull/58936/head: mds: do not duplicate journaler write heads mds: use Journaler getters osdc: properly acquire locks for getters osdc: add print method for Journaler::Header mds: do not trim segments after open file table commit mds: delay expiry if LogSegment is ahead of committed oft seq mds: do not write journal head twice on trim mds: simplify and explain expiry finisher ctx mds: add mds_lock asserts for journal flush mds: skip second wait_for_safe mds: trim only to the LogSegment created for flush mds: allow passing explicit seq to trim to mds: quiet unhelpful debug message mds: add C_IO_Wrapper completion debugging mds: add dout for new segment Reviewed-by: Venky Shankar <vshankar@redhat.com>