summaryrefslogtreecommitdiffstats
path: root/doc/cephfs/administration.rst
blob: 07646bff06786ac4c00292de86032be4fd4fe256 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
.. _cephfs-administration:

CephFS Administrative commands
==============================

File Systems
------------

.. note:: The names of the file systems, metadata pools, and data pools can
          only have characters in the set [a-zA-Z0-9\_-.].

These commands operate on the CephFS file systems in your Ceph cluster.
Note that by default only one file system is permitted: to enable
creation of multiple file systems use ``ceph fs flag set enable_multiple true``.

::

    ceph fs new <file system name> <metadata pool name> <data pool name>

This command creates a new file system. The file system name and metadata pool
name are self-explanatory. The specified data pool is the default data pool and
cannot be changed once set. Each file system has its own set of MDS daemons
assigned to ranks so ensure that you have sufficient standby daemons available
to accommodate the new file system.

::

    ceph fs ls

List all file systems by name.

::

    ceph fs lsflags <file system name>

List all the flags set on a file system.

::

    ceph fs dump [epoch]

This dumps the FSMap at the given epoch (default: current) which includes all
file system settings, MDS daemons and the ranks they hold, and the list of
standby MDS daemons.


::

    ceph fs rm <file system name> [--yes-i-really-mean-it]

Destroy a CephFS file system. This wipes information about the state of the
file system from the FSMap. The metadata pool and data pools are untouched and
must be destroyed separately.

::

    ceph fs get <file system name>

Get information about the named file system, including settings and ranks. This
is a subset of the same information from the ``ceph fs dump`` command.

::

    ceph fs set <file system name> <var> <val> [--yes-i-really-mean-it]

Change a setting on a file system. These settings are specific to the named
file system and do not affect other file systems. Confirmation flag is only
needed for changing ``max_mds`` when cluster is unhealthy.

.. note:: It is mandatory to pass confirmation flag (--yes--i-really-mean-it)
   for modifying FS setting variable ``max_mds`` when cluster is unhealthy.
   It has been added a precaution to tell users that modifying ``max_mds``
   during troubleshooting or recovery might not help. Instead, it might
   further destabilize the cluster.

::

    ceph fs add_data_pool <file system name> <pool name/id>

Add a data pool to the file system. This pool can be used for file layouts
as an alternate location to store file data.

::

    ceph fs rm_data_pool <file system name> <pool name/id>

This command removes the specified pool from the list of data pools for the
file system.  If any files have layouts for the removed data pool, the file
data will become unavailable. The default data pool (when creating the file
system) cannot be removed.

::

    ceph fs rename <file system name> <new file system name> [--yes-i-really-mean-it]

Rename a Ceph file system. This also changes the application tags on the data
pools and metadata pool of the file system to the new file system name.
The CephX IDs authorized to the old file system name need to be reauthorized
to the new name. Any on-going operations of the clients using these IDs may be
disrupted. Mirroring is expected to be disabled on the file system.

::

    fs swap <fs1-name> <fs1_id> <fs2-name> <fs2_id> [--swap-fscids=yes|no] [--yes-i-really-mean-it]

Swaps names of two Ceph file sytems and updates the application tags on all
pools of both FSs accordingly. Certain tools that track FSCIDs of the file
systems, besides the FS names, might get confused due to this operation. For
this reason, mandatory option ``--swap-fscids`` has been provided that must be
used to indicate whether or not FSCIDs must be swapped.

.. note:: FSCID stands for "File System Cluster ID".

Before the swap, mirroring should be disabled on both the CephFSs
(because the cephfs-mirror daemon uses the fscid internally and changing it
while the daemon is running could result in undefined behaviour), both the
CephFSs should be offline and the file system flag ``refuse_client_sessions``
must be set for both the CephFS.

The function of this API is to facilitate disaster recovery where a new file
system reconstructed from the previous one is ready to take over for the
possibly damaged file system. Instead of two ``fs rename`` operations, the
operator can use a swap so there is no FSMap epoch where the primary (or
production) named file system does not exist. This is important when Ceph is
monitored by automatic storage operators like (Rook) which try to reconcile
the storage system continuously. That operator may attempt to recreate the
file system as soon as it is seen to not exist.

After the swap, CephX credentials may need to be reauthorized if the existing
mounts should "follow" the old file system to its new name. Generally, for
disaster recovery, its desirable for the existing mounts to continue using
the same file system name. Any active file system mounts for either CephFSs
must remount. Existing unflushed operations will be lost. When it is judged
that one of the swapped file systems is ready for clients, run::

    ceph fs set <fs> joinable true
    ceph fs set <fs> refuse_client_sessions false

Keep in mind that one of the swapped file systems may be left offline for
future analysis if doing a disaster recovery swap.


Settings
--------

::

    ceph fs set <fs name> max_file_size <size in bytes>

CephFS has a configurable maximum file size, and it's 1TB by default.
You may wish to set this limit higher if you expect to store large files
in CephFS. It is a 64-bit field.

Setting ``max_file_size`` to 0 does not disable the limit. It would
simply limit clients to only creating empty files.


Maximum file sizes and performance
----------------------------------

CephFS enforces the maximum file size limit at the point of appending to
files or setting their size. It does not affect how anything is stored.

When users create a file of an enormous size (without necessarily
writing any data to it), some operations (such as deletes) cause the MDS
to have to do a large number of operations to check if any of the RADOS
objects within the range that could exist (according to the file size)
really existed.

The ``max_file_size`` setting prevents users from creating files that
appear to be eg. exabytes in size, causing load on the MDS as it tries
to enumerate the objects during operations like stats or deletes.


Taking the cluster down
-----------------------

Taking a CephFS cluster down is done by setting the down flag:
 
:: 
 
    ceph fs set <fs_name> down true
 
To bring the cluster back online:
 
:: 

    ceph fs set <fs_name> down false

This will also restore the previous value of max_mds. MDS daemons are brought
down in a way such that journals are flushed to the metadata pool and all
client I/O is stopped.


Taking the cluster down rapidly for deletion or disaster recovery
-----------------------------------------------------------------

To allow rapidly deleting a file system (for testing) or to quickly bring the
file system and MDS daemons down, use the ``ceph fs fail`` command:

::

    ceph fs fail <fs_name> {--yes-i-really-mean-it}

.. note:: Note that confirmation flag is optional because it is only required
   when the MDS is active and has health warning MDS_TRIM or
   MDS_CACHE_OVERSIZED.

This command sets a file system flag to prevent standbys from
activating on the file system (the ``joinable`` flag).

This process can also be done manually by doing the following:

::

    ceph fs set <fs_name> joinable false

Then the operator can fail all of the ranks which causes the MDS daemons to
respawn as standbys. The file system will be left in a degraded state.

::

    # For all ranks, 0-N:
    ceph mds fail <fs_name>:<n> {--yes-i-really-mean-it}

.. note:: Note that confirmation flag is optional because it is only required
   when the MDS is active and has health warning MDS_TRIM or
   MDS_CACHE_OVERSIZED.

Once all ranks are inactive, the file system may also be deleted or left in
this state for other purposes (perhaps disaster recovery).

To bring the cluster back up, simply set the joinable flag:

::

    ceph fs set <fs_name> joinable true


Daemons
-------

Most commands manipulating MDSs take a ``<role>`` argument which can take one
of three forms:

::

    <fs_name>:<rank>
    <fs_id>:<rank>
    <rank>

Commands to manipulate MDS daemons:

::

    ceph mds fail <gid/name/role>

Mark an MDS daemon as failed.  This is equivalent to what the cluster
would do if an MDS daemon had failed to send a message to the mon
for ``mds_beacon_grace`` second.  If the daemon was active and a suitable
standby is available, using ``ceph mds fail`` will force a failover to the
standby.

If the MDS daemon was in reality still running, then using ``ceph mds fail``
will cause the daemon to restart.  If it was active and a standby was
available, then the "failed" daemon will return as a standby.


::

    ceph tell mds.<daemon name> command ...

Send a command to the MDS daemon(s). Use ``mds.*`` to send a command to all
daemons. Use ``ceph tell mds.* help`` to learn available commands.

::

    ceph mds metadata <gid/name/role>

Get metadata about the given MDS known to the Monitors.

::

    ceph mds repaired <role>

Mark the file system rank as repaired. Unlike the name suggests, this command
does not change a MDS; it manipulates the file system rank which has been
marked damaged.

::

    ceph mds last-seen <name>

Learn the when the MDS named ``name`` was last in the FSMap. The JSON output
includes the epoch the MDS was last seen. Historically information is limited by
the following ``mon`` configuration:


.. confval:: mon_fsmap_prune_threshold


Required Client Features
------------------------

It is sometimes desirable to set features that clients must support to talk to
CephFS. Clients without those features may disrupt other clients or behave in
surprising ways. Or, you may want to require newer features to prevent older
and possibly buggy clients from connecting.

Commands to manipulate required client features of a file system:

::

    ceph fs required_client_features <fs name> add reply_encoding
    ceph fs required_client_features <fs name> rm reply_encoding

To list all CephFS features

::

    ceph fs feature ls

Clients that are missing newly added features will be evicted automatically.

Here are the current CephFS features and first release they came out:

+----------------------------+--------------+-----------------+
| Feature                    | Ceph release | Upstream Kernel |
+============================+==============+=================+
| jewel                      | jewel        | 4.5             |
+----------------------------+--------------+-----------------+
| kraken                     | kraken       | 4.13            |
+----------------------------+--------------+-----------------+
| luminous                   | luminous     | 4.13            |
+----------------------------+--------------+-----------------+
| mimic                      | mimic        | 4.19            |
+----------------------------+--------------+-----------------+
| reply_encoding             | nautilus     | 5.1             |
+----------------------------+--------------+-----------------+
| reclaim_client             | nautilus     | N/A             |
+----------------------------+--------------+-----------------+
| lazy_caps_wanted           | nautilus     | 5.1             |
+----------------------------+--------------+-----------------+
| multi_reconnect            | nautilus     | 5.1             |
+----------------------------+--------------+-----------------+
| deleg_ino                  | octopus      | 5.6             |
+----------------------------+--------------+-----------------+
| metric_collect             | pacific      | N/A             |
+----------------------------+--------------+-----------------+
| alternate_name             | pacific      | 6.5             |
+----------------------------+--------------+-----------------+
| notify_session_state       | quincy       | 5.19            |
+----------------------------+--------------+-----------------+
| op_getvxattr               | quincy       | 6.0             |
+----------------------------+--------------+-----------------+
| 32bits_retry_fwd           | reef         | 6.6             |
+----------------------------+--------------+-----------------+
| new_snaprealm_info         | reef         | UNKNOWN         |
+----------------------------+--------------+-----------------+
| has_owner_uidgid           | reef         | 6.6             |
+----------------------------+--------------+-----------------+
| client_mds_auth_caps       | squid+bp     | PLANNED         |
+----------------------------+--------------+-----------------+

..
    Comment: use `git describe --tags --abbrev=0 <commit>` to lookup release


CephFS Feature Descriptions


::

    reply_encoding

MDS encodes request reply in extensible format if client supports this feature.


::

    reclaim_client

MDS allows new client to reclaim another (dead) client's states. This feature
is used by NFS-Ganesha.


::

    lazy_caps_wanted

When a stale client resumes, if the client supports this feature, mds only needs
to re-issue caps that are explicitly wanted.


::

    multi_reconnect

When mds failover, client sends reconnect messages to mds, to reestablish cache
states. If MDS supports this feature, client can split large reconnect message
into multiple ones.


::

    deleg_ino

MDS delegate inode numbers to client if client supports this feature. Having
delegated inode numbers is a prerequisite for client to do async file creation.


::

    metric_collect

Clients can send performance metric to MDS if MDS support this feature.

::

    alternate_name

Clients can set and understand "alternate names" for directory entries. This is
to be used for encrypted file name support.

::

    client_mds_auth_caps

To effectively implement ``root_squash`` in a client's ``mds`` caps, the client
must understand that it is enforcing ``root_squash`` and other cap metadata.
Clients without this feature are in danger of dropping updates to files.  It is
recommend to set this feature bit.


Global settings
---------------


::

    ceph fs flag set <flag name> <flag val> [<confirmation string>]

Sets a global CephFS flag (i.e. not specific to a particular file system).
Currently, the only flag setting is 'enable_multiple' which allows having
multiple CephFS file systems.

Some flags require you to confirm your intentions with "--yes-i-really-mean-it"
or a similar string they will prompt you with. Consider these actions carefully
before proceeding; they are placed on especially dangerous activities.

.. _advanced-cephfs-admin-settings:

Advanced
--------

These commands are not required in normal operation, and exist
for use in exceptional circumstances.  Incorrect use of these
commands may cause serious problems, such as an inaccessible
file system.

::

    ceph mds rmfailed

This removes a rank from the failed set.

::

    ceph fs reset <file system name>

This command resets the file system state to defaults, except for the name and
pools. Non-zero ranks are saved in the stopped set.


::

    ceph fs new <file system name> <metadata pool name> <data pool name> --fscid <fscid> --force

This command creates a file system with a specific **fscid** (file system cluster ID).
You may want to do this when an application expects the file system's ID to be
stable after it has been recovered, e.g., after monitor databases are lost and
rebuilt. Consequently, file system IDs don't always keep increasing with newer
file systems.