summaryrefslogtreecommitdiffstats
path: root/mdmon.h (follow)
Commit message (Collapse)AuthorAgeFilesLines
* mdmon: delegate removal to managemonMariusz Tkaczyk2024-11-041-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Starting from [1], kernel requires suspend lock on member drive remove path. It causes deadlock with external management because monitor thread may be locked on suspend and is unable to switch array to active, for example if badblock is reported in this time. It is blocking action now, so it must be delegated to managemon thread but we must ensure that monitor does metadata update first, just after detecting faulty. This patch adds appropriative support. Monitor thread detects "faulty", and updates the metadata. After that, it is asking manager thread to remove the device. Manager must be careful because closing descriptors used by select() may lead to abort with D_FORTIFY_SOURCE=2. First, it must ensure that device descriptors are not used by monitor. There is unlimited numer of remove retries and recovery is blocked until all failed drives are removed. It is safe because "faulty" device is not longer used by MD. Issue will be also mitigated by optimalization on badlbock recording path in kernel. It will check if device is not failed before badblock is recorded but relying on this is not ideologically correct. Userspace must keep compatibility with kernel and since it is blocking action, we must tract is as blocking action. [1] kernel commit cfa078c8b80d ("md: use new apis to suspend array for adding/removing rdev from state_store()") Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
* mdstat: Rework mdstat external arrays handlingMariusz Tkaczyk2024-07-301-1/+1
| | | | | | | | | | | | To avoid repeating mdstat_read() in IncrementalRemove(), new function mdstat_find_by_member_name() has been proposed. With that, IncrementalRemove() handles own copy of mdstat content and there is no need to repeat reading for external stop. Additionally, It proposed few helper to avoid repeating mdstat_ent->metadata_version checks across code. Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
* mdadm: Introduce new array state 'broken' for raid0/linearGuilherme G. Piccoli2019-09-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently if a md raid0/linear array gets one or more members removed while being mounted, kernel keeps showing state 'clean' in the 'array_state' sysfs attribute. Despite udev signaling the member device is gone, 'mdadm' cannot issue the STOP_ARRAY ioctl successfully, given the array is mounted. Nothing else hints that something is wrong (except that the removed devices don't show properly in the output of mdadm 'detail' command). There is no other property to be checked, and if user is not performing reads/writes to the array, even kernel log is quiet and doesn't give a clue about the missing member. This patch is the mdadm counterpart of kernel new array state 'broken'. The 'broken' state mimics the state 'clean' in every aspect, being useful only to distinguish if an array has some member missing. All necessary paths in mdadm were changed to deal with 'broken' state, and in case the tool runs in a kernel that is not updated, it'll work normally, i.e., it doesn't require the 'broken' state in order to work. Also, this patch changes the way the array state is showed in the 'detail' command (for raid0/linear only) - now it takes the 'array_state' sysfs attribute into account instead of only rely in the MD_SB_CLEAN flag. Cc: Jes Sorensen <jes.sorensen@gmail.com> Cc: NeilBrown <neilb@suse.de> Cc: Song Liu <songliubraving@fb.com> Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* mdmon: get safe mode delay file descriptor earlyTomasz Majchrzak2017-10-041-0/+1
| | | | | | | | | | | | | | | | | After switch root new mdmon is started. It sends initrd mdmon a signal to terminate. initrd mdmon receives it and switches the safe mode delay to 1 ms in order to get array to clean state and flush last version of metadata. The problem is sysfs filesystem is not available to initrd mdmon after switch root so the original safe mode delay is unchanged. The delay is set to few seconds - if there is a lot of traffic on the filesystem, initrd mdmon doesn't terminate for a long time (no clean state). There are 2 instances of mdmon. initrd mdmon flushes metadata when array goes to clean state but this metadata might be already outdated. Use file descriptor obtained on mdmon start to change safe mode delay. Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Fix some issues found by clangNeilBrown2016-10-071-1/+1
| | | | | | | | | | | | | | | | | | The clang compiler complained about each of these. The mdmon.h error will only affect 'far' RAID10 arrays using intel or DDF metadata, and there is no such thing. The mdopen.c will cause a problem if there are no free md device numbers in the first 512. That is fairly unlikely. The restripe.c error would only affect the 'test_stripe' command, and probably doesn't change its behaviour. The super-intel.c fix is purely cosmetic. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
* Change way of printing name of a processPawel Baldysiak2015-02-121-2/+1
| | | | | | | | | | | | | Sometimes mdadm prints messages with wrong name "mdmon", and vice versa. This patch solves this problem by changing method of determining process name. Now "Name" will be set in const at start of a program, previously was hardcoded as #define. Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com> Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Fix is_resync_complete for RAID10NeilBrown2013-07-311-3/+17
| | | | | | | | For RAID10, 'sync' numbers go up to the array size rather than the component size. is_resync_complete() needs to allow for this. Reported-by: Pawel Baldysiak <pawel.baldysiak@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Remove lots of unnecessary white space.NeilBrown2013-06-191-3/+0
| | | | | | | Now that I am using white-space mode in Emacs I can see all of this, and I don't like it :-) Signed-off-by: NeilBrown <neilb@suse.de>
* pr_err for mdmon.NeilBrown2013-05-211-0/+3
| | | | Signed-off-by: NeilBrown <neilb@suse.de>
* Discard devnum in favour of devnmNeilBrown2013-02-211-2/+0
| | | | | | | | | | | | | | We widely use a "devnum" which is 0 or +ve for md%d devices and -ve for md_d%d devices. But I want to be able to use md_%s device names. So get rid of devnum (a number) and use devnm (a 32char string). eg. md0 md_d2 md_home Signed-off-by: NeilBrown <neilb@suse.de>
* FIX: Mdmon crashes after changing RAID level from 1 to 0Lukasz Dorau2011-09-061-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Description of the bug: Sometimes mdmon crashes after changing RAID level from 1 to 0 (takeover). Cause of the bug: The managemon marks an active_array for removal from monitoring by assigning a->container to NULL value (in the "manage_member" function). Sometimes (during stress test) it happens right when the monitor is in the "read_and_act" function and a->container pointer is in use. This causes the monitor crashes. Solution: The active array has to be marked for removal in another way than setting NULL pointer when it can be in use. A new field "to_remove" was added to the "active_array" structure. It is used in the managemon to mark a container to remove (instead of the old assigment: a->container = NULL) and monitor checks it to determine if the array should be removed. The field "to_remove" should be checked in some other places to avoid managing of the array which is going to be removed. Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* mdmon: when a reshape is detected, add any newly added devices to the array.NeilBrown2010-12-151-0/+1
| | | | | | | | | When mdadm starts a reshape, it might add some devices to the array first. mdmon needs to notice the reshape starting and check for any new devices. If there are any they need to be provided to be monitored. Signed-off-by: NeilBrown <neilb@suse.de>
* mdmon: periodically checkpoint recoveryDan Williams2010-05-151-0/+9
| | | | | | | | | | | | The kernel updates and notifies md/sync_completed when it is time to take a checkpoint. When this occurs (at 1/16 array size intervals) write 'idle' to md/sync_action to have the current recovery position updated in recovery_start and resync_start. Requires the metadata handler to reset ->last_checkpoint when it has determined that recovery has ended. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* mdmon: insist on creating .pid file at startup.NeilBrown2010-02-081-3/+0
| | | | | | | | | | | | | | | | Now that we don't "mdadm --takeover" until /var/run is writable there is no need to continually try to create files in there. So only create these files at startup and fail if they cannot be made. This means that to start an array with externally managed metadata, either /var/run or ALT_RUN (e.g. /lib/init/rw) must be writable. To 'takeover' from a previous mdmon instance, /var/run must be writable. This means we don't need to worry about SIGHUP (which was once used to tell us it was time to create .pid) and SIGALRM. Signed-off-by: NeilBrown <neilb@suse.de>
* mdmon: allow pid to be stored in different directory.NeilBrown2010-02-041-1/+1
| | | | | | | | /var/run probably doesn't persist from early boot. So if necessary, store in in /lib/init/rw or somewhere else that does persist. Signed-off-by: NeilBrown <neilb@suse.de>
* mdmon: cleanup resync_startDan Williams2009-12-141-5/+2
| | | | | | | | | | We don't need to sprinkle reads of this attribute all over the place, just once at the entry of read_and_act(). Also, the mdinfo structure for the array already has a 'resync_start' member, so just reuse that. Finally, rename get_resync_start() to read_resync_start to make it consistent with the other sysfs accessors in monitor.c. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* Update copyright dates and remove references to @cse.unsw.edu.auNeilBrown2009-06-021-2/+2
| | | | | | Also removed 'paper' addresses. Signed-off-by: NeilBrown <neilb@suse.de>
* update copyright headersDan Williams2008-10-281-0/+20
| | | | Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* mdmon: wait after trying to killDan Williams2008-10-151-0/+1
| | | | | | | | | | | | Now that mdmon handles sigterm if another monitor wants to take over it should wait until all managed arrays are clean. So make WaitClean() available to mdmon and teach try_kill_monitor() to wait on each subarray in the container. ...since we may be communicating with a dieing process, we need to block SIGPIPE earlier. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* mdmon: terminate cleanDan Williams2008-10-151-0/+1
| | | | | | | | | | | We generally don't want mdmon to be terminated, but if a SIGTERM gets through try to leave the monitored arrays in a clean state, block attempts to mark the array dirty, and stop servicing the socket. When we are killed by sigterm don't remove the pidfile let that be cleaned up by the next monitor. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* monitor: protect against CONFIG_LBD=nDan Williams2008-10-151-0/+11
| | | | | | | | md/resync_start reports different terminal values depending on kernel configuration (~0UL versus ~0ULL). Make detection of the resync-complete state more robust by comparing against array size. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* sysfs: dprintf when we fail to write a sysfs fileDan Williams2008-10-151-8/+0
| | | | | | | When arrays do not startup correctly it would be nice to know why. Need to move the dprintf definition to mdadm.h Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* mdmon: recreate socket/pid file on SIGHUPDan Williams2008-09-161-0/+3
| | | | | | | | | | Allow mdmon to start while /var/run/mdadm is readonly. Later a SIGHUP can trigger mdmon to drop its pid and socket once /var/run/mdadm is writable. Of course one needs the pid to send a HUP, that can be stored in a distribution specific rw-init directory... For now, rely on a killall -HUP mdmon to get the files dumped. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* Allow an externally managed array to be marked readonlyNeilBrown2008-08-191-0/+1
| | | | | | | | | | | If the metadata_version is -mdXXX/whatever rather than /mdXXX/whatever then the array is readonly and should be left alone by mdmon. Signed-off-by: NeilBrown <neilb@suse.de>
* mdmon: ping will wait for manage_mon to catch up.NeilBrown2008-07-181-0/+1
| | | | | | | | | | | When a 'ping' (empty message) is sent to mdmon, we wait for 'monitor' to do a full loop to make sure it has caught up with anything that needs doing. This allows synchronisation between mdadm and mdmon. Maybe monitor should signal managemon rather than managemon polling... Signed-off-by: Neil Brown <neilb@suse.de>
* Make sure resync_start is initialised properly and maintained properlyNeil Brown2008-07-181-0/+1
| | | | Signed-off-by: Neil Brown <neilb@suse.de>
* Create arrays via metadata-updateNeil Brown2008-07-121-6/+0
| | | | | Support creating arrays inside an active ddf container by sending a metadata update over a pipe to mdmon.
* Remove mgr_pipe for communicating from manage to monitor.Neil Brown2008-07-121-0/+1
| | | | | Data is being passed in shared memory, so the pipe is only being use as a wakeup. This can more easily be done with a thread-signal.
* Hide subordinate superswitch structures.Neil Brown2008-07-121-2/+0
| | | | | | Only one superswitch should be externally visible for each general type. Others which handle different flavours (e.g. container/data-array) should be internal only.
* mdmon: add debug print statements for profiling mdmonDan Williams2008-06-171-0/+7
| | | | | | | for development only as console output can block leading to monitor deadlocks in low mem situations Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* Support adding a spare to a degraded array.Neil Brown2008-06-121-0/+2
| | | | | When signalled by the monitor, the manager will find spares and add them to the array and initiate a recovery.
* Allow passing metadata update to the monitor.Neil Brown2008-06-121-1/+18
| | | | | | Code in manager can now just call queue_metadata_update with a (freeable) buf holding the update, and it will get passed to the monitor and written out.
* Discard get_sync_pos. We should be using get_resync_start.Neil Brown2008-05-271-2/+0
| | | | | | | | | "sync_complete" just tracks the current resync/recover/check/whatever pass. "resync_start" tracks which parts of the array are known to be in-sync (modulo active writes). So it is what we need to use to update the metadata. Also we cannot call it when the array has stopped, as the value is no longer available then. We must call it when the resync completes. Possibly also call it preiodically if the array is quiescent.
* Exit when there are no more arrays to manage.Neil Brown2008-05-271-0/+3
|
* Discard 'array_list' in mdmonNeil Brown2008-05-271-1/+0
| | | | The container has an ->arrays field that we should be using.
* add infrastructure to receive higher order commands, like remove_deviceDan Williams2008-05-151-0/+1
| | | | | | | | | | | From: Dan Williams <dan.j.williams@intel.com> Each md_message encapsulates a single command. A command includes an 'action' member which describes what if any data comes after the action. Communication with the monitor involves updating the active_cmd pointer and then writing to mgr_pipe. Pass/fail status is returned via mon_pipe. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* start resync when transitioning from initial readonly stateDan Williams2008-05-151-0/+2
| | | | | | | | | From: Dan Williams <dan.j.williams@intel.com> mdadm handles setting resync_start, monitor uses this value to determine whether to set the 'active' or 'readauto' state. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* Merge mdmonNeil Brown2008-05-151-0/+41