summaryrefslogtreecommitdiffstats
path: root/Monitor.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Monitor/msg: Don't print error message if mdmon doesn't runMariusz Tkaczyk2017-11-211-4/+5
| | | | | | | | | | | | | | | | Commit 4515fb28a53a ("Add detail information when can not connect monitor") was added to warn about failed connection to monitor in WaitClean function (see link below). Mdmon runs for IMSM containers when they have array with redundancy so if mdmon doesn't run, mdadm prints this error. This is misleading and unnecessary. Just print it in WaitClean function. The sock in WaitClean is deprecated so it is removed. Link: https://bugzilla.redhat.com/show_bug.cgi?id=1375002 Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@intel.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Monitor: Check redundancy for arraysMariusz Tkaczyk2017-10-021-4/+4
| | | | | | | | | GET_MISMATCH option doesn't exist for RAID arrays without redundancy so sysfs_read fails if this information is requested. Set options according to the device using information from /proc/mdstat. Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@intel.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Monitor: Include containers in spare migrationMariusz Tkaczyk2017-08-161-1/+1
| | | | | | | | | | | | | Spare migration doesn't work for external metadata. mdadm skips a container with spare device because it is inactive. It used to work because GET_ARRAY_INFO ioctl returned valid structure for a container and mdadm treated such response as active container. Current implementation checks it in sysfs where container is shown as inactive. Adapt sysfs implementation to work the same way as ioctl. Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@intel.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Monitor: containers don't have the same sysfs properties as arraysMariusz Tkaczyk2017-08-161-18/+28
| | | | | | | | | GET_MISMATCH option doesn't exist for containers so sysfs_read fails if this information is requested. Set options according to the device using information from /proc/mdstat. Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@intel.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Monitor: don't assume mdadm parameter is a block deviceTomasz Majchrzak2017-07-101-2/+11
| | | | | | | | | | | If symlink (e.g. /dev/md/raid) is passed as a parameter to mdadm --wait, it fails as it's not able to find a corresponding entry in /proc/mdstat output. Get parameter file major:minor and look for block device name in sysfs. This commit is partial revert of commit 9e04ac1c43e6 ("mdadm/util: unify stat checking blkdev into function"). Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Get failed disk count from array stateTomasz Majchrzak2017-06-051-2/+2
| | | | | | | | | | | | | | | | | | | | Recent commit has changed the way failed disks are counted. It breaks recovery for external metadata arrays as failed disks are not part of the array and have no corresponding entries is sysfs (they are only reported for containers) so degraded arrays show no failed disks. Recent commit overwrites GET_DEGRADED result prior to GET_STATE and it is not set again if GET_STATE has not been requested. As GET_STATE provides the same information as GET_DEGRADED, the latter is not needed anymore. Remove GET_DEGRADED option and replace it with GET_STATE option. Don't count number of failed disks looking at sysfs entries but calculate it at the end. Do it only for arrays as containers report no disks, just spares. Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* mdadm: Fixup more broken logical operator formattingJes Sorensen2017-05-161-2/+2
| | | | Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Monitor: Fixup a pile of whitespace issuesJes Sorensen2017-05-111-55/+55
| | | | | | No code was hurt in this event Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Monitor: mailfrom is initialized correctlyJes Sorensen2017-05-111-1/+1
| | | | | | Remove gratituous variable initialization. Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Monitor: Not much point declaring mdlist in both forks of the if() statementJes Sorensen2017-05-111-2/+3
| | | | Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Monitor/check_array: Use working_disks from sysfsJes Sorensen2017-05-091-2/+2
| | | | | | sysfs now provides working_disks information, so lets use it too. Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Monitor/check_array: Get nr_disks, active_disks and spare_disks from sysfsJes Sorensen2017-05-091-7/+7
| | | | | | | This leaves working_disks and utime missing before we can eliminate check_array()'s call to md_get_array_info() Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Monitor/check_array: Get array_disks from sysfsJes Sorensen2017-05-091-2/+2
| | | | Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Monitor/check_array: Get 'failed_disks' from sysfsJes Sorensen2017-05-091-3/+4
| | | | Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Monitor/check_array: Obtain RAID level from syfsJes Sorensen2017-05-091-3/+3
| | | | Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Monitor/check_array: Read sysfs entry earlierJes Sorensen2017-05-091-6/+10
| | | | | | | This will allow us to pull additional info from sysfs, such as level and device info. Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Monitor/check_array: Declate mdinfo instance globallyJes Sorensen2017-05-091-2/+2
| | | | | | We can pull in more information from sysfs earlier, so move sra to the top. Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Monitor/check_array: Reduce duplicated error handlingJes Sorensen2017-05-091-24/+15
| | | | | | | Avoid closing fd in multiple places, and duplicating the error message for when a device disappeared. Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Monitor/check_array: Centralize exit pathJes Sorensen2017-05-091-10/+14
| | | | | | | Improve exit handling to make it easier to share error handling and free sysfs entries later. Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Add sector size as spare selection criterionAlexey Obitotskiy2017-05-091-0/+8
| | | | | | | | | | | Add sector size as new spare selection criterion. Assume that 0 means there is no requirement for the sector size in the array. Skip disks with unsuitable sector size when looking for a spare to move across containers. Signed-off-by: Alexey Obitotskiy <aleksey.obitotskiy@intel.com> Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Allow more spare selection criteriaAlexey Obitotskiy2017-05-091-14/+16
| | | | | | | | | | | | | | Disks can be moved across containers in order to be used as a spare drive for reubild. At the moment the only requirement checked for such disk is its size (if it matches donor expectations). In order to introduce more criteria rename corresponding superswitch method to more generic name and move function parameter to a structure. This change is a big edit but it doesn't introduce any changes in code logic, it just updates function naming and parameters. Signed-off-by: Alexey Obitotskiy <aleksey.obitotskiy@intel.com> Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Monitor: Code is 80 characters per lineJes Sorensen2017-05-081-34/+27
| | | | | | | Fix up some lines that are too long for no reason, and some that have silly line breaks. Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Monitor: Use md_array_active() instead of manually fiddling in sysfsJes Sorensen2017-05-081-28/+11
| | | | | | | This removes a pile of clutter that can easily behandled with a simple check of array_state. Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* mdadm/util: unify stat checking blkdev into functionZhilong Liu2017-05-051-12/+4
| | | | | | | | | | | | declare function stat_is_blkdev() to integrate repeated stat checking blkdev operations, it returns 'true/1' when it is a block device, and returns 'false/0' when it isn't. The devname is necessary parameter, *rdev is optional, parse the pointer of dev_t *rdev, if valid, assigned device number to dev_t *rdev, if NULL, ignores. Signed-off-by: Zhilong Liu <zlliu@suse.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Retire mdassembleJes Sorensen2017-04-111-3/+0
| | | | | | | | mdassemble doesn't handle container based arrays, no support for sysfs, etc. It has not been actively maintained for years, so time to send it off to retirement. Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* sysfs: Make sysfs_init() return an error codeJes Sorensen2017-03-301-1/+3
| | | | | | | | Rather than have the caller inspect the returned content, return an error code from sysfs_init(). In addition make all callers actually check it. Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
* util: Introduce md_get_disk_info()Jes Sorensen2017-03-291-1/+1
| | | | | | | This removes all the inline ioctl calls for GET_DISK_INFO, allowing us to switch to sysfs in one place, and improves type checking. Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
* util: Introduce md_get_array_info()Jes Sorensen2017-03-291-3/+4
| | | | | | | | | | | Remove most direct ioctl calls for GET_ARRAY_INFO, except for one, which will be addressed in the next patch. This is the start of the effort to clean up the use of ioctl calls and introduce a more structured API, which will use sysfs and fall back to ioctl for backup. Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
* mdadm/Monitor: Fix NULL pointer dereference when stat2devnm return NULLZhilong Liu2017-03-281-1/+7
| | | | | | | | | | | Wait(): stat2devnm() returns NULL for non block devices. Check the pointer is valid derefencing it. This can happen when using --wait, such as the 'f' and 'd' file type, causing a core dump. such as: ./mdadm --wait /dev/md/ Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Zhilong Liu <zlliu@suse.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
* Monitor: release /proc/mdstat fd when no arrays presentTomasz Majchrzak2016-07-211-0/+2
| | | | | | | | | | | | | | | | | | | | If md kernel module is reloaded, /proc/mdstat cannot be accessed ("cat: /proc/mdstat: No such file or directory"). The reason is mdadm monitor still holds a file descriptor to previous /proc/mdstat instance. It leads to really confusing outcome of the following operations - mdadm seems to run without errors, however some udev rules don't get executed and new array doesn't work. Add a check if lseek was successful as it fails if md kernel module has been unloaded - close a file descriptor then. The problem is mdadm monitor doesn't always do it before next operation takes place. To prevent it monitor always releases /proc/mdstat descriptor when there are no arrays to be monitored, just in case driver unload happens in a moment. Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com> Reviewed-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
* Monitor: Use sysfs_free() to free object returned by sysfs_read()Jes Sorensen2016-06-101-1/+1
| | | | | | | We should always use sysfs_free() to release sysfs_* allocated objects. Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
* Fix some type comparison problemsXiao Ni2016-02-081-1/+1
| | | | | | | | | | | As 26714713cd2bad9e0bf7f4669f6cc4659ceaab6c said, 32 bit signed timestamps will overflow in the year 2038. It already changed the utime and ctime in struct mdu_array_info_s from int to unsigned int. So we need to change the values that compared with them to unsigned int too. Signed-off-by : Xiao Ni <xni@redhat.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
* Monitor: don't Wait forever on a 'frozen' array.NeilBrown2015-07-061-2/+10
| | | | | | | If Wait() finds the array resync is 'frozen', then wait a little while to avoid races, but don't wait forever. Signed-off-by: NeilBrown <neilb@suse.com>
* mdadm: monitor: fix nullptr dereference when get_md_name() returns NULLSergey Vidishev2015-05-201-1/+9
| | | | | | | | | Function add_new_arrays() expects that function get_md_name() should return pointer to devname, but also get_md_name() may return NULL. So check the pointer before use it in add_new_arrays(). Signed-off-by: Sergey Vidishev <sergeyv@yandex-team.ru> Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: use the "space protocol" for "Wrong-Level".NeilBrown2015-04-081-1/+1
| | | | | | | "Wrong-Level" is a reason, not a component device, so it should start with a space to indiciate this to alert(). Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: Obey "space protocol" when writing to syslog.NeilBrown2015-04-081-1/+5
| | | | | | | | | | | "alert" treats the "disc" arg differently if it starts with a space. At least it does for sending email. It doesn't for writing to syslog. Make this consistent and obey the 'space protocol' when writing to syslog. Signed-off-by: NeilBrown <neilb@suse.de>
* Don't break long strings onto multiple lines.NeilBrown2015-02-121-23/+10
| | | | | | | | | | | | | | | | | It is best to keep strings all together so that they are easier to search for in the source code. If a string is so long that it looks ugly one line, them maybe it should be broken into multiple lines for display too. Only strings which contain a newline can be broken into multiple lines: "It is OK to\n" "break this string\n" Signed-off-by: NeilBrown <neilb@suse.de>
* Change way of printing name of a processPawel Baldysiak2015-02-121-2/+2
| | | | | | | | | | | | | Sometimes mdadm prints messages with wrong name "mdmon", and vice versa. This patch solves this problem by changing method of determining process name. Now "Name" will be set in const at start of a program, previously was hardcoded as #define. Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com> Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: fix for regression with container devicesArtur Paszkiewicz2015-02-111-4/+10
| | | | | | | | | | | This patch fixes 2 problems introduced by commit 9a518d8: not closing a file descriptor and ignoring container devices. Array state is always "inactive" for containers, so we make sure that the device is not a container by reading also the "level" sysfs entry. Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Reviewed-by: Pawel Baldysiak <pawel.baldysiak@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: don't open md array that doesn't exist.NeilBrown2014-11-251-1/+22
| | | | | | | | | | | | | | Opening a block-special-device for an array that doesn't exist causes that array to be instantiated (as an empty array). Races at array shutdown can cause the array to spontaneously re-appear if some deamon notices a 'change' event and goes to investigate. Teach "mdadm --monitor" to avoid this race by checking the "array_state" before opening the device. Reported-by: Francis Moreau <francis.moro@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: Stop monitoring devices that have disappeared.NeilBrown2014-08-141-6/+18
| | | | | | | | If we are only monitoring a device because we found it in /proc/mdstat, and it has been gone for 5 checks, forget about it completely. Signed-off-by: NeilBrown <neilb@suse.de>
* New function: sysfs_waitNeilBrown2013-07-011-8/+2
| | | | | | | We have several places that wait for activity on a sysfs file. Combine most of these into a single 'sysfs_wait' function. Signed-off-by: NeilBrown <neilb@suse.de>
* Remove lots of unnecessary white space.NeilBrown2013-06-191-7/+5
| | | | | | | Now that I am using white-space mode in Emacs I can see all of this, and I don't like it :-) Signed-off-by: NeilBrown <neilb@suse.de>
* Wait: also wait if an action is about to start.NeilBrown2013-05-011-0/+13
| | | | | | | | | | If a sync/recover action is about to start but hasn't actually begun yet, /proc/mdstat won't show it, but md/sync_action will (it checks MD_RECOVERY_NEEDED). So when /proc/mdstat seems to say nothing is happening, double check with md/sync_action. Signed-off-by: NeilBrown <neilb@suse.de>
* Discard devnum in favour of devnmNeilBrown2013-02-211-46/+45
| | | | | | | | | | | | | | We widely use a "devnum" which is 0 or +ve for md%d devices and -ve for md_d%d devices. But I want to be able to use md_%s device names. So get rid of devnum (a number) and use devnm (a 32char string). eg. md0 md_d2 md_home Signed-off-by: NeilBrown <neilb@suse.de>
* Allow --wait to wait for delayed resync.NeilBrown2012-11-211-1/+1
| | | | | | | | If a resync is delayed, then e->percent will be negative but not RESYNC_NONE. In that case we still want to wait. Reported-by: Ross Boylan <ross@biostat.ucsf.edu> Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: don't complain about non-monitorable arrays in mdadm.confNeilBrown2012-10-241-1/+3
| | | | | | | | | | | | | | If we are asked to monitor a RAID0 or Linear - which cannot be monitored - we complain with "Device Disappeared .... Wrong-Level". However if the RAID0 or Linear is being requested because it is in mdadm.conf then the message is inappropriate and confusing. So track which arrays are added from the config file, and suppress that message in that case. Reported-by: "Johnson Yan" <johnson_yan@usish.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Change Monitor to take a struct contextNeilBrown2012-07-091-13/+14
| | | | Signed-off-by: NeilBrown <neilb@suse.de>
* Remove scattered checks for malloc success.NeilBrown2012-07-091-15/+9
| | | | | | | | | | | | | | malloc should never fail, and if it does it is unlikely that anything else useful can be done. Best approach is to abort and let some super-daemon restart. So define xmalloc, xcalloc, xrealloc, xstrdup which don't fail but just print a message and exit. Then use those removing all the tests for failure. Also replace all "malloc;memset" sequences with 'xcalloc'. Signed-off-by: NeilBrown <neilb@suse.de>
* Introduce pr_err for printing error messages.NeilBrown2012-07-091-12/+12
| | | | | | | 'pr_err("' is a lot shorter than 'fprintf(stderr, Name ": ' cont_err() is also available. Signed-off-by: NeilBrown <neilb@suse.de>