summaryrefslogtreecommitdiffstats
path: root/fs (follow)
Commit message (Collapse)AuthorAgeFilesLines
* xfs: stats are no longer dependent on CONFIG_PROC_FSDave Chinner2015-10-182-1/+3
| | | | | | | | | So we need to fix the makefile to understand this, otherwise build errors with CONFIG_PROC_FS=n occur. Reported-and-tested-by: Jim Davis <jim.epost@gmail.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
* xfs: simplify /proc teardown & error handlingEric Sandeen2015-10-121-19/+7
| | | | | | | | | | | | remove_proc_subtree() was added in 3.9, and can be used to simplify our procfile creation error handling and cleanup, removing the nested gotos. It simply removes fs/xfs and everything created under it. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
* xfs: per-filesystem stats counter implementationBill O'Donnell2015-10-1221-118/+131
| | | | | | | | | | | | | | | | | | | | | | This patch modifies the stats counting macros and the callers to those macros to properly increment, decrement, and add-to the xfs stats counts. The counts for global and per-fs stats are correctly advanced, and cleared by writing a "1" to the corresponding clear file. global counts: /sys/fs/xfs/stats/stats per-fs counts: /sys/fs/xfs/sda*/stats/stats global clear: /sys/fs/xfs/stats/stats_clear per-fs clear: /sys/fs/xfs/sda*/stats/stats_clear [dchinner: cleaned up macro variables, removed CONFIG_FS_PROC around stats structures and macros. ] Signed-off-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
* xfs: per-filesystem stats in sysfsBill O'Donnell2015-10-123-3/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements per-filesystem stats objects in sysfs. It depends on the application of the previous patch series that develops the infrastructure to support both xfs global stats and xfs per-fs stats in sysfs. Stats objects are instantiated when an xfs filesystem is mounted and deleted on unmount. With this patch, the stats directory is created and populated with the familiar stats and stats_clear files. Example: /sys/fs/xfs/sda9/stats/stats /sys/fs/xfs/sda9/stats/stats_clear With this patch, the individual counts within the new per-fs stats file(s) remain at zero. Functions that use the the macros to increment, decrement, and add-to the per-fs stats counts will be covered in a separate new patch to follow this one. Note that the counts within the global stats file (/sys/fs/xfs/stats/stats) advance normally and can be cleared as it was prior to this patch. [dchinner: move setup/teardown to xfs_fs_{fill|put}_super() so it is down before/after any path that uses the per-mount stats. ] Signed-off-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
* xfs: pass xfsstats structures to handlers and macrosBill O'Donnell2015-10-116-37/+63
| | | | | | | | | | | | | | | | | | | | This patch is the next step toward per-fs xfs stats. The patch makes the show and clear routines able to handle any stats structure associated with a kobject. Instead of a single global xfsstats structure, add kobject and a pointer to a per-cpu struct xfsstats. Modify the macros that manipulate the stats accordingly: XFS_STATS_INC, XFS_STATS_DEC, and XFS_STATS_ADD now access xfsstats->xs_stats. The sysfs functions need to get from the kobject back to the xfsstats structure which contains it, and pass the pointer to the ->xs_stats percpu structure into the show & clear routines. Signed-off-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
* xfs: consolidate sysfs opsBill O'Donnell2015-10-111-119/+63
| | | | | | | | | | As a part of the series to move xfs global stats from procfs to sysfs, this patch consolidates the sysfs ops functions and removes redundancy. Signed-off-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
* xfs: remove unused procfs codeBill O'Donnell2015-10-111-74/+0
| | | | | | | | | | As a part of the work to move xfs global stats from procfs to sysfs, this patch removes the now unused procfs code that was xfs stat specific. Signed-off-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
* xfs: create symlink proc/fs/xfs/stat to sys/fs/xfs/statsBill O'Donnell2015-10-111-2/+3
| | | | | | | | | | As a part of the work to move xfs global stats from procfs to sysfs, this patch creates the symlink from proc/fs/xfs/stat to sys/fs/xfs/stats. Signed-off-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
* xfs: create global stats and stats_clear in sysfsBill O'Donnell2015-10-116-17/+178
| | | | | | | | | | | Currently, xfs global stats are in procfs. This patch introduces (replicates) the global stats in sysfs. Additionally a stats_clear file is introduced in sysfs. Signed-off-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
* Merge branch 'libnvdimm-fixes' of ↵Linus Torvalds2015-09-202-1/+9
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: - a boot regression (since v4.2) fix for some ARM configurations from Tyler - regression (since v4.1) fixes for mkfs.xfs on a DAX enabled device from Jeff. These are tagged for -stable. - a pair of locking fixes from Axel that are hidden from lockdep since they involve device_lock(). The "btt" one is tagged for -stable, the other only applies to the new "pfn" mechanism in v4.3. - a fix for the pmem ->rw_page() path to use wmb_pmem() from Ross. * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: mm: fix type cast in __pfn_to_phys() pmem: add proper fencing to pmem_rw_page() libnvdimm: pfn_devs: Fix locking in namespace_store libnvdimm: btt_devs: Fix locking in namespace_store blockdev: don't set S_DAX for misaligned partitions dax: fix O_DIRECT I/O to the last block of a blockdev
| * blockdev: don't set S_DAX for misaligned partitionsJeff Moyer2015-09-161-0/+7
| | | | | | | | | | | | | | | | | | | | | | The dax code doesn't currently support misaligned partitions, so disable O_DIRECT via dax until such time as that support materializes. Cc: <stable@vger.kernel.org> Suggested-by: Boaz Harrosh <boaz@plexistor.com> Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * dax: fix O_DIRECT I/O to the last block of a blockdevJeff Moyer2015-09-161-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit bbab37ddc20b (block: Add support for DAX reads/writes to block devices) caused a regression in mkfs.xfs. That utility sets the block size of the device to the logical block size using the BLKBSZSET ioctl, and then issues a single sector read from the last sector of the device. This results in the dax_io code trying to do a page-sized read from 512 bytes from the end of the device. The result is -ERANGE being returned to userspace. The fix is to align the block to the page size before calling get_block. Thanks to willy for simplifying my original patch. Cc: <stable@vger.kernel.org> Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Tested-by: Linda Knippers <linda.knippers@hp.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* | fs-writeback: unplug before cond_resched in writeback_sb_inodesChris Mason2015-09-201-1/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 505a666ee3fc ("writeback: plug writeback in wb_writeback() and writeback_inodes_wb()") has us holding a plug during writeback_sb_inodes, which increases the merge rate when relatively contiguous small files are written by the filesystem. It helps both on flash and spindles. For an fs_mark workload creating 4K files in parallel across 8 drives, this commit improves performance ~9% more by unplugging before calling cond_resched(). cond_resched() doesn't trigger an implicit unplug, so explicitly getting the IO down to the device before scheduling reduces latencies for anyone waiting on clean pages. It also cuts down on how often we use kblockd to unplug, which means less work bouncing from one workqueue to another. Many more details about how we got here: https://lkml.org/lkml/2015/9/11/570 Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | userfaultfd: add missing mmput() in error pathEric Biggers2015-09-181-1/+3
| | | | | | | | | | | | | | | | | | This fixes a memleak if anon_inode_getfile() fails in userfaultfd(). Signed-off-by: Eric Biggers <ebiggers3@gmail.com> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds2015-09-142-1/+10
|\ \ | |/ |/| | | | | | | | | | | | | Pull CIFS fixes from Steve French: "Two small cifs fixes" * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: [CIFS] mount option sec=none not displayed properly in /proc/mounts CIFS: fix type confusion in copy offload ioctl
| * [CIFS] mount option sec=none not displayed properly in /proc/mountsSteve French2015-09-121-1/+4
| | | | | | | | | | | | | | | | | | | | | | When the user specifies "sec=none" in a cifs mount, we set sec_type as unspecified (and set a flag and the username will be null) rather than setting sectype as "none" so cifs_show_security was not properly displaying it in cifs /proc/mounts entries. Signed-off-by: Steve French <steve.french@primarydata.com> Reviewed-by: Jeff Layton <jlayton@poochiereds.net>
| * CIFS: fix type confusion in copy offload ioctlJann Horn2015-09-111-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This might lead to local privilege escalation (code execution as kernel) for systems where the following conditions are met: - CONFIG_CIFS_SMB2 and CONFIG_CIFS_POSIX are enabled - a cifs filesystem is mounted where: - the mount option "vers" was used and set to a value >=2.0 - the attacker has write access to at least one file on the filesystem To attack this, an attacker would have to guess the target_tcon pointer (but guessing wrong doesn't cause a crash, it just returns an error code) and win a narrow race. CC: Stable <stable@vger.kernel.org> Signed-off-by: Jann Horn <jann@thejh.net> Signed-off-by: Steve French <smfrench@gmail.com>
* | Merge branch 'writeback-plugging'Linus Torvalds2015-09-121-3/+10
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix up the writeback plugging introduced in commit d353d7587d02 ("writeback: plug writeback at a high level") that then caused problems due to the unplug happening with a spinlock held. * writeback-plugging: writeback: plug writeback in wb_writeback() and writeback_inodes_wb() Revert "writeback: plug writeback at a high level"
| * | writeback: plug writeback in wb_writeback() and writeback_inodes_wb()Linus Torvalds2015-09-121-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We had to revert the pluggin in writeback_sb_inodes() because the wb->list_lock is held, but we could easily plug at a higher level before taking that lock, and unplug after releasing it. This does that. Chris will run performance numbers, just to verify that this approach is comparable to the alternative (we could just drop and re-take the lock around the blk_finish_plug() rather than these two commits. I'd have preferred waiting for actual performance numbers before picking one approach over the other, but I don't want to release rc1 with the known "sleeping function called from invalid context" issue, so I'll pick this cleanup version for now. But if the numbers show that we really want to plug just at the writeback_sb_inodes() level, and we should just play ugly games with the spinlock, we'll switch to that. Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <jbacik@fb.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Neil Brown <neilb@suse.de> Cc: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | Revert "writeback: plug writeback at a high level"Linus Torvalds2015-09-111-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit d353d7587d02116b9732d5c06615aed75a4d3a47. Doing the block layer plug/unplug inside writeback_sb_inodes() is broken, because that function is actually called with a spinlock held: wb->list_lock, as pointed out by Chris Mason. Chris suggested just dropping and re-taking the spinlock around the blk_finish_plug() call (the plgging itself can happen under the spinlock), and that would technically work, but is just disgusting. We do something fairly similar - but not quite as disgusting because we at least have a better reason for it - in writeback_single_inode(), so it's not like the caller can depend on the lock being held over the call, but in this case there just isn't any good reason for that "release and re-take the lock" pattern. [ In general, we should really strive to avoid the "release and retake" pattern for locks, because in the general case it can easily cause subtle bugs when the caller caches any state around the call that might be invalidated by dropping the lock even just temporarily. ] But in this case, the plugging should be easy to just move up to the callers before the spinlock is taken, which should even improve the effectiveness of the plug. So there is really no good reason to play games with locking here. I'll send off a test-patch so that Dave Chinner can verify that that plug movement works. In the meantime this just reverts the problematic commit and adds a comment to the function so that we hopefully don't make this mistake again. Reported-by: Chris Mason <clm@fb.com> Cc: Josef Bacik <jbacik@fb.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Neil Brown <neilb@suse.de> Cc: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge branch 'akpm' (patches from Andrew)Linus Torvalds2015-09-123-42/+37
|\ \ \ | |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge fourth patch-bomb from Andrew Morton: - sys_membarier syscall - seq_file interface changes - a few misc fixups * emailed patches from Andrew Morton <akpm@linux-foundation.org>: revert "ocfs2/dlm: use list_for_each_entry instead of list_for_each" mm/early_ioremap: add explicit #include of asm/early_ioremap.h fs/seq_file: convert int seq_vprint/seq_printf/etc... returns to void selftests: enhance membarrier syscall test selftests: add membarrier syscall test sys_membarrier(): system-wide memory barrier (generic, x86) MODSIGN: fix a compilation warning in extract-cert
| * | revert "ocfs2/dlm: use list_for_each_entry instead of list_for_each"Andrew Morton2015-09-121-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Revert commit f83c7b5e9fd6 ("ocfs2/dlm: use list_for_each_entry instead of list_for_each"). list_for_each_entry() will dereference its `pos' argument, which can be NULL in dlm_process_recovery_data(). Reported-by: Julia Lawall <julia.lawall@lip6.fr> Reported-by: Fengguang Wu <fengguang.wu@gmail.com> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | fs/seq_file: convert int seq_vprint/seq_printf/etc... returns to voidJoe Perches2015-09-122-40/+33
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The seq_<foo> function return values were frequently misused. See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to seq_has_overflowed() and make public") All uses of these return values have been removed, so convert the return types to void. Miscellanea: o Move seq_put_decimal_<type> and seq_escape prototypes closer the other seq_vprintf prototypes o Reorder seq_putc and seq_puts to return early on overflow o Add argument names to seq_vprintf and seq_printf o Update the seq_escape kernel-doc o Convert a couple of leading spaces to tabs in seq_escape Signed-off-by: Joe Perches <joe@perches.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mark Brown <broonie@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Joerg Roedel <jroedel@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'for-linus-4.3' of ↵Linus Torvalds2015-09-119-96/+82
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs cleanups and fixes from Chris Mason: "These are small cleanups, and also some fixes for our async worker thread initialization. I was having some trouble testing these, but it ended up being a combination of changing around my test servers and a shiny new schedule while atomic from the new start/finish_plug in writeback_sb_inodes(). That one only hits on btrfs raid5/6 or MD raid10, and if I wasn't changing a bunch of things in my test setup at once it would have been really clear. Fix for writeback_sb_inodes() on the way as well" * 'for-linus-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: cleanup: remove unnecessary check before btrfs_free_path is called btrfs: async_thread: Fix workqueue 'max_active' value when initializing btrfs: Add raid56 support for updating num_tolerated_disk_barrier_failures in btrfs_balance btrfs: Cleanup for btrfs_calc_num_tolerated_disk_barrier_failures btrfs: Remove noused chunk_tree and chunk_objectid from scrub_enumerate_chunks and scrub_chunk btrfs: Update out-of-date "skip parity stripe" comment
| * | Btrfs: cleanup: remove unnecessary check before btrfs_free_path is calledTsutomu Itoh2015-08-313-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | We need not check path before btrfs_free_path() is called because path is checked in btrfs_free_path(). Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
| * | btrfs: async_thread: Fix workqueue 'max_active' value when initializingQu Wenruo2015-08-312-24/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At initializing time, for threshold-able workqueue, it's max_active of kernel workqueue should be 1 and grow if it hits threshold. But due to the bad naming, there is both 'max_active' for kernel workqueue and btrfs workqueue. So wrong value is given at workqueue initialization. This patch fixes it, and to avoid further misunderstanding, change the member name of btrfs_workqueue to 'current_active' and 'limit_active'. Also corresponding comment is added for readability. Reported-by: Alex Lyakas <alex.btrfs@zadarastorage.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
| * | btrfs: Add raid56 support for updatingZhao Lei2015-08-313-39/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | num_tolerated_disk_barrier_failures in btrfs_balance Code for updating fs_info->num_tolerated_disk_barrier_failures in btrfs_balance() lacks raid56 support. Reason: Above code was wroten in 2012-08-01, together with btrfs_calc_num_tolerated_disk_barrier_failures()'s first version. Then, btrfs_calc_num_tolerated_disk_barrier_failures() got updated later to support raid56, but code in btrfs_balance() was not updated together. Fix: Merge above similar code to a common function: btrfs_get_num_tolerated_disk_barrier_failures() and make it support both case. It can fix this bug with a bonus of cleanup, and make these code never in above no-sync state from now on. Suggested-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
| * | btrfs: Cleanup for btrfs_calc_num_tolerated_disk_barrier_failuresZhao Lei2015-08-311-40/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1: Use ARRAY_SIZE(types) to replace a static-value variant: int num_types = 4; 2: Use 'continue' on condition to reduce one level tab if (!XXX) { code; ... } -> if (XXX) continue; code; ... 3: Put setting 'num_tolerated_disk_barrier_failures = 2' to (num_tolerated_disk_barrier_failures > 2) condition to make make logic neat. if (num_tolerated_disk_barrier_failures > 0 && XXX) num_tolerated_disk_barrier_failures = 0; else if (num_tolerated_disk_barrier_failures > 1) { if (XXX) num_tolerated_disk_barrier_failures = 1; else if (XXX) num_tolerated_disk_barrier_failures = 2; -> if (num_tolerated_disk_barrier_failures > 0 && XXX) num_tolerated_disk_barrier_failures = 0; if (num_tolerated_disk_barrier_failures > 1 && XXX) num_tolerated_disk_barrier_failures = ; if (num_tolerated_disk_barrier_failures > 2 && XXX) num_tolerated_disk_barrier_failures = 2; 4: Remove comment of: num_mirrors - 1: if RAID1 or RAID10 is configured and more than 2 mirrors are used. which is not fit with code. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
| * | btrfs: Remove noused chunk_tree and chunk_objectid from ↵Zhao Lei2015-08-311-8/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | scrub_enumerate_chunks and scrub_chunk These variables are not used from introduced version, remove them. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
| * | btrfs: Update out-of-date "skip parity stripe" commentZhao Lei2015-08-311-1/+1
| | | | | | | | | | | | | | | | | | | | | Because btrfs support scrub raid56 parity stripe now. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
* | | Merge branch 'for-linus' of ↵Linus Torvalds2015-09-117-29/+67
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph update from Sage Weil: "There are a few fixes for snapshot behavior with CephFS and support for the new keepalive protocol from Zheng, a libceph fix that affects both RBD and CephFS, a few bug fixes and cleanups for RBD from Ilya, and several small fixes and cleanups from Jianpeng and others" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: improve readahead for file holes ceph: get inode size for each append write libceph: check data_len in ->alloc_msg() libceph: use keepalive2 to verify the mon session is alive rbd: plug rbd_dev->header.object_prefix memory leak rbd: fix double free on rbd_dev->header_name libceph: set 'exists' flag for newly up osd ceph: cleanup use of ceph_msg_get ceph: no need to get parent inode in ceph_open ceph: remove the useless judgement ceph: remove redundant test of head->safe and silence static analysis warnings ceph: fix queuing inode to mdsdir's snaprealm libceph: rename con_work() to ceph_con_workfn() libceph: Avoid holding the zero page on ceph_msgr_slab_init errors libceph: remove the unused macro AES_KEY_SIZE ceph: invalidate dirty pages after forced umount ceph: EIO all operations after forced umount
| * | | ceph: improve readahead for file holesYan, Zheng2015-09-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When readahead encounters file holes, osd reply returns error -ENOENT, finish_read() skips adding pages to the the page cache. So readahead does not work for file holes. The fix is adding zero pages to the page cache when -ENOENT is returned. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * | | ceph: get inode size for each append writeYan, Zheng2015-09-091-0/+6
| | | | | | | | | | | | | | | | Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * | | ceph: cleanup use of ceph_msg_getJianpeng Ma2015-09-081-2/+1
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com> Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * | | ceph: no need to get parent inode in ceph_openJianpeng Ma2015-09-081-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | parent inode is needed in creating new inode case. For ceph_open, the target inode already exists. Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com> Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * | | ceph: remove the useless judgementJianpeng Ma2015-09-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | err != 0 is already handled. So skip this. Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com> Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * | | ceph: remove redundant test of head->safe and silence static analysis warningsBrad Hubbard2015-09-081-1/+1
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Brad Hubbard <bhubbard@redhat.com> Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * | | ceph: fix queuing inode to mdsdir's snaprealmYan, Zheng2015-09-081-7/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During MDS failovers, MClientSnap message may cause kclient to move some inodes from root directory's snaprealm to mdsdir's snaprealm and queue snapshots for these inodes. For a FS has never created any snapshot, both root directory's snaprealm and mdsdir's snaprealm share the same snapshot contexts (both are ceph_empty_snapc). This confuses ceph_put_wrbuffer_cap_refs(), make it unable to distinguish snapshot buffers from head buffers. The fix is do not use ceph_empty_snapc as snaprealm's cached context. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * | | ceph: invalidate dirty pages after forced umountYan, Zheng2015-09-081-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After forced umount, ceph_writepages_start() skips flushing dirty pages. To make sure inode's reference count get dropped to zero, we need to invalidate dirty pages. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * | | ceph: EIO all operations after forced umountYan, Zheng2015-09-085-12/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes try_get_cap_refs() and __do_request() check if the file system was forced umount, and return -EIO if it was. This patch also adds a helper function to drops dirty caps and wakes up blocking operation. Signed-off-by: Yan, Zheng <zyan@redhat.com>
* | | | Merge tag 'gfs2-merge-window' of ↵Linus Torvalds2015-09-1111-285/+212
|\ \ \ \ | |_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull GFS2 updates from Bob Peterson: "Here is a list of patches we've accumulated for GFS2 for the current upstream merge window. This time we've only got six patches, many of which are very small: - three cleanups from Andreas Gruenbacher, including a nice cleanup of the sequence file code for the sbstats debugfs file. - a patch from Ben Hutchings that changes statistics variables from signed to unsigned. - two patches from me that increase GFS2's glock scalability by switching from a conventional hash table to rhashtable" * tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: A minor "sbstats" cleanup gfs2: Fix a typo in a comment gfs2: Make statistics unsigned, suitable for use with do_div() GFS2: Use resizable hash table for glocks GFS2: Move glock superblock pointer to field gl_name gfs2: Simplify the seq file code for "sbstats"
| * | | gfs2: A minor "sbstats" cleanupAndreas Gruenbacher2015-09-031-7/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | It seems cleaner to avoid the temporary value here. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
| * | | gfs2: Fix a typo in a commentAndreas Gruenbacher2015-09-031-1/+1
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
| * | | gfs2: Make statistics unsigned, suitable for use with do_div()Ben Hutchings2015-09-034-24/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | None of these statistics can meaningfully be negative, and the numerator for do_div() must have the type u64. The generic implementation of do_div() used on some 32-bit architectures asserts that, resulting in a compiler error in gfs2_rgrp_congested(). Fixes: 0166b197c2ed ("GFS2: Average in only non-zero round-trip times ...") Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Acked-by: Andreas Gruenbacher <agruenba@redhat.com>
| * | | GFS2: Use resizable hash table for glocksBob Peterson2015-09-032-166/+102
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch changes the glock hash table from a normal hash table to a resizable hash table, which scales better. This also simplifies a lot of code. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com>
| * | | GFS2: Move glock superblock pointer to field gl_nameBob Peterson2015-09-0311-74/+75
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | What uniquely identifies a glock in the glock hash table is not gl_name, but gl_name and its superblock pointer. This patch makes the gl_name field correspond to a unique glock identifier. That will allow us to simplify hashing with a future patch, since the hash algorithm can then take the gl_name and hash its components in one operation. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com>
| * | | gfs2: Simplify the seq file code for "sbstats"Andreas Gruenbacher2015-09-031-20/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't use struct gfs2_glock_iter as the helper data structure for iterating through "sbstats"; we are not iterating through glocks here. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
* | | | Merge branch 'for-4.3/blkcg' of git://git.kernel.dk/linux-blockLinus Torvalds2015-09-112-96/+66
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull blk-cg updates from Jens Axboe: "A bit later in the cycle, but this has been in the block tree for a a while. This is basically four patchsets from Tejun, that improve our buffered cgroup writeback. It was dependent on the other cgroup changes, but they went in earlier in this cycle. Series 1 is set of 5 patches that has cgroup writeback updates: - bdi_writeback iteration fix which could lead to some wb's being skipped or repeated during e.g. sync under memory pressure. - Simplification of wb work wait mechanism. - Writeback tracepoints updated to report cgroup. Series 2 is is a set of updates for the CFQ cgroup writeback handling: cfq has always charged all async IOs to the root cgroup. It didn't have much choice as writeback didn't know about cgroups and there was no way to tell who to blame for a given writeback IO. writeback finally grew support for cgroups and now tags each writeback IO with the appropriate cgroup to charge it against. This patchset updates cfq so that it follows the blkcg each bio is tagged with. Async cfq_queues are now shared across cfq_group, which is per-cgroup, instead of per-request_queue cfq_data. This makes all IOs follow the weight based IO resource distribution implemented by cfq. - Switched from GFP_ATOMIC to GFP_NOWAIT as suggested by Jeff. - Other misc review points addressed, acks added and rebased. Series 3 is the blkcg policy cleanup patches: This patchset contains assorted cleanups for blkcg_policy methods and blk[c]g_policy_data handling. - alloc/free added for blkg_policy_data. exit dropped. - alloc/free added for blkcg_policy_data. - blk-throttle's async percpu allocation is replaced with direct allocation. - all methods now take blk[c]g_policy_data instead of blkcg_gq or blkcg. And finally, series 4 is a set of patches cleaning up the blkcg stats handling: blkcg's stats have always been somwhat of a mess. This patchset tries to improve the situation a bit. - The following patches added to consolidate blkcg entry point and blkg creation. This is in itself is an improvement and helps colllecting common stats on bio issue. - per-blkg stats now accounted on bio issue rather than request completion so that bio based and request based drivers can behave the same way. The issue was spotted by Vivek. - cfq-iosched implements custom recursive stats and blk-throttle implements custom per-cpu stats. This patchset make blkcg core support both by default. - cfq-iosched and blk-throttle keep track of the same stats multiple times. Unify them" * 'for-4.3/blkcg' of git://git.kernel.dk/linux-block: (45 commits) blkcg: use CGROUP_WEIGHT_* scale for io.weight on the unified hierarchy blkcg: s/CFQ_WEIGHT_*/CFQ_WEIGHT_LEGACY_*/ blkcg: implement interface for the unified hierarchy blkcg: misc preparations for unified hierarchy interface blkcg: separate out tg_conf_updated() from tg_set_conf() blkcg: move body parsing from blkg_conf_prep() to its callers blkcg: mark existing cftypes as legacy blkcg: rename subsystem name from blkio to io blkcg: refine error codes returned during blkcg configuration blkcg: remove unnecessary NULL checks from __cfqg_set_weight_device() blkcg: reduce stack usage of blkg_rwstat_recursive_sum() blkcg: remove cfqg_stats->sectors blkcg: move io_service_bytes and io_serviced stats into blkcg_gq blkcg: make blkg_[rw]stat_recursive_sum() to be able to index into blkcg_gq blkcg: make blkcg_[rw]stat per-cpu blkcg: add blkg_[rw]stat->aux_cnt and replace cfq_group->dead_stats with it blkcg: consolidate blkg creation in blkcg_bio_issue_check() blk-throttle: improve queue bypass handling blkcg: move root blkg lookup optimization from throtl_lookup_tg() to __blkg_lookup() blkcg: inline [__]blkg_lookup() ...
| * | | | writeback: update writeback tracepoints to report cgroupTejun Heo2015-08-191-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following tracepoints are updated to report the cgroup used during cgroup writeback. * writeback_write_inode[_start] * writeback_queue * writeback_exec * writeback_start * writeback_written * writeback_wait * writeback_nowork * writeback_wake_background * wbc_writepage * writeback_queue_io * bdi_dirty_ratelimit * balance_dirty_pages * writeback_sb_inodes_requeue * writeback_single_inode[_start] Note that writeback_bdi_register is separated out from writeback_class as reporting cgroup doesn't make sense to it. Tracepoints which take bdi are updated to take bdi_writeback instead. Signed-off-by: Tejun Heo <tj@kernel.org> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | | kernfs: implement kernfs_path_len()Tejun Heo2015-08-191-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a function to determine the path length of a kernfs node. This for now will be used by writeback tracepoint updates. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jens Axboe <axboe@fb.com>