summaryrefslogtreecommitdiffstats
path: root/fs (follow)
Commit message (Collapse)AuthorAgeFilesLines
* afs: Mark afs_net::ws_cell as __rcu and set using rcu functionsDavid Howells2018-05-233-6/+6
| | | | | | | | | | | | | | | | | The afs_net::ws_cell member is sometimes used under RCU conditions from within an seq-readlock. It isn't, however, marked __rcu and it isn't set using the proper RCU barrier-imposing functions. Fix this by annotating it with __rcu and using appropriate barriers to make sure accesses are correctly ordered. Without this, the code can produce the following warning: >> fs/afs/proc.c:151:24: sparse: incompatible types in comparison expression (different address spaces) Fixes: f044c8847bb6 ("afs: Lay the groundwork for supporting network namespaces") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: David Howells <dhowells@redhat.com>
* afs: Fix a Sparse warning in xdr_decode_AFSFetchStatus()David Howells2018-05-231-42/+55
| | | | | | | | | | | | | | Sparse doesn't appear able to handle the conditionally-taken locks in xdr_decode_AFSFetchStatus(), even though the lock and unlock are both contingent on the same unvarying function argument. Deal with this by interpolating a wrapper function that takes the lock if needed and calls xdr_decode_AFSFetchStatus() on two separate branches, one with the lock held and one without. This allows Sparse to work out the locking. Signed-off-by: David Howells <dhowells@redhat.com>
* proc: Add a way to make network proc files writableDavid Howells2018-05-183-0/+118
| | | | | | | | | | | | Provide two extra functions, proc_create_net_data_write() and proc_create_net_single_write() that act like their non-write versions but also set a write method in the proc_dir_entry struct. An internal simple write function is provided that will copy its buffer and hand it to the pde->write() method if available (or give an error if not). The buffer may be modified by the write method. Signed-off-by: David Howells <dhowells@redhat.com>
* afs: Rearrange fs/afs/proc.c to remove remaining predeclarations.David Howells2018-05-181-192/+160
| | | | | | Rearrange fs/afs/proc.c to get rid of all the remaining predeclarations. Signed-off-by: David Howells <dhowells@redhat.com>
* afs: Rearrange fs/afs/proc.c to move the show routines upDavid Howells2018-05-181-75/+75
| | | | | | | Rearrange fs/afs/proc.c to move the show routines up to the top of each block so the order is show, iteration, ops, file ops, fops. Signed-off-by: David Howells <dhowells@redhat.com>
* afs: Rearrange fs/afs/proc.c by moving fops and open functions downDavid Howells2018-05-181-44/+27
| | | | | | | Rearrange fs/afs/proc.c by moving fops and open functions down so as to remove predeclarations. Signed-off-by: David Howells <dhowells@redhat.com>
* afs: Move /proc management functions to the end of the fileDavid Howells2018-05-181-81/+79
| | | | | | | | In fs/afs/proc.c, move functions that create and remove /proc files to the end of the source file as a first stage in getting rid of all the forward declarations. Signed-off-by: David Howells <dhowells@redhat.com>
* proc: update SIZEOF_PDE_INLINE_NAME for the new pde fieldsChristoph Hellwig2018-05-161-2/+2
| | | | | | | This makes Alexey happy and Al groan. Based on a patch from Alexey Dobriyan. Signed-off-by: Christoph Hellwig <hch@lst.de>
* tty: replace ->proc_fops with ->proc_showChristoph Hellwig2018-05-161-3/+3
| | | | | | | | Just set up the show callback in the tty_operations, and use proc_create_single_data to create the file without additional boilerplace code. Signed-off-by: Christoph Hellwig <hch@lst.de>
* jfs: simplify procfs codeChristoph Hellwig2018-05-166-99/+24
| | | | | | | | Use remove_proc_subtree to remove the whole subtree on cleanup, and unwind the registration loop into individual calls. Switch to use proc_create_seq where applicable. Signed-off-by: Christoph Hellwig <hch@lst.de>
* ext4: simplify procfs codeChristoph Hellwig2018-05-163-66/+14
| | | | | | | | Use remove_proc_subtree to remove the whole subtree on cleanup, and unwind the registration loop into individual calls. Switch to use proc_create_seq where applicable. Signed-off-by: Christoph Hellwig <hch@lst.de>
* afs: simplify procfs codeChristoph Hellwig2018-05-161-119/+15
| | | | | | | | Use remove_proc_subtree to remove the whole subtree on cleanup, and unwind the registration loop into individual calls. Switch to use proc_create_seq where applicable. Signed-off-by: Christoph Hellwig <hch@lst.de>
* proc: introduce proc_create_net_singleChristoph Hellwig2018-05-161-18/+31
| | | | | | | | | Variant of proc_create_data that directly take a seq_file show callback and deals with network namespaces in ->open and ->release. All callers of proc_create + single_open_net converted over, and single_{open,release}_net are removed entirely. Signed-off-by: Christoph Hellwig <hch@lst.de>
* proc: introduce proc_create_net{,_data}Christoph Hellwig2018-05-162-60/+44
| | | | | | | | | Variants of proc_create{,_data} that directly take a struct seq_operations and deal with network namespaces in ->open and ->release. All callers of proc_create + seq_open_net converted over, and seq_{open,release}_net are removed entirely. Signed-off-by: Christoph Hellwig <hch@lst.de>
* proc: introduce proc_create_single{,_data}Christoph Hellwig2018-05-1616-191/+55
| | | | | | | | | Variants of proc_create{,_data} that directly take a seq_file show callback and drastically reduces the boilerplate code in the callers. All trivial callers converted over. Signed-off-by: Christoph Hellwig <hch@lst.de>
* proc: introduce proc_create_seq_privateChristoph Hellwig2018-05-163-17/+9
| | | | | | | | | | Variant of proc_create_data that directly take a struct seq_operations argument + a private state size and drastically reduces the boilerplate code in the callers. All trivial callers converted over. Signed-off-by: Christoph Hellwig <hch@lst.de>
* proc: introduce proc_create_seq{,_data}Christoph Hellwig2018-05-1611-102/+44
| | | | | | | | | Variants of proc_create{,_data} that directly take a struct seq_operations argument and drastically reduces the boilerplate code in the callers. All trivial callers converted over. Signed-off-by: Christoph Hellwig <hch@lst.de>
* proc: add a proc_create_reg helperChristoph Hellwig2018-05-162-19/+27
| | | | | | | Common code for creating a regular file. Factor out of proc_create_data, to be reused by other functions soon. Signed-off-by: Christoph Hellwig <hch@lst.de>
* proc: simplify proc_register calling conventionsChristoph Hellwig2018-05-162-26/+20
| | | | | | | | Return registered entry on success, return NULL on failure and free the passed in entry. Also expose it in internal.h as we'll start using it in proc_net.c soon. Signed-off-by: Christoph Hellwig <hch@lst.de>
* proc: don't detour through seq->private to get the inodeChristoph Hellwig2018-05-161-14/+6
| | | | Signed-off-by: Christoph Hellwig <hch@lst.de>
* proc: introduce a proc_pid_ns helperChristoph Hellwig2018-05-164-20/+13
| | | | | | | | Factor out retrieving the per-sb pid namespaces from the sb private data into an easier to understand helper. Suggested-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* Merge tag '4.17-rc4-SMB3-Fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds2018-05-134-43/+57
|\ | | | | | | | | | | | | | | | | | | | | Pull cifs fixes from Steve French: "Some small SMB3 fixes for 4.17-rc5, some for stable" * tag '4.17-rc4-SMB3-Fixes' of git://git.samba.org/sfrench/cifs-2.6: smb3: directory sync should not return an error cifs: smb2ops: Fix listxattr() when there are no EAs cifs: smbd: Enable signing with smbdirect cifs: Allocate validate negotiation request through kmalloc
| * smb3: directory sync should not return an errorSteve French2018-05-111-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | As with NFS, which ignores sync on directory handles, fsync on a directory handle is a noop for CIFS/SMB3. Do not return an error on it. It breaks some database apps otherwise. Signed-off-by: Steve French <smfrench@gmail.com> CC: Stable <stable@vger.kernel.org> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
| * cifs: smb2ops: Fix listxattr() when there are no EAsPaulo Alcantara2018-05-091-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As per listxattr(2): On success, a nonnegative number is returned indicating the size of the extended attribute name list. On failure, -1 is returned and errno is set appropriately. In SMB1, when the server returns an empty EA list through a listxattr(), it will correctly return 0 as there are no EAs for the given file. However, in SMB2+, it returns -ENODATA in listxattr() which is wrong since the request and response were sent successfully, although there's no actual EA for the given file. This patch fixes listxattr() for SMB2+ by returning 0 in cifs_listxattr() when the server returns an empty list of EAs. Signed-off-by: Paulo Alcantara <palcantara@suse.de> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <smfrench@gmail.com>
| * cifs: smbd: Enable signing with smbdirectLong Li2018-05-092-13/+0
| | | | | | | | | | | | | | | | | | | | Now signing is supported with RDMA transport. Remove the code that disabled it. Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
| * cifs: Allocate validate negotiation request through kmallocLong Li2018-05-091-30/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The data buffer allocated on the stack can't be DMA'ed, ib_dma_map_page will return an invalid DMA address for a buffer on stack. Even worse, this incorrect address can't be detected by ib_dma_mapping_error. Sending data from this address to hardware will not fail, but the remote peer will get junk data. Fix this by allocating the request on the heap in smb3_validate_negotiate. Changes in v2: Removed duplicated code on freeing buffers on function exit. (Thanks to Parav Pandit <parav@mellanox.com>) Fixed typo in the patch title. Changes in v3: Added "Fixes" to the patch. Changed several sizeof() to use *pointer in place of struct. Changes in v4: Added detailed comments on the failure through RDMA. Allocate request buffer using GPF_NOFS. Fixed possible memory leak. Changes in v5: Removed variable ret for checking return value. Changed to use pneg_inbuf->Dialects[0] to calculate unused space in pneg_inbuf. Fixes: ff1c038addc4 ("Check SMB3 dialects against downgrade attacks") Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Tom Talpey <ttalpey@microsoft.com>
* | Merge branch 'akpm' (patches from Andrew)Linus Torvalds2018-05-122-9/+28
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge misc fixes from Andrew Morton: "13 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: rbtree: include rcu.h scripts/faddr2line: fix error when addr2line output contains discriminator ocfs2: take inode cluster lock before moving reflinked inode from orphan dir mm, oom: fix concurrent munlock and oom reaper unmap, v3 mm: migrate: fix double call of radix_tree_replace_slot() proc/kcore: don't bounds check against address 0 mm: don't show nr_indirectly_reclaimable in /proc/vmstat mm: sections are not offlined during memory hotremove z3fold: fix reclaim lock-ups init: fix false positives in W+X checking lib/find_bit_benchmark.c: avoid soft lockup in test_find_first_bit() KASAN: prohibit KASAN+STRUCTLEAK combination MAINTAINERS: update Shuah's email address
| * | ocfs2: take inode cluster lock before moving reflinked inode from orphan dirAshish Samant2018-05-121-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While reflinking an inode, we create a new inode in orphan directory, then take EX lock on it, reflink the original inode to orphan inode and release EX lock. Once the lock is released another node could request it in EX mode from ocfs2_recover_orphans() which causes downconvert of the lock, on this node, to NL mode. Later we attempt to initialize security acl for the orphan inode and move it to the reflink destination. However, while doing this we dont take EX lock on the inode. This could potentially cause problems because we could be starting transaction, accessing journal and modifying metadata of the inode while holding NL lock and with another node holding EX lock on the inode. Fix this by taking orphan inode cluster lock in EX mode before initializing security and moving orphan inode to reflink destination. Use the __tracker variant while taking inode lock to avoid recursive locking in the ocfs2_init_security_and_acl() call chain. Link: http://lkml.kernel.org/r/1523475107-7639-1-git-send-email-ashish.samant@oracle.com Signed-off-by: Ashish Samant <ashish.samant@oracle.com> Reviewed-by: Joseph Qi <jiangqi903@gmail.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Acked-by: Jun Piao <piaojun@huawei.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | proc/kcore: don't bounds check against address 0Laura Abbott2018-05-121-7/+16
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The existing kcore code checks for bad addresses against __va(0) with the assumption that this is the lowest address on the system. This may not hold true on some systems (e.g. arm64) and produce overflows and crashes. Switch to using other functions to validate the address range. It's currently only seen on arm64 and it's not clear if anyone wants to use that particular combination on a stable release. So this is not urgent for stable. Link: http://lkml.kernel.org/r/20180501201143.15121-1-labbott@redhat.com Signed-off-by: Laura Abbott <labbott@redhat.com> Tested-by: Dave Anderson <anderson@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Alexey Dobriyan <adobriyan@gmail.com>a Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | ceph: fix iov_iter issues in ceph_direct_read_write()Ilya Dryomov2018-05-101-78/+117
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dio_get_pagev_size() and dio_get_pages_alloc() introduced in commit b5b98989dc7e ("ceph: combine as many iovec as possile into one OSD request") assume that the passed iov_iter is ITER_IOVEC. This isn't the case with splice where it ends up poking into the guts of ITER_BVEC or ITER_PIPE iterators, causing lockups and crashes easily reproduced with generic/095. Rather than trying to figure out gap alignment and stuff pages into a page vector, add a helper for going from iov_iter to a bio_vec array and make use of the new CEPH_OSD_DATA_TYPE_BVECS code. Fixes: b5b98989dc7e ("ceph: combine as many iovec as possile into one OSD request") Link: http://tracker.ceph.com/issues/18130 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Tested-by: Luis Henriques <lhenriques@suse.com>
* | ceph: fix rsize/wsize capping in ceph_direct_read_write()Ilya Dryomov2018-05-101-5/+5
|/ | | | | | | | | | | | | rsize/wsize cap should be applied before ceph_osdc_new_request() is called. Otherwise, if the size is limited by the cap instead of the stripe unit, ceph_osdc_new_request() would setup an extent op that is bigger than what dio_get_pages_alloc() would pin and add to the page vector, triggering asserts in the messenger. Cc: stable@vger.kernel.org Fixes: 95cca2b44e54 ("ceph: limit osd write size") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
* Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds2018-05-051-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull rdma fixes from Doug Ledford: "This is our first pull request of the rc cycle. It's not that it's been overly quiet, we were just waiting on a few things before sending this off. For instance, the 6 patch series from Intel for the hfi1 driver had actually been pulled in on Tuesday for a Wednesday pull request, only to have Jason notice something I missed, so we held off for some testing, and then on Thursday had to respin the series because the very first patch needed a minor fix (unnecessary cast is all). There is a sizable hns patch series in here, as well as a reasonably largish hfi1 patch series, then all of the lines of uapi updates are just the change to the new official Linux-OpenIB SPDX tag (a bunch of our files had what amounts to a BSD-2-Clause + MIT Warranty statement as their license as a result of the initial code submission years ago, and the SPDX folks decided it was unique enough to warrant a unique tag), then the typical mlx4 and mlx5 updates, and finally some cxgb4 and core/cache/cma updates to round out the bunch. None of it was overly large by itself, but in the 2 1/2 weeks we've been collecting patches, it has added up :-/. As best I can tell, it's been through 0day (I got a notice about my last for-next push, but not for my for-rc push, but Jason seems to think that failure messages are prioritized and success messages not so much). It's also been through linux-next. And yes, we did notice in the context portion of the CMA query gid fix patch that there is a dubious BUG_ON() in the code, and have plans to audit our BUG_ON usage and remove it anywhere we can. Summary: - Various build fixes (USER_ACCESS=m and ADDR_TRANS turned off) - SPDX license tag cleanups (new tag Linux-OpenIB) - RoCE GID fixes related to default GIDs - Various fixes to: cxgb4, uverbs, cma, iwpm, rxe, hns (big batch), mlx4, mlx5, and hfi1 (medium batch)" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (52 commits) RDMA/cma: Do not query GID during QP state transition to RTR IB/mlx4: Fix integer overflow when calculating optimal MTT size IB/hfi1: Fix memory leak in exception path in get_irq_affinity() IB/{hfi1, rdmavt}: Fix memory leak in hfi1_alloc_devdata() upon failure IB/hfi1: Fix NULL pointer dereference when invalid num_vls is used IB/hfi1: Fix loss of BECN with AHG IB/hfi1 Use correct type for num_user_context IB/hfi1: Fix handling of FECN marked multicast packet IB/core: Make ib_mad_client_id atomic iw_cxgb4: Atomically flush per QP HW CQEs IB/uverbs: Fix kernel crash during MR deregistration flow IB/uverbs: Prevent reregistration of DM_MR to regular MR RDMA/mlx4: Add missed RSS hash inner header flag RDMA/hns: Fix a couple misspellings RDMA/hns: Submit bad wr RDMA/hns: Update assignment method for owner field of send wqe RDMA/hns: Adjust the order of cleanup hem table RDMA/hns: Only assign dqpn if IB_QP_PATH_DEST_QPN bit is set RDMA/hns: Remove some unnecessary attr_mask judgement RDMA/hns: Only assign mtu if IB_QP_PATH_MTU bit is set ...
| * cifs: smbd: depend on INFINIBAND_ADDR_TRANSGreg Thelen2018-04-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | CIFS_SMB_DIRECT code depends on INFINIBAND_ADDR_TRANS provided symbols. So declare the kconfig dependency. This is necessary to allow for enabling INFINIBAND without INFINIBAND_ADDR_TRANS. Signed-off-by: Greg Thelen <gthelen@google.com> Cc: Tarick Bedeir <tarick@google.com> Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* | Merge tag 'for-linus-20180504' of git://git.kernel.dk/linux-blockLinus Torvalds2018-05-051-1/+1
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block fixes from Jens Axboe: "A collection of fixes that should to into this release. This contains: - Set of bcache fixes from Coly, fixing regression in patches that went into this series. - Set of NVMe fixes by way of Keith. - Set of bdi related fixes, one from Jan and two from Tetsuo Handa, fixing various issues around device addition/removal. - Two block inflight fixes from Omar, fixing issues around the transition to using tags for blk-mq inflight accounting that we did a few releases ago" * tag 'for-linus-20180504' of git://git.kernel.dk/linux-block: bdi: Fix oops in wb_workfn() nvmet: switch loopback target state to connecting when resetting nvme/multipath: Fix multipath disabled naming collisions nvme/multipath: Disable runtime writable enabling parameter nvme: Set integrity flag for user passthrough commands nvme: fix potential memory leak in option parsing bdi: Fix use after free bug in debugfs_remove() bdi: wake up concurrent wb_shutdown() callers. bcache: use pr_info() to inform duplicated CACHE_SET_IO_DISABLE set bcache: set dc->io_disable to true in conditional_stop_bcache_device() bcache: add wait_for_kthread_stop() in bch_allocator_thread() bcache: count backing device I/O error for writeback I/O bcache: set CACHE_SET_IO_DISABLE in bch_cached_dev_error() bcache: store disk name in struct cache and struct cached_dev blk-mq: fix sysfs inflight counter blk-mq: count allocated but not started requests in iostats inflight
| * | bdi: Fix oops in wb_workfn()Jan Kara2018-05-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Syzbot has reported that it can hit a NULL pointer dereference in wb_workfn() due to wb->bdi->dev being NULL. This indicates that wb_workfn() was called for an already unregistered bdi which should not happen as wb_shutdown() called from bdi_unregister() should make sure all pending writeback works are completed before bdi is unregistered. Except that wb_workfn() itself can requeue the work with: mod_delayed_work(bdi_wq, &wb->dwork, 0); and if this happens while wb_shutdown() is waiting in: flush_delayed_work(&wb->dwork); the dwork can get executed after wb_shutdown() has finished and bdi_unregister() has cleared wb->bdi->dev. Make wb_workfn() use wakeup_wb() for requeueing the work which takes all the necessary precautions against racing with bdi unregistration. CC: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> CC: Tejun Heo <tj@kernel.org> Fixes: 839a8e8660b6777e7fe4e80af1a048aebe2b5977 Reported-by: syzbot <syzbot+9873874c735f2892e7e9@syzkaller.appspotmail.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | Merge tag 'xfs-4.17-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds2018-05-051-0/+10
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull xfs fixes from Darrick Wong: "I've got one more bug fix for xfs for 4.17-rc4, which caps the amount of data we try to handle in one dedupe request so that userspace can't livelock the kernel. This series has been run through a full xfstests run during the week and through a quick xfstests run against this morning's master, with no ajor failures reported. Summary: - Cap the maximum length of a deduplication request at MAX_RW_COUNT/2 to avoid kernel livelock due to excessively large IO requests" * tag 'xfs-4.17-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: cap the length of deduplication requests
| * | | xfs: cap the length of deduplication requestsDarrick J. Wong2018-05-021-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since deduplication potentially has to read in all the pages in both files in order to compare the contents, cap the deduplication request length at MAX_RW_COUNT/2 (roughly 1GB) so that we have /some/ upper bound on the request length and can't just lock up the kernel forever. Found by running generic/304 after commit 1ddae54555b62 ("common/rc: add missing 'local' keywords"). Reported-by: matorola@gmail.com Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
* | | | Merge tag 'for-4.17-rc3-tag' of ↵Linus Torvalds2018-05-053-1/+12
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "Two regression fixes and one fix for stable" * tag 'for-4.17-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Btrfs: send, fix missing truncate for inode with prealloc extent past eof btrfs: Take trans lock before access running trans in check_delayed_ref btrfs: Fix wrong first_key parameter in replace_path
| * | | | Btrfs: send, fix missing truncate for inode with prealloc extent past eofFilipe Manana2018-05-021-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An incremental send operation can miss a truncate operation when an inode has an increased size in the send snapshot and a prealloc extent beyond its size. Consider the following scenario where a necessary truncate operation is missing in the incremental send stream: 1) In the parent snapshot an inode has a size of 1282957 bytes and it has no prealloc extents beyond its size; 2) In the the send snapshot it has a size of 5738496 bytes and has a new extent at offsets 1884160 (length of 106496 bytes) and a prealloc extent beyond eof at offset 6729728 (and a length of 339968 bytes); 3) When processing the prealloc extent, at offset 6729728, we end up at send.c:send_write_or_clone() and set the @len variable to a value of 18446744073708560384 because @offset plus the original @len value is larger then the inode's size (6729728 + 339968 > 5738496). We then call send_extent_data(), with that @offset and @len, which in turn calls send_write(), and then the later calls fill_read_buf(). Because the offset passed to fill_read_buf() is greater then inode's i_size, this function returns 0 immediately, which makes send_write() and send_extent_data() do nothing and return immediately as well. When we get back to send.c:send_write_or_clone() we adjust the value of sctx->cur_inode_next_write_offset to @offset plus @len, which corresponds to 6729728 + 18446744073708560384 = 5738496, which is precisely the the size of the inode in the send snapshot; 4) Later when at send.c:finish_inode_if_needed() we determine that we don't need to issue a truncate operation because the value of sctx->cur_inode_next_write_offset corresponds to the inode's new size, 5738496 bytes. This is wrong because the last write operation that was issued started at offset 1884160 with a length of 106496 bytes, so the correct value for sctx->cur_inode_next_write_offset should be 1990656 (1884160 + 106496), so that a truncate operation with a value of 5738496 bytes would have been sent to insert a trailing hole at the destination. So fix the issue by making send.c:send_write_or_clone() not attempt to send write or clone operations for extents that start beyond the inode's size, since such attempts do nothing but waste time by calling helper functions and allocating path structures, and send currently has no fallocate command in order to create prealloc extents at the destination (either beyond a file's eof or not). The issue was found running the test btrfs/007 from fstests using a seed value of 1524346151 for fsstress. Reported-by: Gu, Jinxiang <gujx@cn.fujitsu.com> Fixes: ffa7c4296e93 ("Btrfs: send, do not issue unnecessary truncate operations") Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | btrfs: Take trans lock before access running trans in check_delayed_refethanwu2018-05-021-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preivous patch: Btrfs: kill trans in run_delalloc_nocow and btrfs_cross_ref_exist We avoid starting btrfs transaction and get this information from fs_info->running_transaction directly. When accessing running_transaction in check_delayed_ref, there's a chance that current transaction will be freed by commit transaction after the NULL pointer check of running_transaction is passed. After looking all the other places using fs_info->running_transaction, they are either protected by trans_lock or holding the transactions. Fix this by using trans_lock and increasing the use_count. Fixes: e4c3b2dcd144 ("Btrfs: kill trans in run_delalloc_nocow and btrfs_cross_ref_exist") CC: stable@vger.kernel.org # 4.14+ Signed-off-by: ethanwu <ethanwu@synology.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | btrfs: Fix wrong first_key parameter in replace_pathQu Wenruo2018-04-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 581c1760415c ("btrfs: Validate child tree block's level and first key") introduced new @first_key parameter for read_tree_block(), however caller in replace_path() is parasing wrong key to read_tree_block(). It should use parameter @first_key other than @key. Normally it won't expose problem as @key is normally initialzied to the same value of @first_key we expect. However in relocation recovery case, @key can be set to (0, 0, 0), and since no valid key in relocation tree can be (0, 0, 0), it will cause read_tree_block() to return -EUCLEAN and interrupt relocation recovery. Fix it by setting @first_key correctly. Fixes: 581c1760415c ("btrfs: Validate child tree block's level and first key") Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
* | | | | Merge tag 'xfs-4.17-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds2018-05-014-6/+42
|\ \ \ \ \ | | |/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull xfs fixes from Darrick Wong: "Here are a few more bug fixes for xfs for 4.17-rc4. Most of them are fixes for bad behavior. This series has been run through a full xfstests run during LSF and through a quick xfstests run against this morning's master, with no major failures reported. Summary: - Enhance inode fork verifiers to prevent loading of corrupted metadata. - Fix a crash when we try to convert extents format inodes to btree format, we run out of space, but forget to revert the in-core state changes. - Fix file size checks when doing INSERT_RANGE that could cause files to end up negative size if there previously was an extent mapped at s_maxbytes. - Fix a bug when doing a remove-then-add ATTR_REPLACE xattr update where we forget to clear ATTR_REPLACE after the remove, which causes the attr to be lost and the fs to shut down due to (what it thinks is) inconsistent in-core state" * tag 'xfs-4.17-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: don't fail when converting shortform attr to long form during ATTR_REPLACE xfs: prevent creating negative-sized file via INSERT_RANGE xfs: set format back to extents if xfs_bmap_extents_to_btree xfs: enhance dinode verifier
| * | | | xfs: don't fail when converting shortform attr to long form during ATTR_REPLACEDarrick J. Wong2018-04-181-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Kanda Motohiro reported that expanding a tiny xattr into a large xattr fails on XFS because we remove the tiny xattr from a shortform fork and then try to re-add it after converting the fork to extents format having not removed the ATTR_REPLACE flag. This fails because the attr is no longer present, causing a fs shutdown. This is derived from the patch in his bug report, but we really shouldn't ignore a nonzero retval from the remove call. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199119 Reported-by: kanda.motohiro@gmail.com Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * | | | xfs: prevent creating negative-sized file via INSERT_RANGEDarrick J. Wong2018-04-181-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During the "insert range" fallocate operation, i_size grows by the specified 'len' bytes. XFS verifies that i_size + len < s_maxbytes, as it should. But this comparison is done using the signed 'loff_t', and 'i_size + len' can wrap around to a negative value, causing the check to incorrectly pass, resulting in an inode with "negative" i_size. This is possible on 64-bit platforms, where XFS sets s_maxbytes = LLONG_MAX. ext4 and f2fs don't run into this because they set a smaller s_maxbytes. Fix it by using subtraction instead. Reproducer: xfs_io -f file -c "truncate $(((1<<63)-1))" -c "finsert 0 4096" Fixes: a904b1ca5751 ("xfs: Add support FALLOC_FL_INSERT_RANGE for fallocate") Cc: <stable@vger.kernel.org> # v4.1+ Originally-From: Eric Biggers <ebiggers@google.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: fix signed integer addition overflow too] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * | | | xfs: set format back to extents if xfs_bmap_extents_to_btreeEric Sandeen2018-04-181-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If xfs_bmap_extents_to_btree fails in a mode where we call xfs_iroot_realloc(-1) to de-allocate the root, set the format back to extents. Otherwise we can assume we can dereference ifp->if_broot based on the XFS_DINODE_FMT_BTREE format, and crash. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199423 Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * | | | xfs: enhance dinode verifierEric Sandeen2018-04-181-0/+21
| | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add several more validations to xfs_dinode_verify: - For LOCAL data fork formats, di_nextents must be 0. - For LOCAL attr fork formats, di_anextents must be 0. - For inodes with no attr fork offset, - format must be XFS_DINODE_FMT_EXTENTS if set at all - di_anextents must be 0. Thanks to dchinner for pointing out a couple related checks I had forgotten to add. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199377 Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* | | | Merge tag 'for_linus_stable' of ↵Linus Torvalds2018-04-294-9/+18
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Fix misc bugs and a regression for ext4" * tag 'for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: add MODULE_SOFTDEP to ensure crc32c is included in the initramfs ext4: fix bitmap position validation ext4: set h_journal if there is a failure starting a reserved handle ext4: prevent right-shifting extents beyond EXT_MAX_BLOCKS
| * | | | ext4: add MODULE_SOFTDEP to ensure crc32c is included in the initramfsTheodore Ts'o2018-04-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes: a45403b51582 ("ext4: always initialize the crc32c checksum driver") Reported-by: François Valenduc <francoisvalenduc@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
| * | | | ext4: fix bitmap position validationLukas Czerner2018-04-241-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently in ext4_valid_block_bitmap() we expect the bitmap to be positioned anywhere between 0 and s_blocksize clusters, but that's wrong because the bitmap can be placed anywhere in the block group. This causes false positives when validating bitmaps on perfectly valid file system layouts. Fix it by checking whether the bitmap is within the group boundary. The problem can be reproduced using the following mkfs -t ext3 -E stride=256 /dev/vdb1 mount /dev/vdb1 /mnt/test cd /mnt/test wget https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.16.3.tar.xz tar xf linux-4.16.3.tar.xz This will result in the warnings in the logs EXT4-fs error (device vdb1): ext4_validate_block_bitmap:399: comm tar: bg 84: block 2774529: invalid block bitmap [ Changed slightly for clarity and to not drop a overflow test -- TYT ] Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reported-by: Ilya Dryomov <idryomov@gmail.com> Fixes: 7dac4a1726a9 ("ext4: add validity checks for bitmap block numbers") Cc: stable@vger.kernel.org
| * | | | ext4: set h_journal if there is a failure starting a reserved handleTheodore Ts'o2018-04-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If ext4 tries to start a reserved handle via jbd2_journal_start_reserved(), and the journal has been aborted, this can result in a NULL pointer dereference. This is because the fields h_journal and h_transaction in the handle structure share the same memory, via a union, so jbd2_journal_start_reserved() will clear h_journal before calling start_this_handle(). If this function fails due to an aborted handle, h_journal will still be NULL, and the call to jbd2_journal_free_reserved() will pass a NULL journal to sub_reserve_credits(). This can be reproduced by running "kvm-xfstests -c dioread_nolock generic/475". Cc: stable@kernel.org # 3.11 Fixes: 8f7d89f36829b ("jbd2: transaction reservation support") Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Jan Kara <jack@suse.cz>