| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull io_uring fixes from Jens Axboe:
"A few minor fixes:
- Fix an issue with hashed wait removal on exit (Zqiang, Pavel)
- Fix a recent data race introduced in this series (Marco)"
* tag 'io_uring-5.13-2021-05-28' of git://git.kernel.dk/linux-block:
io_uring: fix data race to avoid potential NULL-deref
io-wq: Fix UAF when wakeup wqe in hash waitqueue
io_uring/io-wq: close io-wq full-stop gap
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Commit ba5ef6dc8a82 ("io_uring: fortify tctx/io_wq cleanup") introduced
setting tctx->io_wq to NULL a bit earlier. This has caused KCSAN to
detect a data race between accesses to tctx->io_wq:
write to 0xffff88811d8df330 of 8 bytes by task 3709 on cpu 1:
io_uring_clean_tctx fs/io_uring.c:9042 [inline]
__io_uring_cancel fs/io_uring.c:9136
io_uring_files_cancel include/linux/io_uring.h:16 [inline]
do_exit kernel/exit.c:781
do_group_exit kernel/exit.c:923
get_signal kernel/signal.c:2835
arch_do_signal_or_restart arch/x86/kernel/signal.c:789
handle_signal_work kernel/entry/common.c:147 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
...
read to 0xffff88811d8df330 of 8 bytes by task 6412 on cpu 0:
io_uring_try_cancel_iowq fs/io_uring.c:8911 [inline]
io_uring_try_cancel_requests fs/io_uring.c:8933
io_ring_exit_work fs/io_uring.c:8736
process_one_work kernel/workqueue.c:2276
...
With the config used, KCSAN only reports data races with value changes:
this implies that in the case here we also know that tctx->io_wq was
non-NULL. Therefore, depending on interleaving, we may end up with:
[CPU 0] | [CPU 1]
io_uring_try_cancel_iowq() | io_uring_clean_tctx()
if (!tctx->io_wq) // false | ...
... | tctx->io_wq = NULL
io_wq_cancel_cb(tctx->io_wq, ...) | ...
-> NULL-deref |
Note: It is likely that thus far we've gotten lucky and the compiler
optimizes the double-read into a single read into a register -- but this
is never guaranteed, and can easily change with a different config!
Fix the data race by restoring the previous behaviour, where both
setting io_wq to NULL and put of the wq are _serialized_ after
concurrent io_uring_try_cancel_iowq() via acquisition of the uring_lock
and removal of the node in io_uring_del_task_file().
Fixes: ba5ef6dc8a82 ("io_uring: fortify tctx/io_wq cleanup")
Suggested-by: Pavel Begunkov <asml.silence@gmail.com>
Reported-by: syzbot+bf2b3d0435b9b728946c@syzkaller.appspotmail.com
Signed-off-by: Marco Elver <elver@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20210527092547.2656514-1-elver@google.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
BUG: KASAN: use-after-free in __wake_up_common+0x637/0x650
Read of size 8 at addr ffff8880304250d8 by task iou-wrk-28796/28802
Call Trace:
__dump_stack [inline]
dump_stack+0x141/0x1d7
print_address_description.constprop.0.cold+0x5b/0x2c6
__kasan_report [inline]
kasan_report.cold+0x7c/0xd8
__wake_up_common+0x637/0x650
__wake_up_common_lock+0xd0/0x130
io_worker_handle_work+0x9dd/0x1790
io_wqe_worker+0xb2a/0xd40
ret_from_fork+0x1f/0x30
Allocated by task 28798:
kzalloc_node [inline]
io_wq_create+0x3c4/0xdd0
io_init_wq_offload [inline]
io_uring_alloc_task_context+0x1bf/0x6b0
__io_uring_add_task_file+0x29a/0x3c0
io_uring_add_task_file [inline]
io_uring_install_fd [inline]
io_uring_create [inline]
io_uring_setup+0x209a/0x2bd0
do_syscall_64+0x3a/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xae
Freed by task 28798:
kfree+0x106/0x2c0
io_wq_destroy+0x182/0x380
io_wq_put [inline]
io_wq_put_and_exit+0x7a/0xa0
io_uring_clean_tctx [inline]
__io_uring_cancel+0x428/0x530
io_uring_files_cancel
do_exit+0x299/0x2a60
do_group_exit+0x125/0x310
get_signal+0x47f/0x2150
arch_do_signal_or_restart+0x2a8/0x1eb0
handle_signal_work[inline]
exit_to_user_mode_loop [inline]
exit_to_user_mode_prepare+0x171/0x280
__syscall_exit_to_user_mode_work [inline]
syscall_exit_to_user_mode+0x19/0x60
do_syscall_64+0x47/0xb0
entry_SYSCALL_64_after_hwframe
There are the following scenarios, hash waitqueue is shared by
io-wq1 and io-wq2. (note: wqe is worker)
io-wq1:worker2 | locks bit1
io-wq2:worker1 | waits bit1
io-wq1:worker3 | waits bit1
io-wq1:worker2 | completes all wqe bit1 work items
io-wq1:worker2 | drop bit1, exit
io-wq2:worker1 | locks bit1
io-wq1:worker3 | can not locks bit1, waits bit1 and exit
io-wq1 | exit and free io-wq1
io-wq2:worker1 | drops bit1
io-wq1:worker3 | be waked up, even though wqe is freed
After all iou-wrk belonging to io-wq1 have exited, remove wqe
form hash waitqueue, it is guaranteed that there will be no more
wqe belonging to io-wq1 in the hash waitqueue.
Reported-by: syzbot+6cb11ade52aa17095297@syzkaller.appspotmail.com
Signed-off-by: Zqiang <qiang.zhang@windriver.com>
Link: https://lore.kernel.org/r/20210526050826.30500-1-qiang.zhang@windriver.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
There is an old problem with io-wq cancellation where requests should be
killed and are in io-wq but are not discoverable, e.g. in @next_hashed
or @linked vars of io_worker_handle_work(). It adds some unreliability
to individual request canellation, but also may potentially get
__io_uring_cancel() stuck. For instance:
1) An __io_uring_cancel()'s cancellation round have not found any
request but there are some as desribed.
2) __io_uring_cancel() goes to sleep
3) Then workers wake up and try to execute those hidden requests
that happen to be unbound.
As we already cancel all requests of io-wq there, set IO_WQ_BIT_EXIT
in advance, so preventing 3) from executing unbound requests. The
workers will initially break looping because of getting a signal as they
are threads of the dying/exec()'ing user task.
Cc: stable@vger.kernel.org
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/abfcf8c54cb9e8f7bfbad7e9a0cc5433cc70bdc2.1621781238.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Pull cifs fixes from Steve French:
"Three SMB3 fixes.
Two for stable, and the other fixes a problem pointed out with a
recently added ioctl"
* tag '5.13-rc4-smb3' of git://git.samba.org/sfrench/cifs-2.6:
cifs: change format of CIFS_FULL_KEY_DUMP ioctl
cifs: fix string declarations and assignments in tracepoints
cifs: set server->cipher_type to AES-128-CCM for SMB3.0
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Make CIFS_FULL_KEY_DUMP ioctl able to return variable-length keys.
* userspace needs to pass the struct size along with optional
session_id and some space at the end to store keys
* if there is enough space kernel returns keys in the extra space and
sets the length of each key via xyz_key_length fields
This also fixes the build error for get_user() on ARM.
Sample program:
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <sys/fcntl.h>
#include <sys/ioctl.h>
struct smb3_full_key_debug_info {
uint32_t in_size;
uint64_t session_id;
uint16_t cipher_type;
uint8_t session_key_length;
uint8_t server_in_key_length;
uint8_t server_out_key_length;
uint8_t data[];
/*
* return this struct with the keys appended at the end:
* uint8_t session_key[session_key_length];
* uint8_t server_in_key[server_in_key_length];
* uint8_t server_out_key[server_out_key_length];
*/
} __attribute__((packed));
#define CIFS_IOCTL_MAGIC 0xCF
#define CIFS_DUMP_FULL_KEY _IOWR(CIFS_IOCTL_MAGIC, 10, struct smb3_full_key_debug_info)
void dump(const void *p, size_t len) {
const char *hex = "0123456789ABCDEF";
const uint8_t *b = p;
for (int i = 0; i < len; i++)
printf("%c%c ", hex[(b[i]>>4)&0xf], hex[b[i]&0xf]);
putchar('\n');
}
int main(int argc, char **argv)
{
struct smb3_full_key_debug_info *keys;
uint8_t buf[sizeof(*keys)+1024] = {0};
size_t off = 0;
int fd, rc;
keys = (struct smb3_full_key_debug_info *)&buf;
keys->in_size = sizeof(buf);
fd = open(argv[1], O_RDONLY);
if (fd < 0)
perror("open"), exit(1);
rc = ioctl(fd, CIFS_DUMP_FULL_KEY, keys);
if (rc < 0)
perror("ioctl"), exit(1);
printf("SessionId ");
dump(&keys->session_id, 8);
printf("Cipher %04x\n", keys->cipher_type);
printf("SessionKey ");
dump(keys->data+off, keys->session_key_length);
off += keys->session_key_length;
printf("ServerIn Key ");
dump(keys->data+off, keys->server_in_key_length);
off += keys->server_in_key_length;
printf("ServerOut Key ");
dump(keys->data+off, keys->server_out_key_length);
return 0;
}
Usage:
$ gcc -o dumpkeys dumpkeys.c
Against Windows Server 2020 preview (with AES-256-GCM support):
# mount.cifs //$ip/test /mnt -o "username=administrator,password=foo,vers=3.0,seal"
# ./dumpkeys /mnt/somefile
SessionId 0D 00 00 00 00 0C 00 00
Cipher 0002
SessionKey AB CD CC 0D E4 15 05 0C 6F 3C 92 90 19 F3 0D 25
ServerIn Key 73 C6 6A C8 6B 08 CF A2 CB 8E A5 7D 10 D1 5B DC
ServerOut Key 6D 7E 2B A1 71 9D D7 2B 94 7B BA C4 F0 A5 A4 F8
# umount /mnt
With 256 bit keys:
# echo 1 > /sys/module/cifs/parameters/require_gcm_256
# mount.cifs //$ip/test /mnt -o "username=administrator,password=foo,vers=3.11,seal"
# ./dumpkeys /mnt/somefile
SessionId 09 00 00 00 00 0C 00 00
Cipher 0004
SessionKey 93 F5 82 3B 2F B7 2A 50 0B B9 BA 26 FB 8C 8B 03
ServerIn Key 6C 6A 89 B2 CB 7B 78 E8 04 93 37 DA 22 53 47 DF B3 2C 5F 02 26 70 43 DB 8D 33 7B DC 66 D3 75 A9
ServerOut Key 04 11 AA D7 52 C7 A8 0F ED E3 93 3A 65 FE 03 AD 3F 63 03 01 2B C0 1B D7 D7 E5 52 19 7F CC 46 B4
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We missed using the variable length string macros in several
tracepoints. Fixed them in this change.
There's probably more useful macros that we can use to print
others like flags etc. But I'll submit sepawrate patches for
those at a future date.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Cc: <stable@vger.kernel.org> # v5.12
Signed-off-by: Steve French <stfrench@microsoft.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
SMB3.0 doesn't have encryption negotiate context but simply uses
the SMB2_GLOBAL_CAP_ENCRYPTION flag.
When that flag is present in the neg response cifs.ko uses AES-128-CCM
which is the only cipher available in this context.
cipher_type was set to the server cipher only when parsing encryption
negotiate context (SMB3.1.1).
For SMB3.0 it was set to 0. This means cipher_type value can be 0 or 1
for AES-128-CCM.
Fix this by checking for SMB3.0 and encryption capability and setting
cipher_type appropriately.
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Pull NFS client bugfixes from Trond Myklebust:
"Stable fixes:
- Fix v4.0/v4.1 SEEK_DATA return -ENOTSUPP when set NFS_V4_2 config
- Fix Oops in xs_tcp_send_request() when transport is disconnected
- Fix a NULL pointer dereference in pnfs_mark_matching_lsegs_return()
Bugfixes:
- Fix instances where signal_pending() should be fatal_signal_pending()
- fix an incorrect limit in filelayout_decode_layout()
- Fixes for the SUNRPC backlogged RPC queue
- Don't corrupt the value of pg_bytes_written in nfs_do_recoalesce()
- Revert commit 586a0787ce35 ("Clean up rpcrdma_prepare_readch()")"
* tag 'nfs-for-5.13-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
nfs: Remove trailing semicolon in macros
xprtrdma: Revert 586a0787ce35
NFSv4: Fix v4.0/v4.1 SEEK_DATA return -ENOTSUPP when set NFS_V4_2 config
NFS: Clean up reset of the mirror accounting variables
NFS: Don't corrupt the value of pg_bytes_written in nfs_do_recoalesce()
NFS: Fix an Oopsable condition in __nfs_pageio_add_request()
SUNRPC: More fixes for backlog congestion
SUNRPC: Fix Oops in xs_tcp_send_request() when transport is disconnected
NFSv4: Fix a NULL pointer dereference in pnfs_mark_matching_lsegs_return()
SUNRPC in case of backlog, hand free slots directly to waiting task
pNFS/NFSv4: Remove redundant initialization of 'rd_size'
NFS: fix an incorrect limit in filelayout_decode_layout()
fs/nfs: Use fatal_signal_pending instead of signal_pending
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Macros should not use a trailing semicolon.
Signed-off-by: Huilong Deng <denghuilong@cdjrlc.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Since commit bdcc2cd14e4e ("NFSv4.2: handle NFS-specific llseek errors"),
nfs42_proc_llseek would return -EOPNOTSUPP rather than -ENOTSUPP when
SEEK_DATA on NFSv4.0/v4.1.
This will lead xfstests generic/285 not run on NFSv4.0/v4.1 when set the
CONFIG_NFS_V4_2, rather than run failed.
Fixes: bdcc2cd14e4e ("NFSv4.2: handle NFS-specific llseek errors")
Cc: <stable.vger.kernel.org> # 4.2
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Now that nfs_pageio_do_add_request() resets the pg_count, we don't need
these other inlined resets.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The value of mirror->pg_bytes_written should only be updated after a
successful attempt to flush out the requests on the list.
Fixes: a7d42ddb3099 ("nfs: add mirroring support to pgio layer")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Ensure that nfs_pageio_error_cleanup() resets the mirror array contents,
so that the structure reflects the fact that it is now empty.
Also change the test in nfs_pageio_do_add_request() to be more robust by
checking whether or not the list is empty rather than relying on the
value of pg_count.
Fixes: a7d42ddb3099 ("nfs: add mirroring support to pgio layer")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Commit de144ff4234f changes _pnfs_return_layout() to call
pnfs_mark_matching_lsegs_return() passing NULL as the struct
pnfs_layout_range argument. Unfortunately,
pnfs_mark_matching_lsegs_return() doesn't check if we have a value here
before dereferencing it, causing an oops.
I'm able to hit this crash consistently when running connectathon basic
tests on NFS v4.1/v4.2 against Ontap.
Fixes: de144ff4234f ("NFSv4: Don't discard segments marked for return in _pnfs_return_layout()")
Cc: stable@vger.kernel.org
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Variable 'rd_size' is being initialized however
this value is never read as 'rd_size' is assigned
a new value in for statement. Remove the redundant
assignment.
Clean up clang warning:
fs/nfs/pnfs.c:2681:6: warning: Value stored to 'rd_size' during its
initialization is never read [clang-analyzer-deadcode.DeadStores]
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The "sizeof(struct nfs_fh)" is two bytes too large and could lead to
memory corruption. It should be NFS_MAXFHSIZE because that's the size
of the ->data[] buffer.
I reversed the size of the arguments to put the variable on the left.
Fixes: 16b374ca439f ("NFSv4.1: pnfs: filelayout: add driver's LAYOUTGET and GETDEVICEINFO infrastructure")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
We set the state of the current process to TASK_KILLABLE via
prepare_to_wait(). Should we use fatal_signal_pending() to detect
the signal here?
Fixes: b4868b44c562 ("NFSv4: Wait for stateid updates after CLOSE/OPEN_DOWNGRADE")
Signed-off-by: zhouchuangao <zhouchuangao@vivo.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Fix rename of one directory over another such that the nlink on the deleted
directory is cleared to 0 rather than being decremented to 1.
This was causing the generic/035 xfstest to fail.
Fixes: e49c7b2f6de7 ("afs: Build an abstraction around an "operation" concept")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/162194384460.3999479.7605572278074191079.stgit@warthog.procyon.org.uk/ # v1
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Fix another "confused deputy" weakness[1]. Writes to /proc/$pid/attr/
files need to check the opener credentials, since these fds do not
transition state across execve(). Without this, it is possible to
trick another process (which may have different credentials) to write
to its own /proc/$pid/attr/ files, leading to unexpected and possibly
exploitable behaviors.
[1] https://www.kernel.org/doc/html/latest/security/credentials.html?highlight=confused#open-file-credentials
Fixes: 1da177e4c3f41 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull netfs fixes from David Howells:
"A couple of fixes to the new netfs lib:
- Pass the AOP flags through from netfs_write_begin() into
grab_cache_page_write_begin().
- Automatically enable in Kconfig netfs lib rather than presenting an
option for manual enablement"
* tag 'netfs-lib-fixes-20200525' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
netfs: Make CONFIG_NETFS_SUPPORT auto-selected rather than manual
netfs: Pass flags through to grab_cache_page_write_begin()
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Make the netfs helper library selected automatically by the things that use
it rather than being manually configured, even though it's required[1].
Fixes: 3a5829fefd3b ("netfs: Make a netfs helper module")
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-mm@kvack.org
cc: linux-cachefs@redhat.com
cc: linux-afs@lists.infradead.org
cc: linux-nfs@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: ceph-devel@vger.kernel.org
cc: v9fs-developer@lists.sourceforge.net
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/CAMuHMdXJZ7iNQE964CdBOU=vRKVMFzo=YF_eiwsGgqzuvZ+TuA@mail.gmail.com [1]
Link: https://lore.kernel.org/r/162090298141.3166007.2971118149366779916.stgit@warthog.procyon.org.uk # v1
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
In netfs_write_begin(), pass the AOP flags through to
grab_cache_page_write_begin() so that a request to use GFP_NOFS is
honoured.
Fixes: e1b1240c1ff5 ("netfs: Add write_begin helper")
Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
cc: linux-mm@kvack.org
cc: linux-cachefs@redhat.com
cc: linux-afs@lists.infradead.org
cc: linux-nfs@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: ceph-devel@vger.kernel.org
cc: v9fs-developer@lists.sourceforge.net
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/162090295383.3165945.13595101698295243662.stgit@warthog.procyon.org.uk # v1
|
| |_|/ /
|/| | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
In preparation to enable -Wimplicit-fallthrough for Clang, fix multiple
warnings by explicitly adding multiple fallthrough pseudo-keywords in
places where the code is intended to fall through to the next case.
Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-hardening@vger.kernel.org
Link: https://lore.kernel.org/r/51150b54e0b0431a2c401cd54f2c4e7f50e94601.1605896059.git.gustavoars@kernel.org/ # v1
Link: https://lore.kernel.org/r/20210420211615.GA51432@embeddedor/ # v2
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
In commit d6995da31122 ("hugetlb: use page.private for hugetlb specific
page flags") the use of PagePrivate to indicate a reservation count
should be restored at free time was changed to the hugetlb specific flag
HPageRestoreReserve. Changes to a userfaultfd error path as well as a
VM_BUG_ON() in remove_inode_hugepages() were overlooked.
Users could see incorrect hugetlb reserve counts if they experience an
error with a UFFDIO_COPY operation. Specifically, this would be the
result of an unlikely copy_huge_page_from_user error. There is not an
increased chance of hitting the VM_BUG_ON.
Link: https://lkml.kernel.org/r/20210521233952.236434-1-mike.kravetz@oracle.com
Fixes: d6995da31122 ("hugetlb: use page.private for hugetlb specific page flags")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Mina Almasry <almasry.mina@google.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Pull block fixes from Jens Axboe:
- Fix BLKRRPART and deletion race (Gulam, Christoph)
- NVMe pull request (Christoph):
- nvme-tcp corruption and timeout fixes (Sagi Grimberg, Keith
Busch)
- nvme-fc teardown fix (James Smart)
- nvmet/nvme-loop memory leak fixes (Wu Bo)"
* tag 'block-5.13-2021-05-22' of git://git.kernel.dk/linux-block:
block: fix a race between del_gendisk and BLKRRPART
block: prevent block device lookups at the beginning of del_gendisk
nvme-fc: clear q_live at beginning of association teardown
nvme-tcp: rerun io_work if req_list is not empty
nvme-tcp: fix possible use-after-completion
nvme-loop: fix memory leak in nvme_loop_create_ctrl()
nvmet: fix memory leak in nvmet_alloc_ctrl()
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
When BLKRRPART is called concurrently with del_gendisk, the partitions
rescan can create a stale partition that will never be be cleaned up.
Fix this by checking the the disk is up before rescanning partitions
while under bd_mutex.
Signed-off-by: Gulam Mohamed <gulam.mohamed@oracle.com>
[hch: split from a larger patch]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210514131842.1600568-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
As an artifact of how gendisk lookup used to work in earlier kernels,
GENHD_FL_UP is only cleared very late in del_gendisk, and a global lock
is used to prevent opens from succeeding while del_gendisk is tearing
down the gendisk. Switch to clearing the flag early and under bd_mutex
so that callers can use bd_mutex to stabilize the flag, which removes
the need for the global mutex.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210514131842.1600568-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|\ \ \ \ \
| | |_|_|/
| |/| | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Pull io_uring fixes from Jens Axboe:
"One fix for a regression with poll in this merge window, and another
just hardens the io-wq exit path a bit"
* tag 'io_uring-5.13-2021-05-22' of git://git.kernel.dk/linux-block:
io_uring: fortify tctx/io_wq cleanup
io_uring: don't modify req->poll for rw
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
We don't want anyone poking into tctx->io_wq awhile it's being destroyed
by io_wq_put_and_exit(), and even though it shouldn't even happen, if
buggy would be preferable to get a NULL-deref instead of subtle delayed
failure or UAF.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/827b021de17926fd807610b3e53a5a5fa8530856.1621513214.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
__io_queue_proc() is used by both poll and apoll, so we should not
access req->poll directly but selecting right struct io_poll_iocb
depending on use case.
Reported-and-tested-by: syzbot+a84b8783366ecb1c65d0@syzkaller.appspotmail.com
Fixes: ea6a693d862d ("io_uring: disable multishot poll for double poll add cases")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4a6a1de31142d8e0250fe2dfd4c8923d82a5bbfc.1621251795.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|\ \ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Pull xfs fixes from Darrick Wong:
- Fix some math errors in the realtime allocator when extent size hints
are applied.
- Fix unnecessary short writes to realtime files when free space is
fragmented.
- Fix a crash when using scrub tracepoints.
- Restore ioctl uapi definitions that were accidentally removed in
5.13-rc1.
* tag 'xfs-5.13-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: restore old ioctl definitions
xfs: fix deadlock retry tracepoint arguments
xfs: retry allocations when locality-based search fails
xfs: adjust rt allocation minlen when extszhint > rtextsize
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
These ioctl definitions in xfs_fs.h are part of the userspace ABI and
were mistakenly removed during the 5.13 merge window.
Fixes: 9fefd5db08ce ("xfs: convert to fileattr")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
sc->ip is the inode that's being scrubbed, which means that it's not set
for scrub types that don't involve inodes. If one of those scrubbers
(e.g. inode btrees) returns EDEADLOCK, we'll trip over the null pointer.
Fix that by reporting either the file being examined or the file that
was used to call scrub.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
If a realtime allocation fails because we can't find a sufficiently
large free extent satisfying locality rules, relax the locality rules
and try again. This reduces the occurrence of short writes to realtime
files when the write size is large and the free space is fragmented.
This was originally discovered by running generic/186 with the realtime
reflink patchset and a 128k cow extent size hint, but the short write
symptoms can manifest with a 128k extent size hint and no reflink, so
apply the fix now.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
|
| | |_|_|/
| |/| | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
xfs_bmap_rtalloc doesn't handle realtime extent files with extent size
hints larger than the rt volume's extent size properly, because
xfs_bmap_extsize_align can adjust the offset/length parameters to try to
fit the extent size hint.
Under these conditions, minlen has to be large enough so that any
allocation returned by xfs_rtallocate_extent will be large enough to
cover at least one of the blocks that the caller asked for. If the
allocation is too short, bmapi_write will return no mapping for the
requested range, which causes ENOSPC errors in other parts of the
filesystem.
Therefore, adjust minlen upwards to fix this. This can be found by
running generic/263 (g/127 or g/522) with a realtime extent size hint
that's larger than the rt volume extent size.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
|
|\ \ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few more fixes:
- fix unaligned compressed writes in zoned mode
- fix false positive lockdep warning when cloning inline extent
- remove wrong BUG_ON in tree-log error handling"
* tag 'for-5.13-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: zoned: fix parallel compressed writes
btrfs: zoned: pass start block to btrfs_use_zone_append
btrfs: do not BUG_ON in link_to_fixup_dir
btrfs: release path before starting transaction when cloning inline extent
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
When multiple processes write data to the same block group on a
compressed zoned filesystem, the underlying device could report I/O
errors and data corruption is possible.
This happens because on a zoned file system, compressed data writes
where sent to the device via a REQ_OP_WRITE instead of a
REQ_OP_ZONE_APPEND operation. But with REQ_OP_WRITE and parallel
submission it cannot be guaranteed that the data is always submitted
aligned to the underlying zone's write pointer.
The change to using REQ_OP_ZONE_APPEND instead of REQ_OP_WRITE on a
zoned filesystem is non intrusive on a regular file system or when
submitting to a conventional zone on a zoned filesystem, as it is
guarded by btrfs_use_zone_append.
Reported-by: David Sterba <dsterba@suse.com>
Fixes: 9d294a685fbc ("btrfs: zoned: enable to mount ZONED incompat flag")
CC: stable@vger.kernel.org # 5.12.x: e380adfc213a13: btrfs: zoned: pass start block to btrfs_use_zone_append
CC: stable@vger.kernel.org # 5.12.x
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
btrfs_use_zone_append only needs the passed in extent_map's block_start
member, so there's no need to pass in the full extent map.
This also enables the use of btrfs_use_zone_append in places where we only
have a start byte but no extent_map.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
While doing error injection testing I got the following panic
kernel BUG at fs/btrfs/tree-log.c:1862!
invalid opcode: 0000 [#1] SMP NOPTI
CPU: 1 PID: 7836 Comm: mount Not tainted 5.13.0-rc1+ #305
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
RIP: 0010:link_to_fixup_dir+0xd5/0xe0
RSP: 0018:ffffb5800180fa30 EFLAGS: 00010216
RAX: fffffffffffffffb RBX: 00000000fffffffb RCX: ffff8f595287faf0
RDX: ffffb5800180fa37 RSI: ffff8f5954978800 RDI: 0000000000000000
RBP: ffff8f5953af9450 R08: 0000000000000019 R09: 0000000000000001
R10: 000151f408682970 R11: 0000000120021001 R12: ffff8f5954978800
R13: ffff8f595287faf0 R14: ffff8f5953c77dd0 R15: 0000000000000065
FS: 00007fc5284c8c40(0000) GS:ffff8f59bbd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc5287f47c0 CR3: 000000011275e002 CR4: 0000000000370ee0
Call Trace:
replay_one_buffer+0x409/0x470
? btree_read_extent_buffer_pages+0xd0/0x110
walk_up_log_tree+0x157/0x1e0
walk_log_tree+0xa6/0x1d0
btrfs_recover_log_trees+0x1da/0x360
? replay_one_extent+0x7b0/0x7b0
open_ctree+0x1486/0x1720
btrfs_mount_root.cold+0x12/0xea
? __kmalloc_track_caller+0x12f/0x240
legacy_get_tree+0x24/0x40
vfs_get_tree+0x22/0xb0
vfs_kern_mount.part.0+0x71/0xb0
btrfs_mount+0x10d/0x380
? vfs_parse_fs_string+0x4d/0x90
legacy_get_tree+0x24/0x40
vfs_get_tree+0x22/0xb0
path_mount+0x433/0xa10
__x64_sys_mount+0xe3/0x120
do_syscall_64+0x3d/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
We can get -EIO or any number of legitimate errors from
btrfs_search_slot(), panicing here is not the appropriate response. The
error path for this code handles errors properly, simply return the
error.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
When cloning an inline extent there are a few cases, such as when we have
an implicit hole at file offset 0, where we start a transaction while
holding a read lock on a leaf. Starting the transaction results in a call
to sb_start_intwrite(), which results in doing a read lock on a percpu
semaphore. Lockdep doesn't like this and complains about it:
[46.580704] ======================================================
[46.580752] WARNING: possible circular locking dependency detected
[46.580799] 5.13.0-rc1 #28 Not tainted
[46.580832] ------------------------------------------------------
[46.580877] cloner/3835 is trying to acquire lock:
[46.580918] c00000001301d638 (sb_internal#2){.+.+}-{0:0}, at: clone_copy_inline_extent+0xe4/0x5a0
[46.581167]
[46.581167] but task is already holding lock:
[46.581217] c000000007fa2550 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x70/0x1d0
[46.581293]
[46.581293] which lock already depends on the new lock.
[46.581293]
[46.581351]
[46.581351] the existing dependency chain (in reverse order) is:
[46.581410]
[46.581410] -> #1 (btrfs-tree-00){++++}-{3:3}:
[46.581464] down_read_nested+0x68/0x200
[46.581536] __btrfs_tree_read_lock+0x70/0x1d0
[46.581577] btrfs_read_lock_root_node+0x88/0x200
[46.581623] btrfs_search_slot+0x298/0xb70
[46.581665] btrfs_set_inode_index+0xfc/0x260
[46.581708] btrfs_new_inode+0x26c/0x950
[46.581749] btrfs_create+0xf4/0x2b0
[46.581782] lookup_open.isra.57+0x55c/0x6a0
[46.581855] path_openat+0x418/0xd20
[46.581888] do_filp_open+0x9c/0x130
[46.581920] do_sys_openat2+0x2ec/0x430
[46.581961] do_sys_open+0x90/0xc0
[46.581993] system_call_exception+0x3d4/0x410
[46.582037] system_call_common+0xec/0x278
[46.582078]
[46.582078] -> #0 (sb_internal#2){.+.+}-{0:0}:
[46.582135] __lock_acquire+0x1e90/0x2c50
[46.582176] lock_acquire+0x2b4/0x5b0
[46.582263] start_transaction+0x3cc/0x950
[46.582308] clone_copy_inline_extent+0xe4/0x5a0
[46.582353] btrfs_clone+0x5fc/0x880
[46.582388] btrfs_clone_files+0xd8/0x1c0
[46.582434] btrfs_remap_file_range+0x3d8/0x590
[46.582481] do_clone_file_range+0x10c/0x270
[46.582558] vfs_clone_file_range+0x1b0/0x310
[46.582605] ioctl_file_clone+0x90/0x130
[46.582651] do_vfs_ioctl+0x874/0x1ac0
[46.582697] sys_ioctl+0x6c/0x120
[46.582733] system_call_exception+0x3d4/0x410
[46.582777] system_call_common+0xec/0x278
[46.582822]
[46.582822] other info that might help us debug this:
[46.582822]
[46.582888] Possible unsafe locking scenario:
[46.582888]
[46.582942] CPU0 CPU1
[46.582984] ---- ----
[46.583028] lock(btrfs-tree-00);
[46.583062] lock(sb_internal#2);
[46.583119] lock(btrfs-tree-00);
[46.583174] lock(sb_internal#2);
[46.583212]
[46.583212] *** DEADLOCK ***
[46.583212]
[46.583266] 6 locks held by cloner/3835:
[46.583299] #0: c00000001301d448 (sb_writers#12){.+.+}-{0:0}, at: ioctl_file_clone+0x90/0x130
[46.583382] #1: c00000000f6d3768 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}, at: lock_two_nondirectories+0x58/0xc0
[46.583477] #2: c00000000f6d72a8 (&sb->s_type->i_mutex_key#15/4){+.+.}-{3:3}, at: lock_two_nondirectories+0x9c/0xc0
[46.583574] #3: c00000000f6d7138 (&ei->i_mmap_lock){+.+.}-{3:3}, at: btrfs_remap_file_range+0xd0/0x590
[46.583657] #4: c00000000f6d35f8 (&ei->i_mmap_lock/1){+.+.}-{3:3}, at: btrfs_remap_file_range+0xe0/0x590
[46.583743] #5: c000000007fa2550 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x70/0x1d0
[46.583828]
[46.583828] stack backtrace:
[46.583872] CPU: 1 PID: 3835 Comm: cloner Not tainted 5.13.0-rc1 #28
[46.583931] Call Trace:
[46.583955] [c0000000167c7200] [c000000000c1ee78] dump_stack+0xec/0x144 (unreliable)
[46.584052] [c0000000167c7240] [c000000000274058] print_circular_bug.isra.32+0x3a8/0x400
[46.584123] [c0000000167c72e0] [c0000000002741f4] check_noncircular+0x144/0x190
[46.584191] [c0000000167c73b0] [c000000000278fc0] __lock_acquire+0x1e90/0x2c50
[46.584259] [c0000000167c74f0] [c00000000027aa94] lock_acquire+0x2b4/0x5b0
[46.584317] [c0000000167c75e0] [c000000000a0d6cc] start_transaction+0x3cc/0x950
[46.584388] [c0000000167c7690] [c000000000af47a4] clone_copy_inline_extent+0xe4/0x5a0
[46.584457] [c0000000167c77c0] [c000000000af525c] btrfs_clone+0x5fc/0x880
[46.584514] [c0000000167c7990] [c000000000af5698] btrfs_clone_files+0xd8/0x1c0
[46.584583] [c0000000167c7a00] [c000000000af5b58] btrfs_remap_file_range+0x3d8/0x590
[46.584652] [c0000000167c7ae0] [c0000000005d81dc] do_clone_file_range+0x10c/0x270
[46.584722] [c0000000167c7b40] [c0000000005d84f0] vfs_clone_file_range+0x1b0/0x310
[46.584793] [c0000000167c7bb0] [c00000000058bf80] ioctl_file_clone+0x90/0x130
[46.584861] [c0000000167c7c10] [c00000000058c894] do_vfs_ioctl+0x874/0x1ac0
[46.584922] [c0000000167c7d10] [c00000000058db4c] sys_ioctl+0x6c/0x120
[46.584978] [c0000000167c7d60] [c0000000000364a4] system_call_exception+0x3d4/0x410
[46.585046] [c0000000167c7e10] [c00000000000d45c] system_call_common+0xec/0x278
[46.585114] --- interrupt: c00 at 0x7ffff7e22990
[46.585160] NIP: 00007ffff7e22990 LR: 00000001000010ec CTR: 0000000000000000
[46.585224] REGS: c0000000167c7e80 TRAP: 0c00 Not tainted (5.13.0-rc1)
[46.585280] MSR: 800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 28000244 XER: 00000000
[46.585374] IRQMASK: 0
[46.585374] GPR00: 0000000000000036 00007fffffffdec0 00007ffff7f17100 0000000000000004
[46.585374] GPR04: 000000008020940d 00007fffffffdf40 0000000000000000 0000000000000000
[46.585374] GPR08: 0000000000000004 0000000000000000 0000000000000000 0000000000000000
[46.585374] GPR12: 0000000000000000 00007ffff7ffa940 0000000000000000 0000000000000000
[46.585374] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[46.585374] GPR20: 0000000000000000 000000009123683e 00007fffffffdf40 0000000000000000
[46.585374] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000004
[46.585374] GPR28: 0000000100030260 0000000100030280 0000000000000003 000000000000005f
[46.585919] NIP [00007ffff7e22990] 0x7ffff7e22990
[46.585964] LR [00000001000010ec] 0x1000010ec
[46.586010] --- interrupt: c00
This should be a false positive, as both locks are acquired in read mode.
Nevertheless, we don't need to hold a leaf locked when we start the
transaction, so just release the leaf (path) before starting it.
Reported-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/linux-btrfs/20210513214404.xks77p566fglzgum@riteshh-domain/
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|\ \ \ \ \ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Pull cifs fixes from Steve French:
"Seven smb3 fixes: one for stable, three others fix problems found in
testing handle leases, and a compounded request fix"
* tag '5.13-rc3-smb3' of git://git.samba.org/sfrench/cifs-2.6:
Fix KASAN identified use-after-free issue.
Defer close only when lease is enabled.
Fix kernel oops when CONFIG_DEBUG_ATOMIC_SLEEP is enabled.
cifs: Fix inconsistent indenting
cifs: fix memory leak in smb2_copychunk_range
SMB3: incorrect file id in requests compounded with open
cifs: remove deadstore in cifs_close_all_deferred_files()
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
[ 612.157429] ==================================================================
[ 612.158275] BUG: KASAN: use-after-free in process_one_work+0x90/0x9b0
[ 612.158801] Read of size 8 at addr ffff88810a31ca60 by task kworker/2:9/2382
[ 612.159611] CPU: 2 PID: 2382 Comm: kworker/2:9 Tainted: G
OE 5.13.0-rc2+ #98
[ 612.159623] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.14.0-1.fc33 04/01/2014
[ 612.159640] Workqueue: 0x0 (deferredclose)
[ 612.159669] Call Trace:
[ 612.159685] dump_stack+0xbb/0x107
[ 612.159711] print_address_description.constprop.0+0x18/0x140
[ 612.159733] ? process_one_work+0x90/0x9b0
[ 612.159743] ? process_one_work+0x90/0x9b0
[ 612.159754] kasan_report.cold+0x7c/0xd8
[ 612.159778] ? lock_is_held_type+0x80/0x130
[ 612.159789] ? process_one_work+0x90/0x9b0
[ 612.159812] kasan_check_range+0x145/0x1a0
[ 612.159834] process_one_work+0x90/0x9b0
[ 612.159877] ? pwq_dec_nr_in_flight+0x110/0x110
[ 612.159914] ? spin_bug+0x90/0x90
[ 612.159967] worker_thread+0x3b6/0x6c0
[ 612.160023] ? process_one_work+0x9b0/0x9b0
[ 612.160038] kthread+0x1dc/0x200
[ 612.160051] ? kthread_create_worker_on_cpu+0xd0/0xd0
[ 612.160092] ret_from_fork+0x1f/0x30
[ 612.160399] Allocated by task 2358:
[ 612.160757] kasan_save_stack+0x1b/0x40
[ 612.160768] __kasan_kmalloc+0x9b/0xd0
[ 612.160778] cifs_new_fileinfo+0xb0/0x960 [cifs]
[ 612.161170] cifs_open+0xadf/0xf20 [cifs]
[ 612.161421] do_dentry_open+0x2aa/0x6b0
[ 612.161432] path_openat+0xbd9/0xfa0
[ 612.161441] do_filp_open+0x11d/0x230
[ 612.161450] do_sys_openat2+0x115/0x240
[ 612.161460] __x64_sys_openat+0xce/0x140
When mod_delayed_work is called to modify the delay of pending work,
it might return false and queue a new work when pending work is
already scheduled or when try to grab pending work failed.
So, Increase the reference count when new work is scheduled to
avoid use-after-free.
Signed-off-by: Rohith Surabattula <rohiths@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
When smb2 lease parameter is disabled on server. Server grants
batch oplock instead of RHW lease by default on open, inode page cache
needs to be zapped immediatley upon close as cache is not valid.
Signed-off-by: Rohith Surabattula <rohiths@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Removed oplock_break_received flag which was added to achieve
synchronization between oplock handler and open handler by earlier commit.
It is not needed because there is an existing lock open_file_lock to achieve
the same. find_readable_file takes open_file_lock and then traverses the
openFileList. Similarly, cifs_oplock_break while closing the deferred
handle (i.e cifsFileInfo_put) takes open_file_lock and then sends close
to the server.
Added comments for better readability.
Signed-off-by: Rohith Surabattula <rohiths@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Eliminate the follow smatch warning:
fs/cifs/fs_context.c:1148 smb3_fs_context_parse_param() warn:
inconsistent indenting.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
When using smb2_copychunk_range() for large ranges we will
run through several iterations of a loop calling SMB2_ioctl()
but never actually free the returned buffer except for the final
iteration.
This leads to memory leaks everytime a large copychunk is requested.
Fixes: 9bf0c9cd4314 ("CIFS: Fix SMB2/SMB3 Copy offload support (refcopy) for large files")
Cc: <stable@vger.kernel.org>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
See MS-SMB2 3.2.4.1.4, file ids in compounded requests should be set to
0xFFFFFFFFFFFFFFFF (we were treating it as u32 not u64 and setting
it incorrectly).
Signed-off-by: Steve French <stfrench@microsoft.com>
Reported-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
|
| | |/ / / /
| |/| | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Deadstore detected by Lukas Bulwahn's CodeChecker Tool (ELISA group).
line 741 struct cifsInodeInfo *cinode;
line 747 cinode = CIFS_I(d_inode(cfile->dentry));
could be deleted.
cinode on filesystem should not be deleted when files are closed,
they are representations of some data fields on a physical disk,
thus no further action is required.
The virtual inode on vfs will be handled by vfs automatically,
and the denotation is inode, which is different from the cinode.
Signed-off-by: wenhuizhang <wenhui@gwmail.gwu.edu>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|\ \ \ \ \ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull siginfo fix from Eric Biederman:
"During the merge window an issue with si_perf and the siginfo ABI came
up. The alpha and sparc siginfo structure layout had changed with the
addition of SIGTRAP TRAP_PERF and the new field si_perf.
The reason only alpha and sparc were affected is that they are the
only architectures that use si_trapno.
Looking deeper it was discovered that si_trapno is used for only a few
select signals on alpha and sparc, and that none of the other
_sigfault fields past si_addr are used at all. Which means technically
no regression on alpha and sparc.
While the alignment concerns might be dismissed the abuse of si_errno
by SIGTRAP TRAP_PERF does have the potential to cause regressions in
existing userspace.
While we still have time before userspace starts using and depending
on the new definition siginfo for SIGTRAP TRAP_PERF this set of
changes cleans up siginfo_t.
- The si_trapno field is demoted from magic alpha and sparc status
and made an ordinary union member of the _sigfault member of
siginfo_t. Without moving it of course.
- si_perf is replaced with si_perf_data and si_perf_type ending the
abuse of si_errno.
- Unnecessary additions to signalfd_siginfo are removed"
* 'for-v5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
signalfd: Remove SIL_PERF_EVENT fields from signalfd_siginfo
signal: Deliver all of the siginfo perf data in _perf
signal: Factor force_sig_perf out of perf_sigtrap
signal: Implement SIL_FAULT_TRAPNO
siginfo: Move si_trapno inside the union inside _si_fault
|