| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull NFS Client Updates from Anna Schumaker:
"New Features:
- Support for eager writes, and the write=eager and write=wait mount
options
- Other Bugfixes and Cleanups:
- Fix typos in some comments
- Fix up fall-through warnings for Clang
- Cleanups to the NFS readpage codepath
- Remove FMR support in rpcrdma_convert_iovs()
- Various other cleanups to xprtrdma
- Fix xprtrdma pad optimization for servers that don't support
RFC 8797
- Improvements to rpcrdma tracepoints
- Fix up nfs4_bitmask_adjust()
- Optimize sparse writes past the end of files"
* tag 'nfs-for-5.12-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (27 commits)
NFS: Support the '-owrite=' option in /proc/self/mounts and mountinfo
NFS: Set the stable writes flag when initialising the super block
NFS: Add mount options supporting eager writes
NFS: Add support for eager writes
NFS: 'flags' field should be unsigned in struct nfs_server
NFS: Don't set NFS_INO_INVALID_XATTR if there is no xattr cache
NFS: Always clear an invalid mapping when attempting a buffered write
NFS: Optimise sparse writes past the end of file
NFS: Fix documenting comment for nfs_revalidate_file_size()
NFSv4: Fixes for nfs4_bitmask_adjust()
xprtrdma: Clean up rpcrdma_prepare_readch()
rpcrdma: Capture bytes received in Receive completion tracepoints
xprtrdma: Pad optimization, revisited
rpcrdma: Fix comments about reverse-direction operation
xprtrdma: Refactor invocations of offset_in_page()
xprtrdma: Simplify rpcrdma_convert_kvec() and frwr_map()
xprtrdma: Remove FMR support in rpcrdma_convert_iovs()
NFS: Add nfs_pageio_complete_read() and remove nfs_readpage_async()
NFS: Call readpage_async_filler() from nfs_readpage_async()
NFS: Refactor nfs_readpage() and nfs_readpage_async() to use nfs_readdesc
...
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Since commit 9ed5af268e88 ("SUNRPC: Clean up the handling of page
padding in rpc_prepare_reply_pages()") [Dec 2020] the NFS client
passes payload data to the transport with the padding in xdr->pages
instead of in the send buffer's tail kvec. There's no need for the
extra logic to advance the base of the tail kvec because the upper
layer no longer places XDR padding there.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The NetApp Linux team discovered that with NFS/RDMA servers that do
not support RFC 8797, the Linux client is forming NFSv4.x WRITE
requests incorrectly.
In this case, the Linux NFS client disables implicit chunk round-up
for odd-length Read and Write chunks. The goal was to support old
servers that needed that padding to be sent explicitly by clients.
In that case the Linux NFS included the tail kvec in the Read chunk,
since the tail contains any needed padding. That meant a separate
memory registration is needed for the tail kvec, adding to the cost
of forming such requests. To avoid that cost for a mere 3 bytes of
zeroes that are always ignored by receivers, we try to use implicit
roundup when possible.
For NFSv4.x, the tail kvec also sometimes contains a trailing
GETATTR operation. The Linux NFS client unintentionally includes
that GETATTR operation in the Read chunk as well as inline.
The fix is simply to /never/ include the tail kvec when forming a
data payload Read chunk. The padding is thus now always present.
Note that since commit 9ed5af268e88 ("SUNRPC: Clean up the handling
of page padding in rpc_prepare_reply_pages()") [Dec 2020] the NFS
client passes payload data to the transport with the padding in
xdr->pages instead of in the send buffer's tail kvec. So now the
Linux NFS client appends XDR padding to all odd-sized Read chunks.
This shouldn't be a problem because:
- RFC 8166-compliant servers are supposed to work with or without
that XDR padding in Read chunks.
- Since the padding is now in the same memory region as the data
payload, a separate memory registration is not needed. In
addition, the link layer extends data in RDMA Read responses to
4-byte boundaries anyway. Thus there is now no savings when the
padding is not included.
Because older kernels include the payload's XDR padding in the
tail kvec, a fix there will be more complicated. Thus backporting
this patch is not recommended.
Reported by: Olga Kornievskaia <Olga.Kornievskaia@netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
During the final stages of publication of RFC 8167, reviewers
requested that we use the term "reverse direction" rather than
"backwards direction". Update comments to reflect this preference.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Clean up so that offset_in_page() is invoked less often in the
most common case, which is mapping xdr->pages.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Clean up.
Remove a conditional branch from the SGL set-up loop in frwr_map():
Instead of using either sg_set_page() or sg_set_buf(), initialize
the mr_page field properly when rpcrdma_convert_kvec() converts the
kvec to an SGL entry. frwr_map() can then invoke sg_set_page()
unconditionally.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Support for FMR was removed by commit ba69cd122ece ("xprtrdma:
Remove support for FMR memory registration") [Dec 2018]. That means
the buffer-splitting behavior of rpcrdma_convert_kvec(), added by
commit 821c791a0bde ("xprtrdma: Segment head and tail XDR buffers
on page boundaries") [Mar 2016], is no longer necessary. FRWR
memory registration handles this case with aplomb.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In preparation to enable -Wimplicit-fallthrough for Clang, fix multiple
warnings by explicitly adding multiple break statements instead of
letting the code fall through to the next case.
Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
| |
| |
| |
| |
| |
| |
| | |
Few trivial and rudimentary spell corrections.
Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This comment was introduced by commit 6ea44adce915
("SUNRPC: ensure correct error is reported by xs_tcp_setup_socket()").
I believe EIO was a typo at the time: it should have been EAGAIN.
Subsequently, commit 0445f92c5d53 ("SUNRPC: Fix disconnection races")
changed that to ENOTCONN.
Rather than trying to keep the comment here in sync with the code in
xprt_force_disconnect(), make the point in a non-specific way.
Fixes: 6ea44adce915 ("SUNRPC: ensure correct error is reported by xs_tcp_setup_socket()")
Signed-off-by: Calum Mackay <calum.mackay@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull more nfsd updates from Chuck Lever:
"Here are a few additional NFSD commits for the merge window:
Optimization:
- Cork the socket while there are queued replies
Fixes:
- DRC shutdown ordering
- svc_rdma_accept() lockdep splat"
* tag 'nfsd-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
SUNRPC: Further clean up svc_tcp_sendmsg()
SUNRPC: Remove redundant socket flags from svc_tcp_sendmsg()
SUNRPC: Use TCP_CORK to optimise send performance on the server
svcrdma: Hold private mutex while invoking rdma_accept()
nfsd: register pernet ops last, unregister first
|
| | |
| | |
| | |
| | |
| | |
| | | |
Clean up: The msghdr is no longer needed in the caller.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Now that the caller controls the TCP_CORK socket option, it is redundant
to set MSG_MORE and MSG_SENDPAGE_NOTLAST in the calls to
kernel_sendpage().
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Use a counter to keep track of how many requests are queued behind the
xprt->xpt_mutex, and keep TCP_CORK set until the queue is empty.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Link: https://lore.kernel.org/linux-nfs/20210213202532.23146-1-trondmy@kernel.org/T/#u
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
RDMA core mutex locking was restructured by commit d114c6feedfe
("RDMA/cma: Add missing locking to rdma_accept()") [Aug 2020]. When
lock debugging is enabled, the RPC/RDMA server trips over the new
lockdep assertion in rdma_accept() because it doesn't call
rdma_accept() from its CM event handler.
As a temporary fix, have svc_rdma_accept() take the handler_mutex
explicitly. In the meantime, let's consider how to restructure the
RPC/RDMA transport to invoke rdma_accept() from the proper context.
Calls to svc_rdma_accept() are serialized with calls to
svc_rdma_free() by the generic RPC server layer.
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/linux-rdma/20210209154014.GO4247@nvidia.com/
Fixes: d114c6feedfe ("RDMA/cma: Add missing locking to rdma_accept()")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|\ \ \
| |/ /
|/| |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Pull nfsd updates from Chuck Lever:
- Update NFSv2 and NFSv3 XDR decoding functions
- Further improve support for re-exporting NFS mounts
- Convert NFSD stats to per-CPU counters
- Add batch Receive posting to the server's RPC/RDMA transport
* tag 'nfsd-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (65 commits)
nfsd: skip some unnecessary stats in the v4 case
nfs: use change attribute for NFS re-exports
NFSv4_2: SSC helper should use its own config.
nfsd: cstate->session->se_client -> cstate->clp
nfsd: simplify nfsd4_check_open_reclaim
nfsd: remove unused set_client argument
nfsd: find_cpntf_state cleanup
nfsd: refactor set_client
nfsd: rename lookup_clientid->set_client
nfsd: simplify nfsd_renew
nfsd: simplify process_lock
nfsd4: simplify process_lookup1
SUNRPC: Correct a comment
svcrdma: DMA-sync the receive buffer in svc_rdma_recvfrom()
svcrdma: Reduce Receive doorbell rate
svcrdma: Deprecate stat variables that are no longer used
svcrdma: Restore read and write stats
svcrdma: Convert rdma_stat_sq_starve to a per-CPU counter
svcrdma: Convert rdma_stat_recv to a per-CPU counter
svcrdma: Refactor svc_rdma_init() and svc_rdma_clean_up()
...
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Clean up: The rq_argpages field was removed from struct svc_rqst in
the pre-git era.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The Receive completion handler doesn't look at the contents of the
Receive buffer. The DMA sync isn't terribly expensive but it's one
less thing that needs to be done by the Receive completion handler,
which is single-threaded (per svc_xprt). This helps scalability.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This is similar to commit e340c2d6ef2a ("xprtrdma: Reduce the
doorbell rate (Receive)") which added Receive batching to the
client.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Clean up. We are not permitted to remove old proc files. Instead,
convert these variables to stubs that are only ever allowed to
display a value of zero.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Now that we have an efficient mechanism to update these two stats,
let's start maintaining them again.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Avoid the overhead of a memory bus lock cycle for counting a value
that is hardly every used.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Receives are frequent events. Avoid the overhead of a memory bus
lock cycle for counting a value that is hardly every used.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | | |
Setting up the proc variables is about to get more complicated.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
| |/
|/|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Anj Duvnjak reports that the Kodi.tv NFS client is not able to read
video files from a v5.10.11 Linux NFS server.
The new sendpage-based TCP sendto logic was not attentive to non-
zero page_base values. nfsd_splice_read() sets that field when a
READ payload starts in the middle of a page.
The Linux NFS client rarely emits an NFS READ that is not page-
aligned. All of my testing so far has been with Linux clients, so I
missed this one.
Reported-by: A. Duvnjak <avian@extremenerds.net>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=211471
Fixes: 4a85a6a3320b ("SUNRPC: Handle TCP socket sends with kernel_sendpage() again")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: A. Duvnjak <avian@extremenerds.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When handling an auth_gss downcall, it's possible to get 0-length
opaque object for the acceptor. In the case of a 0-length XDR
object, make sure simple_get_netobj() fills in dest->data = NULL,
and does not continue to kmemdup() which will set
dest->data = ZERO_SIZE_PTR for the acceptor.
The trace event code can handle NULL but not ZERO_SIZE_PTR for a
string, and so without this patch the rpcgss_context trace event
will crash the kernel as follows:
[ 162.887992] BUG: kernel NULL pointer dereference, address: 0000000000000010
[ 162.898693] #PF: supervisor read access in kernel mode
[ 162.900830] #PF: error_code(0x0000) - not-present page
[ 162.902940] PGD 0 P4D 0
[ 162.904027] Oops: 0000 [#1] SMP PTI
[ 162.905493] CPU: 4 PID: 4321 Comm: rpc.gssd Kdump: loaded Not tainted 5.10.0 #133
[ 162.908548] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 162.910978] RIP: 0010:strlen+0x0/0x20
[ 162.912505] Code: 48 89 f9 74 09 48 83 c1 01 80 39 00 75 f7 31 d2 44 0f b6 04 16 44 88 04 11 48 83 c2 01 45 84 c0 75 ee c3 0f 1f 80 00 00 00 00 <80> 3f 00 74 10 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 31
[ 162.920101] RSP: 0018:ffffaec900c77d90 EFLAGS: 00010202
[ 162.922263] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000fffde697
[ 162.925158] RDX: 000000000000002f RSI: 0000000000000080 RDI: 0000000000000010
[ 162.928073] RBP: 0000000000000010 R08: 0000000000000e10 R09: 0000000000000000
[ 162.930976] R10: ffff8e698a590cb8 R11: 0000000000000001 R12: 0000000000000e10
[ 162.933883] R13: 00000000fffde697 R14: 000000010034d517 R15: 0000000000070028
[ 162.936777] FS: 00007f1e1eb93700(0000) GS:ffff8e6ab7d00000(0000) knlGS:0000000000000000
[ 162.940067] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 162.942417] CR2: 0000000000000010 CR3: 0000000104eba000 CR4: 00000000000406e0
[ 162.945300] Call Trace:
[ 162.946428] trace_event_raw_event_rpcgss_context+0x84/0x140 [auth_rpcgss]
[ 162.949308] ? __kmalloc_track_caller+0x35/0x5a0
[ 162.951224] ? gss_pipe_downcall+0x3a3/0x6a0 [auth_rpcgss]
[ 162.953484] gss_pipe_downcall+0x585/0x6a0 [auth_rpcgss]
[ 162.955953] rpc_pipe_write+0x58/0x70 [sunrpc]
[ 162.957849] vfs_write+0xcb/0x2c0
[ 162.959264] ksys_write+0x68/0xe0
[ 162.960706] do_syscall_64+0x33/0x40
[ 162.962238] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 162.964346] RIP: 0033:0x7f1e1f1e57df
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|/
|
|
|
|
|
|
|
|
|
|
| |
Remove duplicated helper functions to parse opaque XDR objects
and place inside new file net/sunrpc/auth_gss/auth_gss_internal.h.
In the new file carry the license and copyright from the source file
net/sunrpc/auth_gss/auth_gss.c. Finally, update the comment inside
include/linux/sunrpc/xdr.h since lockd is not the only user of
struct xdr_netobj.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Avoid exposing parent of root directory in NFSv3 READDIRPLUS results
- Fix a tracepoint change that went in the initial 5.11 merge
* tag 'nfsd-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
SUNRPC: Move the svc_xdr_recvfrom tracepoint again
nfsd4: readdirplus shouldn't return parent of export
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Commit 156708adf2d9 ("SUNRPC: Move the svc_xdr_recvfrom()
tracepoint") tried to capture the correct XID in the trace record,
but this line in svc_recv:
rqstp->rq_xid = svc_getu32(&rqstp->rq_arg.head[0]);
alters the size of rq_arg.head[0].iov_len. The tracepoint records
the correct XID but an incorrect value for the length of the
xdr_buf's head.
To keep the trace callsites simple, I've created two trace classes.
One assumes the xdr_buf contains a full RPC message, and the XID
can be extracted from it. The other assumes the contents of the
xdr_buf are arbitrary, and the xid will be provided by the caller.
Currently there is only one user of each class, but I expect we will
need a few more tracepoints using each class as time goes on.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|\ \
| |/
|/|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull NFS client fixes from Trond Myklebust:
"Highlights include:
- Fix parsing of link-local IPv6 addresses
- Fix confusing logging of mount errors that was introduced by the
fsopen() patchset.
- Fix a tracing use after free in _nfs4_do_setlk()
- Layout return-on-close fixes when called from nfs4_evict_inode()
- Layout segments were being leaked in
pnfs_generic_clear_request_commit()
- Don't leak DS commits in pnfs_generic_retry_commit()
- Fix an Oopsable use-after-free when nfs_delegation_find_inode_server()
calls iput() on an inode after the super block has gone away"
* tag 'nfs-for-5.11-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: nfs_igrab_and_active must first reference the superblock
NFS: nfs_delegation_find_inode_server must first reference the superblock
NFS/pNFS: Fix a leak of the layout 'plh_outstanding' counter
NFS/pNFS: Don't leak DS commits in pnfs_generic_retry_commit()
NFS/pNFS: Don't call pnfs_free_bucket_lseg() before removing the request
pNFS: Stricter ordering of layoutget and layoutreturn
pNFS: Clean up pnfs_layoutreturn_free_lsegs()
pNFS: We want return-on-close to complete when evicting the inode
pNFS: Mark layout for return if return-on-close was not sent
net: sunrpc: interpret the return value of kstrtou32 correctly
NFS: Adjust fs_context error logging
NFS4: Fix use-after-free in trace_event_raw_event_nfs4_set_lock
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
A return value of 0 means success. This is documented in lib/kstrtox.c.
This was found by trying to mount an NFS share from a link-local IPv6
address with the interface specified by its index:
mount("[fe80::1%1]:/srv/nfs", "/mnt", "nfs", 0, "nolock,addr=fe80::1%1")
Before this commit this failed with EINVAL and also caused the following
message in dmesg:
[...] NFS: bad IP address specified: addr=fe80::1%1
The syscall using the same address based on the interface name instead
of its index succeeds.
Credits for this patch go to my colleague Christian Speich, who traced
the origin of this bug to this line of code.
Signed-off-by: Johannes Nixdorf <j.nixdorf@avm.de>
Fixes: 00cfaa943ec3 ("replace strict_strto calls")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|\ \
| |/
|/|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull nfsd fixes from Chuck Lever:
- Fix major TCP performance regression
- Get NFSv4.2 READ_PLUS regression tests to pass
- Improve NFSv4 COMPOUND memory allocation
- Fix sparse warning
* tag 'nfsd-5.11-1' of git://git.linux-nfs.org/projects/cel/cel-2.6:
NFSD: Restore NFSv4 decoding's SAVEMEM functionality
SUNRPC: Handle TCP socket sends with kernel_sendpage() again
NFSD: Fix sparse warning in nfssvc.c
nfsd: Don't set eof on a truncated READ_PLUS
nfsd: Fixes for nfsd4_encode_read_plus_data()
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Daire Byrne reports a ~50% aggregrate throughput regression on his
Linux NFS server after commit da1661b93bf4 ("SUNRPC: Teach server to
use xprt_sock_sendmsg for socket sends"), which replaced
kernel_send_page() calls in NFSD's socket send path with calls to
sock_sendmsg() using iov_iter.
Investigation showed that tcp_sendmsg() was not using zero-copy to
send the xdr_buf's bvec pages, but instead was relying on memcpy.
This means copying every byte of a large NFS READ payload.
It looks like TLS sockets do indeed support a ->sendpage method,
so it's really not necessary to use xprt_sock_sendmsg() to support
TLS fully on the server. A mechanical reversion of da1661b93bf4 is
not possible at this point, but we can re-implement the server's
TCP socket sendmsg path using kernel_sendpage().
Reported-by: Daire Byrne <daire@dneg.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=209439
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|\ \
| |/
|/|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull NFS client updates from Trond Myklebust:
"Highlights include:
Features:
- NFSv3: Add emulation of lookupp() to improve open_by_filehandle()
support
- A series of patches to improve readdir performance, particularly
with large directories
- Basic support for using NFS/RDMA with the pNFS files and flexfiles
drivers
- Micro-optimisations for RDMA
- RDMA tracing improvements
Bugfixes:
- Fix a long standing bug with xs_read_xdr_buf() when receiving
partial pages (Dan Aloni)
- Various fixes for getxattr and listxattr, when used over non-TCP
transports
- Fixes for containerised NFS from Sargun Dhillon
- switch nfsiod to be an UNBOUND workqueue (Neil Brown)
- READDIR should not ask for security label information if there is
no LSM policy (Olga Kornievskaia)
- Avoid using interval-based rebinding with TCP in lockd (Calum
Mackay)
- A series of RPC and NFS layer fixes to support the NFSv4.2
READ_PLUS code
- A couple of fixes for pnfs/flexfiles read failover
Cleanups:
- Various cleanups for the SUNRPC xdr code in conjunction with the
READ_PLUS fixes"
* tag 'nfs-for-5.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (90 commits)
NFS/pNFS: Fix a typo in ff_layout_resend_pnfs_read()
pNFS/flexfiles: Avoid spurious layout returns in ff_layout_choose_ds_for_read
NFSv4/pnfs: Add tracing for the deviceid cache
fs/lockd: convert comma to semicolon
NFSv4.2: fix error return on memory allocation failure
NFSv4.2/pnfs: Don't use READ_PLUS with pNFS yet
NFSv4.2: Deal with potential READ_PLUS data extent buffer overflow
NFSv4.2: Don't error when exiting early on a READ_PLUS buffer overflow
NFSv4.2: Handle hole lengths that exceed the READ_PLUS read buffer
NFSv4.2: decode_read_plus_hole() needs to check the extent offset
NFSv4.2: decode_read_plus_data() must skip padding after data segment
NFSv4.2: Ensure we always reset the result->count in decode_read_plus()
SUNRPC: When expanding the buffer, we may need grow the sparse pages
SUNRPC: Cleanup - constify a number of xdr_buf helpers
SUNRPC: Clean up open coded setting of the xdr_stream 'nwords' field
SUNRPC: _copy_to/from_pages() now check for zero length
SUNRPC: Cleanup xdr_shrink_bufhead()
SUNRPC: Fix xdr_expand_hole()
SUNRPC: Fixes for xdr_align_data()
SUNRPC: _shift_data_left/right_pages should check the shift length
...
|
| |\
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.linux-nfs.org/projects/anna/linux-nfs into linux-next
NFSoRDmA Client updates for Linux 5.11
Cleanups and improvements:
- Remove use of raw kernel memory addresses in tracepoints
- Replace dprintk() call sites in ERR_CHUNK path
- Trace unmap sync calls
- Optimize MR DMA-unmapping
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Now that rpcrdma_ep is no longer part of rpcrdma_xprt, there are
four or five serial address dereferences needed to get to the
IB device needed for DMA unmapping.
Instead, let's use the same pattern that regbufs use: cache a
pointer to the device in the MR, and use that as the indication
that unmapping is necessary.
This also guarantees that the exact same device is used for DMA
mapping and unmapping, even if the r_xprt's ep has been replaced. I
don't think this can happen today, but future changes might break
this assumption.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Clean up: This function is now invoked only in frwr_ops.c. The move
enables deduplication of the trace_xprtrdma_mr_unmap() call site.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
->buf_free is called nearly once per RPC. Only rarely does
xprt_rdma_free() have to do anything, thus tracing every one of
these calls seems unnecessary. Instead, just throw a trace event
when that one occasional RPC still has MRs that need to be
released.
xprt_rdma_free() is further micro-optimized to reduce the amount of
work done in the common case.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Tie each MR event to the requesting rpc_task to make it easier to
follow MR ownership and control flow.
MR unmapping and recycling can happen in the background, after an
MR's mr_req field is stale, so set up a separate tracepoint class
for those events.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
- Rename it following the "_err" suffix convention
- Replace display of kernel memory addresses
- Tie MR exhaustion to a peer IP address, similar to the createmrs
tracepoint
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
- Replace displayed kernel memory addresses
- Tie the XID and event with the peer's IP address
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Replace unnecessary display of kernel memory addresses.
Also, there are no longer any trace_xprtrdma_defer_cmp() call sites.
And remove the trace_xprtrdma_leaked_rep() tracepoint because there
doesn't seem to be an overwhelming need to have a tracepoint for
catching a software bug that has long since been fixed.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
- Rename the tracepoints with the "_err" suffix to indicate these
are rare error events
- Replace display of kernel memory addresses
- Tie the XID and error to a connection IP address instead
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
- Replace the display of kernel memory addresses
- Add "_err" to the end of its name to indicate that it's a
tracepoint that fires only when there's an error
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Set up a completion ID in each rpcrdma_frwr. The ID is used to match
an incoming completion to a transport (CQ) and other MR-related
activity.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Set up a completion ID in each rpcrdma_req. The ID is used to match
an incoming Send completion to a transport and to a previous
ib_post_send().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Set up a completion ID in each rpcrdma_rep. The ID is used to match
an incoming Receive completion to a transport and to a previous
ib_post_recv().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
| | |
| | |
| | |
| | |
| | | |
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If we're shifting the page data to the right, and this happens to be a
sparse page array, then we may need to allocate new pages in order to
receive the data.
Reported-by: "Mkrtchyan, Tigran" <tigran.mkrtchyan@desy.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
There are a number of xdr helpers for struct xdr_buf that do not change
the structure itself. Mark those as taking const pointers for
documentation purposes.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|