summaryrefslogtreecommitdiffstats
path: root/fs/fuse (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'fuse-fixes-6.13-rc7' of ↵Christian Brauner11 days1-12/+19
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi <mszeredi@redhat.com>: - Fix fuse_get_user_pages() allocation failure handling. - Fix direct-io folio offset and length calculation. * tag 'fuse-fixes-6.13-rc7' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: Set *nbytesp=0 in fuse_get_user_pages on allocation failure fuse: fix direct io folio offset and length calculation Link: https://lore.kernel.org/r/CAJfpegu7o_X%3DSBWk_C47dUVUQ1mJZDEGe1MfD0N3wVJoUBWdmg@mail.gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
| * fuse: Set *nbytesp=0 in fuse_get_user_pages on allocation failureBernd Schubert2024-12-131-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In fuse_get_user_pages(), set *nbytesp to 0 when struct page **pages allocation fails. This prevents the caller (fuse_direct_io) from making incorrect assumptions that could lead to NULL pointer dereferences when processing the request reply. Previously, *nbytesp was left unmodified on allocation failure, which could cause issues if the caller assumed pages had been added to ap->descs[] when they hadn't. Reported-by: syzbot+87b8e6ed25dbc41759f7@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=87b8e6ed25dbc41759f7 Fixes: 3b97c3652d91 ("fuse: convert direct io to use folios") Signed-off-by: Bernd Schubert <bschubert@ddn.com> Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Tested-by: Dmitry Antipov <dmantipov@yandex.ru> Tested-by: David Howells <dhowells@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * fuse: fix direct io folio offset and length calculationJoanne Koong2024-12-121-12/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For the direct io case, the pages from userspace may be part of a huge folio, even if all folios in the page cache for fuse are small. Fix the logic for calculating the offset and length of the folio for the direct io case, which currently incorrectly assumes that all folios encountered are one page size. Fixes: 3b97c3652d91 ("fuse: convert direct io to use folios") Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Bernd Schubert <bschubert@ddn.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* | fuse: respect FOPEN_KEEP_CACHE on opendirAmir Goldstein2025-01-041-0/+2
|/ | | | | | | | | | | | | The re-factoring of fuse_dir_open() missed the need to invalidate directory inode page cache with open flag FOPEN_KEEP_CACHE. Fixes: 7de64d521bf92 ("fuse: break up fuse_open_common()") Reported-by: Prince Kumar <princer@google.com> Closes: https://lore.kernel.org/linux-fsdevel/CAEW=TRr7CYb4LtsvQPLj-zx5Y+EYBmGfM24SuzwyDoGVNoKm7w@mail.gmail.com/ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20250101130037.96680-1-amir73il@gmail.com Reviewed-by: Bernd Schubert <bernd.schubert@fastmail.fm> Signed-off-by: Christian Brauner <brauner@kernel.org>
* Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds2024-11-271-6/+7
|\ | | | | | | | | | | | | | | | | | | | | | | | | Pull virtio updates from Michael Tsirkin: "A small number of improvements all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio_vdpa: remove redundant check on desc virtio_fs: store actual queue index in mq_map virtio_fs: add informative log for new tag discovery virtio: Make vring_new_virtqueue support packed vring virtio_pmem: Add freeze/restore callbacks vdpa/mlx5: Fix suboptimal range on iotlb iteration
| * virtio_fs: store actual queue index in mq_mapMax Gurtovoy2024-11-131-6/+6
| | | | | | | | | | | | | | | | | | This will eliminate the need for index recalculation during the fast path. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Message-Id: <20241006184341.9081-1-mgurtovoy@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * virtio_fs: add informative log for new tag discoveryMax Gurtovoy2024-11-131-0/+1
| | | | | | | | | | | | | | | | | | | | Enhance the device probing process by adding a log message when a new virtio-fs tag is successfully discovered. This improvement provides better visibility into the initialization of virtio-fs devices. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Message-Id: <20241006184324.8497-1-mgurtovoy@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* | Merge tag 'fuse-update-6.13' of ↵Linus Torvalds2024-11-2612-363/+550
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: - Add page -> folio conversions (Joanne Koong, Josef Bacik) - Allow max size of fuse requests to be configurable with a sysctl (Joanne Koong) - Allow FOPEN_DIRECT_IO to take advantage of async code path (yangyun) - Fix large kernel reads (like a module load) in virtio_fs (Hou Tao) - Fix attribute inconsistency in case readdirplus (and plain lookup in corner cases) is racing with inode eviction (Zhang Tianci) - Fix a WARN_ON triggered by virtio_fs (Asahi Lina) * tag 'fuse-update-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (30 commits) virtiofs: dax: remove ->writepages() callback fuse: check attributes staleness on fuse_iget() fuse: remove pages for requests and exclusively use folios fuse: convert direct io to use folios mm/writeback: add folio_mark_dirty_lock() fuse: convert writebacks to use folios fuse: convert retrieves to use folios fuse: convert ioctls to use folios fuse: convert writes (non-writeback) to use folios fuse: convert reads to use folios fuse: convert readdir to use folios fuse: convert readlink to use folios fuse: convert cuse to use folios fuse: add support in virtio for requests using folios fuse: support folios in struct fuse_args_pages and fuse_copy_pages() fuse: convert fuse_notify_store to use folios fuse: convert fuse_retrieve to use folios fuse: use the folio based vmstat helpers fuse: convert fuse_writepage_need_send to take a folio fuse: convert fuse_do_readpage to use folios ...
| * | virtiofs: dax: remove ->writepages() callbackAsahi Lina2024-11-181-11/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When using FUSE DAX with virtiofs, cache coherency is managed by the host. Disk persistence is handled via fsync() and friends, which are passed directly via the FUSE layer to the host. Therefore, there's no need to do dax_writeback_mapping_range(). All that ends up doing is a cache flush operation, which is not caught by KVM and doesn't do much, since the host and guest are already cache-coherent. Since dax_writeback_mapping_range() checks that the inode block size is equal to PAGE_SIZE, this fixes a spurious WARN when virtiofs is used with a mismatched guest PAGE_SIZE and virtiofs backing FS block size (this happens, for example, when it's a tmpfs and the host and guest have a different PAGE_SIZE). FUSE DAX does not require any particular FS block size, since it always performs DAX mappings in aligned 2MiB blocks. See discussion in [1]. [1] https://lore.kernel.org/lkml/20241101-dax-page-size-v1-1-eedbd0c6b08f@asahilina.net/T/#u [SzM: remove the empty callback] Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Asahi Lina <lina@asahilina.net> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | fuse: check attributes staleness on fuse_iget()Zhang Tianci2024-11-184-25/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Function fuse_direntplus_link() might call fuse_iget() to initialize a new fuse_inode and change its attributes. If fi->attr_version is always initialized with 0, even if the attributes returned by the FUSE_READDIR request is staled, as the new fi->attr_version is 0, fuse_change_attributes will still set the staled attributes to inode. This wrong behaviour may cause file size inconsistency even when there is no changes from server-side. To reproduce the issue, consider the following 2 programs (A and B) are running concurrently, A B ---------------------------------- -------------------------------- { /fusemnt/dir/f is a file path in a fuse mount, the size of f is 0. } readdir(/fusemnt/dir) start //Daemon set size 0 to f direntry fallocate(f, 1024) stat(f) // B see size 1024 echo 2 > /proc/sys/vm/drop_caches readdir(/fusemnt/dir) reply to kernel Kernel set 0 to the I_NEW inode stat(f) // B see size 0 In the above case, only program B is modifying the file size, however, B observes file size changing between the 2 'readonly' stat() calls. To fix this issue, we should make sure readdirplus still follows the rule of attr_version staleness checking even if the fi->attr_version is lost due to inode eviction. To identify this situation, the new fc->evict_ctr is used to record whether the eviction of inodes occurs during the readdirplus request processing. If it does, the result of readdirplus may be inaccurate; otherwise, the result of readdirplus can be trusted. Although this may still lead to incorrect invalidation, considering the relatively low frequency of evict occurrences, it should be acceptable. Link: https://lore.kernel.org/lkml/20230711043405.66256-2-zhangjiachen.jaycee@bytedance.com/ Link: https://lore.kernel.org/lkml/20241114070905.48901-1-zhangtianci.1997@bytedance.com/ Reported-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com> Suggested-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Zhang Tianci <zhangtianci.1997@bytedance.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | fuse: remove pages for requests and exclusively use foliosJoanne Koong2024-11-058-154/+90
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All fuse requests use folios instead of pages for transferring data. Remove pages from the requests and exclusively use folios. No functional changes. [SzM: rename back folio_descs -> descs, etc.] Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | fuse: convert direct io to use foliosJoanne Koong2024-11-052-67/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert direct io requests to use folios instead of pages. No functional changes. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | fuse: convert writebacks to use foliosJoanne Koong2024-11-051-62/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert writeback requests to use folios instead of pages. No functional changes. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | fuse: convert retrieves to use foliosJoanne Koong2024-11-051-10/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert retrieve requests to use folios instead of pages. No functional changes. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | fuse: convert ioctls to use foliosJoanne Koong2024-11-052-16/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert ioctl requests to use folios instead of pages. No functional changes. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | fuse: convert writes (non-writeback) to use foliosJoanne Koong2024-11-052-16/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert non-writeback write requests to use folios instead of pages. No functional changes. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | fuse: convert reads to use foliosJoanne Koong2024-11-052-22/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert read requests to use folios instead of pages. No functional changes. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | fuse: convert readdir to use foliosJoanne Koong2024-11-051-10/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert readdir requests to use a folio instead of a page. No functional changes. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | fuse: convert readlink to use foliosJoanne Koong2024-11-051-14/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert readlink requests to use a folio instead of a page. No functional changes. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | fuse: convert cuse to use foliosJoanne Koong2024-11-051-15/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert cuse requests to use a folio instead of a page. No functional changes. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | fuse: add support in virtio for requests using foliosJoanne Koong2024-11-051-31/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until all requests have been converted to use folios instead of pages, virtio will need to support both types. Once all requests have been converted, then virtio will support just folios. No functional changes. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | fuse: support folios in struct fuse_args_pages and fuse_copy_pages()Joanne Koong2024-11-052-11/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds support in struct fuse_args_pages and fuse_copy_pages() for using folios instead of pages for transferring data. Both folios and pages must be supported right now in struct fuse_args_pages and fuse_copy_pages() until all request types have been converted to use folios. Once all have been converted, then struct fuse_args_pages and fuse_copy_pages() will only support folios. Right now in fuse, all folios are one page (large folios are not yet supported). As such, copying folio->page is sufficient for copying the entire folio in fuse_copy_pages(). No functional changes. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | fuse: convert fuse_notify_store to use foliosJosef Bacik2024-10-251-11/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This function creates pages in an inode and copies data into them, update the function to use a folio instead of a page, and use the appropriate folio helpers. [SzM: use filemap_grab_folio()] [Hau Tao: The third argument of folio_zero_range() should be the length to be zeroed, not the total length. Fix it by using folio_zero_segment() instead in fuse_notify_store()] Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | fuse: convert fuse_retrieve to use foliosJosef Bacik2024-10-251-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | We're just looking for pages in a mapping, use a folio and the folio lookup function directly instead of using the page helper. Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | fuse: use the folio based vmstat helpersJosef Bacik2024-10-251-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to make it easier to switch to folios in the fuse_args_pages update the places where we update the vmstat counters for writeback to use the folio related helpers. On the inc side this is easy as we already have the folio, on the dec side we have to page_folio() the pages for now. Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | fuse: convert fuse_writepage_need_send to take a folioJosef Bacik2024-10-251-7/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fuse_writepage_need_send is called by fuse_writepages_fill() which already has a folio. Change fuse_writepage_need_send() to take a folio instead, add a helper to check if the folio range is under writeback and use this, as well as the appropriate folio helpers in the rest of the function. Update fuse_writepage_need_send() to pass in the folio directly. Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | fuse: convert fuse_do_readpage to use foliosJosef Bacik2024-10-251-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that the buffered write path is using folios, convert fuse_do_readpage() to take a folio instead of a page, update it to use the appropriate folio helpers, and update the callers to pass in the folio directly instead of a page. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | fuse: use kiocb_modified in buffered write pathJosef Bacik2024-10-251-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | This combines the file_remove_privs() and file_update_time() call into one call. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Bernd Schubert <bschubert@ddn.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | fuse: convert fuse_page_mkwrite to use foliosJosef Bacik2024-10-251-5/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | Convert this to grab the folio directly, and update all the helpers to use the folio related functions. Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | fuse: convert fuse_fill_write_pages to use foliosJosef Bacik2024-10-251-13/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | Convert this to grab the folio directly, and update all the helpers to use the folio related functions. Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | fuse: convert fuse_send_write_pages to use foliosJosef Bacik2024-10-251-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert this to grab the folio from the fuse_args_pages and use the appropriate folio related functions. Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | fuse: convert readahead to use foliosJosef Bacik2024-10-251-13/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we're using the __readahead_batch() helper which populates our fuse_args_pages->pages array with pages. Convert this to use the newer folio based pattern which is to call readahead_folio() to get the next folio in the read ahead batch. I've updated the code to use things like folio_size() and to take into account larger folio sizes, but this is purely to make that eventual work easier to do, we currently will not get large folios so this is more future proofing than actual support. [SzM: remove check for readahead_folio() won't return NULL (at least for now) so remove ugly assign in conditional.] Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | fuse: use fuse_range_is_writeback() instead of iterating pagesJosef Bacik2024-10-251-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fuse_send_readpages() waits for writeback on each page. This can be replaced by a single call to fuse_range_is_writeback(). [SzM: split this off from "fuse: convert readahead to use folios"] Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | virtiofs: use GFP_NOFS when enqueuing request through kworkerHou Tao2024-10-251-9/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When invoking virtio_fs_enqueue_req() through kworker, both the allocation of the sg array and the bounce buffer still use GFP_ATOMIC. Considering the size of the sg array may be greater than PAGE_SIZE, use GFP_NOFS instead of GFP_ATOMIC to lower the possibility of memory allocation failure and to avoid unnecessarily depleting the atomic reserves. GFP_NOFS is not passed to virtio_fs_enqueue_req() directly, GFP_KERNEL and memalloc_nofs_{save|restore} helpers are used instead. It may seem OK to pass GFP_NOFS to virtio_fs_enqueue_req() as well when queuing the request for the first time, but this is not the case. The reason is that fuse_request_queue_background() may call ->queue_request_and_unlock() while holding fc->bg_lock, which is a spin-lock. Therefore, still use GFP_ATOMIC for it. Signed-off-by: Hou Tao <houtao1@huawei.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | virtiofs: use pages instead of pointer for kernel direct IOHou Tao2024-10-253-19/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When trying to insert a 10MB kernel module kept in a virtio-fs with cache disabled, the following warning was reported: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 404 at mm/page_alloc.c:4551 ...... Modules linked in: CPU: 1 PID: 404 Comm: insmod Not tainted 6.9.0-rc5+ #123 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ...... RIP: 0010:__alloc_pages+0x2bf/0x380 ...... Call Trace: <TASK> ? __warn+0x8e/0x150 ? __alloc_pages+0x2bf/0x380 __kmalloc_large_node+0x86/0x160 __kmalloc+0x33c/0x480 virtio_fs_enqueue_req+0x240/0x6d0 virtio_fs_wake_pending_and_unlock+0x7f/0x190 queue_request_and_unlock+0x55/0x60 fuse_simple_request+0x152/0x2b0 fuse_direct_io+0x5d2/0x8c0 fuse_file_read_iter+0x121/0x160 __kernel_read+0x151/0x2d0 kernel_read+0x45/0x50 kernel_read_file+0x1a9/0x2a0 init_module_from_file+0x6a/0xe0 idempotent_init_module+0x175/0x230 __x64_sys_finit_module+0x5d/0xb0 x64_sys_call+0x1c3/0x9e0 do_syscall_64+0x3d/0xc0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 ...... </TASK> ---[ end trace 0000000000000000 ]--- The warning is triggered as follows: 1) syscall finit_module() handles the module insertion and it invokes kernel_read_file() to read the content of the module first. 2) kernel_read_file() allocates a 10MB buffer by using vmalloc() and passes it to kernel_read(). kernel_read() constructs a kvec iter by using iov_iter_kvec() and passes it to fuse_file_read_iter(). 3) virtio-fs disables the cache, so fuse_file_read_iter() invokes fuse_direct_io(). As for now, the maximal read size for kvec iter is only limited by fc->max_read. For virtio-fs, max_read is UINT_MAX, so fuse_direct_io() doesn't split the 10MB buffer. It saves the address and the size of the 10MB-sized buffer in out_args[0] of a fuse request and passes the fuse request to virtio_fs_wake_pending_and_unlock(). 4) virtio_fs_wake_pending_and_unlock() uses virtio_fs_enqueue_req() to queue the request. Because virtiofs need DMA-able address, so virtio_fs_enqueue_req() uses kmalloc() to allocate a bounce buffer for all fuse args, copies these args into the bounce buffer and passed the physical address of the bounce buffer to virtiofsd. The total length of these fuse args for the passed fuse request is about 10MB, so copy_args_to_argbuf() invokes kmalloc() with a 10MB size parameter and it triggers the warning in __alloc_pages(): if (WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp)) return NULL; 5) virtio_fs_enqueue_req() will retry the memory allocation in a kworker, but it won't help, because kmalloc() will always return NULL due to the abnormal size and finit_module() will hang forever. A feasible solution is to limit the value of max_read for virtio-fs, so the length passed to kmalloc() will be limited. However it will affect the maximal read size for normal read. And for virtio-fs write initiated from kernel, it has the similar problem but now there is no way to limit fc->max_write in kernel. So instead of limiting both the values of max_read and max_write in kernel, introducing use_pages_for_kvec_io in fuse_conn and setting it as true in virtiofs. When use_pages_for_kvec_io is enabled, fuse will use pages instead of pointer to pass the KVEC_IO data. After switching to pages for KVEC_IO data, these pages will be used for DMA through virtio-fs. If these pages are backed by vmalloc(), {flush|invalidate}_kernel_vmap_range() are necessary to flush or invalidate the cache before the DMA operation. So add two new fields in fuse_args_pages to record the base address of vmalloc area and the condition indicating whether invalidation is needed. Perform the flush in fuse_get_user_pages() for write operations and the invalidation in fuse_release_user_pages() for read operations. It may seem necessary to introduce another field in fuse_conn to indicate that these KVEC_IO pages are used for DMA, However, considering that virtio-fs is currently the only user of use_pages_for_kvec_io, just reuse use_pages_for_kvec_io to indicate that these pages will be used for DMA. Fixes: a62a8ef9d97d ("virtio-fs: add virtiofs filesystem") Signed-off-by: Hou Tao <houtao1@huawei.com> Tested-by: Jingbo Xu <jefflexu@linux.alibaba.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | fuse: remove useless IOCB_DIRECT in fuse_direct_read/write_iteryangyun2024-10-251-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 23c94e1cdcbf ("fuse: Switch to using async direct IO for FOPEN_DIRECT_IO") gave the async direct IO code path in the fuse_direct_read_iter() and fuse_direct_write_iter(). But since these two functions are only called under FOPEN_DIRECT_IO is set, it seems that we can also use the async direct IO even the flag IOCB_DIRECT is not set to enjoy the async direct IO method. Also move the definition of fuse_io_priv to where it is used in fuse_ direct_write_iter. Signed-off-by: yangyun <yangyun50@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | fuse: enable dynamic configuration of fuse max pages limit (FUSE_MAX_MAX_PAGES)Joanne Koong2024-10-255-5/+65
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce the capability to dynamically configure the max pages limit (FUSE_MAX_MAX_PAGES) through a sysctl. This allows system administrators to dynamically set the maximum number of pages that can be used for servicing requests in fuse. Previously, this is gated by FUSE_MAX_MAX_PAGES which is statically set to 256 pages. One result of this is that the buffer size for a write request is limited to 1 MiB on a 4k-page system. The default value for this sysctl is the original limit (256 pages). $ sysctl -a | grep max_pages_limit fs.fuse.max_pages_limit = 256 $ sysctl -n fs.fuse.max_pages_limit 256 $ echo 1024 | sudo tee /proc/sys/fs/fuse/max_pages_limit 1024 $ sysctl -n fs.fuse.max_pages_limit 1024 $ echo 65536 | sudo tee /proc/sys/fs/fuse/max_pages_limit tee: /proc/sys/fs/fuse/max_pages_limit: Invalid argument $ echo 0 | sudo tee /proc/sys/fs/fuse/max_pages_limit tee: /proc/sys/fs/fuse/max_pages_limit: Invalid argument $ echo 65535 | sudo tee /proc/sys/fs/fuse/max_pages_limit 65535 $ sysctl -n fs.fuse.max_pages_limit 65535 Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* | Merge tag 'ovl-update-6.13' of ↵Linus Torvalds2024-11-231-14/+18
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs Pull overlayfs updates from Amir Goldstein: - Fix a syzbot reported NULL pointer deref with bfs lower layers - Fix a copy up failure of large file from lower fuse fs - Followup cleanup of backing_file API from Miklos - Introduction and use of revert/override_creds_light() helpers, that were suggested by Christian as a mitigation to cache line bouncing and false sharing of fields in overlayfs creator_cred long lived struct cred copy. - Store up to two backing file references (upper and lower) in an ovl_file container instead of storing a single backing file in file->private_data. This is used to avoid the practice of opening a short lived backing file for the duration of some file operations and to avoid the specialized use of FDPUT_FPUT in such occasions, that was getting in the way of Al's fd_file() conversions. * tag 'ovl-update-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs: ovl: Filter invalid inodes with missing lookup function ovl: convert ovl_real_fdget() callers to ovl_real_file() ovl: convert ovl_real_fdget_path() callers to ovl_real_file_path() ovl: store upper real file in ovl_file struct ovl: allocate a container struct ovl_file for ovl private context ovl: do not open non-data lower file for fsync ovl: Optimize override/revert creds ovl: pass an explicit reference of creators creds to callers ovl: use wrapper ovl_revert_creds() fs/backing-file: Convert to revert/override_creds_light() cred: Add a light version of override/revert_creds() backing-file: clean up the API ovl: properly handle large files in ovl_security_fileattr
| * | backing-file: clean up the APIMiklos Szeredi2024-11-111-14/+18
| |/ | | | | | | | | | | | | | | | | | | | | | | - Pass iocb to ctx->end_write() instead of file + pos - Get rid of ctx->user_file, which is redundant most of the time - Instead pass iocb to backing_file_splice_read and backing_file_splice_write Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
* | Merge tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds2024-11-181-4/+2
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull 'struct fd' class updates from Al Viro: "The bulk of struct fd memory safety stuff Making sure that struct fd instances are destroyed in the same scope where they'd been created, getting rid of reassignments and passing them by reference, converting to CLASS(fd{,_pos,_raw}). We are getting very close to having the memory safety of that stuff trivial to verify" * tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (28 commits) deal with the last remaing boolean uses of fd_file() css_set_fork(): switch to CLASS(fd_raw, ...) memcg_write_event_control(): switch to CLASS(fd) assorted variants of irqfd setup: convert to CLASS(fd) do_pollfd(): convert to CLASS(fd) convert do_select() convert vfs_dedupe_file_range(). convert cifs_ioctl_copychunk() convert media_request_get_by_fd() convert spu_run(2) switch spufs_calls_{get,put}() to CLASS() use convert cachestat(2) convert do_preadv()/do_pwritev() fdget(), more trivial conversions fdget(), trivial conversions privcmd_ioeventfd_assign(): don't open-code eventfd_ctx_fdget() o2hb_region_dev_store(): avoid goto around fdget()/fdput() introduce "fd_pos" class, convert fdget_pos() users to it. fdget_raw() users: switch to CLASS(fd_raw) convert vmsplice() to CLASS(fd) ...
| * fdget(), more trivial conversionsAl Viro2024-11-031-4/+2
| | | | | | | | | | | | | | | | | | | | all failure exits prior to fdget() leave the scope, all matching fdput() are immediately followed by leaving the scope. [xfs_ioc_commit_range() chunk moved here as well] Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | fuse: remove stray debug lineMiklos Szeredi2024-10-251-1/+0
| | | | | | | | | | | | | | | | | | It wasn't there when the patch was posted for review, but somehow made it into the pull. Link: https://lore.kernel.org/all/20240913104703.1673180-1-mszeredi@redhat.com/ Fixes: efad7153bf93 ("fuse: allow O_PATH fd for FUSE_DEV_IOC_BACKING_OPEN") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* | Revert "fuse: move initialization of fuse_file to fuse_writepages() instead ↵Miklos Szeredi2024-10-211-6/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | of in callback" This reverts commit 672c3b7457fcee9656c36a29a4b21ec4a652433e. fuse_writepages() might be called with no dirty pages after all writable opens were closed. In this case __fuse_write_file_get() will return NULL which will trigger the WARNING. The exact conditions under which this is triggered is unclear and syzbot didn't find a reproducer yet. Reported-by: syzbot+217a976dc26ef2fa8711@syzkaller.appspotmail.com Link: https://lore.kernel.org/all/CAJnrk1aQwfvb51wQ5rUSf9N8j1hArTFeSkHqC_3T-mU6_BCD=A@mail.gmail.com/ Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* | fuse: update inode size after extending passthrough writeAmir Goldstein2024-10-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | yangyun reported that libfuse test test_copy_file_range() copies zero bytes from a newly written file when fuse passthrough is enabled. The reason is that extending passthrough write is not updating the fuse inode size and when vfs_copy_file_range() observes a zero size inode, it returns without calling the filesystem copy_file_range() method. Fix this by adjusting the fuse inode size after an extending passthrough write. This does not provide cache coherency of fuse inode attributes and backing inode attributes, but it should prevent situations where fuse inode size is too small, causing read/copy to be wrongly shortened. Reported-by: yangyun <yangyun50@huawei.com> Closes: https://github.com/libfuse/libfuse/issues/1048 Fixes: 57e1176e6086 ("fuse: implement read/write passthrough") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* | fs: pass offset and result to backing_file end_write() callbackAmir Goldstein2024-10-161-3/+3
|/ | | | | | | | | This is needed for extending fuse inode size after fuse passthrough write. Suggested-by: Miklos Szeredi <miklos@szeredi.hu> Link: https://lore.kernel.org/linux-fsdevel/CAJfpegs=cvZ_NYy6Q_D42XhYS=Sjj5poM1b5TzXzOVvX=R36aA@mail.gmail.com/ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* [tree-wide] finally take no_llseek outAl Viro2024-09-272-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | no_llseek had been defined to NULL two years ago, in commit 868941b14441 ("fs: remove no_llseek") To quote that commit, At -rc1 we'll need do a mechanical removal of no_llseek - git grep -l -w no_llseek | grep -v porting.rst | while read i; do sed -i '/\<no_llseek\>/d' $i done would do it. Unfortunately, that hadn't been done. Linus, could you do that now, so that we could finally put that thing to rest? All instances are of the form .llseek = no_llseek, so it's obviously safe. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds2024-09-261-14/+150
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull virtio updates from Michael Tsirkin: "Several new features here: - virtio-balloon supports new stats - vdpa supports setting mac address - vdpa/mlx5 suspend/resume as well as MKEY ops are now faster - virtio_fs supports new sysfs entries for queue info - virtio/vsock performance has been improved And fixes, cleanups all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (34 commits) vsock/virtio: avoid queuing packets when intermediate queue is empty vsock/virtio: refactor virtio_transport_send_pkt_work fw_cfg: Constify struct kobj_type vdpa/mlx5: Postpone MR deletion vdpa/mlx5: Introduce init/destroy for MR resources vdpa/mlx5: Rename mr_mtx -> lock vdpa/mlx5: Extract mr members in own resource struct vdpa/mlx5: Rename function vdpa/mlx5: Delete direct MKEYs in parallel vdpa/mlx5: Create direct MKEYs in parallel MAINTAINERS: add virtio-vsock driver in the VIRTIO CORE section virtio_fs: add sysfs entries for queue information virtio_fs: introduce virtio_fs_put_locked helper vdpa: Remove unused declarations vdpa/mlx5: Parallelize VQ suspend/resume for CVQ MQ command vdpa/mlx5: Small improvement for change_num_qps() vdpa/mlx5: Keep notifiers during suspend but ignore vdpa/mlx5: Parallelize device resume vdpa/mlx5: Parallelize device suspend vdpa/mlx5: Use async API for vq modify commands ...
| * virtio_fs: add sysfs entries for queue informationMax Gurtovoy2024-09-251-8/+139
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce sysfs entries to provide visibility to the multiple queues used by the Virtio FS device. This enhancement allows users to query information about these queues. Specifically, add two sysfs entries: 1. Queue name: Provides the name of each queue (e.g. hiprio/requests.8). 2. CPU list: Shows the list of CPUs that can process requests for each queue. The CPU list feature is inspired by similar functionality in the block MQ layer, which provides analogous sysfs entries for block devices. These new sysfs entries will improve observability and aid in debugging and performance tuning of Virtio FS devices. Reviewed-by: Idan Zach <izach@nvidia.com> Reviewed-by: Shai Malin <smalin@nvidia.com> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Message-Id: <20240825130716.9506-2-mgurtovoy@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * virtio_fs: introduce virtio_fs_put_locked helperMax Gurtovoy2024-09-251-6/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce a new helper function virtio_fs_put_locked to encapsulate the common pattern of releasing a virtio_fs reference while holding a lock. The existing virtio_fs_put helper will be used to release a virtio_fs reference while not holding a lock. Also add an assertion in case the lock is not taken when it should. Reviewed-by: Idan Zach <izach@nvidia.com> Reviewed-by: Shai Malin <smalin@nvidia.com> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Message-Id: <20240825130716.9506-1-mgurtovoy@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
* | Merge tag 'fuse-update-6.12' of ↵Linus Torvalds2024-09-2510-294/+505
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: - Add support for idmapped fuse mounts (Alexander Mikhalitsyn) - Add optimization when checking for writeback (yangyun) - Add tracepoints (Josef Bacik) - Clean up writeback code (Joanne Koong) - Clean up request queuing (me) - Misc fixes * tag 'fuse-update-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (32 commits) fuse: use exclusive lock when FUSE_I_CACHE_IO_MODE is set fuse: clear FR_PENDING if abort is detected when sending request fs/fuse: convert to use invalid_mnt_idmap fs/mnt_idmapping: introduce an invalid_mnt_idmap fs/fuse: introduce and use fuse_simple_idmap_request() helper fs/fuse: fix null-ptr-deref when checking SB_I_NOIDMAP flag fuse: allow O_PATH fd for FUSE_DEV_IOC_BACKING_OPEN virtio_fs: allow idmapped mounts fuse: allow idmapped mounts fuse: warn if fuse_access is called when idmapped mounts are allowed fuse: handle idmappings properly in ->write_iter() fuse: support idmapped ->rename op fuse: support idmapped ->set_acl fuse: drop idmap argument from __fuse_get_acl fuse: support idmapped ->setattr op fuse: support idmapped ->permission inode op fuse: support idmapped getattr inode op fuse: support idmap for mkdir/mknod/symlink/create/tmpfile fuse: support idmapped FUSE_EXT_GROUPS fuse: add an idmap argument to fuse_simple_request ...