summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* kbuild: export_report: read modules.order instead of .tmp_versions/*.modMasahiro Yamada2019-07-171-6/+5
| | | | | | | | Towards the goal of removing MODVERDIR aka .tmp_versions, read out modules.order to get the list of modules to be processed. This is simpler than parsing *.mod files in .tmp_versions. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
* kbuild: modpost: read modules.order instead of $(MODVERDIR)/*.modMasahiro Yamada2019-07-172-8/+9
| | | | | | | | | | | | | | Towards the goal of removing MODVERDIR, read out modules.order to get the list of modules to be processed. This is simpler than parsing *.mod files in $(MODVERDIR). For external modules, $(KBUILD_EXTMOD)/modules.order should be read. I removed the single target %.ko from the top Makefile. To make sure modpost works correctly, vmlinux and the other modules must be built. You cannot build a particular .ko file alone. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
* kbuild: modsign: read modules.order instead of $(MODVERDIR)/*.modMasahiro Yamada2019-07-171-2/+1
| | | | | | | | | | | Towards the goal of removing MODVERDIR, read out modules.order to get the list of modules to be signed. This is simpler than parsing *.mod files in $(MODVERDIR). The modules_sign target is only supported for in-kernel modules. So, this commit does not take care of external modules. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
* kbuild: modinst: read modules.order instead of $(MODVERDIR)/*.modMasahiro Yamada2019-07-171-4/+1
| | | | | | | | | | Towards the goal of removing MODVERDIR, read out modules.order to get the list of modules to be installed. This is simpler than parsing *.mod files in $(MODVERDIR). For external modules, $(KBUILD_EXTMOD)/modules.order should be read. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
* scsi: remove pointless $(MODVERDIR)/$(obj)/53c700.verMasahiro Yamada2019-07-171-1/+1
| | | | | | Nothing depends on this, so it is dead code. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
* kbuild: remove duplication from modules.order in sub-directoriesMasahiro Yamada2019-07-171-6/+3
| | | | | | | | | | | | | Currently, only the top-level modules.order drops duplicated entries. The modules.order files in sub-directories potentially contain duplication. To list out the paths of all modules, I want to use modules.order instead of parsing *.mod files in $(MODVERDIR). To achieve this, I want to rip off duplication from modules.order of external modules too. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
* kbuild: get rid of kernel/ prefix from in-tree modules.{order,builtin}Masahiro Yamada2019-07-174-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | Removing the 'kernel/' prefix will make our life easier because we can simply do 'cat modules.order' to get all built modules with full paths. Currently, we parse the first line of '*.mod' files in $(MODVERDIR). Since we have duplicated functionality here, I plan to remove MODVERDIR entirely. In fact, modules.order is generated also for external modules in a broken format. It adds the 'kernel/' prefix to the absolute path of the module, like this: kernel//path/to/your/external/module/foo.ko This is fine for now since modules.order is not used for external modules. However, I want to sanitize the format everywhere towards the goal of removing MODVERDIR. We cannot change the format of installed module.{order,builtin}. So, 'make modules_install' will add the 'kernel/' prefix while copying them to $(MODLIB)/. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
* kbuild: do not create empty modules.order in the prepare stageMasahiro Yamada2019-07-172-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, $(objtree)/modules.order is touched in two places. In the 'prepare0' rule, scripts/Makefile.build creates an empty modules.order while processing 'obj=.' In the 'modules' rule, the top-level Makefile overwrites it with the correct list of modules. While this might be a good side-effect that modules.order is made empty every time (probably this is not intended functionality), I personally do not like this behavior. Create modules.order only when it is sensible to do so. This avoids creating the following pointless files: scripts/basic/modules.order scripts/dtc/modules.order scripts/gcc-plugins/modules.order scripts/genksyms/modules.order scripts/mod/modules.order scripts/modules.order scripts/selinux/genheaders/modules.order scripts/selinux/mdp/modules.order scripts/selinux/modules.order Going forward, $(objtree)/modules.order lists the modules that was built in the last successful build. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
* coccinelle: api: add devm_platform_ioremap_resource scriptHimanshu Jha2019-07-171-0/+60
| | | | | | | | | | | Use recently introduced devm_platform_ioremap_resource helper which wraps platform_get_resource() and devm_ioremap_resource() together. This helps produce much cleaner code and remove local `struct resource` declaration. Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
* kbuild: compile-test headers listed in header-test-m as wellMasahiro Yamada2019-07-172-2/+2
| | | | | | | | | | It will be useful to control the header-test by a tristate option. If CONFIG_FOO is a tristate option, you can write like this: header-test-$(CONFIG_FOO) += foo.h Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
* kbuild: remove unused hostcc-optionMasahiro Yamada2019-07-171-5/+0
| | | | | | We can re-add this whenever it is needed. At this moment, it is unused. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
* kbuild: remove tag files by distclean instead of mrproperMasahiro Yamada2019-07-171-1/+10
| | | | | | | It takes somewhat long time to generate these tag files. Keep such precious files until we run 'make distclean'. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
* kbuild: add --hash-style= and --build-id unconditionallyMasahiro Yamada2019-07-175-13/+8
| | | | | | | | | | As commit 1e0221374e30 ("mips: vdso: drop unnecessary cc-ldoption") explained, these flags are supported by the minimal required version of binutils. They are supported by ld.lld too. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Tested-by: Nathan Chancellor <natechancellor@gmail.com>
* kbuild: get rid of misleading $(AS) from documentsMasahiro Yamada2019-07-172-9/+8
| | | | | | | | | | | | | | | | | | The assembler files in the kernel are *.S instead of *.s, so they must be preprocessed. Since 'as' of GNU binutils is not able to preprocess, we always use $(CC) as an assembler driver. $(AS) is almost unused in Kbuild. As of v5.2, there is just one place that directly invokes $(AS). $ git grep -e '$(AS)' -e '${AS}' -e '$AS' -e '$(AS:' -e '${AS:' -- :^Documentation drivers/net/wan/Makefile: AS68K = $(AS) The documentation about *_AFLAGS* sounds like the flags were passed to $(AS). This is somewhat misleading. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
* kconfig: fix missing choice values in auto.confMasahiro Yamada2019-07-172-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 00c864f8903d ("kconfig: allow all config targets to write auto.conf if missing"), Kconfig creates include/config/auto.conf in the defconfig stage when it is missing. Joonas Kylmälä reported incorrect auto.conf generation under some circumstances. To reproduce it, apply the following diff: | --- a/arch/arm/configs/imx_v6_v7_defconfig | +++ b/arch/arm/configs/imx_v6_v7_defconfig | @@ -345,14 +345,7 @@ CONFIG_USB_CONFIGFS_F_MIDI=y | CONFIG_USB_CONFIGFS_F_HID=y | CONFIG_USB_CONFIGFS_F_UVC=y | CONFIG_USB_CONFIGFS_F_PRINTER=y | -CONFIG_USB_ZERO=m | -CONFIG_USB_AUDIO=m | -CONFIG_USB_ETH=m | -CONFIG_USB_G_NCM=m | -CONFIG_USB_GADGETFS=m | -CONFIG_USB_FUNCTIONFS=m | -CONFIG_USB_MASS_STORAGE=m | -CONFIG_USB_G_SERIAL=m | +CONFIG_USB_FUNCTIONFS=y | CONFIG_MMC=y | CONFIG_MMC_SDHCI=y | CONFIG_MMC_SDHCI_PLTFM=y And then, run: $ make ARCH=arm mrproper imx_v6_v7_defconfig You will see CONFIG_USB_FUNCTIONFS=y is correctly contained in the .config, but not in the auto.conf. Please note drivers/usb/gadget/legacy/Kconfig is included from a choice block in drivers/usb/gadget/Kconfig. So USB_FUNCTIONFS is a choice value. This is probably a similar situation described in commit beaaddb62540 ("kconfig: tests: test defconfig when two choices interact"). When sym_calc_choice() is called, the choice symbol forgets the SYMBOL_DEF_USER unless all of its choice values are explicitly set by the user. The choice symbol is given just one chance to recall it because set_all_choice_values() is called if SYMBOL_NEED_SET_CHOICE_VALUES is set. When sym_calc_choice() is called again, the choice symbol forgets it forever, since SYMBOL_NEED_SET_CHOICE_VALUES is a one-time aid. Hence, we cannot call sym_clear_all_valid() again and again. It is crazy to repeat set and unset of internal flags. However, we cannot simply get rid of "sym->flags &= flags | ~SYMBOL_DEF_USER;" Doing so would re-introduce the problem solved by commit 5d09598d488f ("kconfig: fix new choices being skipped upon config update"). To work around the issue, conf_write_autoconf() stopped calling sym_clear_all_valid(). conf_write() must be changed accordingly. Currently, it clears SYMBOL_WRITE after the symbol is written into the .config file. This is needed to prevent it from writing the same symbol multiple times in case the symbol is declared in two or more locations. I added the new flag SYMBOL_WRITTEN, to track the symbols that have been written. Anyway, this is a cheesy workaround in order to suppress the issue as far as defconfig is concerned. Handling of choices is totally broken. sym_clear_all_valid() is called every time a user touches a symbol from the GUI interface. To reproduce it, just add a new symbol drivers/usb/gadget/legacy/Kconfig, then touch around unrelated symbols from menuconfig. USB_FUNCTIONFS will disappear from the .config file. I added the Fixes tag since it is more fatal than before. But, this has been broken since long long time before, and still it is. We should take a closer look to fix this correctly somehow. Fixes: 00c864f8903d ("kconfig: allow all config targets to write auto.conf if missing") Cc: linux-stable <stable@vger.kernel.org> # 4.19+ Reported-by: Joonas Kylmälä <joonas.kylmala@iki.fi> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Tested-by: Joonas Kylmälä <joonas.kylmala@iki.fi>
* builddeb: generate multi-arch friendly linux-libc-dev packageCedric Hombourger2019-07-172-0/+6
| | | | | | | | | | | | Debian-based distributions place libc header files in a machine specific directory (/usr/include/<libc-machine>) instead of /usr/include/asm to support installation of the linux-libc-dev package from multiple architectures. Move headers installed by "make headers_install" accordingly using Debian's tuple from dpkg-architecture (stored in debian/arch). Signed-off-by: Cedric Hombourger <Cedric_Hombourger@mentor.com> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
* kconfig: run olddefconfig instead of oldconfig after merging fragmentsMasahiro Yamada2019-07-171-1/+1
| | | | | | | 'make olddefconfig' is non-interactive, so we can drop 'yes'. The behavior is equivalent. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
* riscv: drop unneeded -Wall additionMasahiro Yamada2019-07-171-2/+0
| | | | | | | | | | The top level Makefile adds -Wall globally: KBUILD_CFLAGS := -Wall -Wundef -Werror=strict-prototypes -Wno-trigraphs \ For riscv, I see two "-Wall" added for compiling each object. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
* memory: ti-emif-sram: move driver-specific asm-offset.h to drivers/memory/Masahiro Yamada2019-07-173-3/+5
| | | | | | | | | <generated/ti-emif-asm-offsets.h> is only generated and included by drivers/memory/, so it does not need to reside in the globally visible include/generated/. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Santosh Shilimkar <ssantosh@kernel.org>
* Merge tag 'for-linus-20190715' of git://git.kernel.dk/linux-blockLinus Torvalds2019-07-1650-210/+660
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull more block updates from Jens Axboe: "A later pull request with some followup items. I had some vacation coming up to the merge window, so certain things items were delayed a bit. This pull request also contains fixes that came in within the last few days of the merge window, which I didn't want to push right before sending you a pull request. This contains: - NVMe pull request, mostly fixes, but also a few minor items on the feature side that were timing constrained (Christoph et al) - Report zones fixes (Damien) - Removal of dead code (Damien) - Turn on cgroup psi memstall (Josef) - block cgroup MAINTAINERS entry (Konstantin) - Flush init fix (Josef) - blk-throttle low iops timing fix (Konstantin) - nbd resize fixes (Mike) - nbd 0 blocksize crash fix (Xiubo) - block integrity error leak fix (Wenwen) - blk-cgroup writeback and priority inheritance fixes (Tejun)" * tag 'for-linus-20190715' of git://git.kernel.dk/linux-block: (42 commits) MAINTAINERS: add entry for block io cgroup null_blk: fixup ->report_zones() for !CONFIG_BLK_DEV_ZONED block: Limit zone array allocation size sd_zbc: Fix report zones buffer allocation block: Kill gfp_t argument of blkdev_report_zones() block: Allow mapping of vmalloc-ed buffers block/bio-integrity: fix a memory leak bug nvme: fix NULL deref for fabrics options nbd: add netlink reconfigure resize support nbd: fix crash when the blksize is zero block: Disable write plugging for zoned block devices block: Fix elevator name declaration block: Remove unused definitions nvme: fix regression upon hot device removal and insertion blk-throttle: fix zero wait time for iops throttled group block: Fix potential overflow in blk_report_zones() blkcg: implement REQ_CGROUP_PUNT blkcg, writeback: Implement wbc_blkcg_css() blkcg, writeback: Add wbc->no_cgroup_owner blkcg, writeback: Rename wbc_account_io() to wbc_account_cgroup_owner() ...
| * MAINTAINERS: add entry for block io cgroupKonstantin Khlebnikov2019-07-121-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This links mailing list cgroups@vger.kernel.org with related files. $ ./scripts/get_maintainer.pl -f block/blk-cgroup.c Jens Axboe <axboe@kernel.dk> (maintainer:BLOCK LAYER) cgroups@vger.kernel.org (open list:CONTROL GROUP - BLOCK IO CONTROLLER (BLKIO)) linux-block@vger.kernel.org (open list:BLOCK LAYER) linux-kernel@vger.kernel.org (open list) Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Added git tree/maintainer entries from Tejun. Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * null_blk: fixup ->report_zones() for !CONFIG_BLK_DEV_ZONEDJens Axboe2019-07-121-1/+1
| | | | | | | | | | | | | | | | A previous commit changed the prototype, but didn't adjust the function for when zoned device support is disabled. Fix it up. Fixes: bd976e527259 ("block: Kill gfp_t argument of blkdev_report_zones()") Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * block: Limit zone array allocation sizeDamien Le Moal2019-07-122-16/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Limit the size of the struct blk_zone array used in blk_revalidate_disk_zones() to avoid memory allocation failures leading to disk revalidation failure. Also further reduce the likelyhood of such failures by using kvcalloc() (that is vmalloc()) instead of allocating contiguous pages with alloc_pages(). Fixes: 515ce6061312 ("scsi: sd_zbc: Fix sd_zbc_report_zones() buffer allocation") Fixes: e76239a3748c ("block: add a report_zones method") Cc: stable@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * sd_zbc: Fix report zones buffer allocationDamien Le Moal2019-07-121-29/+75
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During disk scan and revalidation done with sd_revalidate(), the zones of a zoned disk are checked using the helper function blk_revalidate_disk_zones() if a configuration change is detected (change in the number of zones or zone size). The function blk_revalidate_disk_zones() issues report_zones calls that are very large, that is, to obtain zone information for all zones of the disk with a single command. The size of the report zones command buffer necessary for such large request generally is lower than the disk max_hw_sectors and KMALLOC_MAX_SIZE (4MB) and succeeds on boot (no memory fragmentation), but often fail at run time (e.g. hot-plug event). This causes the disk revalidation to fail and the disk capacity to be changed to 0. This problem can be avoided by using vmalloc() instead of kmalloc() for the buffer allocation. To limit the amount of memory to be allocated, this patch also introduces the arbitrary SD_ZBC_REPORT_MAX_ZONES maximum number of zones to report with a single report zones command. This limit may be lowered further to satisfy the disk max_hw_sectors limit. Finally, to ensure that the vmalloc-ed buffer can always be mapped in a request, the buffer size is further limited to at most queue_max_segments() pages, allowing successful mapping of the buffer even in the worst case scenario where none of the buffer pages are contiguous. Fixes: 515ce6061312 ("scsi: sd_zbc: Fix sd_zbc_report_zones() buffer allocation") Fixes: e76239a3748c ("block: add a report_zones method") Cc: stable@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * block: Kill gfp_t argument of blkdev_report_zones()Damien Le Moal2019-07-1212-44/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | Only GFP_KERNEL and GFP_NOIO are used with blkdev_report_zones(). In preparation of using vmalloc() for large report buffer and zone array allocations used by this function, remove its "gfp_t gfp_mask" argument and rely on the caller context to use memalloc_noio_save/restore() where necessary (block layer zone revalidation and dm-zoned I/O error path). Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * block: Allow mapping of vmalloc-ed buffersDamien Le Moal2019-07-121-1/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To allow the SCSI subsystem scsi_execute_req() function to issue requests using large buffers that are better allocated with vmalloc() rather than kmalloc(), modify bio_map_kern() to allow passing a buffer allocated with vmalloc(). To do so, detect vmalloc-ed buffers using is_vmalloc_addr(). For vmalloc-ed buffers, flush the buffer using flush_kernel_vmap_range(), use vmalloc_to_page() instead of virt_to_page() to obtain the pages of the buffer, and invalidate the buffer addresses with invalidate_kernel_vmap_range() on completion of read BIOs. This last point is executed using the function bio_invalidate_vmalloc_pages() which is defined only if the architecture defines ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE, that is, if the architecture actually needs the invalidation done. Fixes: 515ce6061312 ("scsi: sd_zbc: Fix sd_zbc_report_zones() buffer allocation") Fixes: e76239a3748c ("block: add a report_zones method") Cc: stable@vger.kernel.org Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * block/bio-integrity: fix a memory leak bugWenwen Wang2019-07-121-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In bio_integrity_prep(), a kernel buffer is allocated through kmalloc() to hold integrity metadata. Later on, the buffer will be attached to the bio structure through bio_integrity_add_page(), which returns the number of bytes of integrity metadata attached. Due to unexpected situations, bio_integrity_add_page() may return 0. As a result, bio_integrity_prep() needs to be terminated with 'false' returned to indicate this error. However, the allocated kernel buffer is not freed on this execution path, leading to a memory leak. To fix this issue, free the allocated buffer before returning from bio_integrity_prep(). Reviewed-by: Ming Lei <ming.lei@redhat.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * nvme: fix NULL deref for fabrics optionsMinwoo Im2019-07-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.infradead.org/nvme.git nvme-5.3 branch now causes the following NULL deref oops. Check the ctrl->opts first before the deref. [ 16.337581] BUG: kernel NULL pointer dereference, address: 0000000000000056 [ 16.338551] #PF: supervisor read access in kernel mode [ 16.338551] #PF: error_code(0x0000) - not-present page [ 16.338551] PGD 0 P4D 0 [ 16.338551] Oops: 0000 [#1] SMP PTI [ 16.338551] CPU: 2 PID: 1035 Comm: kworker/u16:5 Not tainted 5.2.0-rc6+ #1 [ 16.338551] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014 [ 16.338551] Workqueue: nvme-wq nvme_scan_work [nvme_core] [ 16.338551] RIP: 0010:nvme_validate_ns+0xc9/0x7e0 [nvme_core] [ 16.338551] Code: c0 49 89 c5 0f 84 00 07 00 00 48 8b 7b 58 e8 be 48 39 c1 48 3d 00 f0 ff ff 49 89 45 18 0f 87 a4 06 00 00 48 8b 93 70 0a 00 00 <80> 7a 56 00 74 0c 48 8b 40 68 83 48 3c 08 49 8b 45 18 48 89 c6 bf [ 16.338551] RSP: 0018:ffffc900024c7d10 EFLAGS: 00010283 [ 16.338551] RAX: ffff888135a30720 RBX: ffff88813a4fd1f8 RCX: 0000000000000007 [ 16.338551] RDX: 0000000000000000 RSI: ffffffff8256dd38 RDI: ffff888135a30720 [ 16.338551] RBP: 0000000000000001 R08: 0000000000000007 R09: ffff88813aa6a840 [ 16.338551] R10: 0000000000000001 R11: 000000000002d060 R12: ffff88813a4fd1f8 [ 16.338551] R13: ffff88813a77f800 R14: ffff88813aa35180 R15: 0000000000000001 [ 16.338551] FS: 0000000000000000(0000) GS:ffff88813ba80000(0000) knlGS:0000000000000000 [ 16.338551] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 16.338551] CR2: 0000000000000056 CR3: 000000000240a002 CR4: 0000000000360ee0 [ 16.338551] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 16.338551] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 16.338551] Call Trace: [ 16.338551] nvme_scan_work+0x2c0/0x340 [nvme_core] [ 16.338551] ? __switch_to_asm+0x40/0x70 [ 16.338551] ? _raw_spin_unlock_irqrestore+0x18/0x30 [ 16.338551] ? try_to_wake_up+0x408/0x450 [ 16.338551] process_one_work+0x20b/0x3e0 [ 16.338551] worker_thread+0x1f9/0x3d0 [ 16.338551] ? cancel_delayed_work+0xa0/0xa0 [ 16.338551] kthread+0x117/0x120 [ 16.338551] ? kthread_stop+0xf0/0xf0 [ 16.338551] ret_from_fork+0x3a/0x50 [ 16.338551] Modules linked in: nvme nvme_core [ 16.338551] CR2: 0000000000000056 [ 16.338551] ---[ end trace b9bf761a93e62d84 ]--- [ 16.338551] RIP: 0010:nvme_validate_ns+0xc9/0x7e0 [nvme_core] [ 16.338551] Code: c0 49 89 c5 0f 84 00 07 00 00 48 8b 7b 58 e8 be 48 39 c1 48 3d 00 f0 ff ff 49 89 45 18 0f 87 a4 06 00 00 48 8b 93 70 0a 00 00 <80> 7a 56 00 74 0c 48 8b 40 68 83 48 3c 08 49 8b 45 18 48 89 c6 bf [ 16.338551] RSP: 0018:ffffc900024c7d10 EFLAGS: 00010283 [ 16.338551] RAX: ffff888135a30720 RBX: ffff88813a4fd1f8 RCX: 0000000000000007 [ 16.338551] RDX: 0000000000000000 RSI: ffffffff8256dd38 RDI: ffff888135a30720 [ 16.338551] RBP: 0000000000000001 R08: 0000000000000007 R09: ffff88813aa6a840 [ 16.338551] R10: 0000000000000001 R11: 000000000002d060 R12: ffff88813a4fd1f8 [ 16.338551] R13: ffff88813a77f800 R14: ffff88813aa35180 R15: 0000000000000001 [ 16.338551] FS: 0000000000000000(0000) GS:ffff88813ba80000(0000) knlGS:0000000000000000 [ 16.338551] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 16.338551] CR2: 0000000000000056 CR3: 000000000240a002 CR4: 0000000000360ee0 [ 16.338551] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 16.338551] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Fixes: 958f2a0f8121 ("nvme-tcp: set the STABLE_WRITES flag when data digests are enabled") Cc: Christoph Hellwig <hch@lst.de> Cc: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * Merge branch 'nvme-5.3' of git://git.infradead.org/nvme into for-linusJens Axboe2019-07-1114-51/+237
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NVMe fixes from Christoph: "Lof of fixes all over the place, and two very minor features that were in the nvme tree by the end of the merge window, but hadn't made it out to Jens yet." * 'nvme-5.3' of git://git.infradead.org/nvme: nvme: fix regression upon hot device removal and insertion nvme-fc: fix module unloads while lports still pending nvme-tcp: don't use sendpage for SLAB pages nvme-tcp: set the STABLE_WRITES flag when data digests are enabled nvmet: print a hint while rejecting NSID 0 or 0xffffffff nvme-multipath: do not select namespaces which are about to be removed nvme-multipath: also check for a disabled path if there is a single sibling nvme-multipath: factor out a nvme_path_is_disabled helper nvme: set physical block size and optimal I/O size nvme: add I/O characteristics fields nvmet: export I/O characteristics attributes in Identify nvme-trace: add delete completion and submission queue to admin cmds tracer nvme-trace: fix spelling mistake "spcecific" -> "specific" nvme-pci: limit max_hw_sectors based on the DMA max mapping size nvme-pci: check for NULL return from pci_alloc_p2pmem() nvme-pci: don't create a read hctx mapping without read queues nvme-pci: don't fall back to a 32-bit DMA mask nvme-pci: make nvme_dev_pm_ops static nvme-fcloop: resolve warnings on RCU usage and sleep warnings nvme-fcloop: fix inconsistent lock state warnings
| | * nvme: fix regression upon hot device removal and insertionSagi Grimberg2019-07-101-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we validate the new controller id, we want to skip controllers that are either deleting or dead. Fix the check to do that and not on the newly added controller. Fixes: 1b1031ca63b2 ("nvme: validate cntlid during controller initialisation") Reported-by: Jon Derrick <jonathan.derrick@intel.com> Tested-by: Jon Derrick <jonathan.derrick@intel.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * nvme-fc: fix module unloads while lports still pendingJames Smart2019-07-091-3/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current code allows the module to be unloaded even if there are pending data structures, such as localports and controllers on the localports, that have yet to hit their reference counting to remove them. Fix by having exit entrypoint explicitly delete every controller, which in turn will remove references on the remoteports and localports causing them to be deleted as well. The exit entrypoint, after initiating the deletes, will wait for the last localport to be deleted before continuing. Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * nvme-tcp: don't use sendpage for SLAB pagesMikhail Skorzhinskii2019-07-091-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to commit a10674bf2406 ("tcp: detecting the misuse of .sendpage for Slab objects") and previous discussion, tcp_sendpage should not be used for pages that is managed by SLAB, as SLAB is not taking page reference counters into consideration. Signed-off-by: Mikhail Skorzhinskii <mskorzhinskiy@solarflare.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * nvme-tcp: set the STABLE_WRITES flag when data digests are enabledMikhail Skorzhinskii2019-07-091-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There was a few false alarms sighted on target side about wrong data digest while performing high throughput load to XFS filesystem shared through NVMoF TCP. This flag tells the rest of the kernel to ensure that the data buffer does not change while the write is in flight. It incurs a performance penalty, so only enable it when it is actually needed, i.e. when we are calculating data digests. Although even with this change in place, ext2 users can steel experience false positives, as ext2 is not respecting this flag. This may be apply to vfat as well. Signed-off-by: Mikhail Skorzhinskii <mskorzhinskiy@solarflare.com> Signed-off-by: Mike Playle <mplayle@solarflare.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * nvmet: print a hint while rejecting NSID 0 or 0xffffffffMikhail Skorzhinskii2019-07-091-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adding this hint for the sake of convenience. It was spotted that a few times people spent some time before understanding what is exactly wrong in configuration process. This should save a few time in such situations, especially for people who is not very confident with NVMe requirements. Signed-off-by: Mikhail Skorzhinskii <mskorzhinskiy@solarflare.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * nvme-multipath: do not select namespaces which are about to be removedHannes Reinecke2019-07-091-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nvme_ns_remove() will first set the NVME_NS_REMOVING flag before removing it from the list at the very last step. So to avoid selecting a namespace in nvme_find_path() which is about to be removed check the NVME_NS_REMOVING flag, too, when selecting a new path. Signed-off-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * nvme-multipath: also check for a disabled path if there is a single siblingHannes Reinecke2019-07-091-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | When we have a singular list in nvme_round_robin_path() we still need to check its validity. Signed-off-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * nvme-multipath: factor out a nvme_path_is_disabled helperHannes Reinecke2019-07-091-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | Factor our a common helper to check if a path has been disabled by something other than the per-namespace ANA state. Signed-off-by: Hannes Reinecke <hare@suse.com> [hch: split from a bigger patch] Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * nvme: set physical block size and optimal I/O sizeBart Van Assche2019-07-092-2/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | >From the NVMe 1.4 spec: NSFEAT bit 4 if set to 1: indicates that the fields NPWG, NPWA, NPDG, NPDA, and NOWS are defined for this namespace and should be used by the host for I/O optimization; [ ... ] Namespace Preferred Write Granularity (NPWG): This field indicates the smallest recommended write granularity in logical blocks for this namespace. This is a 0's based value. The size indicated should be less than or equal to Maximum Data Transfer Size (MDTS) that is specified in units of minimum memory page size. The value of this field may change if the namespace is reformatted. The size should be a multiple of Namespace Preferred Write Alignment (NPWA). Refer to section 8.25 for how this field is utilized to improve performance and endurance. [ ... ] Each Write, Write Uncorrectable, or Write Zeroes commands should address a multiple of Namespace Preferred Write Granularity (NPWG) (refer to Figure 245) and Stream Write Size (SWS) (refer to Figure 515) logical blocks (as expressed in the NLB field), and the SLBA field of the command should be aligned to Namespace Preferred Write Alignment (NPWA) (refer to Figure 245) for best performance. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * nvme: add I/O characteristics fieldsBart Van Assche2019-07-091-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Several new fields have been introduced in version 1.4 of the NVMe spec at offsets that were defined as reserved in version 1.3d of the NVMe spec. Update the definition of the nvme_id_ns data structure such that it is in sync with version 1.4 of the NVMe spec. This change preserves backwards compatibility. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * nvmet: export I/O characteristics attributes in IdentifyBart Van Assche2019-07-093-0/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make the NVMe NAWUN, NAWUPF, NACWU, NPWG, NPWA, NPDG and NOWS attributes available to initator systems for the block backend. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * nvme-trace: add delete completion and submission queue to admin cmds tracerTom Wu2019-07-091-0/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The trace log for 'delete I/O submission queue' and 'delete I/O completion queue' command will look like as below: kworker/u49:1-3438 [003] .... 6693.070865: nvme_setup_cmd: nvme0: qid=0, cmdid=11, nsid=0, flags=0x0, meta=0x0, cmd=(nvme_admin_delete_sq sqid=1) kworker/u49:1-3438 [003] .... 6693.071171: nvme_setup_cmd: nvme0: qid=0, cmdid=8, nsid=0, flags=0x0, meta=0x0, cmd=(nvme_admin_delete_cq cqid=24) Signed-off-by: Tom Wu <tomwu@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Israel Rukshin <israelr@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * nvme-trace: fix spelling mistake "spcecific" -> "specific"Colin Ian King2019-07-092-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | There are two spelling mistakes in trace_seq_printf messages, fix these. Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * nvme-pci: limit max_hw_sectors based on the DMA max mapping sizeChristoph Hellwig2019-07-091-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When running a NVMe device that is attached to a addressing challenged PCIe root port that requires bounce buffering, our request sizes can easily overflow the swiotlb bounce buffer size. Limit the maximum I/O size to the limit exposed by the DMA mapping subsystem. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Atish Patra <Atish.Patra@wdc.com> Tested-by: Atish Patra <Atish.Patra@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
| | * nvme-pci: check for NULL return from pci_alloc_p2pmem()Alan Mikhak2019-07-091-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Modify nvme_alloc_sq_cmds() to call pci_free_p2pmem() to free the memory it allocated using pci_alloc_p2pmem() in case pci_p2pmem_virt_to_bus() returns null. Makes sure not to call pci_free_p2pmem() if pci_alloc_p2pmem() returned NULL, which can happen if CONFIG_PCI_P2PDMA is not configured. The current implementation is not expected to leak since pci_p2pmem_virt_to_bus() is expected to fail only if pci_alloc_p2pmem() returns null. However, checking the return value of pci_alloc_p2pmem() is more explicit. Signed-off-by: Alan Mikhak <alan.mikhak@sifive.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * nvme-pci: don't create a read hctx mapping without read queuesAlan Mikhak2019-07-091-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Only request an IRQ mapping for read queues if at least one read queue is being allocted, as nvme_pci_map_queues() will later on ignore the unnecessary mapping request should nvme_dev_add() request such an IRQ mapping even though no read queues are being allocated. However, nvme_dev_add() can avoid making the request by checking the number of read queues without assuming. This would bring it more in line with nvme_setup_irqs() and nvme_calc_irq_sets(). Signed-off-by: Alan Mikhak <alan.mikhak@sifive.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * nvme-pci: don't fall back to a 32-bit DMA maskChristoph Hellwig2019-07-091-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Since Linux 5.0 drivers can safely set the largest DMA mask supported by the device, and don't need fallbacks to work around the dma mapping implementations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
| | * nvme-pci: make nvme_dev_pm_ops staticYueHaibing2019-07-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix sparse warning: drivers/nvme/host/pci.c:2926:25: warning: symbol 'nvme_dev_pm_ops' was not declared. Should it be static? Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * nvme-fcloop: resolve warnings on RCU usage and sleep warningsJames Smart2019-07-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With additional debugging enabled, seeing warnings for suspicious RCU usage or Sleeping function called from invalid context. These both map to allocation of a work structure which is currently GFP_KERNEL, meaning it can sleep. For the RCU warning, the sequence was sleeping while holding the RCU lock. Convert the allocation to GFP_ATOMIC. Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * nvme-fcloop: fix inconsistent lock state warningsJames Smart2019-07-091-21/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With extra debug on, inconsistent lock state warnings are being called out as the tfcp_req->reqlock is being taken out without irq, while some calling sequences have the sequence in a softirq state. Change the lock taking/release to raise/drop irq. Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nbd: add netlink reconfigure resize supportMike Christie2019-07-111-16/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the device is setup with ioctl we can resize the device after the initial setup, but if the device is setup with netlink we cannot use the resize related ioctls and there is no netlink reconfigure size ATTR handling code. This patch adds netlink reconfigure resize support to match the ioctl interface. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>