summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | | | Drivers: hv: vmbus: Fix virt_to_hvpfn() for X86_PAEDexuan Cui2019-08-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the case of X86_PAE, unsigned long is u32, but the physical address type should be u64. Due to the bug here, the netvsc driver can not load successfully, and sometimes the VM can panic due to memory corruption (the hypervisor writes data to the wrong location). Fixes: 6ba34171bcbd ("Drivers: hv: vmbus: Remove use of slow_virt_to_phys()") Cc: stable@vger.kernel.org Cc: Michael Kelley <mikelley@microsoft.com> Reported-and-tested-by: Juliana Rodrigueiro <juliana.rodrigueiro@intra2net.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
| * | | | | Tools: hv: kvp: eliminate 'may be used uninitialized' warningVitaly Kuznetsov2019-08-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When building hv_kvp_daemon GCC-8.3 complains: hv_kvp_daemon.c: In function ‘kvp_get_ip_info.constprop’: hv_kvp_daemon.c:812:30: warning: ‘ip_buffer’ may be used uninitialized in this function [-Wmaybe-uninitialized] struct hv_kvp_ipaddr_value *ip_buffer; this seems to be a false positive: we only use ip_buffer when op == KVP_OP_GET_IP_INFO and it is only unset when op == KVP_OP_ENUMERATE. Silence the warning by initializing ip_buffer to NULL. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
| * | | | | Input: hyperv-keyboard: Use in-place iterator API in the channel callbackDexuan Cui2019-08-201-29/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Simplify the ring buffer handling with the in-place API. Also avoid the dynamic allocation and the memory leak in the channel callback function. Signed-off-by: Dexuan Cui <decui@microsoft.com> Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
| * | | | | Drivers: hv: vmbus: Remove the unused "tsc_page" from struct hv_contextDexuan Cui2019-08-201-2/+0
| | |/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This field is no longer used after the commit 63ed4e0c67df ("Drivers: hv: vmbus: Consolidate all Hyper-V specific clocksource code") , because it's replaced by the global variable "struct ms_hyperv_tsc_page *tsc_pg;" (now, the variable is in drivers/clocksource/hyperv_timer.c). Fixes: 63ed4e0c67df ("Drivers: hv: vmbus: Consolidate all Hyper-V specific clocksource code") Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* | | | | Merge tag 'arm64-fixes' of ↵Linus Torvalds2019-08-242-10/+27
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "Two KVM/arm fixes for MMIO emulation and UBSAN. Unusually, we're routing them via the arm64 tree as per Paolo's request on the list: https://lore.kernel.org/kvm/21ae69a2-2546-29d0-bff6-2ea825e3d968@redhat.com/ We don't actually have any other arm64 fixes pending at the moment (touch wood), so I've pulled from Marc, written a merge commit, tagged the result and run it through my build/boot/bisect scripts" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: KVM: arm/arm64: VGIC: Properly initialise private IRQ affinity KVM: arm/arm64: Only skip MMIO insn once
| * \ \ \ \ Merge tag 'kvmarm-fixes-for-5.3-3' of ↵Will Deacon2019-08-242-10/+27
| |\ \ \ \ \ | | |/ / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm/fixes Pull KVM/arm fixes from Marc Zyngier as per Paulo's request at: https://lkml.kernel.org/r/21ae69a2-2546-29d0-bff6-2ea825e3d968@redhat.com "One (hopefully last) set of fixes for KVM/arm for 5.3: an embarassing MMIO emulation regression, and a UBSAN splat. Oh well... - Don't overskip instructions on MMIO emulation - Fix UBSAN splat when initializing PPI priorities" * tag 'kvmarm-fixes-for-5.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm: KVM: arm/arm64: VGIC: Properly initialise private IRQ affinity KVM: arm/arm64: Only skip MMIO insn once
| | * | | | KVM: arm/arm64: VGIC: Properly initialise private IRQ affinityAndre Przywara2019-08-231-10/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At the moment we initialise the target *mask* of a virtual IRQ to the VCPU it belongs to, even though this mask is only defined for GICv2 and quickly runs out of bits for many GICv3 guests. This behaviour triggers an UBSAN complaint for more than 32 VCPUs: ------ [ 5659.462377] UBSAN: Undefined behaviour in virt/kvm/arm/vgic/vgic-init.c:223:21 [ 5659.471689] shift exponent 32 is too large for 32-bit type 'unsigned int' ------ Also for GICv3 guests the reporting of TARGET in the "vgic-state" debugfs dump is wrong, due to this very same problem. Because there is no requirement to create the VGIC device before the VCPUs (and QEMU actually does it the other way round), we can't safely initialise mpidr or targets in kvm_vgic_vcpu_init(). But since we touch every private IRQ for each VCPU anyway later (in vgic_init()), we can just move the initialisation of those fields into there, where we definitely know the VGIC type. On the way make sure we really have either a VGICv2 or a VGICv3 device, since the existing code is just checking for "VGICv3 or not", silently ignoring the uninitialised case. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reported-by: Dave Martin <dave.martin@arm.com> Tested-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
| | * | | | KVM: arm/arm64: Only skip MMIO insn onceAndrew Jones2019-08-221-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If after an MMIO exit to userspace a VCPU is immediately run with an immediate_exit request, such as when a signal is delivered or an MMIO emulation completion is needed, then the VCPU completes the MMIO emulation and immediately returns to userspace. As the exit_reason does not get changed from KVM_EXIT_MMIO in these cases we have to be careful not to complete the MMIO emulation again, when the VCPU is eventually run again, because the emulation does an instruction skip (and doing too many skips would be a waste of guest code :-) We need to use additional VCPU state to track if the emulation is complete. As luck would have it, we already have 'mmio_needed', which even appears to be used in this way by other architectures already. Fixes: 0d640732dbeb ("arm64: KVM: Skip MMIO insn after emulation") Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
* | | | | | Merge tag 'scsi-fixes' of ↵Linus Torvalds2019-08-248-7/+49
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Four fixes, three for edge conditions which don't occur very often. The lpfc fix mitigates memory exhaustion for some high CPU systems" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: lpfc: Mitigate high memory pre-allocation by SCSI-MQ scsi: ufs: Fix NULL pointer dereference in ufshcd_config_vreg_hpm() scsi: target: tcmu: avoid use-after-free after command timeout scsi: qla2xxx: Fix gnl.l memory leak on adapter init failure
| * | | | | | scsi: lpfc: Mitigate high memory pre-allocation by SCSI-MQJames Smart2019-08-204-4/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When SCSI-MQ is enabled, the SCSI-MQ layers will do pre-allocation of MQ resources based on shost values set by the driver. In newer cases of the driver, which attempts to set nr_hw_queues to the cpu count, the multipliers become excessive, with a single shost having SCSI-MQ pre-allocation reaching into the multiple GBytes range. NPIV, which creates additional shosts, only multiply this overhead. On lower-memory systems, this can exhaust system memory very quickly, resulting in a system crash or failures in the driver or elsewhere due to low memory conditions. After testing several scenarios, the situation can be mitigated by limiting the value set in shost->nr_hw_queues to 4. Although the shost values were changed, the driver still had per-cpu hardware queues of its own that allowed parallelization per-cpu. Testing revealed that even with the smallish number for nr_hw_queues for SCSI-MQ, performance levels remained near maximum with the within-driver affiinitization. A module parameter was created to allow the value set for the nr_hw_queues to be tunable. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | | | | | scsi: ufs: Fix NULL pointer dereference in ufshcd_config_vreg_hpm()Adrian Hunter2019-08-151-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the following BUG: [ 187.065689] BUG: kernel NULL pointer dereference, address: 000000000000001c [ 187.065790] RIP: 0010:ufshcd_vreg_set_hpm+0x3c/0x110 [ufshcd_core] [ 187.065938] Call Trace: [ 187.065959] ufshcd_resume+0x72/0x290 [ufshcd_core] [ 187.065980] ufshcd_system_resume+0x54/0x140 [ufshcd_core] [ 187.065993] ? pci_pm_restore+0xb0/0xb0 [ 187.066005] ufshcd_pci_resume+0x15/0x20 [ufshcd_pci] [ 187.066017] pci_pm_thaw+0x4c/0x90 [ 187.066030] dpm_run_callback+0x5b/0x150 [ 187.066043] device_resume+0x11b/0x220 Voltage regulators are optional, so functions must check they exist before dereferencing. Note this issue is hidden if CONFIG_REGULATORS is not set, because the offending code is optimised away. Notes for stable: The issue first appears in commit 57d104c153d3 ("ufs: add UFS power management support") but is inadvertently fixed in commit 60f0187031c0 ("scsi: ufs: disable vccq if it's not needed by UFS device") which in turn was reverted by commit 730679817d83 ("Revert "scsi: ufs: disable vccq if it's not needed by UFS device""). So fix applies v3.18 to v4.5 and v5.1+ Fixes: 57d104c153d3 ("ufs: add UFS power management support") Fixes: 730679817d83 ("Revert "scsi: ufs: disable vccq if it's not needed by UFS device"") Cc: stable@vger.kernel.org Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | | | | | scsi: target: tcmu: avoid use-after-free after command timeoutDmitry Fomichev2019-08-151-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In tcmu_handle_completion() function, the variable called read_len is always initialized with a value taken from se_cmd structure. If this function is called to complete an expired (timed out) out command, the session command pointed by se_cmd is likely to be already deallocated by the target core at that moment. As the result, this access triggers a use-after-free warning from KASAN. This patch fixes the code not to touch se_cmd when completing timed out TCMU commands. It also resets the pointer to se_cmd at the time when the TCMU_CMD_BIT_EXPIRED flag is set because it is going to become invalid after calling target_complete_cmd() later in the same function, tcmu_check_expired_cmd(). Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Acked-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | | | | | scsi: qla2xxx: Fix gnl.l memory leak on adapter init failureBill Kuzeja2019-08-152-1/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If HBA initialization fails unexpectedly (exiting via probe_failed:), we may fail to free vha->gnl.l. So that we don't attempt to double free, set this pointer to NULL after a free and check for NULL at probe_failed: so we know whether or not to call dma_free_coherent. Signed-off-by: Bill Kuzeja <william.kuzeja@stratus.com> Acked-by: Himanshu Madhani <hmadhani@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* | | | | | | Merge tag 'xfs-5.3-fixes-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds2019-08-241-0/+1
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull xfs fix from Darrick Wong: "A single patch that fixes a xfs lockup problem when a chown/chgrp operation fails due to running out of quota. It has survived the usual xfstests runs and merges cleanly with this morning's master: - Fix a forgotten inode unlock when chown/chgrp fail due to quota" * tag 'xfs-5.3-fixes-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: fix missing ILOCK unlock when xfs_setattr_nonsize fails due to EDQUOT
| * | | | | | | xfs: fix missing ILOCK unlock when xfs_setattr_nonsize fails due to EDQUOTDarrick J. Wong2019-08-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Benjamin Moody reported to Debian that XFS partially wedges when a chgrp fails on account of being out of disk quota. I ran his reproducer script: # adduser dummy # adduser dummy plugdev # dd if=/dev/zero bs=1M count=100 of=test.img # mkfs.xfs test.img # mount -t xfs -o gquota test.img /mnt # mkdir -p /mnt/dummy # chown -c dummy /mnt/dummy # xfs_quota -xc 'limit -g bsoft=100k bhard=100k plugdev' /mnt (and then as user dummy) $ dd if=/dev/urandom bs=1M count=50 of=/mnt/dummy/foo $ chgrp plugdev /mnt/dummy/foo and saw: ================================================ WARNING: lock held when returning to user space! 5.3.0-rc5 #rc5 Tainted: G W ------------------------------------------------ chgrp/47006 is leaving the kernel with locks still held! 1 lock held by chgrp/47006: #0: 000000006664ea2d (&xfs_nondir_ilock_class){++++}, at: xfs_ilock+0xd2/0x290 [xfs] ...which is clearly caused by xfs_setattr_nonsize failing to unlock the ILOCK after the xfs_qm_vop_chown_reserve call fails. Add the missing unlock. Reported-by: benjamin.moody@gmail.com Fixes: 253f4911f297 ("xfs: better xfs_trans_alloc interface") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Tested-by: Salvatore Bonaccorso <carnil@debian.org>
* | | | | | | | Merge tag 'drm-fixes-2019-08-24' of git://anongit.freedesktop.org/drm/drmLinus Torvalds2019-08-242-7/+18
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull more drm fixes from Dave Airlie: "Although the tree built for me fine on arm here, it appears either header cleanups in next or some kconfig combo it breaks, so this contains a fix to mediatek to include dma-mapping.h explicitly. There was also one nouveau fix that came in late that I was going to leave until next week, but since I was sending this I thought it may as well be in here: mediatek: - fix build in some cases nouveau: - fix hang with i2c and mst docks" * tag 'drm-fixes-2019-08-24' of git://anongit.freedesktop.org/drm/drm: drm/mediatek: include dma-mapping header drm/nouveau: Don't retry infinitely when receiving no data on i2c over AUX
| * | | | | | | | drm/mediatek: include dma-mapping headerDave Airlie2019-08-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Although it builds fine here in my arm cross compile, it seems either via some other patches in -next or some Kconfig combination, this fails to build for everyone. Include linux/dma-mapping.h should fix it. Signed-off-by: Dave Airlie <airlied@redhat.com>
| * | | | | | | | Merge branch 'linux-5.3' of git://github.com/skeggsb/linux into drm-fixesDave Airlie2019-08-231-7/+17
| |\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes i2c on DP with some docks. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Ben Skeggs <skeggsb@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/CACAvsv713t2_BQ44gVV7Lqic6Vwmhq0r4FB5v-t0kD1jzFrbmQ@mail.gmail.com
| | * | | | | | | | drm/nouveau: Don't retry infinitely when receiving no data on i2c over AUXLyude Paul2019-08-231-7/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While I had thought I had fixed this issue in: commit 342406e4fbba ("drm/nouveau/i2c: Disable i2c bus access after ->fini()") It turns out that while I did fix the error messages I was seeing on my P50 when trying to access i2c busses with the GPU in runtime suspend, I accidentally had missed one important detail that was mentioned on the bug report this commit was supposed to fix: that the CPU would only lock up when trying to access i2c busses _on connected devices_ _while the GPU is not in runtime suspend_. Whoops. That definitely explains why I was not able to get my machine to hang with i2c bus interactions until now, as plugging my P50 into it's dock with an HDMI monitor connected allowed me to finally reproduce this locally. Now that I have managed to reproduce this issue properly, it looks like the problem is much simpler then it looks. It turns out that some connected devices, such as MST laptop docks, will actually ACK i2c reads even if no data was actually read: [ 275.063043] nouveau 0000:01:00.0: i2c: aux 000a: 1: 0000004c 1 [ 275.063447] nouveau 0000:01:00.0: i2c: aux 000a: 00 01101000 10040000 [ 275.063759] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000001 [ 275.064024] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000000 [ 275.064285] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000000 [ 275.064594] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000000 Because we don't handle the situation of i2c ack without any data, we end up entering an infinite loop in nvkm_i2c_aux_i2c_xfer() since the value of cnt always remains at 0. This finally properly explains how this could result in a CPU hang like the ones observed in the aforementioned commit. So, fix this by retrying transactions if no data is written or received, and give up and fail the transaction if we continue to not write or receive any data after 32 retries. Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
* | | | | | | | | | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds2019-08-2326-239/+248
|\ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull rdma fixes from Doug Ledford: "No beating around the bush: this is a monster pull request for an -rc5 kernel. Intel hit me with a series of fixes for TID processing. Mellanox hit me with a series for their UMR memory support. And we had one fix for siw that fixes the 32bit build warnings and because of the number of casts that had to be changed to properly silence the warnings, that one patch alone is a full 40% of the LOC of this entire pull request. Given that this is the initial release kernel for siw, I'm trying to fix anything in it that we can, so that adds to the impetus to take fixes for it like this one. I had to do a rebase early in the week. Jason had thought he put a patch on the rc queue that he needed to be there so he could base some work off of it, and it had actually not been placed there. So he asked me (on Tuesday) to fix that up before pushing my wip branch to the official rc branch. I did, and that's why the early patches look like they were all committed at the same time on Tuesday. That bunch had been in my queue prior. The various patches all pass my test for being legitimate fixes and not attempts to slide new features or development into a late rc. Well, they were all fixes with the exception of a couple clean up patches people wrote for making the fixes they also wrote better (like a cleanup patch to move UMR checking into a function so that the remaining UMR fix patches can reference that function), so I left those in place too. My apologies for the LOC count and the number of patches here, it's just how the cards fell this cycle. Summary: - Fix siw buffer mapping issue - Fix siw 32/64 casting issues - Fix a KASAN access issue in bnxt_re - Fix several memory leaks (hfi1, mlx4) - Fix a NULL deref in cma_cleanup - Fixes for UMR memory support in mlx5 (4 patch series) - Fix namespace check for restrack - Fixes for counter support - Fixes for hfi1 TID processing (5 patch series) - Fix potential NULL deref in siw - Fix memory page calculations in mlx5" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (21 commits) RDMA/siw: Fix 64/32bit pointer inconsistency RDMA/siw: Fix SGL mapping issues RDMA/bnxt_re: Fix stack-out-of-bounds in bnxt_qplib_rcfw_send_message infiniband: hfi1: fix memory leaks infiniband: hfi1: fix a memory leak bug IB/mlx4: Fix memory leaks RDMA/cma: fix null-ptr-deref Read in cma_cleanup IB/mlx5: Block MR WR if UMR is not possible IB/mlx5: Fix MR re-registration flow to use UMR properly IB/mlx5: Report and handle ODP support properly IB/mlx5: Consolidate use_umr checks into single function RDMA/restrack: Rewrite PID namespace check to be reliable RDMA/counters: Properly implement PID checks IB/core: Fix NULL pointer dereference when bind QP to counter IB/hfi1: Drop stale TID RDMA packets that cause TIDErr IB/hfi1: Add additional checks when handling TID RDMA WRITE DATA packet IB/hfi1: Add additional checks when handling TID RDMA READ RESP packet IB/hfi1: Unsafe PSN checking for TID RDMA READ Resp packet IB/hfi1: Drop stale TID RDMA packets RDMA/siw: Fix potential NULL de-ref ...
| * | | | | | | | | | RDMA/siw: Fix 64/32bit pointer inconsistencyBernard Metzler2019-08-239-109/+108
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes improper casting between addresses and unsigned types. Changes siw_pbl_get_buffer() function to return appropriate dma_addr_t, and not u64. Also fixes debug prints. Now any potentially kernel private pointers are printed formatted as '%pK', to allow keeping that information secret. Fixes: d941bfe500be ("RDMA/siw: Change CQ flags from 64->32 bits") Fixes: b0fff7317bb4 ("rdma/siw: completion queue methods") Fixes: 8b6a361b8c48 ("rdma/siw: receive path") Fixes: b9be6f18cf9e ("rdma/siw: transmit path") Fixes: f29dd55b0236 ("rdma/siw: queue pair methods") Fixes: 2251334dcac9 ("rdma/siw: application buffer management") Fixes: 303ae1cdfdf7 ("rdma/siw: application interface") Fixes: 6c52fdc244b5 ("rdma/siw: connection management") Fixes: a531975279f3 ("rdma/siw: main include file") Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Reported-by: Jason Gunthorpe <jgg@ziepe.ca> Reported-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Link: https://lore.kernel.org/r/20190822173738.26817-1-bmt@zurich.ibm.com Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | | | | | | | RDMA/siw: Fix SGL mapping issuesBernard Metzler2019-08-221-22/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All user level and most in-kernel applications submit WQEs where the SG list entries are all of a single type. iSER in particular, however, will send us WQEs with mixed SG types: sge[0] = kernel buffer, sge[1] = PBL region. Check and set is_kva on each SG entry individually instead of assuming the first SGE type carries through to the last. This fixes iSER over siw. Fixes: b9be6f18cf9e ("rdma/siw: transmit path") Reported-by: Krishnamraju Eraparaju <krishna2@chelsio.com> Tested-by: Krishnamraju Eraparaju <krishna2@chelsio.com> Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Link: https://lore.kernel.org/r/20190822150741.21871-1-bmt@zurich.ibm.com Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | | | | | | | RDMA/bnxt_re: Fix stack-out-of-bounds in bnxt_qplib_rcfw_send_messageSelvin Xavier2019-08-222-4/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Driver copies FW commands to the HW queue as units of 16 bytes. Some of the command structures are not exact multiple of 16. So while copying the data from those structures, the stack out of bounds messages are reported by KASAN. The following error is reported. [ 1337.530155] ================================================================== [ 1337.530277] BUG: KASAN: stack-out-of-bounds in bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re] [ 1337.530413] Read of size 16 at addr ffff888725477a48 by task rmmod/2785 [ 1337.530540] CPU: 5 PID: 2785 Comm: rmmod Tainted: G OE 5.2.0-rc6+ #75 [ 1337.530541] Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 1.0.4 08/28/2014 [ 1337.530542] Call Trace: [ 1337.530548] dump_stack+0x5b/0x90 [ 1337.530556] ? bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re] [ 1337.530560] print_address_description+0x65/0x22e [ 1337.530568] ? bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re] [ 1337.530575] ? bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re] [ 1337.530577] __kasan_report.cold.3+0x37/0x77 [ 1337.530581] ? _raw_write_trylock+0x10/0xe0 [ 1337.530588] ? bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re] [ 1337.530590] kasan_report+0xe/0x20 [ 1337.530592] memcpy+0x1f/0x50 [ 1337.530600] bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re] [ 1337.530608] ? bnxt_qplib_creq_irq+0xa0/0xa0 [bnxt_re] [ 1337.530611] ? xas_create+0x3aa/0x5f0 [ 1337.530613] ? xas_start+0x77/0x110 [ 1337.530615] ? xas_clear_mark+0x34/0xd0 [ 1337.530623] bnxt_qplib_free_mrw+0x104/0x1a0 [bnxt_re] [ 1337.530631] ? bnxt_qplib_destroy_ah+0x110/0x110 [bnxt_re] [ 1337.530633] ? bit_wait_io_timeout+0xc0/0xc0 [ 1337.530641] bnxt_re_dealloc_mw+0x2c/0x60 [bnxt_re] [ 1337.530648] bnxt_re_destroy_fence_mr+0x77/0x1d0 [bnxt_re] [ 1337.530655] bnxt_re_dealloc_pd+0x25/0x60 [bnxt_re] [ 1337.530677] ib_dealloc_pd_user+0xbe/0xe0 [ib_core] [ 1337.530683] srpt_remove_one+0x5de/0x690 [ib_srpt] [ 1337.530689] ? __srpt_close_all_ch+0xc0/0xc0 [ib_srpt] [ 1337.530692] ? xa_load+0x87/0xe0 ... [ 1337.530840] do_syscall_64+0x6d/0x1f0 [ 1337.530843] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1337.530845] RIP: 0033:0x7ff5b389035b [ 1337.530848] Code: 73 01 c3 48 8b 0d 2d 0b 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d fd 0a 2c 00 f7 d8 64 89 01 48 [ 1337.530849] RSP: 002b:00007fff83425c28 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 [ 1337.530852] RAX: ffffffffffffffda RBX: 00005596443e6750 RCX: 00007ff5b389035b [ 1337.530853] RDX: 000000000000000a RSI: 0000000000000800 RDI: 00005596443e67b8 [ 1337.530854] RBP: 0000000000000000 R08: 00007fff83424ba1 R09: 0000000000000000 [ 1337.530856] R10: 00007ff5b3902960 R11: 0000000000000206 R12: 00007fff83425e50 [ 1337.530857] R13: 00007fff8342673c R14: 00005596443e6260 R15: 00005596443e6750 [ 1337.530885] The buggy address belongs to the page: [ 1337.530962] page:ffffea001c951dc0 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 [ 1337.530964] flags: 0x57ffffc0000000() [ 1337.530967] raw: 0057ffffc0000000 0000000000000000 ffffffff1c950101 0000000000000000 [ 1337.530970] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 [ 1337.530970] page dumped because: kasan: bad access detected [ 1337.530996] Memory state around the buggy address: [ 1337.531072] ffff888725477900: 00 00 00 00 f1 f1 f1 f1 00 00 00 00 00 f2 f2 f2 [ 1337.531180] ffff888725477980: 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 [ 1337.531288] >ffff888725477a00: 00 f2 f2 f2 f2 f2 f2 00 00 00 f2 00 00 00 00 00 [ 1337.531393] ^ [ 1337.531478] ffff888725477a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 1337.531585] ffff888725477b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 1337.531691] ================================================================== Fix this by passing the exact size of each FW command to bnxt_qplib_rcfw_send_message as req->cmd_size. Before sending the command to HW, modify the req->cmd_size to number of 16 byte units. Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1566468170-489-1-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | | | | | | | infiniband: hfi1: fix memory leaksWenwen Wang2019-08-201-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In fault_opcodes_write(), 'data' is allocated through kcalloc(). However, it is not deallocated in the following execution if an error occurs, leading to memory leaks. To fix this issue, introduce the 'free_data' label to free 'data' before returning the error. Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Link: https://lore.kernel.org/r/1566154486-3713-1-git-send-email-wenwen@cs.uga.edu Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | | | | | | | infiniband: hfi1: fix a memory leak bugWenwen Wang2019-08-201-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In fault_opcodes_read(), 'data' is not deallocated if debugfs_file_get() fails, leading to a memory leak. To fix this bug, introduce the 'free_data' label to free 'data' before returning the error. Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Link: https://lore.kernel.org/r/1566156571-4335-1-git-send-email-wenwen@cs.uga.edu Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | | | | | | | IB/mlx4: Fix memory leaksWenwen Wang2019-08-201-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In mlx4_ib_alloc_pv_bufs(), 'tun_qp->tx_ring' is allocated through kcalloc(). However, it is not always deallocated in the following execution if an error occurs, leading to memory leaks. To fix this issue, free 'tun_qp->tx_ring' whenever an error occurs. Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu> Acked-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/1566159781-4642-1-git-send-email-wenwen@cs.uga.edu Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | | | | | | | RDMA/cma: fix null-ptr-deref Read in cma_cleanupzhengbin2019-08-201-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In cma_init, if cma_configfs_init fails, need to free the previously memory and return fail, otherwise will trigger null-ptr-deref Read in cma_cleanup. cma_cleanup cma_configfs_exit configfs_unregister_subsystem Fixes: 045959db65c6 ("IB/cma: Add configfs for rdma_cm") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Link: https://lore.kernel.org/r/1566188859-103051-1-git-send-email-zhengbin13@huawei.com Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | | | | | | | IB/mlx5: Block MR WR if UMR is not possibleMoni Shoua2019-08-201-5/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Check conditions that are mandatory to post_send UMR WQEs. 1. Modifying page size. 2. Modifying remote atomic permissions if atomic access is required. If either condition is not fulfilled then fail to post_send() flow. Fixes: c8d75a980fab ("IB/mlx5: Respect new UMR capabilities") Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190815083834.9245-9-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | | | | | | | IB/mlx5: Fix MR re-registration flow to use UMR properlyMoni Shoua2019-08-201-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The UMR WQE in the MR re-registration flow requires that modify_atomic and modify_entity_size capabilities are enabled. Therefore, check that the these capabilities are present before going to umr flow and go through slow path if not. Fixes: c8d75a980fab ("IB/mlx5: Respect new UMR capabilities") Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190815083834.9245-8-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | | | | | | | IB/mlx5: Report and handle ODP support properlyMoni Shoua2019-08-202-11/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ODP depends on the several device capabilities, among them is the ability to send UMR WQEs with that modify atomic and entity size of the MR. Therefore, only if all conditions to send such a UMR WQE are met then driver can report that ODP is supported. Use this check of conditions in all places where driver needs to know about ODP support. Also, implicit ODP support depends on ability of driver to send UMR WQEs for an indirect mkey. Therefore, verify that all conditions to do so are met when reporting support. Fixes: c8d75a980fab ("IB/mlx5: Respect new UMR capabilities") Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190815083834.9245-7-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | | | | | | | IB/mlx5: Consolidate use_umr checks into single functionMoni Shoua2019-08-202-3/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce helper function to unify various use_umr checks. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190815083834.9245-6-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | | | | | | | RDMA/restrack: Rewrite PID namespace check to be reliableLeon Romanovsky2019-08-203-12/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | task_active_pid_ns() is wrong API to check PID namespace because it posses some restrictions and return PID namespace where the process was allocated. It created mismatches with current namespace, which can be different. Rewrite whole rdma_is_visible_in_pid_ns() logic to provide reliable results without any relation to allocated PID namespace. Fixes: 8be565e65fa9 ("RDMA/nldev: Factor out the PID namespace check") Fixes: 6a6c306a09b5 ("RDMA/restrack: Make is_visible_in_pid_ns() as an API") Reviewed-by: Mark Zhang <markz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190815083834.9245-4-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | | | | | | | RDMA/counters: Properly implement PID checksLeon Romanovsky2019-08-201-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "Auto" configuration mode is called for visible in that PID namespace and it ensures that all counters and QPs are coexist in the same namespace and belong to same PID. Fixes: 99fa331dc862 ("RDMA/counter: Add "auto" configuration mode support") Reviewed-by: Mark Zhang <markz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190815083834.9245-3-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | | | | | | | IB/core: Fix NULL pointer dereference when bind QP to counterIdo Kalir2019-08-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If QP is not visible to the pid, then we try to decrease its reference count and return from the function before the QP pointer is initialized. This lead to NULL pointer dereference. Fix it by pass directly the res to the rdma_restract_put as arg instead of &qp->res. This fixes below call trace: [ 5845.110329] BUG: kernel NULL pointer dereference, address: 00000000000000dc [ 5845.120482] Oops: 0002 [#1] SMP PTI [ 5845.129119] RIP: 0010:rdma_restrack_put+0x5/0x30 [ib_core] [ 5845.169450] Call Trace: [ 5845.170544] rdma_counter_get_qp+0x5c/0x70 [ib_core] [ 5845.172074] rdma_counter_bind_qpn_alloc+0x6f/0x1a0 [ib_core] [ 5845.173731] nldev_stat_set_doit+0x314/0x330 [ib_core] [ 5845.175279] rdma_nl_rcv_msg+0xeb/0x1d0 [ib_core] [ 5845.176772] ? __kmalloc_node_track_caller+0x20b/0x2b0 [ 5845.178321] rdma_nl_rcv+0xcb/0x120 [ib_core] [ 5845.179753] netlink_unicast+0x179/0x220 [ 5845.181066] netlink_sendmsg+0x2d8/0x3d0 [ 5845.182338] sock_sendmsg+0x30/0x40 [ 5845.183544] __sys_sendto+0xdc/0x160 [ 5845.184832] ? syscall_trace_enter+0x1f8/0x2e0 [ 5845.186209] ? __audit_syscall_exit+0x1d9/0x280 [ 5845.187584] __x64_sys_sendto+0x24/0x30 [ 5845.188867] do_syscall_64+0x48/0x120 [ 5845.190097] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 1bd8e0a9d0fd1 ("RDMA/counter: Allow manual mode configuration support") Signed-off-by: Ido Kalir <idok@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190815083834.9245-2-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | | | | | | | IB/hfi1: Drop stale TID RDMA packets that cause TIDErrKaike Wan2019-08-201-44/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In a congested fabric with adaptive routing enabled, traces show that packets could be delivered out of order. A stale TID RDMA data packet could lead to TidErr if the TID entries have been released by duplicate data packets generated from retries, and subsequently erroneously force the qp into error state in the current implementation. Since the payload has already been dropped by hardware, the packet can be simply dropped and it is no longer necessary to put the qp into error state. Fixes: 9905bf06e890 ("IB/hfi1: Add functions to receive TID RDMA READ response") Cc: <stable@vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Link: https://lore.kernel.org/r/20190815192058.105923.72324.stgit@awfm-01.aw.intel.com Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | | | | | | | IB/hfi1: Add additional checks when handling TID RDMA WRITE DATA packetKaike Wan2019-08-201-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In a congested fabric with adaptive routing enabled, traces show that packets could be delivered out of order, which could cause incorrect processing of stale packets. For stale TID RDMA WRITE DATA packets that cause KDETH EFLAGS errors, this patch adds additional checks before processing the packets. Fixes: d72fe7d5008b ("IB/hfi1: Add a function to receive TID RDMA WRITE DATA packet") Cc: <stable@vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Link: https://lore.kernel.org/r/20190815192051.105923.69979.stgit@awfm-01.aw.intel.com Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | | | | | | | IB/hfi1: Add additional checks when handling TID RDMA READ RESP packetKaike Wan2019-08-201-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In a congested fabric with adaptive routing enabled, traces show that packets could be delivered out of order, which could cause incorrect processing of stale packets. For stale TID RDMA READ RESP packets that cause KDETH EFLAGS errors, this patch adds additional checks before processing the packets. Fixes: 9905bf06e890 ("IB/hfi1: Add functions to receive TID RDMA READ response") Cc: <stable@vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Link: https://lore.kernel.org/r/20190815192045.105923.59813.stgit@awfm-01.aw.intel.com Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | | | | | | | IB/hfi1: Unsafe PSN checking for TID RDMA READ Resp packetKaike Wan2019-08-201-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When processing a TID RDMA READ RESP packet that causes KDETH EFLAGS errors, the packet's IB PSN is checked against qp->s_last_psn and qp->s_psn without the protection of qp->s_lock, which is not safe. This patch fixes the issue by acquiring qp->s_lock first. Fixes: 9905bf06e890 ("IB/hfi1: Add functions to receive TID RDMA READ response") Cc: <stable@vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Link: https://lore.kernel.org/r/20190815192039.105923.7852.stgit@awfm-01.aw.intel.com Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | | | | | | | IB/hfi1: Drop stale TID RDMA packetsKaike Wan2019-08-201-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In a congested fabric with adaptive routing enabled, traces show that the sender could receive stale TID RDMA NAK packets that contain newer KDETH PSNs and older Verbs PSNs. If not dropped, these packets could cause the incorrect rewinding of the software flows and the incorrect completion of TID RDMA WRITE requests, and eventually leading to memory corruption and kernel crash. The current code drops stale TID RDMA ACK/NAK packets solely based on KDETH PSNs, which may lead to erroneous processing. This patch fixes the issue by also checking the Verbs PSN. Addition checks are added before rewinding the TID RDMA WRITE DATA packets. Fixes: 9e93e967f7b4 ("IB/hfi1: Add a function to receive TID RDMA ACK packet") Cc: <stable@vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Link: https://lore.kernel.org/r/20190815192033.105923.44192.stgit@awfm-01.aw.intel.com Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | | | | | | | RDMA/siw: Fix potential NULL de-refBernard Metzler2019-08-201-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In siw_connect() we have an error flow where there is no valid qp pointer. Make sure we don't try to de-ref in that situation. Fixes: 6c52fdc244b5 ("rdma/siw: connection management") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Link: https://lore.kernel.org/r/20190819140257.19319-1-bmt@zurich.ibm.com Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | | | | | | | RDMA/mlx5: Fix MR npages calculation for IB_ACCESS_HUGETLBJason Gunthorpe2019-08-202-8/+4
| | |_|_|_|/ / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When ODP is enabled with IB_ACCESS_HUGETLB then the required pages should be calculated based on the extent of the MR, which is rounded to the nearest huge page alignment. Fixes: d2183c6f1958 ("RDMA/umem: Move page_shift from ib_umem to ib_odp_umem") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190815083834.9245-5-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
* | | | | | | | | | Merge tag 'for-linus-20190823' of git://git.kernel.dk/linux-blockLinus Torvalds2019-08-236-23/+70
|\ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block fixes from Jens Axboe: "Here's a set of fixes that should go into this release. This contains: - Three minor fixes for NVMe. - Three minor tweaks for the io_uring polling logic. - Officially mark Song as the MD maintainer, after he's been filling that role sucessfully for the last 6 months or so" * tag 'for-linus-20190823' of git://git.kernel.dk/linux-block: io_uring: add need_resched() check in inner poll loop md: update MAINTAINERS info io_uring: don't enter poll loop if we have CQEs pending nvme: Add quirk for LiteON CL1 devices running FW 22301111 nvme: Fix cntlid validation when not using NVMEoF nvme-multipath: fix possible I/O hang when paths are updated io_uring: fix potential hang with polled IO
| * | | | | | | | | | io_uring: add need_resched() check in inner poll loopJens Axboe2019-08-221-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The outer poll loop checks for whether we need to reschedule, and returns to userspace if we do. However, it's possible to get stuck in the inner loop as well, if the CPU we are running on needs to reschedule to finish the IO work. Add the need_resched() check in the inner loop as well. This fixes a potential hang if the kernel is configured with CONFIG_PREEMPT_VOLUNTARY=y. Reported-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | | | | | md: update MAINTAINERS infoSong Liu2019-08-221-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I have been reviewing patches for md in the past few months. Mark me as the MD maintainer, as I have effectively been filling that role. Cc: NeilBrown <neilb@suse.com> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | | | | | io_uring: don't enter poll loop if we have CQEs pendingJens Axboe2019-08-201-7/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to check if we have CQEs pending before starting a poll loop, as those could be the events we will be spinning for (and hence we'll find none). This can happen if a CQE triggers an error, or if it is found by eg an IRQ before we get a chance to find it through polling. Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | | | | | nvme: Add quirk for LiteON CL1 devices running FW 22301111Mario Limonciello2019-08-203-1/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One of the components in LiteON CL1 device has limitations that can be encountered based upon boundary race conditions using the nvme bus specific suspend to idle flow. When this situation occurs the drive doesn't resume properly from suspend-to-idle. LiteON has confirmed this problem and fixed in the next firmware version. As this firmware is already in the field, avoid running nvme specific suspend to idle flow. Fixes: d916b1be94b6 ("nvme-pci: use host managed power state for suspend") Link: http://lists.infradead.org/pipermail/linux-nvme/2019-July/thread.html Signed-off-by: Mario Limonciello <mario.limonciello@dell.com> Signed-off-by: Charles Hyde <charles.hyde@dellteam.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | | | | | nvme: Fix cntlid validation when not using NVMEoFGuilherme G. Piccoli2019-08-201-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 1b1031ca63b2 ("nvme: validate cntlid during controller initialisation") introduced a validation for controllers with duplicate cntlid that runs on nvme_init_subsystem(). The problem is that the validation relies on ctrl->cntlid, and this value is assigned (from id_ctrl value) after the call for nvme_init_subsystem() in nvme_init_identify() for non-fabrics scenario. That leads to ctrl->cntlid always being 0 in case we have a physical set of controllers in the same subsystem. This patch fixes that by loading the discovered cntlid id_ctrl value into ctrl->cntlid before the subsystem initialization, only for the non-fabrics case. The patch was tested with emulated nvme devices (qemu) having two controllers in a single subsystem. Without the patch, we couldn't make it work failing in the duplicate check; when running with the patch, we could see the subsystem holding both controllers. For the fabrics case we see ctrl->cntlid has a more intricate relation with the admin connect, so we didn't change that. Fixes: 1b1031ca63b2 ("nvme: validate cntlid during controller initialisation") Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | | | | | nvme-multipath: fix possible I/O hang when paths are updatedAnton Eidelman2019-08-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nvme_state_set_live() making a path available triggers requeue_work in order to resubmit requests that ended up on requeue_list when no paths were available. This requeue_work may race with concurrent nvme_ns_head_make_request() that do not observe the live path yet. Such concurrent requests may by made by either: - New IO submission. - Requeue_work triggered by nvme_failover_req() or another ana_work. A race may cause requeue_work capture the state of requeue_list before more requests get onto the list. These requests will stay on the list forever unless requeue_work is triggered again. In order to prevent such race, nvme_state_set_live() should synchronize_srcu(&head->srcu) before triggering the requeue_work and prevent nvme_ns_head_make_request referencing an old snapshot of the path list. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anton Eidelman <anton@lightbitslabs.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | | | | | io_uring: fix potential hang with polled IOJens Axboe2019-08-201-11/+25
| |/ / / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a request issue ends up being punted to async context to avoid blocking, we can get into a situation where the original application enters the poll loop for that very request before it has been issued. This should not be an issue, except that the polling will hold the io_uring uring_ctx mutex for the duration of the poll. When the async worker has actually issued the request, it needs to acquire this mutex to add the request to the poll issued list. Since the application polling is already holding this mutex, the workqueue sleeps on the mutex forever, and the application thus never gets a chance to poll for the very request it was interested in. Fix this by ensuring that the polling drops the uring_ctx occasionally if it's not making any progress. Reported-by: Jeffrey M. Birnbaum <jmbnyc@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | | | | | | | Merge tag 'for-5.3/dm-fixes-2' of ↵Linus Torvalds2019-08-2312-60/+209
|\ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - Revert a DM bufio change from during the 5.3 merge window now that a proper fix has been made to the block loopback driver. - Fix DM kcopyd to wakeup so failed subjobs get completed. - Various fixes to DM zoned target to address error handling, and other small tweaks (SPDX license identifiers and fix typos). - Fix DM integrity range locking race by tracking whether journal has changed. - Fix DM dust target to detect reads of badblocks beyond the first 512b sector (applicable if blocksize is larger than 512b). - Fix DM persistent-data issue in both the DM btree and DM space-map-metadata interfaces. - Fix out of bounds memory access with certain DM table configurations. * tag 'for-5.3/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm table: fix invalid memory accesses with too high sector number dm space map metadata: fix missing store of apply_bops() return value dm btree: fix order of block initialization in btree_split_beneath dm raid: add missing cleanup in raid_ctr() dm zoned: fix potential NULL dereference in dmz_do_reclaim() dm dust: use dust block size for badblocklist index dm integrity: fix a crash due to BUG_ON in __journal_read_write() dm zoned: fix a few typos dm zoned: add SPDX license identifiers dm zoned: properly handle backing device failure dm zoned: improve error handling in i/o map code dm zoned: improve error handling in reclaim dm kcopyd: always complete failed jobs Revert "dm bufio: fix deadlock with loop device"