summaryrefslogtreecommitdiffstats
path: root/drivers/scsi/lpfc (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch '5.17/scsi-fixes' into 5.18/scsi-stagingMartin K. Petersen2022-02-154-7/+20
|\ | | | | | | | | | | | | Pull 5.17 fixes branch into 5.18 tree to resolve a few pm8001 driver merge conflicts. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * scsi: lpfc: Reduce log messages seen after firmware downloadJames Smart2022-02-082-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Messages around firmware download were incorrectly tagged as being related to discovery trace events. Thus, firmware download status ended up dumping the trace log as well as the firmware update message. As there were a couple of log messages in this state, the trace log was dumped multiple times. Resolve this by converting from trace events to SLI events. Link: https://lore.kernel.org/r/20220207180442.72836-1-jsmart2021@gmail.com Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * scsi: lpfc: Remove NVMe support if kernel has NVME_FC disabledJames Smart2022-02-082-5/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The driver is initiating NVMe PRLIs to determine device NVMe support. This should not be occurring if CONFIG_NVME_FC support is disabled. Correct this by changing the default value for FC4 support. Currently it defaults to FCP and NVMe. With change, when NVME_FC support is not enabled in the kernel, the default value is just FCP. Link: https://lore.kernel.org/r/20220207180516.73052-1-jsmart2021@gmail.com Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* | scsi: lpfc: Remove redundant flush_workqueue() callMinghao Chi (CGEL ZTE)2022-01-311-1/+0
|/ | | | | | | | | | | | | destroy_workqueue() already drains the queue before destroying it, so there is no need to flush it explicitly. Remove the redundant flush_workqueue() call. Link: https://lore.kernel.org/r/20220127014330.1185114-1-chi.minghao@zte.com.cn Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Minghao Chi (CGEL ZTE) <chi.minghao@zte.com.cn> Signed-off-by: CGEL ZTE <cgel.zte@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* Merge tag 'bitmap-5.17-rc1' of git://github.com/norov/linuxLinus Torvalds2022-01-231-5/+5
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull bitmap updates from Yury Norov: - introduce for_each_set_bitrange() - use find_first_*_bit() instead of find_next_*_bit() where possible - unify for_each_bit() macros * tag 'bitmap-5.17-rc1' of git://github.com/norov/linux: vsprintf: rework bitmap_list_string lib: bitmap: add performance test for bitmap_print_to_pagebuf bitmap: unify find_bit operations mm/percpu: micro-optimize pcpu_is_populated() Replace for_each_*_bit_from() with for_each_*_bit() where appropriate find: micro-optimize for_each_{set,clear}_bit() include/linux: move for_each_bit() macros from bitops.h to find.h cpumask: replace cpumask_next_* with cpumask_first_* where appropriate tools: sync tools/bitmap with mother linux all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate cpumask: use find_first_and_bit() lib: add find_first_and_bit() arch: remove GENERIC_FIND_FIRST_BIT entirely include: move find.h from asm_generic to linux bitops: move find_bit_*_le functions from le.h to find.h bitops: protect find_first_{,zero}_bit properly
| * all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriateYury Norov2022-01-151-5/+5
| | | | | | | | | | | | | | | | | | find_first{,_zero}_bit is a more effective analogue of 'next' version if start == 0. This patch replaces 'next' with 'first' where things look trivial. Signed-off-by: Yury Norov <yury.norov@gmail.com> Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
* | Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds2022-01-1413-160/+267
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull SCSI updates from James Bottomley: "This series consists of the usual driver updates (ufs, pm80xx, lpfc, mpi3mr, mpt3sas, hisi_sas, libsas) and minor updates and bug fixes. The most impactful change is likely the switch from GFP_DMA to GFP_KERNEL in a bunch of drivers, but even that shouldn't affect too many people" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (121 commits) scsi: mpi3mr: Bump driver version to 8.0.0.61.0 scsi: mpi3mr: Fixes around reply request queues scsi: mpi3mr: Enhanced Task Management Support Reply handling scsi: mpi3mr: Use TM response codes from MPI3 headers scsi: mpi3mr: Add io_uring interface support in I/O-polled mode scsi: mpi3mr: Print cable mngnt and temp threshold events scsi: mpi3mr: Support Prepare for Reset event scsi: mpi3mr: Add Event acknowledgment logic scsi: mpi3mr: Gracefully handle online FW update operation scsi: mpi3mr: Detect async reset that occurred in firmware scsi: mpi3mr: Add IOC reinit function scsi: mpi3mr: Handle offline FW activation in graceful manner scsi: mpi3mr: Code refactor of IOC init - part2 scsi: mpi3mr: Code refactor of IOC init - part1 scsi: mpi3mr: Fault IOC when internal command gets timeout scsi: mpi3mr: Display IOC firmware package version scsi: mpi3mr: Handle unaligned PLL in unmap cmnds scsi: mpi3mr: Increase internal cmnds timeout to 60s scsi: mpi3mr: Do access status validation before adding devices scsi: mpi3mr: Add support for PCIe Managed Switch SES device ...
| * \ Merge branch '5.16/scsi-fixes' into 5.17/scsi-stagingMartin K. Petersen2021-12-171-7/+2
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull in the 5.16 fixes branch to resolve a conflict in the UFS driver core. Conflicts: drivers/scsi/ufs/ufshcd.c Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | | scsi: lpfc: Use struct_group to isolate cast to larger objectKees Cook2021-12-142-27/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When building under -Warray-bounds, a warning is generated when casting a u32 into MAILBOX_t (which is larger). This warning is conservative, but it's not an unreasonable change to make to improve future robustness. Use a tagged struct_group that can refer to either the specific fields or the first u32 separately, silencing this warning: drivers/scsi/lpfc/lpfc_sli.c: In function 'lpfc_reset_barrier': drivers/scsi/lpfc/lpfc_sli.c:4787:29: error: array subscript 'MAILBOX_t[0]' is partly outside array bounds of 'volatile uint32_t[1]' {aka 'volatile unsigned int[1]'} [-Werror=array-bounds] 4787 | ((MAILBOX_t *)&mbox)->mbxCommand = MBX_KILL_BOARD; | ^~ drivers/scsi/lpfc/lpfc_sli.c:4752:27: note: while referencing 'mbox' 4752 | volatile uint32_t mbox; | ^~~~ There is no change to the resulting executable instruction code. Link: https://lore.kernel.org/r/20211203223351.107323-1-keescook@chromium.org Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | | scsi: lpfc: Use struct_group() to initialize struct lpfc_cgn_infoKees Cook2021-12-142-48/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation for FORTIFY_SOURCE performing compile-time and run-time field bounds checking for memset(), avoid intentionally writing across neighboring fields. Add struct_group() to mark "stat" region of struct lpfc_cgn_info that should be initialized to zero, and refactor the "data" region memset() to wipe everything up to the cgn_stats region. Link: https://lore.kernel.org/r/20211208195957.1603092-1-keescook@chromium.org Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | | scsi: lpfc: Update lpfc version to 14.0.0.4James Smart2021-12-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update lpfc version to 14.0.0.4. Link: https://lore.kernel.org/r/20211204002644.116455-10-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | | scsi: lpfc: Add additional debugfs support for CMFJames Smart2021-12-071-1/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dump raw CMF parameter information in debugfs cgn_buffer. Link: https://lore.kernel.org/r/20211204002644.116455-9-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | | scsi: lpfc: Cap CMF read bytes to MBPIJames Smart2021-12-072-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ensure read bytes data does not go over MBPI for CMF timer intervals that are purposely shortened. Link: https://lore.kernel.org/r/20211204002644.116455-8-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | | scsi: lpfc: Adjust CMF total bytes and rxmonitorJames Smart2021-12-074-15/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Calculate any extra bytes needed to account for timer accuracy. If we are less than LPFC_CMF_INTERVAL, then calculate the adjustment needed for total to reflect a full LPFC_CMF_INTERVAL. Add additional info to rxmonitor, and adjust some log formatting. Link: https://lore.kernel.org/r/20211204002644.116455-7-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | | scsi: lpfc: Trigger SLI4 firmware dump before doing driver cleanupJames Smart2021-12-074-30/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extraneous teardown routines are present in the firmware dump path causing altered states in firmware captures. When a firmware dump is requested via sysfs, trigger the dump immediately without tearing down structures and changing adapter state. The driver shall rely on pre-existing firmware error state clean up handlers to restore the adapter. Link: https://lore.kernel.org/r/20211204002644.116455-6-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | | scsi: lpfc: Fix NPIV port deletion crashJames Smart2021-12-074-25/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The driver is calling schedule_timeout after the DA_ID nameserver request and LOGO commands are issued to the fabric by the initiator virtual endport. These fixed delay functions are causing long delays in the driver's worker thread when processing discovery I/Os in a serialized fashion, which is then triggering mailbox timeout errors artificially. To fix this, don't wait on the DA_ID request to complete and call wait_event_timeout to allow the vport delete thread to make progress on an event driven basis rather than fixing the wait time. Link: https://lore.kernel.org/r/20211204002644.116455-5-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | | scsi: lpfc: Fix lpfc_force_rscn ndlp kref imbalanceJames Smart2021-12-071-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Issuing lpfc_force_rscn twice results in an ndlp kref use-after-free call trace. A prior patch reworked the get/put handling by ensuring nlp_get was done before WQE submission and a put was done in the completion path. Unfortunately, the issue_els_rscn path had a piece of legacy code that did a nlp_put, causing an imbalance on the ref counts. Fixed by removing the unnecessary legacy code snippet. Link: https://lore.kernel.org/r/20211204002644.116455-4-jsmart2021@gmail.com Fixes: 4430f7fd09ec ("scsi: lpfc: Rework locations of ndlp reference taking") Cc: <stable@vger.kernel.org> # v5.11+ Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | | scsi: lpfc: Change return code on I/Os received during link bounceJames Smart2021-12-072-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During heavy I/O testing with issue_lip to bounce the link, occasionally I/O is terminated with status 3 result 9, which means the RPI is suspended. The I/O is completed and this type of error will result in immediate retry by the SCSI layer. The retry count expires and the I/O fails and returns error to the application. To avoid these quick retry/retries exhausted scenarios change the return code given to the midlayer to DID_REQUEUE rather than DID_ERROR. This gets them retried, and eventually succeed when the link recovers. Link: https://lore.kernel.org/r/20211204002644.116455-3-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | | scsi: lpfc: Fix leaked lpfc_dmabuf mbox allocations with NPIVJames Smart2021-12-073-3/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During rmmod testing, messages appeared indicating lpfc_mbuf_pool entries were still busy. This situation was only seen doing rmmod after at least 1 vport (NPIV) instance was created and destroyed. The number of messages scaled with the number of vports created. When a vport is created, it can receive a PLOGI from another initiator Nport. When this happens, the driver prepares to ack the PLOGI and prepares an RPI for registration (via mbx cmd) which includes an mbuf allocation. During the unsolicited PLOGI processing and after the RPI preparation, the driver recognizes it is one of the vport instances and decides to reject the PLOGI. During the LS_RJT preparation for the PLOGI, the mailbox struct allocated for RPI registration is freed, but the mbuf that was also allocated is not released. Fix by freeing the mbuf with the mailbox struct in the LS_RJT path. As part of the code review to figure the issue out a couple of other areas where found that also would not have released the mbuf. Those are cleaned up as well. Link: https://lore.kernel.org/r/20211204002644.116455-2-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* | | | Merge branch 'linus' into irq/core, to fix conflictIngo Molnar2022-01-082-9/+4
|\ \ \ \ | | |_|/ | |/| | | | | | | | | | | | | | | | | | Conflicts: drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | scsi: lpfc: Terminate string in lpfc_debugfs_nvmeio_trc_write()Dan Carpenter2021-12-171-2/+2
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | The "mybuf" string comes from the user, so we need to ensure that it is NUL terminated. Link: https://lore.kernel.org/r/20211214070527.GA27934@kili Fixes: bd2cdd5e400f ("scsi: lpfc: NVME Initiator: Add debugfs support") Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | scsi: lpfc: Fix non-recovery of remote ports following an unsolicited LOGOJames Smart2021-11-241-7/+2
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A commit introduced formal regstration of all Fabric nodes to the SCSI transport as well as REG/UNREG RPI mailbox requests. The commit introduced the NLP_RELEASE_RPI flag for rports set in the lpfc_cmpl_els_logo_acc() routine to help clean up the RPIs. This new code caused the driver to release the RPI value used for the remote port and marked the RPI invalid. When the driver later attempted to re-login, it would use the invalid RPI and the adapter rejected the PLOGI request. As no login occurred, the devloss timer on the rport expired and connectivity was lost. This patch corrects the code by removing the snippet that requests the rpi to be unregistered. This change only occurs on a node that is already marked to be rediscovered. This puts the code back to its original behavior, preserving the already-assigned rpi value (registered or not) which can be used on the re-login attempts. Link: https://lore.kernel.org/r/20211123165646.62740-1-jsmart2021@gmail.com Fixes: fe83e3b9b422 ("scsi: lpfc: Fix node handling for Fabric Controller and Domain Controller") Cc: <stable@vger.kernel.org> # v5.14+ Co-developed-by: Paul Ely <paul.ely@broadcom.com> Signed-off-by: Paul Ely <paul.ely@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* / scsi: lpfc: Use irq_set_affinity()Nitesh Narayan Lal2021-12-101-3/+1
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | The driver uses irq_set_affinity_hint to set the affinity for the lpfc interrupts to a mask corresponding to the local NUMA node to avoid performance overhead on AMD architectures. However, irq_set_affinity_hint() setting the affinity is an undocumented side effect that this function also sets the affinity under the hood. To remove this side effect irq_set_affinity_hint() has been marked as deprecated and new interfaces have been introduced. Also, as per the commit dcaa21367938 ("scsi: lpfc: Change default IRQ model on AMD architectures"): "On AMD architecture, revert the irq allocation to the normal style (non-managed) and then use irq_set_affinity_hint() to set the cpu affinity and disable user-space rebalancing." we don't really need to set the affinity_hint as user-space rebalancing for the lpfc interrupts is not desired. Hence, replace the irq_set_affinity_hint() with irq_set_affinity() which only applies the affinity for the interrupts. Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: James Smart <jsmart2021@gmail.com> Link: https://lore.kernel.org/r/20210903152430.244937-12-nitesh@redhat.com
* Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds2021-11-0514-332/+792
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull SCSI updates from James Bottomley: "This consists of the usual driver updates (ufs, smartpqi, lpfc, target, megaraid_sas, hisi_sas, qla2xxx) and minor updates and bug fixes. Notable core changes are the removal of scsi->tag which caused some churn in obsolete drivers and a sweep through all drivers to call scsi_done() directly instead of scsi->done() which removes a pointer indirection from the hot path and a move to register core sysfs files earlier, which means they're available to KOBJ_ADD processing, which necessitates switching all drivers to using attribute groups" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (279 commits) scsi: lpfc: Update lpfc version to 14.0.0.3 scsi: lpfc: Allow fabric node recovery if recovery is in progress before devloss scsi: lpfc: Fix link down processing to address NULL pointer dereference scsi: lpfc: Allow PLOGI retry if previous PLOGI was aborted scsi: lpfc: Fix use-after-free in lpfc_unreg_rpi() routine scsi: lpfc: Correct sysfs reporting of loop support after SFP status change scsi: lpfc: Wait for successful restart of SLI3 adapter during host sg_reset scsi: lpfc: Revert LOG_TRACE_EVENT back to LOG_INIT prior to driver_resource_setup() scsi: ufs: ufshcd-pltfrm: Fix memory leak due to probe defer scsi: ufs: mediatek: Avoid sched_clock() misuse scsi: mpt3sas: Make mpt3sas_dev_attrs static scsi: scsi_transport_sas: Add 22.5 Gbps link rate definitions scsi: target: core: Stop using bdevname() scsi: aha1542: Use memcpy_{from,to}_bvec() scsi: sr: Add error handling support for add_disk() scsi: sd: Add error handling support for add_disk() scsi: target: Perform ALUA group changes in one step scsi: target: Replace lun_tg_pt_gp_lock with rcu in I/O path scsi: target: Fix alua_tg_pt_gps_count tracking scsi: target: Fix ordered tag handling ...
| * scsi: lpfc: Update lpfc version to 14.0.0.3James Smart2021-10-211-1/+1
| | | | | | | | | | | | | | | | | | | | Update lpfc version to 14.0.0.3. Link: https://lore.kernel.org/r/20211020211417.88754-9-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * scsi: lpfc: Allow fabric node recovery if recovery is in progress before devlossJames Smart2021-10-216-15/+139
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A link bounce to a slow fabric may observe FDISC response delays lasting longer than devloss tmo. Current logic decrements the final fabric node kref during a devloss tmo event. This results in a NULL ptr dereference crash if the FDISC completes for that fabric node after devloss tmo. Fix by adding the NLP_IN_RECOV_POST_DEV_LOSS flag, which is set when devloss tmo triggers and we've noticed that fabric node recovery has already started or finished in between the time lpfc_dev_loss_tmo_callbk queues lpfc_dev_loss_tmo_handler. If fabric node recovery succeeds, then the driver reverses the devloss tmo marked kref put with a kref get. If fabric node recovery fails, then the final kref put relies on the ELS timing out or the REG_LOGIN cmpl routine. Link: https://lore.kernel.org/r/20211020211417.88754-8-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * scsi: lpfc: Fix link down processing to address NULL pointer dereferenceJames Smart2021-10-211-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If an FC link down transition while PLOGIs are outstanding to fabric well known addresses, outstanding ABTS requests may result in a NULL pointer dereference. Driver unload requests may hang with repeated "2878" log messages. The Link down processing results in ABTS requests for outstanding ELS requests. The Abort WQEs are sent for the ELSs before the driver had set the link state to down. Thus the driver is sending the Abort with the expectation that an ABTS will be sent on the wire. The Abort request is stalled waiting for the link to come up. In some conditions the driver may auto-complete the ELSs thus if the link does come up, the Abort completions may reference an invalid structure. Fix by ensuring that Abort set the flag to avoid link traffic if issued due to conditions where the link failed. Link: https://lore.kernel.org/r/20211020211417.88754-7-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * scsi: lpfc: Allow PLOGI retry if previous PLOGI was abortedJames Smart2021-10-211-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A remote nport can stop responding to PLOGI beyond the ELS I/O timeout under some fault conditions. When this happens, the non-response triggers a dev_loss_tmo event from the transport which causes the driver to abort the PLOGI and stop any retries. This was due to a policy in the ELS completion handler whenever an ELS was terminated due to driver request. Revise the ELS completion path to detect PLOGIs that were aborted and allow retries. Link: https://lore.kernel.org/r/20211020211417.88754-6-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * scsi: lpfc: Fix use-after-free in lpfc_unreg_rpi() routineJames Smart2021-10-211-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An error is detected with the following report when unloading the driver: "KASAN: use-after-free in lpfc_unreg_rpi+0x1b1b" The NLP_REG_LOGIN_SEND nlp_flag is set in lpfc_reg_fab_ctrl_node(), but the flag is not cleared upon completion of the login. This allows a second call to lpfc_unreg_rpi() to proceed with nlp_rpi set to LPFC_RPI_ALLOW_ERROR. This results in a use after free access when used as an rpi_ids array index. Fix by clearing the NLP_REG_LOGIN_SEND nlp_flag in lpfc_mbx_cmpl_fc_reg_login(). Link: https://lore.kernel.org/r/20211020211417.88754-5-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * scsi: lpfc: Correct sysfs reporting of loop support after SFP status changeJames Smart2021-10-213-21/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Applications determine loop support in part by querying the 'pls' sysfs node. Reporting of 'pls' (Private Loop Support) is derived from the descriptor returned by the COMMON_GET_SLI4_PARAMETERS mailbox command, which is issued during initialization or after a reset. The value of this field may change if there is a dynamic SFP change. The driver currently will not pick up the change as there was no reset scenario. Rework to commonize the sending of the COMMON_GET_SLI4_PARAMETERS command. Add the calling of the routine after receipt of an async event indicating an SFP change. Link: https://lore.kernel.org/r/20211020211417.88754-4-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * scsi: lpfc: Wait for successful restart of SLI3 adapter during host sg_resetJames Smart2021-10-211-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A prior patch introduced HBA_NEEDS_CFG_PORT flag logic, but in lpfc_sli_brdrestart_s3() code path, right after HBA_NEEDS_CFG_PORT is set, the phba->hba_flag is cleared in lpfc_sli_brdreset(). Fix by calling lpfc_sli_chipset_init() to wait for successful restart of the HBA in lpfc_host_reset_handler() after lpfc_sli_brdrestart(). lpfc_sli_chipset_init() sets the HBA_NEEDS_CFG_PORT flag so that the lpfc_sli_hba_setup() routine from lpfc_online() will execute lpfc_sli_config_port() initialization step when the brdrestart is successful. Link: https://lore.kernel.org/r/20211020211417.88754-3-jsmart2021@gmail.com Fixes: d2f2547efd39 ("scsi: lpfc: Fix auto sli_mode and its effect on CONFIG_PORT for SLI3") Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * scsi: lpfc: Revert LOG_TRACE_EVENT back to LOG_INIT prior to ↵James Smart2021-10-213-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | driver_resource_setup() In cases when lpfc_enable_pci_dev() fails, lpfc_printf_log() with LOG_TRACE_EVENT set will call lpfc_dmp_dbg() which uses the phba->port_list_lock. However, phba->port_list_lock does not get initialized until lpfc_setup_driver_resource_phase1(). Thus, any initialization routine with LOG_TRACE_EVENT log message prior to lpfc_setup_driver_resource_phase1() will crash. Revert LOG_TRACE_EVENT back to LOG_INIT for all log messages in routines prior to lpfc_setup_driver_resource_phase1(). Link: https://lore.kernel.org/r/20211020211417.88754-2-jsmart2021@gmail.com CC: Zheyu Ma <zheyuma97@gmail.com> Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * scsi: lpfc: Switch to attribute groupsBart Van Assche2021-10-174-153/+171
| | | | | | | | | | | | | | | | | | struct device supports attribute groups directly but does not support struct device_attribute directly. Hence switch to attribute groups. Link: https://lore.kernel.org/r/20211012233558.4066756-28-bvanassche@acm.org Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * scsi: lpfc: Call scsi_done() directlyBart Van Assche2021-10-171-4/+4
| | | | | | | | | | | | | | | | | | Conditional statements are faster than indirect calls. Hence call scsi_done() directly. Link: https://lore.kernel.org/r/20211007202923.2174984-47-bvanassche@acm.org Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * Merge branch '5.15/scsi-fixes' into 5.16/scsi-stagingMartin K. Petersen2021-10-127-37/+32
| |\ | | | | | | | | | | | | | | | | | | Merge the 5.15/scsi-fixes branch into the staging tree to resolve UFS conflict reported by sfr. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | scsi: lpfc: Add support for optional PLDV handlingJames Smart2021-09-292-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At adapter attachment or SLI port initialization, read the SLIPORT_STATUS register to check for pldv_enable. If found, the driver will perform a PCIe configuration space write when attaching to an SLI port instance that is an LPe32000 series adapter. Link: https://lore.kernel.org/r/20210927183518.22130-1-jsmart2021@gmail.com Co-developed-by: Nigel Kirkland <nkirkland2304@gmail.com> Signed-off-by: Nigel Kirkland <nkirkland2304@gmail.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | scsi: lpfc: Return NULL rather than a plain 0 integerColin Ian King2021-09-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Function lpfc_sli4_perform_vport_cvl() returns a pointer to struct lpfc_nodelist so returning a plain 0 integer isn't good practice. Fix this by returning a NULL instead. Link: https://lore.kernel.org/r/20210925224113.183040-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | scsi: lpfc: Fix a function name in commentsCai Huoqing2021-09-291-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | Use dma_map_sg() instead of pci_map_sg() in comments. Link: https://lore.kernel.org/r/20210925125324.1760-3-caihuoqing@baidu.com Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | scsi: lpfc: Fix mailbox command failure during driver initializationJames Smart2021-09-221-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Contention for the mailbox interface may occur during driver initialization (immediately after a function reset), between mailbox commands initiated via ioctl (bsg) and those driver requested by the driver. After setting SLI_ACTIVE flag for a port, there is a window in which the driver will allow an ioctl to be initiated while the adapter is initializing and issuing mailbox commands via polling. The polling logic then gets confused. Correct by having thread setting SLI_ACTIVE spot an active mailbox command and allow it complete before proceeding. Link: https://lore.kernel.org/r/20210921143008.64212-1-jsmart2021@gmail.com Co-developed-by: Nigel Kirkland <nkirkland2304@gmail.com> Signed-off-by: Nigel Kirkland <nkirkland2304@gmail.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | scsi: lpfc: Update lpfc version to 14.0.0.2James Smart2021-09-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update lpfc version to 14.0.0.2. Link: https://lore.kernel.org/r/20210910233159.115896-15-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | scsi: lpfc: Improve PBDE checks during SGL processingJames Smart2021-09-153-42/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The PBDE feature, setting payload buffer address explicitly in the WQE so it doesn't have to be fetched from the SGL, only makes sense when there is a single buffer for the I/O. When there are multiple buffers it actually hurts performance as the SGL subsequently has to be fetched. Rework the SGL logic to only use PBDE when a single buffer. Link: https://lore.kernel.org/r/20210910233159.115896-14-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | scsi: lpfc: Zero CGN stats only during initial driver load and stat resetJames Smart2021-09-152-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently congestion management framework results are cleared whenever the framework settings changed (such as it being turned off then back on). This unfortunately means prior stats, rolled up to higher time windows lose meaning. Change such that stats are not cleared. Thus they pause and resume with prior values still being considered. Link: https://lore.kernel.org/r/20210910233159.115896-13-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | scsi: lpfc: Fix I/O block after enabling managed congestion modeJames Smart2021-09-151-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the congestion management framework dynamically enables, it may do so while I/O is in flight. The updates of cmf info due to inflight I/O completing may happen before values have been initialized. Fix by ensure cmf_max_bytes_per_interval is initialized when checking bandwidth utilization for SCSI layer blocking. Link: https://lore.kernel.org/r/20210910233159.115896-12-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | scsi: lpfc: Adjust bytes received vales during cmf timer intervalJames Smart2021-09-151-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The newly added congestion mgmt framework is seeing unexpected congestion FPINs and signals. In analysis, time values given to the adapter are not at hard time intervals. Thus the drift vs the transfer count seen is affecting how the framework manages things. Adjust counters to cover the drift. Link: https://lore.kernel.org/r/20210910233159.115896-11-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | scsi: lpfc: Fix EEH support for NVMe I/OJames Smart2021-09-157-28/+154
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Injecting errors on the PCI slot while the driver is handling NVMe I/O will cause crashes and hangs. There are several rather difficult scenarios occurring. The main issue is that the adapter can report a PCI error before or simultaneously to the PCI subsystem reporting the error. Both paths have different entry points and currently there is no interlock between them. Thus multiple teardown paths are competing and all heck breaks loose. Complicating things is the NVMs path. To a large degree, I/O was able to be shutdown for a full FC port on the SCSI stack. But on NVMe, there isn't a similar call. At best, it works on a per-controller basis, but even at the controller level, it's a controller "reset" call. All of which means I/O is still flowing on different CPUs with reset paths expecting hw access (mailbox commands) to execute properly. The following modifications are made: - A new flag is set in PCI error entrypoints so the driver can track being called by that path. - An interlock is added in the SLI hw error path and the PCI error path such that only one of the paths proceeds with the teardown logic. - RPI cleanup is patched such that RPIs are marked unregistered w/o mbx cmds in cases of hw error. - If entering the SLI port re-init calls, a case where SLI error teardown was quick and beat the PCI calls now reporting error, check whether the SLI port is still live on the PCI bus. - In the PCI reset code to bring the adapter back, recheck the IRQ settings. Different checks for SLI3 vs SLI4. - In I/O completions, that may be called as part of the cleanup or underway just before the hw error, check the state of the adapter. If in error, shortcut handling that would expect further adapter completions as the hw error won't be sending them. - In routines waiting on I/O completions, which may have been in progress prior to the hw error, detect the device is being torn down and abort from their waits and just give up. This points to a larger issue in the driver on ref-counting for data structures, as it doesn't have ref-counting on q and port structures. We'll do this fix for now as it would be a major rework to be done differently. - Fix the NVMe cleanup to simulate NVMe I/O completions if I/O is being failed back due to hw error. - In I/O buf allocation, done at the start of new I/Os, check hw state and fail if hw error. Link: https://lore.kernel.org/r/20210910233159.115896-10-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | scsi: lpfc: Fix FCP I/O flush functionality for TMF routinesJames Smart2021-09-151-23/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A prior patch inadvertently caused lpfc_sli_sum_iocb() to exclude counting of outstanding aborted I/Os and ABORT IOCBs. Thus, lpfc_reset_flush_io_context() called from any TMF routine does not properly wait to flush all outstanding FCP IOCBs leading to a block layer crash on an invalid scsi_cmnd->request pointer. kernel BUG at ../block/blk-core.c:1489! RIP: 0010:blk_requeue_request+0xaf/0xc0 ... Call Trace: <IRQ> __scsi_queue_insert+0x90/0xe0 [scsi_mod] blk_done_softirq+0x7e/0x90 __do_softirq+0xd2/0x280 irq_exit+0xd5/0xe0 do_IRQ+0x4c/0xd0 common_interrupt+0x87/0x87 </IRQ> Fix by separating out the LPFC_IO_FCP, LPFC_IO_ON_TXCMPLQ, LPFC_DRIVER_ABORTED, and CMD_ABORT_XRI_CN || CMD_CLOSE_XRI_CN checks into a new lpfc_sli_validate_fcp_iocb_for_abort() routine when determining to build an ABORT iocb. Restore lpfc_reset_flush_io_context() functionality by including counting of outstanding aborted IOCBs and ABORT IOCBs in lpfc_sli_sum_iocb(). Link: https://lore.kernel.org/r/20210910233159.115896-9-jsmart2021@gmail.com Fixes: e1364711359f ("scsi: lpfc: Fix illegal memory access on Abort IOCBs") Cc: <stable@vger.kernel.org> # v5.12+ Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | scsi: lpfc: Fix NVMe I/O failover to non-optimized pathJames Smart2021-09-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, we hold off unregistering with NVMe transport layer until GID_FT or ADISC completes upon receipt of RSCN. In the ADISC discovery routine, for nodes not found in the GID_FT response, the nodes are unregistered from the SCSI transport but not UNREG_RPI'd. Meaning outstanding WQEs continue to be outstanding and were not failed back to the OS. If an NVMe device, this mean there wasn't initial termination of the I/Os so they could be issued on a different NVMe path. Fix by unregistering the RPI so that I/O is cancelled. Link: https://lore.kernel.org/r/20210910233159.115896-8-jsmart2021@gmail.com Fixes: 0614568361b0 ("scsi: lpfc: Delay unregistering from transport until GIDFT or ADISC completes") Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | scsi: lpfc: Don't remove ndlp on PRLI errors in P2P modeJames Smart2021-09-151-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In pt-2-pt mode, the initiator does not log into the target after a PRLI error. In pt-2-pt mode, the target responded to the PRLI by sending a LOGO. The LOGO causes all ELS and I/Os to be aborted. This caused the PRLI to fail. The PRLI completion path caused the discovery node to be dropped to avoid being stick in an UNUSED (not logged in) state. As the node was dropped there is no retry of the login and as it is pt-2-pt, there is no RSCN to retrigger discovery. Thus the other end is not seen by the OS. Fix by ensuring the discovery node is not dropped if connecting pt-2-pt. This will cause PLOGI to be retried. Link: https://lore.kernel.org/r/20210910233159.115896-7-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | scsi: lpfc: Fix rediscovery of tape device after LIPJames Smart2021-09-151-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On link up and node discovery, a remote port is registered with the SCSI transport and the driver sets fc4_xpt_flags to track transport registration. A link down event causes the driver to deregister with the SCSI transport, starting the devloss timer, and calls a local unreg routine to clear the login state. Part of the login state is the fc4_xpt_flags. However, with tape devices that support sequence level error recovery, which wants to preserve the login, the local unreg routine is skipped, thus the flags aren't cleared. A subsequent link up, ADISC is performed and the lpfc_nlp_reg_node() routine is called. As the fc4_xpt_flags is not clear, it's believed the node is already registered with the transport. Unfortunately, the registration was already terminated. Eventually the devloss tmo timer expires and tears down the device. Fix by ensuring the tape device, known by the ADISC flag, is always unregistered if the link drops. Link: https://lore.kernel.org/r/20210910233159.115896-6-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | scsi: lpfc: Fix hang on unload due to stuck fport nodeJames Smart2021-09-152-1/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A test scenario encountered an unload hang while an FLOGI ELS was in flight when a link down condition occurred. The driver fails unload as it never releases the fport node. For most nodes, when the link drops, devloss tmo is started and the timeout will cause the final node release. For the Fport, as it has not yet registered with the SCSI transport, there is no devloss timer to be started, so there is no final release. Additionally, the link down sequence causes ABORTS to be issued for pending ELS's. The completions from the ABORTS perform the release of node references. However, as the adapter is being reset to be unloaded, those completions will never occur. Fix by the following: - In the ELS cleanup, recognize when unloading and place the ELS's on a different list that immediately cleans up/completes the ELS's. It's recognized that this condition primarily affects only the fport, with other ports having normal clean up logic that handles things. - Resolve the devloss issue by, when cleaning up nodes on after link down, recognizing when the fabric node does not have a completed state (its state is UNUSED) and removing a reference so the node can delete after the ELS reference is released. Link: https://lore.kernel.org/r/20210910233159.115896-5-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>