summaryrefslogtreecommitdiffstats
path: root/drivers (follow)
Commit message (Collapse)AuthorAgeFilesLines
* mtip32xx: fix up the checking for internal command failureJens Axboe2017-06-231-17/+4
| | | | | | | | | | | | | | | This fixes up two commits that have touched this driver. The command status field is now a blk_status_t, so we can't check for < 0 and we definitely can't assume it's holding -Exxxx error values. All we care about here is whether ->status is zero or not. Check for that, and remove the various attempts at smart error reporting. Just log to dmesg what command failed, and the blk_status_t value. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 2a842acab109 ("block: introduce new block status code type") Fixes: 3f5e6a35774c ("mtip32xx: convert internal command issue to block IO path") Signed-off-by: Jens Axboe <axboe@kernel.dk>
* Merge commit '8e8320c9315c' into for-4.13/blockJens Axboe2017-06-239-65/+74
|\ | | | | | | | | | | | | | | | | Pull in the fix for shared tags, as it conflicts with the pending changes in for-4.13/block. We already pulled in v4.12-rc5 to solve other conflicts or get fixes that went into 4.12, so not a lot of changes in this merge. Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * Merge branch 'stable/for-jens-4.12' of ↵Jens Axboe2017-06-203-41/+26
| |\ | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-linus Pull xen-blkback fixes from Konrad: "Security and memory leak fixes in xen block driver."
| | * xen-blkback: don't leak stack data via response ringJan Beulich2017-06-132-31/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than constructing a local structure instance on the stack, fill the fields directly on the shared ring, just like other backends do. Build on the fact that all response structure flavors are actually identical (the old code did make this assumption too). This is XSA-216. Cc: stable@vger.kernel.org Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * xen/blkback: don't use xen_blkif_get() in xen-blkback kthreadJuergen Gross2017-06-132-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no need to use xen_blkif_get()/xen_blkif_put() in the kthread of xen-blkback. Thread stopping is synchronous and using the blkif reference counting in the kthread will avoid to ever let the reference count drop to zero at the end of an I/O running concurrent to disconnecting and multiple rings. Setting ring->xenblkd to NULL after stopping the kthread isn't needed as the kthread does this already. Signed-off-by: Juergen Gross <jgross@suse.com> Tested-by: Steven Haigh <netwiz@crc.id.au> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * xen/blkback: don't free be structure too earlyJuergen Gross2017-06-131-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The be structure must not be freed when freeing the blkif structure isn't done. Otherwise a use-after-free of be when unmapping the ring used for communicating with the frontend will occur in case of a late call of xenblk_disconnect() (e.g. due to an I/O still active when trying to disconnect). Signed-off-by: Juergen Gross <jgross@suse.com> Tested-by: Steven Haigh <netwiz@crc.id.au> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * xen/blkback: fix disconnect while I/Os in flightJuergen Gross2017-06-132-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Today disconnecting xen-blkback is broken in case there are still I/Os in flight: xen_blkif_disconnect() will bail out early without releasing all resources in the hope it will be called again when the last request has terminated. This, however, won't happen as xen_blkif_free() won't be called on termination of the last running request: xen_blkif_put() won't decrement the blkif refcnt to 0 as xen_blkif_disconnect() didn't finish before thus some xen_blkif_put() calls in xen_blkif_disconnect() didn't happen. To solve this deadlock xen_blkif_disconnect() and xen_blkif_alloc_rings() shouldn't use xen_blkif_put() and xen_blkif_get() but use some other way to do their accounting of resources. This at once fixes another error in xen_blkif_disconnect(): when it returned early with -EBUSY for another ring than 0 it would call xen_blkif_put() again for already handled rings on a subsequent call. This will lead to inconsistencies in the refcnt handling. Cc: stable@vger.kernel.org Signed-off-by: Juergen Gross <jgross@suse.com> Tested-by: Steven Haigh <netwiz@crc.id.au> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | Merge tag 'xtensa-20170612' of git://github.com/jcmvbkbc/linux-xtensaLinus Torvalds2017-06-132-2/+2
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull Xtensa fixes from Max Filippov: - don't use linux IRQ #0 in legacy irq domains: fixes timer interrupt assignment when it's hardware IRQ # is 0 and the kernel is built w/o device tree support - reduce reservation size for double exception vector literals from 48 to 20 bytes: fixes build on cores with small user exception vector - cleanups: use kmalloc_array instead of kmalloc in simdisk_init and seq_puts instead of seq_printf in c_show. * tag 'xtensa-20170612' of git://github.com/jcmvbkbc/linux-xtensa: xtensa: don't use linux IRQ #0 xtensa: reduce double exception literal reservation xtensa: ISS: Use kmalloc_array() in simdisk_init() xtensa: Use seq_puts() in c_show()
| | * | xtensa: don't use linux IRQ #0Max Filippov2017-06-062-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Linux IRQ #0 is reserved for error reporting and may not be used. Increase NR_IRQS for one additional slot and increase irq_domain_add_legacy parameter first_irq value to 1, so that linux IRQ #0 is not associated with hardware IRQ #0 in legacy IRQ domains. Introduce macro XTENSA_PIC_LINUX_IRQ for static translation of xtensa PIC hardware IRQ # to linux IRQ #. Use this macro in XTFPGA platform data definitions. This fixes inability to use hardware IRQ #0 in configurations that don't use device tree and allows for non-identity mapping between linux IRQ # and hardware IRQ #. Cc: stable@vger.kernel.org Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
| * | | Merge branch 'for-linus' of ↵Linus Torvalds2017-06-134-22/+46
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Martin Schwidefsky: - A fix for KVM to avoid kernel oopses in case of host protection faults due to runtime instrumentation - A fix for the AP bus to avoid dead devices after unbind / bind - A fix for a compile warning merged from the vfio_ccw tree - Updated default configurations * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390: update defconfig s390/zcrypt: Fix blocking queue device after unbind/bind. s390/vfio_ccw: make some symbols static s390/kvm: do not rely on the ILC on kvm host protection fauls
| | * | | s390/zcrypt: Fix blocking queue device after unbind/bind.Harald Freudenberger2017-06-023-16/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the association between a queue device and the driver is released via unbind and later re-associated the queue device was not operational any more. Reason was a wrong administration of the card/queue lists within the ap device driver. This patch introduces revised card/queue list handling within the ap device driver: when an ap device is detected it is initial not added to the card/queue list any more. With driver probe the card device is added to the card list/the queue device is added to the queue list within a card. With driver remove the device is removed from the card/queue list. Additionally there are some situations within the ap device live where the lists need update upon card/queue device release (for example device hot unplug or suspend/resume). Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| | * | | Merge tag 'vfio-ccw-20170522' of ↵Martin Schwidefsky2017-05-231-6/+6
| | |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/vfio-ccw into fixes Pull vfio-ccw fix from Conelia Huck: "vfio-ccw: one patch" * Make some symbols in vfio-ccw static, as detected by sparse.
| | | * | | s390/vfio_ccw: make some symbols staticSebastian Ott2017-05-221-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make some symbols static to fix sparse warnings like: drivers/s390/cio/vfio_ccw_ops.c:73:1: warning: symbol 'mdev_type_attr_name' was not declared. Should it be static? Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
* | | | | | block: Change argument type of scsi_req_init()Bart Van Assche2017-06-213-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since scsi_req_init() works on a struct scsi_request, change the argument type into struct scsi_request *. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | | | block: Make most scsi_req_init() calls implicitBart Van Assche2017-06-2121-28/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of explicitly calling scsi_req_init() after blk_get_request(), call that function from inside blk_get_request(). Add an .initialize_rq_fn() callback function to the block drivers that need it. Merge the IDE .init_rq_fn() function into .initialize_rq_fn() because it is too small to keep it as a separate function. Keep the scsi_req_init() call in ide_prep_sense() because it follows a blk_rq_init() call. References: commit 82ed4db499b8 ("block: split scsi_request out of struct request") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | | | null_blk: add support for shared tagsJens Axboe2017-06-201-42/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some storage drivers need to share tag sets between devices. It's useful to be able to model that with null_blk, to find hangs or performance issues. Add a 'shared_tags' bool module parameter that. If that is set to true and nr_devices is bigger than 1, all devices allocated will share the same tag set. Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | | | nvme: host: unquiesce queue in nvme_kill_queues()Ming Lei2017-06-191-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When nvme_kill_queues() is run, queues may be in quiesced state, so we forcibly unquiesce queues to avoid blocking dispatch, and I/O hang can be avoided in remove path. Peviously we use blk_mq_start_stopped_hw_queues() as counterpart of blk_mq_quiesce_queue(), now we have introduced blk_mq_unquiesce_queue(), so use it explicitly. Cc: linux-nvme@lists.infradead.org Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | | | blk-mq: use the introduced blk_mq_unquiesce_queue()Ming Lei2017-06-183-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | blk_mq_unquiesce_queue() is used for unquiescing the queue explicitly, so replace blk_mq_start_stopped_hw_queues() with it. For the scsi part, this patch takes Bart's suggestion to switch to block quiesce/unquiesce API completely. Cc: linux-nvme@lists.infradead.org Cc: linux-scsi@vger.kernel.org Cc: dm-devel@redhat.com Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | | | block: remove bio_clone() and all references.NeilBrown2017-06-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bio_clone() is no longer used. Only bio_clone_bioset() or bio_clone_fast(). This is for the best, as bio_clone() used fs_bio_set, and filesystems are unlikely to want to use bio_clone(). So remove bio_clone() and all references. This includes a fix to some incorrect documentation. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | | | bcache: use kmalloc to allocate bio in bch_data_verify()NeilBrown2017-06-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This function allocates a bio, then a collection of pages. It copes with failure. It currently uses a mempool() to allocate the bio, but alloc_page() to allocate the pages. These fail in different ways, so the usage is inconsistent. Change the bio_clone() to bio_clone_kmalloc() so that no pool is used either for the bio or the pages. Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Kent Overstreet <kent.overstreet@gmail.com> Reviewed-by : Ming Lei <ming.lei@redhat.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | | | xen-blkfront: remove bio splitting.NeilBrown2017-06-181-51/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bios that are re-submitted will pass through blk_queue_split() when blk_queue_bio() is called, and this will split the bio if necessary. There is no longer any need to do this splitting in xen-blkfront. Acked-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | | | lightnvm/pblk-read: use bio_clone_fast()NeilBrown2017-06-183-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pblk_submit_read() uses bio_clone_bioset() but doesn't change the io_vec, so bio_clone_fast() is a better choice. It also uses fs_bio_set which is intended for filesystems. Using it in a device driver can deadlock. So allocate a new bioset, and and use bio_clone_fast(). Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Javier González <javier@cnexlabs.com> Tested-by: Javier González <javier@cnexlabs.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | | | pktcdvd: use bio_clone_fast() instead of bio_clone()NeilBrown2017-06-181-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pktcdvd doesn't change the bi_io_vec of the clone bio, so it is more efficient to use bio_clone_fast(), and not clone the bi_io_vec. This requires providing a bio_set, and it is safest to provide a dedicated bio_set rather than sharing fs_bio_set, which filesytems use. This new bio_set, pkt_bio_set, can also be use for the bio_split() call as the two allocations (bio_clone_fast, and bio_split) are independent, neither can block a bio allocated by the other. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | | | drbd: use bio_clone_fast() instead of bio_clone()NeilBrown2017-06-183-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | drbd does not modify the bi_io_vec of the cloned bio, so there is no need to clone that part. So bio_clone_fast() is the better choice. For bio_clone_fast() we need to specify a bio_set. We could use fs_bio_set, which bio_clone() uses, or drbd_md_io_bio_set, which drbd uses for metadata, but it is generally best to avoid sharing bio_sets unless you can be certain that there are no interdependencies. So create a new bio_set, drbd_io_bio_set, and use bio_clone_fast(). Also remove a "XXX cannot fail ???" comment because it definitely cannot fail - bio_clone_fast() doesn't fail if the GFP flags allow for sleeping. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | | | rbd: use bio_clone_fast() instead of bio_clone()NeilBrown2017-06-181-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bio_clone() makes a copy of the bi_io_vec, but rbd never changes that, so there is no need for a copy. bio_clone_fast() can be used instead, which avoids making the copy. This requires that we provide a bio_set. bio_clone() uses fs_bio_set, but it isn't, in general, safe to use the same bio_set at different levels of the stack, as that can lead to deadlocks. As filesystems use fs_bio_set, block devices shouldn't. As rbd never stacks, it is safe to have a single global bio_set for all rbd devices to use. So allocate that when the module is initialised, and use it with bio_clone_fast(). Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | | | blk: make the bioset rescue_workqueue optional.NeilBrown2017-06-185-7/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch converts bioset_create() to not create a workqueue by default, so alloctions will never trigger punt_bios_to_rescuer(). It also introduces a new flag BIOSET_NEED_RESCUER which tells bioset_create() to preserve the old behavior. All callers of bioset_create() that are inside block device drivers, are given the BIOSET_NEED_RESCUER flag. biosets used by filesystems or other top-level users do not need rescuing as the bio can never be queued behind other bios. This includes fs_bio_set, blkdev_dio_pool, btrfs_bioset, xfs_ioend_bioset, and one allocated by target_core_iblock.c. biosets used by md/raid do not need rescuing as their usage was recently audited and revised to never risk deadlock. It is hoped that most, if not all, of the remaining biosets can end up being the non-rescued version. Reviewed-by: Christoph Hellwig <hch@lst.de> Credit-to: Ming Lei <ming.lei@redhat.com> (minor fixes) Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | | | blk: replace bioset_create_nobvec() with a flags arg to bioset_create()NeilBrown2017-06-1812-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "flags" arguments are often seen as good API design as they allow easy extensibility. bioset_create_nobvec() is implemented internally as a variation in flags passed to __bioset_create(). To support future extension, make the internal structure part of the API. i.e. add a 'flags' argument to bioset_create() and discard bioset_create_nobvec(). Note that the bio_split allocations in drivers/md/raid* do not need the bvec mempool - they should have used bioset_create_nobvec(). Suggested-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | | | blk: remove bio_set arg from blk_queue_split()NeilBrown2017-06-1810-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | blk_queue_split() is always called with the last arg being q->bio_split, where 'q' is the first arg. Also blk_queue_split() sometimes uses the passed-in 'bs' and sometimes uses q->bio_split. This is inconsistent and unnecessary. Remove the last arg and always use q->bio_split inside blk_queue_split() Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Credit-to: Javier González <jg@lightnvm.io> (Noticed that lightnvm was missed) Reviewed-by: Javier González <javier@cnexlabs.com> Tested-by: Javier González <javier@cnexlabs.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | | | loop: Add PF_LESS_THROTTLE to block/loop device thread.NeilBrown2017-06-181-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a filesystem is mounted from a loop device, writes are throttled by balance_dirty_pages() twice: once when writing to the filesystem and once when the loop_handle_cmd() writes to the backing file. This double-throttling can trigger positive feedback loops that create significant delays. The throttling at the lower level is seen by the upper level as a slow device, so it throttles extra hard. The PF_LESS_THROTTLE flag was created to handle exactly this circumstance, though with an NFS filesystem mounted from a local NFS server. It reduces the throttling on the lower layer so that it can proceed largely unthrottled. To demonstrate this, create a filesystem on a loop device and write (e.g. with dd) several large files which combine to consume significantly more than the limit set by /proc/sys/vm/dirty_ratio or dirty_bytes. Measure the total time taken. When I do this directly on a device (no loop device) the total time for several runs (mkfs, mount, write 200 files, umount) is fairly stable: 28-35 seconds. When I do this over a loop device the times are much worse and less stable. 52-460 seconds. Half below 100seconds, half above. When I apply this patch, the times become stable again, though not as fast as the no-loop-back case: 53-72 seconds. There may be room for further improvement as the total overhead still seems too high, but this is a big improvement. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <tom.leiming@gmail.com> Suggested-by: Michal Hocko <mhocko@suse.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* | | | | | Merge branch 'nvme-4.13' of git://git.infradead.org/nvme into for-4.13/blockJens Axboe2017-06-1641-632/+869
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NVMe changes for 4.13 from Christoph: Highlights: - UUID identifier support from Johannes - Lots of cleanups from Sagi - Host Memory Buffer support from me And lots of cleanups and smaller fixes of course. Note that the UUID identifier changes are based on top of the uuid tree. I am the maintainer of that tree and will send it to Linus as soon as 4.12 is released as various other trees depend on it as well (and the diffstat includes those changes unfortunately)
| * | | | | | nvme: implement NS Optimal IO Boundary from 1.3 SpecScott Bauer2017-06-162-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The NVMe 1.3 spec introduces Namespace Optimal IO Boundaries (NOIOB), which standardizes the stripe mechanism we currently have quirks for. This patch implements the necessary logic to handle this new feature. Signed-off-by: Scott Bauer <scott.bauer@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | | | | | nvme: don't hard code size of struct t10_pi_tupleSagi Grimberg2017-06-151-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | | | | | nvme: no need to wait for the reset when keepalive failsChristoph Hellwig2017-06-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We don't need to wait for the reset from the delayed work item that is kicked off when we don't get a keepalive. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
| * | | | | | nvme: move reset workqueue handling to common codeChristoph Hellwig2017-06-156-111/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This moves the nvme_reset function from the PCIe driver to common code, renaming it to nvme_reset_ctrl in the process. Additionally a new helper nvme_reset_ctrl_sync is added for the case where we want to wait for the reset. To facilitate that the reset_work work structure is move to the common nvme_ctrl structure and the ->reset_ctrl method is removed. For now the drivers initialize the reset_work with their own callback, but longer term we should move to callouts for specific parts of the reset process and move even more code to the core. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
| * | | | | | nvme-pci: merge init_request methodsChristoph Hellwig2017-06-151-15/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we get the tagset passed we can have a single implementation for the I/O and admin queues. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | | | | | nvme-loop: merge init_request methodsChristoph Hellwig2017-06-151-9/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we get the tagset passed we can have a single implementation for the I/O and admin queues. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | | | | | nvme-fc: merge init_request methodsChristoph Hellwig2017-06-151-13/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we get the tagset passed we can have a single implementation for the I/O and admin queues. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | | | | | nvme-rdma: merge init_request and exit_request methodsChristoph Hellwig2017-06-151-32/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we get the tagset passed we can have a single implementation for the I/O and admin queues. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | | | | | nvme: move protection information check into nvme_setup_rwChristoph Hellwig2017-06-152-15/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It only applies to read/write commands, and this way non-PCIe drivers get the check as well instead of having to duplicate it when adding metadata support. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | | | | | nvme: mark shutdown_timeout staticChristoph Hellwig2017-06-152-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | And open code the SHUTDOWN_TIMEOUT macro. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | | | | | nvme-rdma: fix error code in nvme_rdma_create_ctrl()Dan Carpenter2017-06-151-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We accidentally return ERR_PTR(0) which is NULL. The caller isn't explicitly checking for that but I couldn't immediately spot whether this would lead to a NULL dereference. Anyway, we can fix add an error code easily enough. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | | | | | nvmf: keep track of nvmet connect error statusGuan Junxiong2017-06-151-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To let the host know what happends to the connection establishment, adjust the behavior of nvmf_log_connect_error to make more connect specifig error codes human-readble. Signed-off-by: Guan Junxiong <guanjunxiong@huawei.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | | | | | nvmet-fc: Remove a set-but-not-used variableBart Van Assche2017-06-151-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This was detected by building the nvmet-fc driver with W=1. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: James Smart <james.smart@broadcom.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | | | | | nvme: use ctrl->device consistently for loggingJohannes Thumshirn2017-06-151-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change the few left over users of ctrl->dev over to using ctrl->device for logging purposes, so we consistently use the same device. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | | | | | nvmet: allow overriding the NVMe VS via configfsJohannes Thumshirn2017-06-151-0/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow overriding the announced NVMe Version of a via configfs. This is particularly helpful when debugging new features for the host or target side without bumping the hard coded version (as the target might not be fully compliant to the announced version yet). Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Guan Junxiong <guanjunxiong@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | | | | | nvmet: add uuid field to nvme_ns and populate via configfsJohannes Thumshirn2017-06-151-0/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the UUID field from the NVMe Namespace Identification Descriptor to the nvmet_ns structure and allow it's population via configfs. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | | | | | nvmet: implement namespace identify descriptor listJohannes Thumshirn2017-06-153-1/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A NVMe Identify NS command with a CNS value of '3' is expecting a list of Namespace Identification Descriptor structures to be returned to the host for the namespace requested in the namespace identify command. This Namespace Identification Descriptor structure consists of the type of the namespace identifier, the length of the identifier and the actual identifier. Valid types are NGUID and UUID which we have saved in our nvme_ns structure if they have been configured via configfs. If no value has been assigened to one of these we return an "invalid opcode" back to the host to maintain backward compatibiliy with older implementations without Namespace Identify Descriptor list support. Also as the Namespace Identify Descriptor list is the only mandatory feature change between 1.2.1 and 1.3 we can bump the advertised version as well. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | | | | | nvme: provide UUID value to userspaceJohannes Thumshirn2017-06-151-1/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we have a way for getting the UUID from a target, provide it to userspace as well. Unfortunately there is already a sysfs attribute called UUID which is a misnomer as it holds the NGUID value. So instead of creating yet another wrong name, create a new 'nguid' sysfs attribute for the NGUID. For the UUID attribute add a check wheter the namespace has a UUID assigned to it and return this or return the NGUID to maintain backwards compatibility. This should give userspace a chance to catch up. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Sagi Grimberg <sagi@rimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | | | | | nvme: get list of namespace descriptorsJohannes Thumshirn2017-06-152-0/+80
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a target identifies itself as NVMe 1.3 compliant, try to get the list of Namespace Identification Descriptors and populate the UUID, NGUID and EUI64 fileds in the NVMe namespace structure with these values. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | | | | | nvme: rename uuid to nguid in nvme_nsJohannes Thumshirn2017-06-152-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The uuid field in the nvme_ns structure represents the nguid field from the identify namespace command. And as NVMe 1.3 introduced an UUID in the NVMe Namespace Identification Descriptor this will collide. So rename the uuid to nguid to prevent any further confusion. Unfortunately we export the nguid to sysfs in the uuid sysfs attribute, but this can't be changed anymore without possibly breaking existing userspace. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>