summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* dm cache: fix writethrough mode quiescing in cache_mapMike Snitzer2014-05-011-0/+1
| | | | | | | | | | | | Commit 2ee57d58735 ("dm cache: add passthrough mode") inadvertently removed the deferred set reference that was taken in cache_map()'s writethrough mode support. Restore taking this reference. This issue was found with code inspection. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com> Cc: stable@vger.kernel.org # 3.13+
* dm thin: use INIT_WORK_ONSTACK in noflush_work to avoid ODEBUG warningMike Snitzer2014-04-291-1/+1
| | | | | | | | | Use INIT_WORK_ONSTACK to silence "ODEBUG: object is on stack, but not annotated". Reported-by: Zdeněk Kabeláč <zkabelac@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com>
* dm verity: fix biovecs hash calculation regressionMilan Broz2014-04-151-6/+9
| | | | | | | | | | | | | | | | | | Commit 003b5c5719f159f4f4bf97511c4702a0638313dd ("block: Convert drivers to immutable biovecs") incorrectly converted biovec iteration in dm-verity to always calculate the hash from a full biovec, but the function only needs to calculate the hash from part of the biovec (up to the calculated "todo" value). Fix this issue by limiting hash input to only the requested data size. This problem was identified using the cryptsetup regression test for veritysetup (verity-compat-test). Signed-off-by: Milan Broz <gmazyland@gmail.com> Acked-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 3.14+
* dm thin: fix rcu_read_lock being held in code that can sleepJoe Thornber2014-04-081-3/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit c140e1c4e23 ("dm thin: use per thin device deferred bio lists") introduced the use of an rculist for all active thin devices. The use of rcu_read_lock() in process_deferred_bios() can result in a BUG if a dm_bio_prison_cell must be allocated as a side-effect of bio_detain(): BUG: sleeping function called from invalid context at mm/mempool.c:203 in_atomic(): 1, irqs_disabled(): 0, pid: 6, name: kworker/u8:0 3 locks held by kworker/u8:0/6: #0: ("dm-" "thin"){.+.+..}, at: [<ffffffff8106be42>] process_one_work+0x192/0x550 #1: ((&pool->worker)){+.+...}, at: [<ffffffff8106be42>] process_one_work+0x192/0x550 #2: (rcu_read_lock){.+.+..}, at: [<ffffffff816360b5>] do_worker+0x5/0x4d0 We can't process deferred bios with the rcu lock held, since dm_bio_prison_cell allocation may block if the bio-prison's cell mempool is exhausted. To fix: - Introduce a refcount and completion field to each thin_c - Add thin_get/put methods for adjusting the refcount. If the refcount hits zero then the completion is triggered. - Initialise refcount to 1 when creating thin_c - When iterating the active_thins list we thin_get() whilst the rcu lock is held. - After the rcu lock is dropped we process the deferred bios for that thin. - When destroying a thin_c we thin_put() and then wait for the completion -- to avoid a race between the worker thread iterating from that thin_c and destroying the thin_c. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm thin: irqsave must always be used with the pool->lock spinlockJoe Thornber2014-04-081-2/+3
| | | | | | | | | | | | Commit c140e1c4e23 ("dm thin: use per thin device deferred bio lists") incorrectly stopped disabling irqs when taking the pool's spinlock. Irqs must be disabled when taking the pool's spinlock otherwise a thread could spin_lock(), then get interrupted to service thin_endio() in interrupt context, which would then deadlock in spin_lock_irqsave(). Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm cache: fix a lock-inversionJoe Thornber2014-04-043-52/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When suspending a cache the policy is walked and the individual policy hints written to the metadata via sync_metadata(). This led to this lock order: policy->lock cache_metadata->root_lock When loading the cache target the policy is populated while the metadata lock is held: cache_metadata->root_lock policy->lock Fix this potential lock-inversion (ABBA) deadlock in sync_metadata() by ensuring the cache_metadata root_lock is held whilst all the hints are written, rather than being repeatedly locked while policy->lock is held (as was the case with each callout that policy_walk_mappings() made to the old save_hint() method). Found by turning on the CONFIG_PROVE_LOCKING ("Lock debugging: prove locking correctness") build option. However, it is not clear how the LOCKDEP reported paths can lead to a deadlock since the two paths, suspending a target and loading a target, never occur at the same time. But that doesn't mean the same lock-inversion couldn't have occurred elsewhere. Reported-by: Marian Csontos <mcsontos@redhat.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
* dm thin: sort the per thin deferred bios using an rb_treeMike Snitzer2014-04-041-2/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A thin-pool will allocate blocks using FIFO order for all thin devices which share the thin-pool. Because of this simplistic allocation the thin-pool's space can become fragmented quite easily; especially when multiple threads are requesting blocks in parallel. Sort each thin device's deferred_bio_list based on logical sector to help reduce fragmentation of the thin-pool's ondisk layout. The following tables illustrate the realized gains/potential offered by sorting each thin device's deferred_bio_list. An "io size"-sized random read of the device would result in "seeks/io" fragments being read, with an average "distance/seek" between each fragment. Data was written to a single thin device using multiple threads via iozone (8 threads, 64K for both the block_size and io_size). unsorted: io size seeks/io distance/seek -------------------------------------- 4k 0.000 0b 16k 0.013 11m 64k 0.065 11m 256k 0.274 10m 1m 1.109 10m 4m 4.411 10m 16m 17.097 11m 64m 60.055 13m 256m 148.798 25m 1g 809.929 21m sorted: io size seeks/io distance/seek -------------------------------------- 4k 0.000 0b 16k 0.000 1g 64k 0.001 1g 256k 0.003 1g 1m 0.011 1g 4m 0.045 1g 16m 0.181 1g 64m 0.747 1011m 256m 3.299 1g 1g 14.373 1g Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com>
* dm thin: use per thin device deferred bio listsMike Snitzer2014-03-311-61/+104
| | | | | | | | | | | | | The thin-pool previously only had a single deferred_bios list that would collect bios for all thin devices in the pool. Split this per-pool deferred_bios list out to per-thin deferred_bios_list -- doing so enables increased parallelism when processing deferred bios. And now that each thin device has it's own deferred_bios_list we can sort all bios in the list using logical sector. The requeue code in error handling path is also cleaner as a side-effect. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com>
* dm thin: simplify pool_is_congestedMike Snitzer2014-03-311-11/+5
| | | | | | | | | The pool is congested if the pool is in PM_OUT_OF_DATA_SPACE mode. This is more explicit/clear/efficient than inferring whether or not the pool is congested by checking if retry_on_resume_list is empty. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com>
* dm thin: fix dangling bio in process_deferred_bios error pathMike Snitzer2014-03-281-1/+1
| | | | | | | | | | If unable to ensure_next_mapping() we must add the current bio, which was removed from the @bios list via bio_list_pop, back to the deferred_bios list before all the remaining @bios. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com> Cc: stable@vger.kernel.org
* dm mpath: print more useful warnings in multipath_message()Jose Castillo2014-03-271-2/+2
| | | | | | | | | | | | | The warning message "Unrecognised multipath message received" is displayed in two different situations in multipath_message(): when the number of arguments passed is invalid and when the string passed in argv[0] is not recognized. Make it easier to identify where the problem is by making these warnings more specific with additional context for each case. Signed-off-by: Jose Castillo <jcastillo@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm-mpath: do not activate failed pathsHannes Reinecke2014-03-271-2/+5
| | | | | | | | | | | | activate_path() is run without a lock, so the path might be set to failed before activate_path() had a chance to run. This patch add a check for ->active in activate_path() to avoid unnecessary overhead by calling functions which are known to be failing. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm mpath: remove extra nesting in map functionMike Snitzer2014-03-271-22/+24
| | | | | | | | | Return early for case when no path exists, and when the pathgroup isn't ready. This eliminates the need for extra nesting for the the common case. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Hannes Reinecke <hare@suse.de>
* dm mpath: remove map_io()Hannes Reinecke2014-03-271-13/+6
| | | | | | | | | multipath_map() is now just a wrapper around map_io(), so we can rename map_io() to multipath_map(). Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
* dm mpath: reduce memory pressure when requeuingHannes Reinecke2014-03-271-23/+15
| | | | | | | | | | | When multipath needs to requeue I/O in the block layer the per-request context shouldn't be allocated, as it will be freed immediately afterwards anyway. Avoiding this memory allocation will reduce memory pressure during requeuing. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
* dm mpath: remove process_queued_ios()Hannes Reinecke2014-03-271-42/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | process_queued_ios() has served 3 functions: 1) select pg and pgpath if none is selected 2) start pg_init if requested 3) dispatch queued IOs when pg is ready Basically, a call to queue_work(process_queued_ios) can be replaced by dm_table_run_md_queue_async(), which runs request queue and ends up calling map_io(), which does 1), 2) and 3). Exception is when !pg_ready() (which means either pg_init is running or requested), then multipath_busy() prevents map_io() being called from request_fn. If pg_init is running, it should be ok as long as pg_init_done() does the right thing when pg_init is completed, I.e.: restart pg_init if !pg_ready() or call dm_table_run_md_queue_async() to kick map_io(). If pg_init is requested, we have to make sure the request is detected and pg_init will be started. pg_init is requested in 3 places: a) __choose_pgpath() in map_io() b) __choose_pgpath() in multipath_ioctl() c) pg_init retry in pg_init_done() a) is ok because map_io() calls __pg_init_all_paths(), which does 2). b) needs a call to __pg_init_all_paths(), which does 2). c) needs a call to __pg_init_all_paths(), which does 2). So this patch removes process_queued_ios() and ensures that __pg_init_all_paths() is called at the appropriate locations. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
* dm mpath: push back requests instead of queueingHannes Reinecke2014-03-271-78/+36
| | | | | | | | | | | | | | | | | There is no reason why multipath needs to queue requests internally for queue_if_no_path or pg_init; we should rather push them back onto the request queue. And while we're at it we can simplify the conditional statement in map_io() to make it easier to read. Since mpath no longer does internal queuing of I/O the table info no longer emits the internal queue_size. Instead it displays 1 if queuing is being used or 0 if it is not. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
* dm table: add dm_table_run_md_queue_asyncMike Snitzer2014-03-274-0/+30
| | | | | | | | | | | | Introduce dm_table_run_md_queue_async() to run the request_queue of the mapped_device associated with a request-based DM table. Also add dm_md_get_queue() wrapper to extract the request_queue from a mapped_device. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
* dm mpath: do not call pg_init when it is already runningHannes Reinecke2014-03-271-2/+4
| | | | | | | | | | | | This patch moves condition checks as a preparation of following patches and has no effect on behaviour. process_queued_ios() is the only caller of __pg_init_all_paths() and 2 condition checks are moved from outside to inside without side effects. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
* dm: use RCU_INIT_POINTER instead of rcu_assign_pointer in __unbindMonam Agarwal2014-03-271-1/+1
| | | | | | | | | | | | | Replace rcu_assign_pointer(p, NULL) with RCU_INIT_POINTER(p, NULL). The rcu_assign_pointer() ensures that the initialization of a structure is carried out before storing a pointer to that structure. And in the case of the NULL pointer, there is no structure to initialize. So, rcu_assign_pointer(p, NULL) can be safely converted to RCU_INIT_POINTER(p, NULL). Signed-off-by: Monam Agarwal <monamagarwal123@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm: stop using bi_privateMikulas Patocka2014-03-271-4/+3
| | | | | | | | | | | | | Device mapper uses the bio structure's bi_private field as a pointer to dm_target_io or dm_rq_clone_bio_info. But a bio structure is embedded in the dm_target_io and dm_rq_clone_bio_info structures, so the pointer to the structure that contains the bio can be found with the container_of() macro. Remove the use of bi_private and use container_of() instead. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm: remove dm_get_mapinfoMikulas Patocka2014-03-272-13/+0
| | | | | | | | | | | | Remove dm_get_mapinfo() because no target uses it. Targets can allocate per-bio data using ti->per_bio_data_size, this is much more flexible than union map_info. Leave union map_info only for the request-based multipath target's use. Also delete the unused "unsigned long long ll" field of union map_info. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm: make dm_table_alloc_md_mempools staticMikulas Patocka2014-03-272-2/+1
| | | | | | | | Make the function dm_table_alloc_md_mempools static because it is not called from another file. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm: take care to copy the space map roots before locking the superblockJoe Thornber2014-03-273-81/+127
| | | | | | | | | | | | In theory copying the space map root can fail, but in practice it never does because we're careful to check what size buffer is needed. But make certain we're able to copy the space map roots before locking the superblock. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # drop dm-era and dm-cache changes as needed
* dm transaction manager: fix corruption due to non-atomic transaction commitJoe Thornber2014-03-275-27/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The persistent-data library used by dm-thin, dm-cache, etc is transactional. If anything goes wrong, such as an io error when writing new metadata or a power failure, then we roll back to the last transaction. Atomicity when committing a transaction is achieved by: a) Never overwriting data from the previous transaction. b) Writing the superblock last, after all other metadata has hit the disk. This commit and the following commit ("dm: take care to copy the space map roots before locking the superblock") fix a bug associated with (b). When committing it was possible for the superblock to still be written in spite of an io error occurring during the preceeding metadata flush. With these commits we're careful not to take the write lock out on the superblock until after the metadata flush has completed. Change the transaction manager's semantics for dm_tm_commit() to assume all data has been flushed _before_ the single superblock that is passed in. As a prerequisite, split the block manager's block unlocking and flushing by simplifying dm_bm_flush_and_unlock() to dm_bm_flush(). Now the unlocking must be done separately. This issue was discovered by forcing io errors at the crucial time using dm-flakey. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
* dm cache: remove remainder of distinct discard block sizeHeinz Mauelshagen2014-03-274-77/+46
| | | | | | | | | | | | | Discard block size not being equal to cache block size causes data corruption by erroneously avoiding migrations in issue_copy() because the discard state is being cleared for a group of cache blocks when it should not. Completely remove all code that enabled a distinction between the cache block size and discard block size. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm cache: prevent corruption caused by discard_block_size > cache_block_sizeMike Snitzer2014-03-271-34/+3
| | | | | | | | | | | | | | | | | | | | | If the discard block size is larger than the cache block size we will not properly quiesce IO to a region that is about to be discarded. This results in a race between a cache migration where no copy is needed, and a write to an adjacent cache block that's within the same large discard block. Workaround this by limiting the discard_block_size to cache_block_size. Also limit the max_discard_sectors to cache_block_size. A more comprehensive fix that introduces range locking support in the bio_prison and proper quiescing of a discard range that spans multiple cache blocks is already in development. Reported-by: Morgan Mears <Morgan.Mears@netapp.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com> Acked-by: Heinz Mauelshagen <heinzm@redhat.com> Cc: stable@vger.kernel.org
* dm bitset: only flush the current word if it has been dirtiedJoe Thornber2014-03-272-1/+10
| | | | | | | This change offers a big performance boost for dm-era. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm: add era targetJoe Thornber2014-03-274-0/+1851
| | | | | | | | | | | | | | | | | dm-era is a target that behaves similar to the linear target. In addition it keeps track of which blocks were written within a user defined period of time called an 'era'. Each era target instance maintains the current era as a monotonically increasing 32-bit counter. Use cases include tracking changed blocks for backup software, and partially invalidating the contents of a cache to restore cache coherency after rolling back a vendor snapshot. dm-era is primarily expected to be paired with the dm-cache target. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* Linux 3.14-rc8v3.14-rc8Linus Torvalds2014-03-251-1/+1
|
* Merge branch 'parisc-3.14' of ↵Linus Torvalds2014-03-256-84/+6
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc updates from Helge Deller: - revert parts of the latest patch regarding font selection with STICON console - wire up the utimes() syscall for parisc - remove the unused parisc tmpalias code and unnecessary arch*relax defines * 'parisc-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: locks: remove redundant arch_*_relax operations parisc: wire up sys_utimes parisc: Remove unused CONFIG_PARISC_TMPALIAS code partly revert commit 8a10bc9: parisc/sti_console: prefer Linux fonts over built-in ROM fonts
| * parisc: locks: remove redundant arch_*_relax operationsWill Deacon2014-03-231-4/+0
| | | | | | | | | | | | | | | | | | Now that the arch_{spin,read,write}_relax macros default to cpu_relax(), remove the redundant definitions for parisc. Cc: Helge Deller <deller@gmx.de> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Helge Deller <deller@gmx.de>
| * parisc: wire up sys_utimesHelge Deller2014-03-232-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | We seem to be nearly the only platform which does not provide the sys_utimes syscall. Adding it now makes our life much easier with userspace applications (like dietlibc and e2fsprogs) since we then behave like all other platforms too and don't need extra patches which are hard to get upstream anyway because we are not a mainstream architecture. Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@vger.kernel.org # v3.13
| * parisc: Remove unused CONFIG_PARISC_TMPALIAS codeJohn David Anglin2014-03-232-75/+0
| | | | | | | | | | | | | | | | | | The attached change removes the unused and experimental CONFIG_PARISC_TMPALIAS code. It doesn't work and I don't believe it will ever be used. Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de>
| * partly revert commit 8a10bc9: parisc/sti_console: prefer Linux fonts over ↵Helge Deller2014-03-231-3/+3
| | | | | | | | | | | | | | | | | | | | built-in ROM fonts STI console is used on parisc and m68k HP machines. This patch partly reverts my previous commit and as such restores the fonts for the m68k machines. Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@vger.kernel.org # v3.13
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcLinus Torvalds2014-03-257-43/+31
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull sparc fixes from David Miller: 1) Do serial locking in a way that makes things clear that these are IRQ spinlocks. 2) Conversion to generic idle loop broke first generation Niagara machines, need to have %pil interrupts enabled during cpu yield hypervisor call. 3) Do not use magic constants for iterations over tsb tables, from Doug Wilson. 4) Fix erroneous truncation of 64-bit system call return values to 32-bit. From Dave Kleikamp. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc64: Make sure %pil interrupts are enabled during hypervisor yield. sparc64:tsb.c:use array size macro rather than number sparc64: don't treat 64-bit syscall return codes as 32-bit sparc: serial: Clean up the locking for -rt
| * | sparc64: Make sure %pil interrupts are enabled during hypervisor yield.David S. Miller2014-03-241-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In arch_cpu_idle() we must enable %pil based interrupts before potentially invoking the hypervisor cpu yield call. As per the Hypervisor API documentation for cpu_yield: Interrupts which are blocked by some mechanism other that pstate.ie (for example %pil) are not guaranteed to cause a return from this service. It seems that only first generation Niagara chips are hit by this bug. My best guess is that later chips implement this in hardware and wake up anyways from %pil events, whereas in first generation chips the yield is implemented completely in hypervisor code and requires %pil to be enabled in order to wake properly from this call. Fixes: 87fa05aeb3a5 ("sparc: Use generic idle loop") Reported-by: Fabio M. Di Nitto <fabbione@fabbione.net> Reported-by: Jan Engelhardt <jengelh@inai.de> Tested-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | sparc64:tsb.c:use array size macro rather than numberDoug Wilson2014-03-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | This is a small patch which uses ARRAY_SIZE macro rather than a number to make code readability better. Signed-off-by: Doug Wilson <doug.lkml@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | sparc64: don't treat 64-bit syscall return codes as 32-bitDave Kleikamp2014-03-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When checking a system call return code for an error, linux_sparc_syscall was sign-extending the lower 32-bit value and comparing it to -ERESTART_RESTARTBLOCK. lseek can return valid return codes whose lower 32-bits alone would indicate a failure (such as 4G-1). Use the whole 64-bit value to check for errors. Only the 32-bit path should sign extend the lower 32-bit value. Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Acked-by: Bob Picco <bob.picco@oracle.com> Acked-by: Allen Pais <allen.pais@oracle.com> Cc: David S. Miller <davem@davemloft.net> Cc: sparclinux@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
| * | sparc: serial: Clean up the locking for -rtDavid Miller2014-03-064-39/+25
| | | | | | | | | | | | | | | Signed-off-by: David S. Miller <davem@davemloft.net> Tested-by: Allen Pais <allen.pais@oracle.com>
* | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2014-03-2547-306/+508
|\ \ \ | |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: 1) OpenVswitch's lookup_datapath() returns error pointers, so don't check against NULL. From Jiri Pirko. 2) pfkey_compile_policy() code path tries to do a GFP_KERNEL allocation under RCU locks, fix by using GFP_ATOMIC when necessary. From Nikolay Aleksandrov. 3) phy_suspend() indirectly passes uninitialized data into the ethtool get wake-on-land implementations. Fix from Sebastian Hesselbarth. 4) CPSW driver unregisters CPTS twice, fix from Benedikt Spranger. 5) If SKB allocation of reply packet fails, vxlan's arp_reduce() defers a NULL pointer. Fix from David Stevens. 6) IPV6 neigh handling in vxlan doesn't validate the destination address properly, and it builds a packet with the src and dst reversed. Fix also from David Stevens. 7) Fix spinlock recursion during subscription failures in TIPC stack, from Erik Hugne. 8) Revert buggy conversion of davinci_emac to devm_request_irq, from Chrstian Riesch. 9) Wrong flags passed into forwarding database netlink notifications, from Nicolas Dichtel. 10) The netpoll neighbour soliciation handler checks wrong ethertype, needs to be ETH_P_IPV6 rather than ETH_P_ARP. Fix from Li RongQing. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (34 commits) tipc: fix spinlock recursion bug for failed subscriptions vxlan: fix nonfunctional neigh_reduce() net: davinci_emac: Fix rollback of emac_dev_open() net: davinci_emac: Replace devm_request_irq with request_irq netpoll: fix the skb check in pkt_is_ns net: micrel : ks8851-ml: add vdd-supply support ip6mr: fix mfc notification flags ipmr: fix mfc notification flags rtnetlink: fix fdb notification flags tcp: syncookies: do not use getnstimeofday() netlink: fix setsockopt in mmap examples in documentation openvswitch: Correctly report flow used times for first 5 minutes after boot. via-rhine: Disable device in error path ATHEROS-ATL1E: Convert iounmap to pci_iounmap vxlan: fix potential NULL dereference in arp_reduce() cnic: Update version to 2.5.20 and copyright year. cnic,bnx2i,bnx2fc: Fix inconsistent use of page size cnic: Use proper ulp_ops for per device operations. net: cdc_ncm: fix control message ordering ipv6: ip6_append_data_mtu do not handle the mtu of the second fragment properly ...
| * | tipc: fix spinlock recursion bug for failed subscriptionsErik Hugne2014-03-241-14/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a topology event subscription fails for any reason, such as out of memory, max number reached or because we received an invalid request the correct behavior is to terminate the subscribers connection to the topology server. This is currently broken and produces the following oops: [27.953662] tipc: Subscription rejected, illegal request [27.955329] BUG: spinlock recursion on CPU#1, kworker/u4:0/6 [27.957066] lock: 0xffff88003c67f408, .magic: dead4ead, .owner: kworker/u4:0/6, .owner_cpu: 1 [27.958054] CPU: 1 PID: 6 Comm: kworker/u4:0 Not tainted 3.14.0-rc6+ #5 [27.960230] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [27.960874] Workqueue: tipc_rcv tipc_recv_work [tipc] [27.961430] ffff88003c67f408 ffff88003de27c18 ffffffff815c0207 ffff88003de1c050 [27.962292] ffff88003de27c38 ffffffff815beec5 ffff88003c67f408 ffffffff817f0a8a [27.963152] ffff88003de27c58 ffffffff815beeeb ffff88003c67f408 ffffffffa0013520 [27.964023] Call Trace: [27.964292] [<ffffffff815c0207>] dump_stack+0x45/0x56 [27.964874] [<ffffffff815beec5>] spin_dump+0x8c/0x91 [27.965420] [<ffffffff815beeeb>] spin_bug+0x21/0x26 [27.965995] [<ffffffff81083df6>] do_raw_spin_lock+0x116/0x140 [27.966631] [<ffffffff815c6215>] _raw_spin_lock_bh+0x15/0x20 [27.967256] [<ffffffffa0008540>] subscr_conn_shutdown_event+0x20/0xa0 [tipc] [27.968051] [<ffffffffa000fde4>] tipc_close_conn+0xa4/0xb0 [tipc] [27.968722] [<ffffffffa00101ba>] tipc_conn_terminate+0x1a/0x30 [tipc] [27.969436] [<ffffffffa00089a2>] subscr_conn_msg_event+0x1f2/0x2f0 [tipc] [27.970209] [<ffffffffa0010000>] tipc_receive_from_sock+0x90/0xf0 [tipc] [27.970972] [<ffffffffa000fa79>] tipc_recv_work+0x29/0x50 [tipc] [27.971633] [<ffffffff8105dbf5>] process_one_work+0x165/0x3e0 [27.972267] [<ffffffff8105e869>] worker_thread+0x119/0x3a0 [27.972896] [<ffffffff8105e750>] ? manage_workers.isra.25+0x2a0/0x2a0 [27.973622] [<ffffffff810648af>] kthread+0xdf/0x100 [27.974168] [<ffffffff810647d0>] ? kthread_create_on_node+0x1a0/0x1a0 [27.974893] [<ffffffff815ce13c>] ret_from_fork+0x7c/0xb0 [27.975466] [<ffffffff810647d0>] ? kthread_create_on_node+0x1a0/0x1a0 The recursion occurs when subscr_terminate tries to grab the subscriber lock, which is already taken by subscr_conn_msg_event. We fix this by checking if the request to establish a new subscription was successful, and if not we initiate termination of the subscriber after we have released the subscriber lock. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | vxlan: fix nonfunctional neigh_reduce()David Stevens2014-03-241-14/+113
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The VXLAN neigh_reduce() code is completely non-functional since check-in. Specific errors: 1) The original code drops all packets with a multicast destination address, even though neighbor solicitations are sent to the solicited-node address, a multicast address. The code after this check was never run. 2) The neighbor table lookup used the IPv6 header destination, which is the solicited node address, rather than the target address from the neighbor solicitation. So neighbor lookups would always fail if it got this far. Also for L3MISSes. 3) The code calls ndisc_send_na(), which does a send on the tunnel device. The context for neigh_reduce() is the transmit path, vxlan_xmit(), where the host or a bridge-attached neighbor is trying to transmit a neighbor solicitation. To respond to it, the tunnel endpoint needs to do a *receive* of the appropriate neighbor advertisement. Doing a send, would only try to send the advertisement, encapsulated, to the remote destinations in the fdb -- hosts that definitely did not do the corresponding solicitation. 4) The code uses the tunnel endpoint IPv6 forwarding flag to determine the isrouter flag in the advertisement. This has nothing to do with whether or not the target is a router, and generally won't be set since the tunnel endpoint is bridging, not routing, traffic. The patch below creates a proxy neighbor advertisement to respond to neighbor solicitions as intended, providing proper IPv6 support for neighbor reduction. Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Merge branch 'davinci_emac'David S. Miller2014-03-242-13/+44
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Christian Riesch says: ==================== net: davinci_emac: Fix interrupt requests and error handling since commit 6892b41d9701283085b655c6086fb57a5d63fa47 (Linux 3.11) the davinci_emac driver is broken. After doing ifconfig down, ifconfig up, requesting the interrupts for the driver fails. The interface remains dead until the board is rebooted. The first patch in this patchset reverts commit 6892b41d9701283085b655c6086fb57a5d63fa47 partially and makes the driver useable again. During the work on the first patch, a number of bugs in the error handling of the driver's ndo_open code were found. The second patch fixes these bugs. I believe the first patch meets the rules for stable kernels, I therefore added the stable tag to this patch. The second patch is just cleanup, the code that is fixed by this patch is only executed in case of an error. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | net: davinci_emac: Fix rollback of emac_dev_open()Christian Riesch2014-03-242-17/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If an error occurs during the initialization in emac_dev_open() (the driver's ndo_open function), interrupts, DMA descriptors etc. must be freed. The current rollback code is buggy in several ways. 1) Freeing the interrupts. The current code will not free all interrupts that were requested by the driver. Furthermore, the code tries to do a platform_get_resource(priv->pdev, IORESOURCE_IRQ, -1) in its last iteration. This patch fixes these bugs. 2) Wrong order of err: and rollback: labels. If the setup of the PHY in the code fails, the interrupts that have been requested before are not freed: request irq if requesting irqs fails, goto rollback setup phy if phy setup fails, goto err return 0 rollback: free irqs err: This patch brings the code into the correct order. 3) The code calls napi_enable() and emac_int_enable(), but does not undo both in case of an error. This patch adds calls of emac_int_disable() and napi_disable() to the rollback code. 4) RX DMA descriptors are not freed in case of an error: Right before requesting the irqs, the function creates DMA descriptors for the RX channel. These RX descriptors are never freed when we jump to either rollback or err. This patch adds code for freeing the DMA descriptors in the case of an initialization error. This required a modification of cpdma_ctrl_stop() in davinci_cpdma.c: We must be able to call this function to free the DMA descriptors while the DMA channels are in IDLE state (before cpdma_ctlr_start() was called). Tested on a custom board with the Texas Instruments AM1808. Signed-off-by: Christian Riesch <christian.riesch@omicron.at> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | net: davinci_emac: Replace devm_request_irq with request_irqChristian Riesch2014-03-241-4/+21
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 6892b41d9701283085b655c6086fb57a5d63fa47 Author: Lad, Prabhakar <prabhakar.csengg@gmail.com> Date: Tue Jun 25 21:24:51 2013 +0530 net: davinci: emac: Convert to devm_* api the call of request_irq is replaced by devm_request_irq and the call of free_irq is removed. But since interrupts are requested in emac_dev_open, doing ifconfig up/down on the board requests the interrupts again each time, causing devm_request_irq to fail. The interface is dead until the device is rebooted. This patch reverts said commit partially: It changes the driver back to use request_irq instead of devm_request_irq, puts free_irq back in place, but keeps the remaining changes of the original patch. Reported-by: Jon Ringle <jon@ringle.org> Signed-off-by: Christian Riesch <christian.riesch@omicron.at> Cc: Lad, Prabhakar <prabhakar.csengg@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | netpoll: fix the skb check in pkt_is_nsLi RongQing2014-03-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Neighbor Solicitation is ipv6 protocol, so we should check skb->protocol with ETH_P_IPV6 Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Cc: WANG Cong <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: micrel : ks8851-ml: add vdd-supply supportNishanth Menon2014-03-242-1/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Few platforms use external regulator to keep the ethernet MAC supplied. So, request and enable the regulator for driver functionality. Fixes: 66fda75f47dc (regulator: core: Replace direct ops->disable usage) Reported-by: Russell King <rmk+kernel@arm.linux.org.uk> Suggested-by: Markus Pargmann <mpa@pengutronix.de> Signed-off-by: Nishanth Menon <nm@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Merge branch 'fixes' of ↵David S. Miller2014-03-202-5/+7
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch Jesse Gross says: ==================== Open vSwitch Four small fixes for net/3.14. I realize that these are late in the cycle - just got back from vacation. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | openvswitch: Correctly report flow used times for first 5 minutes after boot.Ben Pfaff2014-03-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kernel starts out its "jiffies" timer as 5 minutes below zero, as shown in include/linux/jiffies.h: /* * Have the 32 bit jiffies value wrap 5 minutes after boot * so jiffies wrap bugs show up earlier. */ #define INITIAL_JIFFIES ((unsigned long)(unsigned int) (-300*HZ)) The loop in ovs_flow_stats_get() starts out with 'used' set to 0, then takes any "later" time. This means that for the first five minutes after boot, flows will always be reported as never used, since 0 is greater than any time already seen. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>