summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds2016-06-105-44/+99
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: "Misc fixes: - a file-based futex fix - one more spin_unlock_wait() fix - a ww-mutex deadlock detection improvement/fix - and a raw_read_seqcount_latch() barrier fix" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Calculate the futex key based on a tail page for file-based futexes locking/qspinlock: Fix spin_unlock_wait() some more locking/ww_mutex: Report recursive ww_mutex locking early locking/seqcount: Re-fix raw_read_seqcount_latch()
| * futex: Calculate the futex key based on a tail page for file-based futexesMel Gorman2016-06-081-3/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mike Galbraith reported that the LTP test case futex_wake04 was broken by commit 65d8fc777f6d ("futex: Remove requirement for lock_page() in get_futex_key()"). This test case uses futexes backed by hugetlbfs pages and so there is an associated inode with a futex stored on such pages. The problem is that the key is being calculated based on the head page index of the hugetlbfs page and not the tail page. Prior to the optimisation, the page lock was used to stabilise mappings and pin the inode is file-backed which is overkill. If the page was a compound page, the head page was automatically looked up as part of the page lock operation but the tail page index was used to calculate the futex key. After the optimisation, the compound head is looked up early and the page lock is only relied upon to identify truncated pages, special pages or a shmem page moving to swapcache. The head page is looked up because without the page lock, special care has to be taken to pin the inode correctly. However, the tail page is still required to calculate the futex key so this patch records the tail page. On vanilla 4.6, the output of the test case is; futex_wake04 0 TINFO : Hugepagesize 2097152 futex_wake04 1 TFAIL : futex_wake04.c:126: Bug: wait_thread2 did not wake after 30 secs. With the patch applied futex_wake04 0 TINFO : Hugepagesize 2097152 futex_wake04 1 TPASS : Hi hydra, thread2 awake! Fixes: 65d8fc777f6d "futex: Remove requirement for lock_page() in get_futex_key()" Reported-and-tested-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20160608132522.GM2469@suse.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * locking/qspinlock: Fix spin_unlock_wait() some morePeter Zijlstra2016-06-082-36/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While this prior commit: 54cf809b9512 ("locking,qspinlock: Fix spin_is_locked() and spin_unlock_wait()") ... fixes spin_is_locked() and spin_unlock_wait() for the usage in ipc/sem and netfilter, it does not in fact work right for the usage in task_work and futex. So while the 2 locks crossed problem: spin_lock(A) spin_lock(B) if (!spin_is_locked(B)) spin_unlock_wait(A) foo() foo(); ... works with the smp_mb() injected by both spin_is_locked() and spin_unlock_wait(), this is not sufficient for: flag = 1; smp_mb(); spin_lock() spin_unlock_wait() if (!flag) // add to lockless list // iterate lockless list ... because in this scenario, the store from spin_lock() can be delayed past the load of flag, uncrossing the variables and loosing the guarantee. This patch reworks spin_is_locked() and spin_unlock_wait() to work in both cases by exploiting the observation that while the lock byte store can be delayed, the contender must have registered itself visibly in other state contained in the word. It also allows for architectures to override both functions, as PPC and ARM64 have an additional issue for which we currently have no generic solution. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Giovanni Gherdovich <ggherdovich@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <waiman.long@hpe.com> Cc: Will Deacon <will.deacon@arm.com> Cc: stable@vger.kernel.org # v4.2 and later Fixes: 54cf809b9512 ("locking,qspinlock: Fix spin_is_locked() and spin_unlock_wait()") Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * locking/ww_mutex: Report recursive ww_mutex locking earlyChris Wilson2016-06-031-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recursive locking for ww_mutexes was originally conceived as an exception. However, it is heavily used by the DRM atomic modesetting code. Currently, the recursive deadlock is checked after we have queued up for a busy-spin and as we never release the lock, we spin until kicked, whereupon the deadlock is discovered and reported. A simple solution for the now common problem is to move the recursive deadlock discovery to the first action when taking the ww_mutex. Suggested-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1464293297-19777-1-git-send-email-chris@chris-wilson.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * locking/seqcount: Re-fix raw_read_seqcount_latch()Peter Zijlstra2016-06-031-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 50755bc1c305 ("seqlock: fix raw_read_seqcount_latch()") broke raw_read_seqcount_latch(). If you look at the comment that was modified; the thing that changes is the seq count, not the latch pointer. * void latch_modify(struct latch_struct *latch, ...) * { * smp_wmb(); <- Ensure that the last data[1] update is visible * latch->seq++; * smp_wmb(); <- Ensure that the seqcount update is visible * * modify(latch->data[0], ...); * * smp_wmb(); <- Ensure that the data[0] update is visible * latch->seq++; * smp_wmb(); <- Ensure that the seqcount update is visible * * modify(latch->data[1], ...); * } * * The query will have a form like: * * struct entry *latch_query(struct latch_struct *latch, ...) * { * struct entry *entry; * unsigned seq, idx; * * do { * seq = lockless_dereference(latch->seq); So here we have: seq = READ_ONCE(latch->seq); smp_read_barrier_depends(); Which is exactly what we want; the new code: seq = ({ p = READ_ONCE(latch); smp_read_barrier_depends(); p })->seq; is just wrong; because it looses the volatile read on seq, which can now be torn or worse 'optimized'. And the read_depend barrier is also placed wrong, we want it after the load of seq, to match the above data[] up-to-date wmb()s. Such that when we dereference latch->data[] below, we're guaranteed to observe the right data. * * idx = seq & 0x01; * entry = data_query(latch->data[idx], ...); * * smp_rmb(); * } while (seq != latch->seq); * * return entry; * } So yes, not passing a pointer is not pretty, but the code was correct, and isn't anymore now. Change to explicit READ_ONCE()+smp_read_barrier_depends() to avoid confusion and allow strict lockless_dereference() checking. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 50755bc1c305 ("seqlock: fix raw_read_seqcount_latch()") Link: http://lkml.kernel.org/r/20160527111117.GL3192@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | Merge branch 'efi-urgent-for-linus' of ↵Linus Torvalds2016-06-102-9/+7
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI fixes from Ingo Molnar: "Two fixes: a regression/crash fix, and a message output fix" * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/arm: Fix the format of EFI debug messages efi: Fix for_each_efi_memory_desc_in_map() for empty memmaps
| * | efi/arm: Fix the format of EFI debug messagesDennis Chen2016-06-031-8/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When both EFI and memblock debugging is enabled on the kernel command line: 'efi=debug memblock=debug' .. the debug messages for early_con look the following way: [ 0.000000] efi: 0x0000e1050000-0x0000e105ffff [Memory Mapped I/O |RUN| | | | | | | | | | |UC] [ 0.000000] efi: 0x0000e1300000-0x0000e1300fff [Memory Mapped I/O |RUN| | | | | | | | | | |UC] [ 0.000000] efi: 0x0000e8200000-0x0000e827ffff [Memory Mapped I/O |RUN| | | | | | | | | | |UC] [ 0.000000] efi: 0x008000000000-0x008001e7ffff [Runtime Data |RUN| | | | | | | |WB|WT|WC|UC] [ 0.000000] memblock_add: [0x00008000000000-0x00008001e7ffff] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c [ 0.000000] * ... Note the misplaced '*' line, which happened because the memblock debug message was printed while the EFI debug message was still being constructed.. This patch fixes the output to be the expected: [ 0.000000] efi: 0x0000e1050000-0x0000e105ffff [Memory Mapped I/O |RUN| | | | | | | | | | |UC] [ 0.000000] efi: 0x0000e1300000-0x0000e1300fff [Memory Mapped I/O |RUN| | | | | | | | | | |UC] [ 0.000000] efi: 0x0000e8200000-0x0000e827ffff [Memory Mapped I/O |RUN| | | | | | | | | | |UC] [ 0.000000] efi: 0x008000000000-0x008001e7ffff [Runtime Data |RUN| | | | | | | |WB|WT|WC|UC]* [ 0.000000] memblock_add: [0x00008000000000-0x00008001e7ffff] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c ... Note how the '*' is now in the proper EFI debug message line. Signed-off-by: Dennis Chen <dennis.chen@arm.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Acked-by: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Salter <msalter@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steve Capper <steve.capper@arm.com> Cc: Steve McIntyre <steve@einval.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1464690224-4503-3-git-send-email-matt@codeblueprint.co.uk [ Made the changelog more readable. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | efi: Fix for_each_efi_memory_desc_in_map() for empty memmapsVitaly Kuznetsov2016-06-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit: 78ce248faa3c ("efi: Iterate over efi.memmap in for_each_efi_memory_desc()") introduced a regression for systems booted with the 'noefi' kernel option. In particular, I observed an early kernel hang in efi_find_mirror()'s for_each_efi_memory_desc() call. As we don't have efi memmap on this system we enter this iterator with the following parameters: efi.memmap.map = 0, efi.memmap.map_end = 0, efi.memmap.desc_size = 28 ... then for_each_efi_memory_desc_in_map() does the following comparison: (md) <= (efi_memory_desc_t *)((m)->map_end - (m)->desc_size); ... where md = 0, (m)->map_end = 0 and (m)->desc_size = 28 but when we subtract something from a NULL pointer wrap around happens and we end up returning invalid pointer and crash. Fix it by using the correct pointer arithmetics. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Salter <msalter@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Fixes: 78ce248faa3c ("efi: Iterate over efi.memmap in for_each_efi_memory_desc()") Link: http://lkml.kernel.org/r/1464690224-4503-2-git-send-email-matt@codeblueprint.co.uk [ Made the changelog more readable. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds2016-06-101-1/+3
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fix from Ingo Molnar: "Addresses a false positive warning in the GPU/DRM code" [ Technically it's not a "false positive", but it's the virtual GPU interface that needs the frame pointer for its own internal purposes ] * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool, drm/vmwgfx: Fix "duplicate frame pointer save" warning
| * | | objtool, drm/vmwgfx: Fix "duplicate frame pointer save" warningJosh Poimboeuf2016-06-081-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | objtool reports the following warnings: drivers/gpu/drm/vmwgfx/vmwgfx_msg.o: warning: objtool: vmw_send_msg()+0x107: duplicate frame pointer save drivers/gpu/drm/vmwgfx/vmwgfx_msg.o: warning: objtool: vmw_host_get_guestinfo()+0x252: duplicate frame pointer save To quote Linus: "The reason is that VMW_PORT_HB_OUT() uses a magic instruction sequence (a "rep outsb") to communicate with the hypervisor (it's a virtual GPU driver for vmware), and %rbp is part of the communication. So the inline asm does a save-and-restore of the frame pointer around the instruction sequence. I actually find the objtool warning to be quite reasonable, so it's not exactly a false positive, since in this case it actually does point out that the frame pointer won't be reliable over that instruction sequence. But in this particular case it just ends up being the wrong thing - the code is what it is, and %rbp just can't have the frame information due to annoying magic calling conventions." Silence the warnings by telling objtool to ignore the two functions which use the VMW_PORT_HB_{IN,OUT} macros. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: DRI <dri-devel@lists.freedesktop.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20160526184343.fdtjjjg67smmeekt@treble Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2016-06-1093-460/+828
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: 1) nfnetlink timestamp taken from wrong skb, fix from Florian Westphal. 2) Revert some msleep conversions in rtlwifi as these spots are in atomic context, from Larry Finger. 3) Validate that NFTA_SET_TABLE attribute is actually specified when we call nf_tables_getset(). From Phil Turnbull. 4) Don't do mdio_reset in stmmac driver with spinlock held as that can sleep, from Vincent Palatin. 5) sk_filter() does things other than run a BPF filter, so we should not elide it's call just because sk->sk_filter is NULL. Fix from Eric Dumazet. 6) Fix missing backlog updates in several packet schedulers, from Cong Wang. 7) bnx2x driver should allow VLAN add/remove while the interface is down, from Michal Schmidt. 8) Several RDS/TCP race fixes from Sowmini Varadhan. 9) fq_codel scheduler doesn't return correct queue length in dumps, from Eric Dumazet. 10) Fix TCP stats for tail loss probe and early retransmit in ipv6, from Yuchung Cheng. 11) Properly initialize udp_tunnel_socket_cfg in l2tp_tunnel_create(), from Guillaume Nault. 12) qfq scheduler leaks SKBs if a kzalloc fails, fix from Florian Westphal. 13) sock_fprog passed into PACKET_FANOUT_DATA needs compat handling, from Willem de Bruijn. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (85 commits) vmxnet3: segCnt can be 1 for LRO packets packet: compat support for sock_fprog stmmac: fix parameter to dwmac4_set_umac_addr() net/mlx5e: Fix blue flame quota logic net/mlx5e: Use ndo_stop explicitly at shutdown flow net/mlx5: E-Switch, always set mc_promisc for allmulti vports net/mlx5: E-Switch, Modify node guid on vf set MAC net/mlx5: E-Switch, Fix vport enable flow net/mlx5: E-Switch, Use the correct error check on returned pointers net/mlx5: E-Switch, Use the correct free() function net/mlx5: Fix E-Switch flow steering capabilities check net/mlx5: Fix flow steering NIC capabilities check net/mlx5: Fix root flow table update net/mlx5: Fix MLX5_CMD_OP_MAX to be defined correctly net/mlx5: Fix masking of reserved bits in XRCD number net/mlx5: Fix the size of modify QP mailbox mlxsw: spectrum: Don't sleep during ndo_get_phys_port_name() mlxsw: spectrum: Make split flow match firmware requirements wext: Fix 32 bit iwpriv compatibility issue with 64 bit Kernel cfg80211: remove get/set antenna and tx power warnings ...
| * | | | vmxnet3: segCnt can be 1 for LRO packetsShrikrishna Khare2016-06-102-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The device emulation may send segCnt of 1 for LRO packets. Signed-off-by: Shrikrishna Khare <skhare@vmware.com> Signed-off-by: Jin Heo <heoj@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | packet: compat support for sock_fprogWillem de Bruijn2016-06-103-2/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Socket option PACKET_FANOUT_DATA takes a struct sock_fprog as argument if PACKET_FANOUT has mode PACKET_FANOUT_CBPF. This structure contains a pointer into user memory. If userland is 32-bit and kernel is 64-bit the two disagree about the layout of struct sock_fprog. Add compat setsockopt support to convert a 32-bit compat_sock_fprog to a 64-bit sock_fprog. This is analogous to compat_sock_fprog support for SO_REUSEPORT added in commit 1957598840f4 ("soreuseport: add compat case for setsockopt SO_ATTACH_REUSEPORT_CBPF"). Reported-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | stmmac: fix parameter to dwmac4_set_umac_addr()Ben Dooks2016-06-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The dwmac4_set_umac_addr() takes a struct mac_device_info as the first parameter, but is being passed a ioaddr instead from dwmac4_set_filter(). Fix the warning/bug by changing the first parameter. drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c:159:46: warning: incorrect type in argument 1 (different address spaces) drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c:159:46: expected struct mac_device_info *hw drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c:159:46: got void [noderef] <asn:2>*ioaddr Note, only compile tested this as do not have any hardware with it in. Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | Merge branch 'mlx5-fixes'David S. Miller2016-06-1010-52/+128
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Saeed Mahameed says: ==================== Mellanox 100G mlx5 fixes for 4.7-rc The following series provides some small fixes for mlx5 driver. Two small fixes for the mlx5e netdev, the 1st is for the blue flame quota accounting and the 2nd is a small refactoring in shutdown flow. Five trivial fixes for mlx5 E-Switch. - Allmulti mc_promisc flag was not set in a specific flow. - Modify VF node guid when admin mac is changed. - Race in vport enable flow. - Misc code fixes (kvfree when needed and error pointers checking). Three in mlx5 steering area. Correct capabilities checking and root flow table update. Three misc fixes in mlx5 commands enum and layouts. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | net/mlx5e: Fix blue flame quota logicEli Cohen2016-06-101-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Blue flame is a latency enhancement feature that allows the driver to write the packet data directly to the NIC's registers thus making the read of the packet data from host memory redundant. We maintain a quota for the blue flame which is reloaded whenever we identify that the hardware is processing send requests and processes them fast enough so by the time we post the next send request it was able to process all the pending ones. This indicates that the hardware is capable of processing more blue flame requests efficiently. The blue flame quota is decremented whenever we send using blue flame. The current code erroneously clears the budget if we did not use blue flame for the current post send operation and we fix it here. Fixes: 88a85f99e51f ('net/mlx5e: TX latency optimization to save DMA reads') Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | net/mlx5e: Use ndo_stop explicitly at shutdown flowEran Ben Elisha2016-06-101-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current implementation copies the flow of ndo_stop instead of calling it explicitly, Fixed it. Fixes: 5fc7197d3a25 ("net/mlx5: Add pci shutdown callback") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | net/mlx5: E-Switch, always set mc_promisc for allmulti vportsMohamad Haj Yahia2016-06-101-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Set the mc_promisc flag also in the case of adding new mc address to existing allmulti vport. Fixes: a35f71f27a61 ('net/mlx5: E-Switch, Implement promiscuous rx modes vf request handling') Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | net/mlx5: E-Switch, Modify node guid on vf set MACNoa Osherovich2016-06-104-4/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In RoCE, the RDMA-CM needs the node guid to establish connection between nodes. Today, the node guid exposed to mlx5 Ethernet VFs is zero, therefore RDMA-CM on the VF is broken. Whenever the administrator sets a MAC for a VF, derive the node guid from it and set it as well in the following way: MAC: e4:1d:2d:b3:f4:01 -> node_guid: e4:1d:2d:ff:fe:b3:f4:01 Fixes: 77256579c6b43 ('net/mlx5: E-Switch, Introduce Vport...') Signed-off-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | net/mlx5: E-Switch, Fix vport enable flowMohamad Haj Yahia2016-06-101-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reorder vport enable flow to mark the vport as enabled before calling the vport change handler which was modified to handle the case for when vport is not enabled. This fixes the case for when the PF netdev is open before sriov is enabled, once sriov is enabled at esw_enable_vport, esw_vport_change_handle_locked didn't read the PF context since it thought the PF vport was not enabled. When we enable the vport, arming for events is not required anymore, since it's done on the vport change handle Fixes: 586cfa7f1d58 ('net/mlx5: E-Switch, Use vport event handler for vport cleanup') Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | net/mlx5: E-Switch, Use the correct error check on returned pointersOr Gerlitz2016-06-101-17/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The mlx5 flow-steering API (mlx5_create_flow_table/group/rule) never returns null pointer on error. Even if it was doing that, checking for IS_ERR_OR_NULL(p) and then returning PTR_ERR(p) would have cause bugs, since PTR_ERR(NULL) --> success, crash. To make things more robust and protect against related future bugs, convert all IS_ERR_OR_NULL checks on returned values to IS_ERR. Fixes: 5742df0f7dbe ('net/mlx5: E-Switch, Introduce VST vport ingress/egress ACLs') Fixes: 86d722ad2c3b ('net/mlx5: Use flow steering infrastructure for mlx5_en') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | net/mlx5: E-Switch, Use the correct free() functionOr Gerlitz2016-06-101-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We must use kvfree() for something that could have been allocated with vzalloc(), do that. Fixes: 5742df0f7dbe ('net/mlx5: E-Switch, Introduce VST vport ingress/egress ACLs') Fixes: 86d722ad2c3b ('net/mlx5: Use flow steering infrastructure for mlx5_en') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | net/mlx5: Fix E-Switch flow steering capabilities checkMaor Gottlieb2016-06-101-13/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add missing capabilities check for E-Switch FDB and ACLs flow tables before creating their namespace in flow steering. Fixes: efdc810ba39d ('net/mlx5: Flow steering, Add vport ACL support') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | net/mlx5: Fix flow steering NIC capabilities checkMaor Gottlieb2016-06-102-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Flow steering infrastructure is currently used only on link layer ethernet, therefore the driver should initialize the flow steering when the device link layer is ethernet. In addition, add missing capability check before initializing the namespace of NIC RX flow tables. Fixes: 2530236303d9 ('net/mlx5_core: Flow steering tree initialization') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | net/mlx5: Fix root flow table updateMaor Gottlieb2016-06-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we destroy the last flow table we need to update the root_ft to NULL. It fixes an issue for when the last flow table is destroyed and recreated again, root_ft pointer will not be updated, as a result traffic will be dropped. Fixes: 2cc43b494a6c ('net/mlx5_core: Managing root flow table') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | net/mlx5: Fix MLX5_CMD_OP_MAX to be defined correctlyShahar Klein2016-06-102-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Having MLX5_CMD_OP_MAX on another file causes us to repeatedly miss accounting new commands added to the driver and hence there're no entries for them in debugfs. To solve that, we integrate it into the commands enum as the last entry. Fixes: 34a40e689393 ('net/mlx5_core: Introduce modify flow table command') Signed-off-by: Shahar Klein <shahark@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | net/mlx5: Fix masking of reserved bits in XRCD numberMajd Dibbiny2016-06-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mask the reserved bits when reading the number of newly created XRCD. Fixes: e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB adapters') Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | net/mlx5: Fix the size of modify QP mailboxMajd Dibbiny2016-06-101-0/+1
| |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add 16 reserved bytes at the end of mlx5_modify_qp_mbox_in to match the hardware spec definition. Fixes: e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB adapters') Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | Merge tag 'mac80211-for-davem-2016-06-09' of ↵David S. Miller2016-06-092-4/+23
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Two more fixes for now: * a fix for a long-standing iwpriv 32/64 compat issue * two fairly recently introduced (4.6) warning asking for symmetric operations are erroneous and I remove them ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | wext: Fix 32 bit iwpriv compatibility issue with 64 bit KernelPrasun Maiti2016-06-091-2/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | iwpriv app uses iw_point structure to send data to Kernel. The iw_point structure holds a pointer. For compatibility Kernel converts the pointer as required for WEXT IOCTLs (SIOCIWFIRST to SIOCIWLAST). Some drivers may use iw_handler_def.private_args to populate iwpriv commands instead of iw_handler_def.private. For those case, the IOCTLs from SIOCIWFIRSTPRIV to SIOCIWLASTPRIV will follow the path ndo_do_ioctl(). Accordingly when the filled up iw_point structure comes from 32 bit iwpriv to 64 bit Kernel, Kernel will not convert the pointer and sends it to driver. So, the driver may get the invalid data. The pointer conversion for the IOCTLs (SIOCIWFIRSTPRIV to SIOCIWLASTPRIV), which follow the path ndo_do_ioctl(), is mandatory. This patch adds pointer conversion from 32 bit to 64 bit and vice versa, if the ioctl comes from 32 bit iwpriv to 64 bit Kernel. Cc: stable@vger.kernel.org Signed-off-by: Prasun Maiti <prasunmaiti87@gmail.com> Signed-off-by: Ujjal Roy <royujjal@gmail.com> Tested-by: Dibyajyoti Ghosh <dibyajyotig@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
| | * | | | cfg80211: remove get/set antenna and tx power warningsJohannes Berg2016-06-091-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since set_tx_power and set_antenna are frequently implemented without the matching get_tx_power/get_antenna, we shouldn't have added warnings for those. Remove them. The remaining ones are correct and need to be implemented symmetrically for correct operation. Cc: stable@vger.kernel.org Fixes: de3bb771f471 ("cfg80211: add more warnings for inconsistent ops") Signed-off-by: Johannes Berg <johannes.berg@intel.com>
| * | | | | Merge branch 'mlxsw-fixes'David S. Miller2016-06-092-91/+117
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jiri Pirko says: ==================== mlxsw: couple of fixes Couple of fixes from Ido. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | mlxsw: spectrum: Don't sleep during ndo_get_phys_port_name()Ido Schimmel2016-06-092-43/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When rtnl_fill_ifinfo() is called for a certain netdevice it queries its various parameters such as switch id and physical port name. The function might get called in an atomic context, which means the underlying driver must not sleep during the query operation. Don't query the device and sleep during ndo_get_phys_port_name(), but instead store the needed parameters in port creation time. Fixes: 2bf9a58675c5 ("mlxsw: spectrum: Add support for physical port names") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | mlxsw: spectrum: Make split flow match firmware requirementsIdo Schimmel2016-06-091-52/+95
| |/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a port is created following a split / unsplit we need to map it to the correct module and lane, enable it and then continue to initialize its various parameters such as MTU and VLAN filters. Under certain conditions, such as trying to split ports at the bottom row of the front panel by four, we get firmware errors. After evaluating this with the firmware team it was decided to alter the split / unsplit flow, so that first all the affected ports are mapped, then enabled and finally each is initialized separately. Fix the split / unsplit flow by first mapping and enabling all the affected ports. Newer firmware versions will support both flows. Fixes: 18f1e70c4137 ("mlxsw: spectrum: Introduce port splitting") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | qfq: don't leak skb if kzalloc failsFlorian Westphal2016-06-091-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we need to create a new aggregate to enqueue the skb we call kzalloc. If that fails we returned ENOBUFS without freeing the skb. Spotted during code review. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | ip6gre: Allow live link address changeShweta Choudaha2016-06-091-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ip6 GRE tap device should not be forced to down state to change the mac address and should allow live address change for tap device similar to ipv4 gre. Signed-off-by: Shweta Choudaha <schoudah@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | Merge branch 'cls_u32-hwoffload-fixes'David S. Miller2016-06-091-19/+26
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jakub Kicinski says: ==================== incremental cls_u32 hardware offload fixes These are incremental changes from v1 of cls_u32 fixes. First patch is reposted in its entirety, patch 2 is an incremental change from patch 2 of the original series. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | net: cls_u32: be more strict about skip-sw flag for knodesJakub Kicinski2016-06-091-18/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Return an error if user requested skip-sw and the underlaying hardware cannot handle tc offloads (or offloads are disabled). This patch fixes the knode handling. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | net: cls_u32: catch all hardware offload errorsJakub Kicinski2016-06-091-1/+7
| |/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Errors reported by u32_replace_hw_hnode() were not propagated. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | sfc: report supported link speeds on SFP connectionsBert Kenward2016-06-081-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 7000-series SFC NICs connected with an SFP+ module currently fail to report any supported link speeds. Reported-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: Bert Kenward <bkenward@solarflare.com> Reviewed-by: Jarod Wilson <jarod@redhat.com> Tested-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | net_sched: add missing paddattr descriptionEric Dumazet2016-06-081-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "make htmldocs" complains otherwise: .//net/core/gen_stats.c:65: warning: No description found for parameter 'padattr' .//net/core/gen_stats.c:101: warning: No description found for parameter 'padattr' Fixes: 9854518ea04d ("sched: align nlattr properly when needed") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: kbuild test robot <fengguang.wu@intel.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | ipv6: Skip XFRM lookup if dst_entry in socket cache is validJakub Sitnicki2016-06-081-8/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At present we perform an xfrm_lookup() for each UDPv6 message we send. The lookup involves querying the flow cache (flow_cache_lookup) and, in case of a cache miss, creating an XFRM bundle. If we miss the flow cache, we can end up creating a new bundle and deriving the path MTU (xfrm_init_pmtu) from on an already transformed dst_entry, which we pass from the socket cache (sk->sk_dst_cache) down to xfrm_lookup(). This can happen only if we're caching the dst_entry in the socket, that is when we're using a connected UDP socket. To put it another way, the path MTU shrinks each time we miss the flow cache, which later on leads to incorrectly fragmented payload. It can be observed with ESPv6 in transport mode: 1) Set up a transformation and lower the MTU to trigger fragmentation # ip xfrm policy add dir out src ::1 dst ::1 \ tmpl src ::1 dst ::1 proto esp spi 1 # ip xfrm state add src ::1 dst ::1 \ proto esp spi 1 enc 'aes' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b # ip link set dev lo mtu 1500 2) Monitor the packet flow and set up an UDP sink # tcpdump -ni lo -ttt & # socat udp6-listen:12345,fork /dev/null & 3) Send a datagram that needs fragmentation with a connected socket # perl -e 'print "@" x 1470 | socat - udp6:[::1]:12345 2016/06/07 18:52:52 socat[724] E read(3, 0x555bb3d5ba00, 8192): Protocol error 00:00:00.000000 IP6 ::1 > ::1: frag (0|1448) ESP(spi=0x00000001,seq=0x2), length 1448 00:00:00.000014 IP6 ::1 > ::1: frag (1448|32) 00:00:00.000050 IP6 ::1 > ::1: ESP(spi=0x00000001,seq=0x3), length 1272 (^ ICMPv6 Parameter Problem) 00:00:00.000022 IP6 ::1 > ::1: ESP(spi=0x00000001,seq=0x5), length 136 4) Compare it to a non-connected socket # perl -e 'print "@" x 1500' | socat - udp6-sendto:[::1]:12345 00:00:40.535488 IP6 ::1 > ::1: frag (0|1448) ESP(spi=0x00000001,seq=0x6), length 1448 00:00:00.000010 IP6 ::1 > ::1: frag (1448|64) What happens in step (3) is: 1) when connecting the socket in __ip6_datagram_connect(), we perform an XFRM lookup, miss the flow cache, create an XFRM bundle, and cache the destination, 2) afterwards, when sending the datagram, we perform an XFRM lookup, again, miss the flow cache (due to mismatch of flowi6_iif and flowi6_oif, which is an issue of its own), and recreate an XFRM bundle based on the cached (and already transformed) destination. To prevent the recreation of an XFRM bundle, avoid an XFRM lookup altogether whenever we already have a destination entry cached in the socket. This prevents the path MTU shrinkage and brings us on par with UDPv4. The fix also benefits connected PINGv6 sockets, another user of ip6_sk_dst_lookup_flow(), who also suffer messages being transformed twice. Joint work with Hannes Frederic Sowa. Reported-by: Jan Tluka <jtluka@redhat.com> Signed-off-by: Jakub Sitnicki <jkbs@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | l2tp: fix configuration passed to setup_udp_tunnel_sock()Guillaume Nault2016-06-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unused fields of udp_cfg must be all zeros. Otherwise setup_udp_tunnel_sock() fills ->gro_receive and ->gro_complete callbacks with garbage, eventually resulting in panic when used by udp_gro_receive(). [ 72.694123] BUG: unable to handle kernel paging request at ffff880033f87d78 [ 72.695518] IP: [<ffff880033f87d78>] 0xffff880033f87d78 [ 72.696530] PGD 26e2067 PUD 26e3067 PMD 342ed063 PTE 8000000033f87163 [ 72.696530] Oops: 0011 [#1] SMP KASAN [ 72.696530] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pptp gre pppox ppp_generic slhc crc32c_intel ghash_clmulni_intel jitterentropy_rng sha256_generic hmac drbg ansi_cprng aesni_intel evdev aes_x86_64 ablk_helper cryptd lrw gf128mul glue_helper serio_raw acpi_cpufreq button proc\ essor ext4 crc16 jbd2 mbcache virtio_blk virtio_net virtio_pci virtio_ring virtio [ 72.696530] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.7.0-rc1 #1 [ 72.696530] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014 [ 72.696530] task: ffff880035b59700 ti: ffff880035b70000 task.ti: ffff880035b70000 [ 72.696530] RIP: 0010:[<ffff880033f87d78>] [<ffff880033f87d78>] 0xffff880033f87d78 [ 72.696530] RSP: 0018:ffff880035f87bc0 EFLAGS: 00010246 [ 72.696530] RAX: ffffed000698f996 RBX: ffff88003326b840 RCX: ffffffff814cc823 [ 72.696530] RDX: ffff88003326b840 RSI: ffff880033e48038 RDI: ffff880034c7c780 [ 72.696530] RBP: ffff880035f87c18 R08: 000000000000a506 R09: 0000000000000000 [ 72.696530] R10: ffff880035f87b38 R11: ffff880034b9344d R12: 00000000ebfea715 [ 72.696530] R13: 0000000000000000 R14: ffff880034c7c780 R15: 0000000000000000 [ 72.696530] FS: 0000000000000000(0000) GS:ffff880035f80000(0000) knlGS:0000000000000000 [ 72.696530] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 72.696530] CR2: ffff880033f87d78 CR3: 0000000033c98000 CR4: 00000000000406a0 [ 72.696530] Stack: [ 72.696530] ffffffff814cc834 ffff880034b93468 0000001481416818 ffff88003326b874 [ 72.696530] ffff880034c7ccb0 ffff880033e48038 ffff88003326b840 ffff880034b93462 [ 72.696530] ffff88003326b88a ffff88003326b88c ffff880034b93468 ffff880035f87c70 [ 72.696530] Call Trace: [ 72.696530] <IRQ> [ 72.696530] [<ffffffff814cc834>] ? udp_gro_receive+0x1c6/0x1f9 [ 72.696530] [<ffffffff814ccb1c>] udp4_gro_receive+0x2b5/0x310 [ 72.696530] [<ffffffff814d989b>] inet_gro_receive+0x4a3/0x4cd [ 72.696530] [<ffffffff81431b32>] dev_gro_receive+0x584/0x7a3 [ 72.696530] [<ffffffff810adf7a>] ? __lock_is_held+0x29/0x64 [ 72.696530] [<ffffffff814321f7>] napi_gro_receive+0x124/0x21d [ 72.696530] [<ffffffffa000b145>] virtnet_receive+0x8df/0x8f6 [virtio_net] [ 72.696530] [<ffffffffa000b27e>] virtnet_poll+0x1d/0x8d [virtio_net] [ 72.696530] [<ffffffff81431350>] net_rx_action+0x15b/0x3b9 [ 72.696530] [<ffffffff815893d6>] __do_softirq+0x216/0x546 [ 72.696530] [<ffffffff81062392>] irq_exit+0x49/0xb6 [ 72.696530] [<ffffffff81588e9a>] do_IRQ+0xe2/0xfa [ 72.696530] [<ffffffff81587a49>] common_interrupt+0x89/0x89 [ 72.696530] <EOI> [ 72.696530] [<ffffffff810b05df>] ? trace_hardirqs_on_caller+0x229/0x270 [ 72.696530] [<ffffffff8102b3c7>] ? default_idle+0x1c/0x2d [ 72.696530] [<ffffffff8102b3c5>] ? default_idle+0x1a/0x2d [ 72.696530] [<ffffffff8102bb8c>] arch_cpu_idle+0xa/0xc [ 72.696530] [<ffffffff810a6c39>] default_idle_call+0x1a/0x1c [ 72.696530] [<ffffffff810a6d96>] cpu_startup_entry+0x15b/0x20f [ 72.696530] [<ffffffff81039a81>] start_secondary+0x12c/0x133 [ 72.696530] Code: ff ff ff ff ff ff ff ff ff ff 7f ff ff ff ff ff ff ff 7f 00 7e f8 33 00 88 ff ff 6d 61 58 81 ff ff ff ff 5e de 0a 81 ff ff ff ff <00> 5c e2 34 00 88 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 72.696530] RIP [<ffff880033f87d78>] 0xffff880033f87d78 [ 72.696530] RSP <ffff880035f87bc0> [ 72.696530] CR2: ffff880033f87d78 [ 72.696530] ---[ end trace ad7758b9a1dccf99 ]--- [ 72.696530] Kernel panic - not syncing: Fatal exception in interrupt [ 72.696530] Kernel Offset: disabled [ 72.696530] ---[ end Kernel panic - not syncing: Fatal exception in interrupt v2: use empty initialiser instead of "{ NULL }" to avoid relying on first field's type. Fixes: 38fd2af24fcf ("udp: Add socket based GRO and config") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | cxgb4: Add device id of T540-BT adapterHariprasad Shenai2016-06-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | net-sysfs: fix missing <linux/of_net.h>Ben Dooks2016-06-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The of_find_net_device_by_node() function is defined in <linux/of_net.h> but not included in the .c file that implements it. Fix the following warning by including the header: net/core/net-sysfs.c:1494:19: warning: symbol 'of_find_net_device_by_node' was not declared. Should it be static? Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | bridge: Don't insert unnecessary local fdb entry on changing mac addressToshiaki Makita2016-06-081-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The missing br_vlan_should_use() test caused creation of an unneeded local fdb entry on changing mac address of a bridge device when there is a vlan which is configured on a bridge port but not on the bridge device. Fixes: 2594e9064a57 ("bridge: vlan: add per-vlan struct and move to rhashtables") Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller2016-06-084-7/+9
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains two Netfilter/IPVS fixes for your net tree, they are: 1) Fix missing alignment in next offset calculation for standard targets, introduced in the previous merge window, patch from Florian Westphal. 2) Fix to correct the handling of outgoing connections which use the SIP-pe such that the binding of a real-server is updated when needed. This was an omission from changes introduced by Marco Angaroni in the previous merge window too, to allow handling of outgoing connections by the SIP-pe. Patch and report came via Simon Horman. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | ipvs: update real-server binding of outgoing connections in SIP-peMarco Angaroni2016-06-063-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previous patch that introduced handling of outgoing packets in SIP persistent-engine did not call ip_vs_check_template() in case packet was matching a connection template. Assumption was that real-server was healthy, since it was sending a packet just in that moment. There are however real-server fault conditions requiring that association between call-id and real-server (represented by connection template) gets updated. Here is an example of the sequence of events: 1) RS1 is a back2back user agent that handled call-id1 and call-id2 2) RS1 is down and was marked as unavailable 3) new message from outside comes to IPVS with call-id1 4) IPVS reschedules the message to RS2, which becomes new call handler 5) RS2 forwards the message outside, translating call-id1 to call-id2 6) inside pe->conn_out() IPVS matches call-id2 with existing template 7) IPVS does not change association call-id2 <-> RS1 8) new message comes from client with call-id2 9) IPVS reschedules the message to a real-server potentially different from RS2, which is now the correct destination This patch introduces ip_vs_check_template() call in the handling of outgoing packets for SIP-pe. And also introduces a second optional argument for ip_vs_check_template() that allows to check if dest associated to a connection template is the same dest that was identified as the source of the packet. This is to change the real-server bound to a particular call-id independently from its availability status: the idea is that it's more reliable, for in->out direction (where internal network can be considered trusted), to always associate a call-id with the last real-server that used it in one of its messages. Think about above sequence of events where, just after step 5, RS1 returns instead to be available. Comparison of dests is done by simply comparing pointers to struct ip_vs_dest; there should be no cases where struct ip_vs_dest keeps its memory address, but represent a different real-server in terms of ip-address / port. Fixes: 39b972231536 ("ipvs: handle connections started by real-servers") Signed-off-by: Marco Angaroni <marcoangaroni@gmail.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
| | * | | | | netfilter: x_tables: don't reject valid target size on some architecturesFlorian Westphal2016-06-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Quoting John Stultz: In updating a 32bit arm device from 4.6 to Linus' current HEAD, I noticed I was having some trouble with networking, and realized that /proc/net/ip_tables_names was suddenly empty. Digging through the registration process, it seems we're catching on the: if (strcmp(t->u.user.name, XT_STANDARD_TARGET) == 0 && target_offset + sizeof(struct xt_standard_target) != next_offset) return -EINVAL; Where next_offset seems to be 4 bytes larger then the offset + standard_target struct size. next_offset needs to be aligned via XT_ALIGN (so we can access all members of ip(6)t_entry struct). This problem didn't show up on i686 as it only needs 4-byte alignment for u64, but iptables userspace on other 32bit arches does insert extra padding. Reported-by: John Stultz <john.stultz@linaro.org> Tested-by: John Stultz <john.stultz@linaro.org> Fixes: 7ed2abddd20cf ("netfilter: x_tables: check standard target size too") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | | | | tcp: record TLP and ER timer stats in v6 statsYuchung Cheng2016-06-081-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The v6 tcp stats scan do not provide TLP and ER timer information correctly like the v4 version . This patch fixes that. Fixes: 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)") Fixes: eed530b6c676 ("tcp: early retransmit") Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>