summaryrefslogtreecommitdiffstats
path: root/net (unfollow)
Commit message (Collapse)AuthorFilesLines
2020-03-25net: Fix CONFIG_NET_CLS_ACT=n and CONFIG_NFT_FWD_NETDEV={y, m} buildPablo Neira Ayuso5-10/+10
net/netfilter/nft_fwd_netdev.c: In function ‘nft_fwd_netdev_eval’: net/netfilter/nft_fwd_netdev.c:32:10: error: ‘struct sk_buff’ has no member named ‘tc_redirected’ pkt->skb->tc_redirected = 1; ^~ net/netfilter/nft_fwd_netdev.c:33:10: error: ‘struct sk_buff’ has no member named ‘tc_from_ingress’ pkt->skb->tc_from_ingress = 1; ^~ To avoid a direct dependency with tc actions from netfilter, wrap the redirect bits around CONFIG_NET_REDIRECT and move helpers to include/linux/skbuff.h. Turn on this toggle from the ifb driver, the only existing client of these bits in the tree. This patch adds skb_set_redirected() that sets on the redirected bit on the skbuff, it specifies if the packet was redirect from ingress and resets the timestamp (timestamp reset was originally missing in the netfilter bugfix). Fixes: bcfabee1afd99484 ("netfilter: nft_fwd_netdev: allow to redirect to ifb via ingress") Reported-by: noreply@ellerman.id.au Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25cxgb4: Add support to catch bits set in INT_CAUSE5Rahul Kundu2-8/+30
This commit adds support to catch any bits set in SGE_INT_CAUSE5 for Parity Errors. F_ERR_T_RXCRC flag is used to ignore that particular bit as it is not considered as fatal. So, we clear out the bit before looking for error. This patch now read and report separately all three registers(Cause1, Cause2, Cause5). Also, checks for errors if any. Signed-off-by: Raju Rangoju <rajur@chelsio.com> Signed-off-by: Rahul Kundu <rahul.kundu@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25octeontx2-pf: Fix ndo_set_rx_modeSunil Goutham2-2/+29
Since set_rx_mode takes a mutex lock for sending mailbox message to admin function to set the mode, moved logic to a workqueue. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25octeontx2-pf: Fix rx buffer page refcountSunil Goutham1-0/+1
Fixed an issue wherein while refilling receive buffers for the last page allocated, recount is not being updated. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25s390/qeth: modernize two list helpersJulian Wiedmann1-5/+3
Replace list_for_each() with list_for_each_entry(), and list_entry(head.next) with list_first_entry(). Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25s390/qeth: keep track of fixed prio-queue configurationJulian Wiedmann3-4/+7
When a device is configured in prio-queue mode to pin all traffic onto a specific HW queue, treat this as a distinct variant of prio-queueing instead of QETH_NO_PRIO_QUEUEING. This corrects an error message from qeth_osa_set_output_queues() for devices configured in such a mode. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25s390/qeth: fine-tune MAC Address-related errnosJulian Wiedmann1-5/+5
Return the correct errnos when .ndo_set_mac_address fails to set a new MAC address. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25s390/qeth: add TX IRQ coalescing support for IQD devicesJulian Wiedmann3-8/+102
Since IQD devices complete (most of) their transmissions synchronously, they don't offer TX completion IRQs and have no HW coalescing controls. But we can fake the easy parts in SW, and give the user some control wrt to how often the TX NAPI code should be triggered to process the TX completions. Having per-queue controls can in particular help the dedicated mcast queue, as it likely benefits from different fine-tuning than what the ucast queues need. CC: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25s390/qeth: collect more TX statisticsJulian Wiedmann3-6/+17
Count the number of TX doorbells we issue to the qdio layer. Also count the number of actual frames in a TX buffer, and then use this data along with the byte count during TX completion. We'll make additional use of the frame count in a subsequent patch. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25s390/qeth: clean up the mac_bitsJulian Wiedmann2-8/+8
We're down to a single bit flag for MAC-address related status, reflect that in the info struct. Also set up the flag during initialization instead of clearing it during shutdown - one more little step towards unifying the shutdown code. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25s390/qeth: simplify L3 dev_id logicJulian Wiedmann2-22/+13
The logic that deals with errors from qeth_l3_get_unique_id() is quite complex: it sets card->unique_id to 0xfffe, additionally flags it as UNIQUE_ID_NOT_BY_CARD and later takes this flag as cue to not propagate card->unique_id to dev->dev_id. With dev->dev_id thus holding 0, addrconf_ifid_eui48() applies its default behaviour. Get rid of all the special bit masks, and just return the old uid in case of an error. For the vast majority of cases this will be 0 (and so we still get the desired default behaviour) - with the rare exception where qeth_l3_get_unique_id() might have been called earlier but the initialization then failed at a later point. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25s390/qdio: extend polling support to multiple queuesJulian Wiedmann7-109/+87
When the support for polling drivers was initially added, it only considered Input Queue 0. But as QDIO interrupts are actually for the full device and not a single queue, this doesn't really fit for configurations where multiple Input Queues are used. Rework the qdio code so that interrupts for a polling driver are not split up into actions for each queue. Instead deliver the interrupt as a single event, and let the driver decide which queue needs what action. When re-enabling the QDIO interrupt via qdio_start_irq(), this means that the qdio code needs to (1) put _all_ eligible queues back into a state where they raise IRQs, (2) and afterwards check _all_ eligible queues for new work to bridge the race window. On the qeth side of things (as the only qdio polling driver), we can now add CQ polling support to the main NAPI poll routine. It doesn't consume NAPI budget, and to avoid hogging the CPU we yield control after completing one full queue worth of buffers. The subsequent qdio_start_irq() will check for any additional work, and have us re-schedule the NAPI instance accordingly. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25s390/qeth: remove redundant if-clause in RX poll codeJulian Wiedmann1-30/+26
Whenever all completed RX buffers have been processed (ie. rx->b_count == 0), we call down to the HW layer to scan for additional buffers. If no further buffers are available, the code breaks out of the while-loop. So we never reach the 'process an RX buffer' step with rx->b_count == 0, eliminate that check and one level of indentation. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25s390/qeth: split out RX poll codeJulian Wiedmann1-21/+33
The main NAPI poll routine should eventually handle more types of work, beyond just the RX ring. Split off the RX poll logic into a separate function, and simplify the nested while-loop. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25s390/qeth: simplify RX buffer trackingJulian Wiedmann2-19/+13
Since RX buffers may contain multiple packets, qeth's NAPI poll code can exhaust its budget in the middle of an RX buffer. Thus we keep track of our current position within the active RX buffer, so we can resume processing here in the next NAPI poll period. Clean up that code by tracking the index of the active buffer element, instead of a pointer to it. Also simplify the code that advances to the next RX buffer when the current buffer has been fully processed. v2: - remove QDIO_ELEMENT_NO() macro (davem) Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25net: ena: Add PCI shutdown handler to allow safe kexecGuilherme G. Piccoli1-10/+41
Currently ENA only provides the PCI remove() handler, used during rmmod for example. This is not called on shutdown/kexec path; we are potentially creating a failure scenario on kexec: (a) Kexec is triggered, no shutdown() / remove() handler is called for ENA; instead pci_device_shutdown() clears the master bit of the PCI device, stopping all DMA transactions; (b) Kexec reboot happens and the device gets enabled again, likely having its FW with that DMA transaction buffered; then it may trigger the (now invalid) memory operation in the new kernel, corrupting kernel memory area. This patch aims to prevent this, by implementing a shutdown() handler quite similar to the remove() one - the difference being the handling of the netdev, which is unregistered on remove(), but following the convention observed in other drivers, it's only detached on shutdown(). This prevents an odd issue in AWS Nitro instances, in which after the 2nd kexec the next one will fail with an initrd corruption, caused by a wild DMA write to invalid kernel memory. The lspci output for the adapter present in my instance is: 00:05.0 Ethernet controller [0200]: Amazon.com, Inc. Elastic Network Adapter (ENA) [1d0f:ec20] Suggested-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com> Acked-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25selftests/net/forwarding: define libs as TEST_PROGS_EXTENDEDHangbin Liu2-15/+16
The lib files should not be defined as TEST_PROGS, or we will run them in run_kselftest.sh. Also remove ethtool_lib.sh exec permission. Fixes: 81573b18f26d ("selftests/net/forwarding: add Makefile to install tests") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25selftests/net: add missing tests to MakefileHangbin Liu1-1/+3
Find some tests are missed in Makefile by running: for file in $(ls *.sh); do grep -q $file Makefile || echo $file; done Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25net: use indirect call wrappers for skb_copy_datagram_iter()Eric Dumazet1-3/+11
TCP recvmsg() calls skb_copy_datagram_iter(), which calls an indirect function (cb pointing to simple_copy_to_iter()) for every MSS (fragment) present in the skb. CONFIG_RETPOLINE=y forces a very expensive operation that we can avoid thanks to indirect call wrappers. This patch gives a 13% increase of performance on a single flow, if the bottleneck is the thread reading the TCP socket. Fixes: 950fcaecd5cc ("datagram: consolidate datagram copy to iter helpers") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25cxgb4: remove set but not used variable 'tab'YueHaibing1-3/+1
drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c: In function cxgb4_get_free_ftid: drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c:547:23: warning: variable tab set but not used [-Wunused-but-set-variable] commit 8d174351f285 ("cxgb4: rework TC filter rule insertion across regions") involved this, remove it. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25zonfs: Fix handling of read-only zonesDamien Le Moal2-12/+34
The write pointer of zones in the read-only consition is defined as invalid by the SCSI ZBC and ATA ZAC specifications. It is thus not possible to determine the correct size of a read-only zone file on mount. Fix this by handling read-only zones in the same manner as offline zones by disabling all accesses to the zone (read and write) and initializing the inode size of the read-only zone to 0). For zones found to be in the read-only condition at runtime, only disable write access to the zone and keep the size of the zone file to its last updated value to allow the user to recover previously written data. Also fix zonefs documentation file to reflect this change. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
2020-03-25r8169: re-enable MSI on RTL8168cHeiner Kallweit1-1/+1
The original change fixed an issue on RTL8168b by mimicking the vendor driver behavior to disable MSI on chip versions before RTL8168d. This however now caused an issue on a system with RTL8168c, see [0]. Therefore leave MSI disabled on RTL8168b, but re-enable it on RTL8168c. [0] https://bugzilla.redhat.com/show_bug.cgi?id=1792839 Fixes: 003bd5b4a7b4 ("r8169: don't use MSI before RTL8168d") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25devlink: expand the devlink-info documentationJakub Kicinski4-16/+213
We are having multiple review cycles with all vendors trying to implement devlink-info. Let's expand the documentation with more information about what's implemented and motivation behind this interface in an attempt to make the implementations easier. Describe what each info section is supposed to contain, and make some references to other HW interfaces (PCI caps). Document how firmware management is expected to look, to make it clear how devlink-info and devlink-flash work in concert. Name some future work. v2: - improve wording v3: - improve wording Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25net: phy: mdio-bcm-unimac: Fix clock handlingAndre Przywara1-4/+2
The DT binding for this PHY describes an *optional* clock property. Due to a bug in the error handling logic, we are actually ignoring this clock *all* of the time so far. Fix this by using devm_clk_get_optional() to handle this clock properly. Fixes: b78ac6ecd1b6b ("net: phy: mdio-bcm-unimac: Allow configuring MDIO clock divider") Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25net: phy: mscc: consolidate a common RGMII delay implementationVladimir Oltean2-57/+49
It looks like the VSC8584 PHY driver is rolling its own RGMII delay configuration code, despite the fact that the logic is mostly the same. In fact only the register layout and position for the RGMII controls has changed. So we need to adapt and parameterize the PHY-dependent bit fields when calling the new generic function. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25net: axienet: Allow DMA to beyond 4GBAndre Przywara1-0/+8
With all DMA address accesses wrapped, we can actually support 64-bit DMA if this option was chosen at IP integration time. If the IP has been configured for an address width greater than 32 bits, we assume the full 64 bit DMA width is working. In practise this will be limited by the actual system address bus width, which will ideally be the same as the DMA IP address width. If this is not the case, the actual width can still be configured using a dma-ranges property in the parent of the MAC node. This increases the DMA mask on those systems to let the kernel choose buffers from memory at higher addresses. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25net: axienet: Autodetect 64-bit DMA capabilityAndre Przywara2-0/+27
When newer revisions of the Axienet IP are configured for a 64-bit bus, we *need* to write to the MSB part of the an address registers, otherwise the IP won't recognise this as a DMA start condition. This is even true when the actual DMA address comes from the lower 4 GB. To autodetect this configuration, at probe time we write all 1's to such an MSB register, and see if any bits stick. If this is configured for a 32-bit bus, those MSB registers are RES0, so reading back 0 indicates that no MSB writes are necessary. On the other hands reading anything other than 0 indicated the need to write the MSB registers, so we set the respective flag. The actual DMA mask stays at 32-bit for now. To help bisecting, a separate patch will enable allocations from higher addresses. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25net: axienet: Upgrade descriptors to hold 64-bit addressesAndre Przywara2-39/+83
Newer revisions of the AXI DMA IP (>= v7.1) support 64-bit addresses, both for the descriptors itself, as well as for the buffers they are pointing to. This is realised by adding "MSB" words for the next and phys pointer right behind the existing address word, now named "LSB". These MSB words live in formerly reserved areas of the descriptor. If the hardware supports it, write both words when setting an address. The buffer address is handled by two wrapper functions, the two occasions where we set the next pointers are open coded. For now this is guarded by a flag which we don't set yet. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25net: axienet: Wrap DMA pointer writes to prepare for 64 bitAndre Przywara1-10/+16
Newer versions of the Xilink DMA IP support busses with more than 32 address bits, by introducing an MSB word for the registers holding DMA pointers (tail/current, RX/TX descriptor addresses). On IP configured for more than 32 bits, it is also *required* to write both words, to let the IP recognise this as a start condition for an MM2S request, for instance. Wrap the DMA pointer writes with a separate function, to add this functionality later. For now we stick to the lower 32 bits. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25net: axienet: Add mii-tool supportAndre Przywara1-0/+11
mii-tool is useful for debugging, and all it requires to work is to wire up the ioctl ops function pointer. Add this to the axienet driver to enable mii-tool. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25net: axienet: Drop MDIO interrupt registers from ethtools dumpAndre Przywara2-11/+0
Newer revisions of the IP don't have these registers. Since we don't really use them, just drop them from the ethtools dump. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25net: axienet: Mark eth_irq as optionalAndre Przywara1-2/+2
According to the DT binding, the Ethernet core interrupt is optional. Use platform_get_irq_optional() to avoid the error message when the IRQ is not specified. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25net: axienet: Check for DMA mapping errorsAndre Przywara1-1/+30
Especially with the default 32-bit DMA mask, DMA buffers are a limited resource, so their allocation can fail. So as the DMA API documentation requires, add error checking code after dma_map_single() calls to catch the case where we run out of "low" memory. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25net: axienet: Factor out TX descriptor chain cleanupAndre Przywara1-22/+57
Factor out the code that cleans up a number of connected TX descriptors, as we will need it to properly roll back a failed _xmit() call. There are subtle differences between cleaning up a successfully sent chain (unknown number of involved descriptors, total data size needed) and a chain that was about to set up (number of descriptors known), so cater for those variations with some extra parameters. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25net: axienet: Improve DMA error handlingAndre Przywara1-2/+2
Since 0 is a valid DMA address, we cannot use the physical address to check whether a TX descriptor is valid and is holding a DMA mapping. Use the "cntrl" member of the descriptor to make this decision, as it contains at least the length of the buffer, so 0 points to an uninitialised buffer. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25net: axienet: Fix DMA descriptor cleanup pathAndre Przywara1-15/+28
When axienet_dma_bd_init() bails out during the initialisation process, it might do so with parts of the structure already allocated and initialised, while other parts have not been touched yet. Before returning in this case, we call axienet_dma_bd_release(), which does not take care of this corner case. This is most obvious by the first loop happily dereferencing lp->rx_bd_v, which we actually check to be non NULL *afterwards*. Make sure we only unmap or free already allocated structures, by: - directly returning with -ENOMEM if nothing has been allocated at all - checking for lp->rx_bd_v to be non-NULL *before* using it - only unmapping allocated DMA RX regions This avoids NULL pointer dereferences when initialisation fails. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25net: axienet: Propagate failure of DMA descriptor setupAndre Przywara1-7/+19
When we fail allocating the DMA buffers in axienet_dma_bd_init(), we report this error, but carry on with initialisation nevertheless. This leads to a kernel panic when the driver later wants to send a packet, as it uses uninitialised data structures. Make the axienet_device_reset() routine return an error value, as it contains the DMA buffer initialisation. Make sure we propagate the error up the chain and eventually fail the driver initialisation, to avoid relying on non-initialised buffers. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25net: axienet: Convert DMA error handler to a work queueAndre Przywara2-13/+13
The DMA error handler routine is currently a tasklet, scheduled to run after the DMA error IRQ was handled. However it needs to take the MDIO mutex, which is not allowed to do in a tasklet. A kernel (with debug options) complains consequently: [ 614.050361] net eth0: DMA Tx error 0x174019 [ 614.064002] net eth0: Current BD is at: 0x8f84aa0ce [ 614.080195] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:935 [ 614.109484] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 40, name: kworker/u4:4 [ 614.135428] 3 locks held by kworker/u4:4/40: [ 614.149075] #0: ffff000879863328 ((wq_completion)rpciod){....}, at: process_one_work+0x1f0/0x6a8 [ 614.177528] #1: ffff80001251bdf8 ((work_completion)(&task->u.tk_work)){....}, at: process_one_work+0x1f0/0x6a8 [ 614.209033] #2: ffff0008784e0110 (sk_lock-AF_INET-RPC){....}, at: tcp_sendmsg+0x24/0x58 [ 614.235429] CPU: 0 PID: 40 Comm: kworker/u4:4 Not tainted 5.6.0-rc3-00926-g4a165a9d5921 #26 [ 614.260854] Hardware name: ARM Test FPGA (DT) [ 614.274734] Workqueue: rpciod rpc_async_schedule [ 614.289022] Call trace: [ 614.296871] dump_backtrace+0x0/0x1a0 [ 614.308311] show_stack+0x14/0x20 [ 614.318751] dump_stack+0xbc/0x100 [ 614.329403] ___might_sleep+0xf0/0x140 [ 614.341018] __might_sleep+0x4c/0x80 [ 614.352201] __mutex_lock+0x5c/0x8a8 [ 614.363348] mutex_lock_nested+0x1c/0x28 [ 614.375654] axienet_dma_err_handler+0x38/0x388 [ 614.389999] tasklet_action_common.isra.15+0x160/0x1a8 [ 614.405894] tasklet_action+0x24/0x30 [ 614.417297] efi_header_end+0xe0/0x494 [ 614.429020] irq_exit+0xd0/0xd8 [ 614.439047] __handle_domain_irq+0x60/0xb0 [ 614.451877] gic_handle_irq+0xdc/0x2d0 [ 614.463486] el1_irq+0xcc/0x180 [ 614.473451] __tcp_transmit_skb+0x41c/0xb58 [ 614.486513] tcp_write_xmit+0x224/0x10a0 [ 614.498792] __tcp_push_pending_frames+0x38/0xc8 [ 614.513126] tcp_rcv_established+0x41c/0x820 [ 614.526301] tcp_v4_do_rcv+0x8c/0x218 [ 614.537784] __release_sock+0x5c/0x108 [ 614.549466] release_sock+0x34/0xa0 [ 614.560318] tcp_sendmsg+0x40/0x58 [ 614.571053] inet_sendmsg+0x40/0x68 [ 614.582061] sock_sendmsg+0x18/0x30 [ 614.593074] xs_sendpages+0x218/0x328 [ 614.604506] xs_tcp_send_request+0xa0/0x1b8 [ 614.617461] xprt_transmit+0xc8/0x4f0 [ 614.628943] call_transmit+0x8c/0xa0 [ 614.640028] __rpc_execute+0xbc/0x6f8 [ 614.651380] rpc_async_schedule+0x28/0x48 [ 614.663846] process_one_work+0x298/0x6a8 [ 614.676299] worker_thread+0x40/0x490 [ 614.687687] kthread+0x134/0x138 [ 614.697804] ret_from_fork+0x10/0x18 [ 614.717319] xilinx_axienet 7fe00000.ethernet eth0: Link is Down [ 615.748343] xilinx_axienet 7fe00000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off Since tasklets are not really popular anymore anyway, lets convert this over to a work queue, which can sleep and thus can take the MDIO mutex. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25net: xilinx: temac: Relax Kconfig dependenciesAndre Przywara1-1/+0
Similar to axienet, the temac driver is now architecture agnostic, and can be at least compiled for several architectures. Especially the fact that this is a soft IP for implementing in FPGAs makes the current restriction rather pointless, as it could literally appear on any architecture, as long as an FPGA is connected to the bus. The driver hasn't been actually tried on any hardware, it is just a drive-by patch when doing the same for axienet (a similar patch for axienet is already merged). This (temac and axienet) have been compile-tested for: alpha hppa64 microblaze mips64 powerpc powerpc64 riscv64 s390 sparc64 (using kernel.org cross compilers). Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25ethtool: fix incorrect tx-checksumming settings reportingVladyslav Tarasiuk1-1/+1
Currently, ethtool feature mask for checksum command is ORed with NETIF_F_FCOE_CRC_BIT, which is bit's position number, instead of the actual feature bit - NETIF_F_FCOE_CRC. The invalid bitmask here might affect unrelated features when toggling TX checksumming. For example, TX checksumming is always mistakenly reported as enabled on the netdevs tested (mlx5, virtio_net). Fixes: f70bb06563ed ("ethtool: update mapping of features to legacy ioctl requests") Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25cxgb4/ptp: pass the sign of offset delta in FW CMDRaju Rangoju1-0/+3
cxgb4_ptp_fineadjtime() doesn't pass the signedness of offset delta in FW_PTP_CMD. Fix it by passing correct sign. Signed-off-by: Raju Rangoju <rajur@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25net: phy: mdio-mux-bcm-iproc: use readl_poll_timeout() to simplify codeDejin Zheng1-10/+4
use readl_poll_timeout() to replace the poll codes for simplify iproc_mdio_wait_for_idle() function Signed-off-by: Dejin Zheng <zhengdejin5@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25net: dsa: tag_8021q: replace dsa_8021q_remove_header with __skb_vlan_popVladimir Oltean3-60/+9
Not only did this wheel did not need reinventing, but there is also an issue with it: It doesn't remove the VLAN header in a way that preserves the L2 payload checksum when that is being provided by the DSA master hw. It should recalculate checksum both for the push, before removing the header, and for the pull afterwards. But the current implementation is quite dizzying, with pulls followed immediately afterwards by pushes, the memmove is done before the push, etc. This makes a DSA master with RX checksumming offload to print stack traces with the infamous 'hw csum failure' message. So remove the dsa_8021q_remove_header function and replace it with something that actually works with inet checksumming. Fixes: d461933638ae ("net: dsa: tag_8021q: Create helper function for removing VLAN header") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25net: cbs: Fix software cbs to consider packet sending timeZh-yuan Ye1-1/+11
Currently the software CBS does not consider the packet sending time when depleting the credits. It caused the throughput to be Idleslope[kbps] * (Port transmit rate[kbps] / |Sendslope[kbps]|) where Idleslope * (Port transmit rate / (Idleslope + |Sendslope|)) = Idleslope is expected. In order to fix the issue above, this patch takes the time when the packet sending completes into account by moving the anchor time variable "last" ahead to the send completion time upon transmission and adding wait when the next dequeue request comes before the send completion time of the previous packet. changelog: V2->V3: - remove unnecessary whitespace cleanup - add the checks if port_rate is 0 before division V1->V2: - combine variable "send_completed" into "last" - add the comment for estimate of the packet sending Fixes: 585d763af09c ("net/sched: Introduce Credit Based Shaper (CBS) qdisc") Signed-off-by: Zh-yuan Ye <ye.zh-yuan@socionext.com> Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-24net/mlx5e: Do not recover from a non-fatal syndromeAya Levin1-2/+1
For non-fatal syndromes like LOCAL_LENGTH_ERR, recovery shouldn't be triggered. In these scenarios, the RQ is not actually in ERR state. This misleads the recovery flow which assumes that the RQ is really in error state and no more completions arrive, causing crashes on bad page state. Fixes: 8276ea1353a4 ("net/mlx5e: Report and recover from CQE with error on RQ") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-24net/mlx5e: Fix ICOSQ recovery flow with Striding RQAya Levin3-8/+26
In striding RQ mode, the buffers of an RX WQE are first prepared and posted to the HW using a UMR WQEs via the ICOSQ. We maintain the state of these in-progress WQEs in the RQ SW struct. In the flow of ICOSQ recovery, the corresponding RQ is not in error state, hence: - The buffers of the in-progress WQEs must be released and the RQ metadata should reflect it. - Existing RX WQEs in the RQ should not be affected. For this, wrap the dealloc of the in-progress WQEs in a function, and use it in the ICOSQ recovery flow instead of mlx5e_free_rx_descs(). Fixes: be5323c8379f ("net/mlx5e: Report and recover from CQE error on ICOSQ") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-24net/mlx5e: Fix missing reset of SW metadata in Striding RQ resetAya Levin1-2/+4
When resetting the RQ (moving RQ state from RST to RDY), the driver resets the WQ's SW metadata. In striding RQ mode, we maintain a field that reflects the actual expected WQ head (including in progress WQEs posted to the ICOSQ). It was mistakenly not reset together with the WQ. Fix this here. Fixes: 8276ea1353a4 ("net/mlx5e: Report and recover from CQE with error on RQ") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-24net/mlx5e: Enhance ICOSQ WQE info fieldsAya Levin3-6/+7
Add number of WQEBBs (WQE's Basic Block) to WQE info struct. Set the number of WQEBBs on WQE post, and increment the consumer counter (cc) on completion. In case of error completions, the cc was mistakenly not incremented, keeping a gap between cc and pc (producer counter). This failed the recovery flow on the ICOSQ from a CQE error which timed-out waiting for the cc and pc to meet. Fixes: be5323c8379f ("net/mlx5e: Report and recover from CQE error on ICOSQ") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-24net/mlx5_core: Set IB capability mask1 to fix ib_srpt connection failureLeon Romanovsky1-0/+3
The cap_mask1 isn't protected by field_select and not listed among RW fields, but it is required to be written to properly initialize ports in IB virtualization mode. Link: https://lore.kernel.org/linux-rdma/88bab94d2fd72f3145835b4518bc63dda587add6.camel@redhat.com Fixes: ab118da4c10a ("net/mlx5: Don't write read-only fields in MODIFY_HCA_VPORT_CONTEXT command") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-24selftests: netfilter: add nfqueue test caseFlorian Westphal4-1/+695
Add a test case to check nf queue infrastructure. Could be extended in the future to also cover serialization of conntrack, uid and secctx attributes in nfqueue. For now, this checks that 'queue bypass' works, that a queue rule with no bypass option blocks traffic and that userspace receives the expected number of packets. For this we add two queues and hook all of prerouting/input/forward/output/postrouting. Packets get queued twice with a dummy base chain in between: This passes with current nf tree, but reverting commit 946c0d8e6ed4 ("netfilter: nf_queue: fix reinject verdict handling") makes this trip (it processes 30 instead of expected 20 packets). v2: update config file with queue and other options missing/needed for other tests. v3: also test with tcp, this reveals problem with commit 28f8bfd1ac94 ("netfilter: Support iif matches in POSTROUTING"), due to skb->dev pointing at another skb in the retransmit rbtree (skb->dev aliases to rbnode child). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>