summaryrefslogtreecommitdiffstats
path: root/net/ipv4/ip_gre.c (unfollow)
Commit message (Collapse)AuthorFilesLines
2019-03-01iommu/vt-d: Fix NULL pointer reference in intel_svm_bind_mm()Lu Baolu1-1/+1
Intel IOMMU could be turned off with intel_iommu=off. If Intel IOMMU is off, the intel_iommu struct will not be initialized. When device drivers call intel_svm_bind_mm(), the NULL pointer reference will happen there. Add dmar_disabled check to avoid NULL pointer reference. Cc: Ashok Raj <ashok.raj@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Reported-by: Dave Jiang <dave.jiang@intel.com> Fixes: 2f26e0a9c9860 ("iommu/vt-d: Add basic SVM PASID support") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-03-01iommu/vt-d: Set context field after value initializedLu Baolu1-1/+2
Otherwise, the translation type field of a context entry for a PCI device will always be 0. All translated DMA requests will be blocked by IOMMU. As the result, the PCI devices with PCI ATS (device IOTBL) support won't work as expected. Cc: Ashok Raj <ashok.raj@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Suggested-by: Kevin Tian <kevin.tian@intel.com> Fixes: 7373a8cc38197 ("iommu/vt-d: Setup context and enable RID2PASID support") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-03-01iommu/vt-d: Disable ATS support on untrusted devicesLu Baolu1-1/+2
Commit fb58fdcd295b9 ("iommu/vt-d: Do not enable ATS for untrusted devices") disables ATS support on the devices which have been marked as untrusted. Unfortunately this is not enough to fix the DMA attack vulnerabiltiies because IOMMU driver allows translated requests as long as a device advertises the ATS capability. Hence a malicious peripheral device could use this to bypass IOMMU. This disables the ATS support on untrusted devices by clearing the internal per-device ATS mark. As the result, IOMMU driver will block any translated requests from any device marked as untrusted. Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Suggested-by: Kevin Tian <kevin.tian@intel.com> Suggested-by: Ashok Raj <ashok.raj@intel.com> Fixes: fb58fdcd295b9 ("iommu/vt-d: Do not enable ATS for untrusted devices") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-03-01iommu/mediatek: Fix semicolon code style issueYang Wei1-1/+1
Delete a superfluous semicolon in mtk_iommu_add_device(). Signed-off-by: Yang Wei <yang.wei9@zte.com.cn> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-02-28MAINTAINERS: Add Hyper-V IOMMU driver into Hyper-V CORE AND DRIVERS scopeLan Tianyu1-0/+1
This patch is to add Hyper-V IOMMU driver file into Hyper-V CORE and DRIVERS scope. Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-02-28iommu/hyper-v: Add Hyper-V stub IOMMU driverLan Tianyu5-0/+210
On the bare metal, enabling X2APIC mode requires interrupt remapping function which helps to deliver irq to cpu with 32-bit APIC ID. Hyper-V doesn't provide interrupt remapping function so far and Hyper-V MSI protocol already supports to deliver interrupt to the CPU whose virtual processor index is more than 255. IO-APIC interrupt still has 8-bit APIC ID limitation. This patch is to add Hyper-V stub IOMMU driver in order to enable X2APIC mode successfully in Hyper-V Linux guest. The driver returns X2APIC interrupt remapping capability when X2APIC mode is available. Otherwise, it creates a Hyper-V irq domain to limit IO-APIC interrupts' affinity and make sure cpus assigned with IO-APIC interrupt have 8-bit APIC ID. Define 24 IO-APIC remapping entries because Hyper-V only expose one single IO-APIC and one IO-APIC has 24 pins according IO-APIC spec( https://pdos.csail.mit.edu/6.828/2016/readings/ia32/ioapic.pdf). Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-02-28x86/Hyper-V: Set x2apic destination mode to physical when x2apic is availableLan Tianyu1-0/+12
Hyper-V doesn't provide irq remapping for IO-APIC. To enable x2apic, set x2apic destination mode to physcial mode when x2apic is available and Hyper-V IOMMU driver makes sure cpus assigned with IO-APIC irqs have 8-bit APIC id. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-02-28PCI/ATS: Add inline to pci_prg_resp_pasid_required()Kuppuswamy Sathyanarayanan1-1/+1
Fix unused function warning when compiled with CONFIG_PCI_PASID disabled. Fixes: e5567f5f6762 ("PCI/ATS: Add pci_prg_resp_pasid_required() interface.") Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-02-26iommu/vt-d: Check identity map for hot-added devicesLu Baolu1-9/+12
The Intel IOMMU driver will put devices into a static identity mapped domain during boot if the kernel parameter "iommu=pt" is used. That means the IOMMU hardware will translate a DMA address into the same memory address. Unfortunately, hot-added devices are not subject to this. That results in some devices not working properly after hot added. A quick way to reproduce this issue is to boot a system with iommu=pt and, remove then readd the pci device with echo 1 > /sys/bus/pci/devices/[pci_source_id]/remove echo 1 > /sys/bus/pci/rescan You will find the identity mapped domain was replaced with a normal domain. Cc: Ashok Raj <ashok.raj@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: stable@vger.kernel.org Reported-by: Jis Ben <jisben@google.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: James Dong <xmdong@google.com> Fixes: 99dcadede42f ('intel-iommu: Support PCIe hot-plug') Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-02-26iommu/dmar: Fix buffer overflow during PCI bus notificationJulia Cartwright1-1/+1
Commit 57384592c433 ("iommu/vt-d: Store bus information in RMRR PCI device path") changed the type of the path data, however, the change in path type was not reflected in size calculations. Update to use the correct type and prevent a buffer overflow. This bug manifests in systems with deep PCI hierarchies, and can lead to an overflow of the static allocated buffer (dmar_pci_notify_info_buf), or can lead to overflow of slab-allocated data. BUG: KASAN: global-out-of-bounds in dmar_alloc_pci_notify_info+0x1d5/0x2e0 Write of size 1 at addr ffffffff90445d80 by task swapper/0/1 CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 4.14.87-rt49-02406-gd0a0e96 #1 Call Trace: ? dump_stack+0x46/0x59 ? print_address_description+0x1df/0x290 ? dmar_alloc_pci_notify_info+0x1d5/0x2e0 ? kasan_report+0x256/0x340 ? dmar_alloc_pci_notify_info+0x1d5/0x2e0 ? e820__memblock_setup+0xb0/0xb0 ? dmar_dev_scope_init+0x424/0x48f ? __down_write_common+0x1ec/0x230 ? dmar_dev_scope_init+0x48f/0x48f ? dmar_free_unused_resources+0x109/0x109 ? cpumask_next+0x16/0x20 ? __kmem_cache_create+0x392/0x430 ? kmem_cache_create+0x135/0x2f0 ? e820__memblock_setup+0xb0/0xb0 ? intel_iommu_init+0x170/0x1848 ? _raw_spin_unlock_irqrestore+0x32/0x60 ? migrate_enable+0x27a/0x5b0 ? sched_setattr+0x20/0x20 ? migrate_disable+0x1fc/0x380 ? task_rq_lock+0x170/0x170 ? try_to_run_init_process+0x40/0x40 ? locks_remove_file+0x85/0x2f0 ? dev_prepare_static_identity_mapping+0x78/0x78 ? rt_spin_unlock+0x39/0x50 ? lockref_put_or_lock+0x2a/0x40 ? dput+0x128/0x2f0 ? __rcu_read_unlock+0x66/0x80 ? __fput+0x250/0x300 ? __rcu_read_lock+0x1b/0x30 ? mntput_no_expire+0x38/0x290 ? e820__memblock_setup+0xb0/0xb0 ? pci_iommu_init+0x25/0x63 ? pci_iommu_init+0x25/0x63 ? do_one_initcall+0x7e/0x1c0 ? initcall_blacklisted+0x120/0x120 ? kernel_init_freeable+0x27b/0x307 ? rest_init+0xd0/0xd0 ? kernel_init+0xf/0x120 ? rest_init+0xd0/0xd0 ? ret_from_fork+0x1f/0x40 The buggy address belongs to the variable: dmar_pci_notify_info_buf+0x40/0x60 Fixes: 57384592c433 ("iommu/vt-d: Store bus information in RMRR PCI device path") Signed-off-by: Julia Cartwright <julia@ni.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-02-26iommu: Fix IOMMU debugfs falloutGeert Uytterhoeven1-19/+4
A change made in the final version of IOMMU debugfs support replaced the public function iommu_debugfs_new_driver_dir() by the public dentry iommu_debugfs_dir in <linux/iommu.h>, but forgot to update both the implementation in iommu-debugfs.c, and the patch description. Fix this by exporting iommu_debugfs_dir, and removing the reference to and implementation of iommu_debugfs_new_driver_dir(). Fixes: bad614b24293ae46 ("iommu: Enable debugfs exposure of IOMMU driver internals") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Gary R Hook <gary.hook@amd.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-02-26iommu: Document iommu_ops.is_attach_deferred()Geert Uytterhoeven1-0/+2
Add missing kerneldoc for iommu_ops.is_attach_deferred(). Fixes: e01d1913b0d08171 ("iommu: Add is_attach_deferred call-back to iommu-ops") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-02-26iommu: Document iommu_ops.iotlb_sync_map()Geert Uytterhoeven1-0/+1
Add missing kerneldoc for iommu_ops.iotlb_sync_map(). Fixes: 1d7ae53b152dbc5b ("iommu: Introduce iotlb_sync_map callback") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-02-26iommu/vt-d: Enable ATS only if the device uses page aligned address.Kuppuswamy Sathyanarayanan1-0/+1
As per Intel vt-d specification, Rev 3.0 (section 7.5.1.1, title "Page Request Descriptor"), Intel IOMMU page request descriptor only uses bits[63:12] of the page address. Hence Intel IOMMU driver would only permit devices that advertise they would only send Page Aligned Requests to participate in ATS service. Cc: Ashok Raj <ashok.raj@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Keith Busch <keith.busch@intel.com> Suggested-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-02-26PCI/ATS: Add pci_ats_page_aligned() interfaceKuppuswamy Sathyanarayanan3-0/+30
Return the Page Aligned Request bit in the ATS Capability Register. As per PCIe spec r4.0, sec 10.5.1.2, if the Page Aligned Request bit is set, it indicates the Untranslated Addresses generated by the device are always aligned to a 4096 byte boundary. An IOMMU that can only translate page-aligned addresses can only be used with devices that always produce aligned Untranslated Addresses. This interface will be used by drivers for such IOMMUs to determine whether devices can use the ATS service. Cc: Ashok Raj <ashok.raj@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Keith Busch <keith.busch@intel.com> Suggested-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-02-26iommu/vt-d: Fix PRI/PASID dependency issue.Kuppuswamy Sathyanarayanan1-1/+3
In Intel IOMMU, if the Page Request Queue (PRQ) is full, it will automatically respond to the device with a success message as a keep alive. And when sending the success message, IOMMU will include PASID in the Response Message when the Page Request has a PASID in Request Message and it does not check against the PRG Response PASID requirement of the device before sending the response. Also, if the device receives the PRG response with PASID when its not expecting it the device behavior is undefined. So if PASID is enabled in the device, enable PRI only if device expects PASID in PRG Response Message. Cc: Ashok Raj <ashok.raj@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Keith Busch <keith.busch@intel.com> Suggested-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-02-26PCI/ATS: Add pci_prg_resp_pasid_required() interface.Kuppuswamy Sathyanarayanan3-0/+36
Return the PRG Response PASID Required bit in the Page Request Status Register. As per PCIe spec r4.0, sec 10.5.2.3, if this bit is Set, the device expects a PASID TLP Prefix on PRG Response Messages when the corresponding Page Requests had a PASID TLP Prefix. If Clear, the device does not expect PASID TLP Prefixes on any PRG Response Message, and the device behavior is undefined if the device receives a PRG Response Message with a PASID TLP Prefix. Also the device behavior is undefined if this bit is Set and the device receives a PRG Response Message with no PASID TLP Prefix when the corresponding Page Requests had a PASID TLP Prefix. This function will be used by drivers like IOMMU, if it is required to check the status of the PRG Response PASID Required bit before enabling the PASID support of the device. Cc: Ashok Raj <ashok.raj@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Keith Busch <keith.busch@intel.com> Suggested-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-02-26iommu/vt-d: Allow interrupts from the entire bus for aliased devicesLogan Gunthorpe1-0/+15
When a device has multiple aliases that all are from the same bus, we program the IRTE to accept requests from any matching device on the bus. This is so NTB devices which can have requests from multiple bus-devfns can pass MSI interrupts through across the bridge. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-02-26iommu/vt-d: Add helper to set an IRTE to verify only the bus numberLogan Gunthorpe1-3/+14
The current code uses set_irte_sid() with SVT_VERIFY_BUS and PCI_DEVID to set the SID value. However, this is very confusing because, with SVT_VERIFY_BUS, the SID value is not a PCI devfn address, but the start and end bus numbers to match against. According to the Intel Virtualization Technology for Directed I/O Architecture Specification, Rev. 3.0, page 9-36: The most significant 8-bits of the SID field contains the Startbus#, and the least significant 8-bits of the SID field contains the Endbus#. Interrupt requests that reference this IRTE must have a requester-id whose bus# (most significant 8-bits of requester-id) has a value equal to or within the Startbus# to Endbus# range. So to make this more clear, introduce a new set_irte_verify_bus() that explicitly takes a start bus and end bus so that we can stop abusing the PCI_DEVID macro. This helper function will be called a second time in an subsequent patch. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-02-26iommu: Fix flush_tlb_all typoTom Murphy1-1/+1
Fix typo, flush_tlb_all should be flush_iotlb_all. Signed-off-by: Tom Murphy <murphyt7@tcd.ie> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-02-26iommu/io-pgtable-arm-v7s: Only kmemleak_ignore L2 tablesNicolas Boichat1-1/+2
L1 tables are allocated with __get_dma_pages, and therefore already ignored by kmemleak. Without this, the kernel would print this error message on boot, when the first L1 table is allocated: [ 2.810533] kmemleak: Trying to color unknown object at 0xffffffd652388000 as Black [ 2.818190] CPU: 5 PID: 39 Comm: kworker/5:0 Tainted: G S 4.19.16 #8 [ 2.831227] Workqueue: events deferred_probe_work_func [ 2.836353] Call trace: ... [ 2.852532] paint_ptr+0xa0/0xa8 [ 2.855750] kmemleak_ignore+0x38/0x6c [ 2.859490] __arm_v7s_alloc_table+0x168/0x1f4 [ 2.863922] arm_v7s_alloc_pgtable+0x114/0x17c [ 2.868354] alloc_io_pgtable_ops+0x3c/0x78 ... Fixes: e5fc9753b1a8314 ("iommu/io-pgtable: Add ARMv7 short descriptor support") Signed-off-by: Nicolas Boichat <drinkcat@chromium.org> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-02-25Linux 5.0-rc8v5.0-rc8Linus Torvalds1-1/+1
2019-02-24net: phy: realtek: Dummy IRQ calls for RTL8366RBLinus Walleij2-0/+15
This fixes a regression introduced by commit 0d2e778e38e0ddffab4bb2b0e9ed2ad5165c4bf7 "net: phy: replace PHY_HAS_INTERRUPT with a check for config_intr and ack_interrupt". This assumes that a PHY cannot trigger interrupt unless it has .config_intr() or .ack_interrupt() implemented. A later patch makes the code assume both need to be implemented for interrupts to be present. But this PHY (which is inside a DSA) will happily fire interrupts without either callback. Implement dummy callbacks for .config_intr() and .ack_interrupt() in the phy header to fix this. Tested on the RTL8366RB on D-Link DIR-685. Fixes: 0d2e778e38e0 ("net: phy: replace PHY_HAS_INTERRUPT with a check for config_intr and ack_interrupt") Cc: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24tcp: repaired skbs must init their tso_segsEric Dumazet1-0/+1
syzbot reported a WARN_ON(!tcp_skb_pcount(skb)) in tcp_send_loss_probe() [1] This was caused by TCP_REPAIR sent skbs that inadvertenly were missing a call to tcp_init_tso_segs() [1] WARNING: CPU: 1 PID: 0 at net/ipv4/tcp_output.c:2534 tcp_send_loss_probe+0x771/0x8a0 net/ipv4/tcp_output.c:2534 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.0.0-rc7+ #77 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 panic+0x2cb/0x65c kernel/panic.c:214 __warn.cold+0x20/0x45 kernel/panic.c:571 report_bug+0x263/0x2b0 lib/bug.c:186 fixup_bug arch/x86/kernel/traps.c:178 [inline] fixup_bug arch/x86/kernel/traps.c:173 [inline] do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271 do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:290 invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973 RIP: 0010:tcp_send_loss_probe+0x771/0x8a0 net/ipv4/tcp_output.c:2534 Code: 88 fc ff ff 4c 89 ef e8 ed 75 c8 fb e9 c8 fc ff ff e8 43 76 c8 fb e9 63 fd ff ff e8 d9 75 c8 fb e9 94 f9 ff ff e8 bf 03 91 fb <0f> 0b e9 7d fa ff ff e8 b3 03 91 fb 0f b6 1d 37 43 7a 03 31 ff 89 RSP: 0018:ffff8880ae907c60 EFLAGS: 00010206 RAX: ffff8880a989c340 RBX: 0000000000000000 RCX: ffffffff85dedbdb RDX: 0000000000000100 RSI: ffffffff85dee0b1 RDI: 0000000000000005 RBP: ffff8880ae907c90 R08: ffff8880a989c340 R09: ffffed10147d1ae1 R10: ffffed10147d1ae0 R11: ffff8880a3e8d703 R12: ffff888091b90040 R13: ffff8880a3e8d540 R14: 0000000000008000 R15: ffff888091b90860 tcp_write_timer_handler+0x5c0/0x8a0 net/ipv4/tcp_timer.c:583 tcp_write_timer+0x10e/0x1d0 net/ipv4/tcp_timer.c:607 call_timer_fn+0x190/0x720 kernel/time/timer.c:1325 expire_timers kernel/time/timer.c:1362 [inline] __run_timers kernel/time/timer.c:1681 [inline] __run_timers kernel/time/timer.c:1649 [inline] run_timer_softirq+0x652/0x1700 kernel/time/timer.c:1694 __do_softirq+0x266/0x95a kernel/softirq.c:292 invoke_softirq kernel/softirq.c:373 [inline] irq_exit+0x180/0x1d0 kernel/softirq.c:413 exiting_irq arch/x86/include/asm/apic.h:536 [inline] smp_apic_timer_interrupt+0x14a/0x570 arch/x86/kernel/apic/apic.c:1062 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:807 </IRQ> RIP: 0010:native_safe_halt+0x2/0x10 arch/x86/include/asm/irqflags.h:58 Code: ff ff ff 48 89 c7 48 89 45 d8 e8 59 0c a1 fa 48 8b 45 d8 e9 ce fe ff ff 48 89 df e8 48 0c a1 fa eb 82 90 90 90 90 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90 RSP: 0018:ffff8880a98afd78 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13 RAX: 1ffffffff1125061 RBX: ffff8880a989c340 RCX: 0000000000000000 RDX: dffffc0000000000 RSI: 0000000000000001 RDI: ffff8880a989cbbc RBP: ffff8880a98afda8 R08: ffff8880a989c340 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 R13: ffffffff889282f8 R14: 0000000000000001 R15: 0000000000000000 arch_cpu_idle+0x10/0x20 arch/x86/kernel/process.c:555 default_idle_call+0x36/0x90 kernel/sched/idle.c:93 cpuidle_idle_call kernel/sched/idle.c:153 [inline] do_idle+0x386/0x570 kernel/sched/idle.c:262 cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:353 start_secondary+0x404/0x5c0 arch/x86/kernel/smpboot.c:271 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243 Kernel Offset: disabled Rebooting in 86400 seconds.. Fixes: 79861919b889 ("tcp: fix TCP_REPAIR xmit queue setup") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Andrey Vagin <avagin@openvz.org> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24net/x25: fix a race in x25_bind()Eric Dumazet1-5/+8
syzbot was able to trigger another soft lockup [1] I first thought it was the O(N^2) issue I mentioned in my prior fix (f657d22ee1f "net/x25: do not hold the cpu too long in x25_new_lci()"), but I eventually found that x25_bind() was not checking SOCK_ZAPPED state under socket lock protection. This means that multiple threads can end up calling x25_insert_socket() for the same socket, and corrupt x25_list [1] watchdog: BUG: soft lockup - CPU#0 stuck for 123s! [syz-executor.2:10492] Modules linked in: irq event stamp: 27515 hardirqs last enabled at (27514): [<ffffffff81006673>] trace_hardirqs_on_thunk+0x1a/0x1c hardirqs last disabled at (27515): [<ffffffff8100668f>] trace_hardirqs_off_thunk+0x1a/0x1c softirqs last enabled at (32): [<ffffffff8632ee73>] x25_get_neigh+0xa3/0xd0 net/x25/x25_link.c:336 softirqs last disabled at (34): [<ffffffff86324bc3>] x25_find_socket+0x23/0x140 net/x25/af_x25.c:341 CPU: 0 PID: 10492 Comm: syz-executor.2 Not tainted 5.0.0-rc7+ #88 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__sanitizer_cov_trace_pc+0x4/0x50 kernel/kcov.c:97 Code: f4 ff ff ff e8 11 9f ea ff 48 c7 05 12 fb e5 08 00 00 00 00 e9 c8 e9 ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 55 48 89 e5 <48> 8b 75 08 65 48 8b 04 25 40 ee 01 00 65 8b 15 38 0c 92 7e 81 e2 RSP: 0018:ffff88806e94fc48 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13 RAX: 1ffff1100d84dac5 RBX: 0000000000000001 RCX: ffffc90006197000 RDX: 0000000000040000 RSI: ffffffff86324bf3 RDI: ffff88806c26d628 RBP: ffff88806e94fc48 R08: ffff88806c1c6500 R09: fffffbfff1282561 R10: fffffbfff1282560 R11: ffffffff89412b03 R12: ffff88806c26d628 R13: ffff888090455200 R14: dffffc0000000000 R15: 0000000000000000 FS: 00007f3a107e4700(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f3a107e3db8 CR3: 00000000a5544000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __x25_find_socket net/x25/af_x25.c:327 [inline] x25_find_socket+0x7d/0x140 net/x25/af_x25.c:342 x25_new_lci net/x25/af_x25.c:355 [inline] x25_connect+0x380/0xde0 net/x25/af_x25.c:784 __sys_connect+0x266/0x330 net/socket.c:1662 __do_sys_connect net/socket.c:1673 [inline] __se_sys_connect net/socket.c:1670 [inline] __x64_sys_connect+0x73/0xb0 net/socket.c:1670 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x457e29 Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f3a107e3c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002a RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457e29 RDX: 0000000000000012 RSI: 0000000020000200 RDI: 0000000000000005 RBP: 000000000073c040 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f3a107e46d4 R13: 00000000004be362 R14: 00000000004ceb98 R15: 00000000ffffffff Sending NMI from CPU 0 to CPUs 1: NMI backtrace for cpu 1 CPU: 1 PID: 10493 Comm: syz-executor.3 Not tainted 5.0.0-rc7+ #88 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__read_once_size include/linux/compiler.h:193 [inline] RIP: 0010:queued_write_lock_slowpath+0x143/0x290 kernel/locking/qrwlock.c:86 Code: 4c 8d 2c 01 41 83 c7 03 41 0f b6 45 00 41 38 c7 7c 08 84 c0 0f 85 0c 01 00 00 8b 03 3d 00 01 00 00 74 1a f3 90 41 0f b6 55 00 <41> 38 d7 7c eb 84 d2 74 e7 48 89 df e8 cc aa 4e 00 eb dd be 04 00 RSP: 0018:ffff888085c47bd8 EFLAGS: 00000206 RAX: 0000000000000300 RBX: ffffffff89412b00 RCX: 1ffffffff1282560 RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff89412b00 RBP: ffff888085c47c70 R08: 1ffffffff1282560 R09: fffffbfff1282561 R10: fffffbfff1282560 R11: ffffffff89412b03 R12: 00000000000000ff R13: fffffbfff1282560 R14: 1ffff11010b88f7d R15: 0000000000000003 FS: 00007fdd04086700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fdd04064db8 CR3: 0000000090be0000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: queued_write_lock include/asm-generic/qrwlock.h:104 [inline] do_raw_write_lock+0x1d6/0x290 kernel/locking/spinlock_debug.c:203 __raw_write_lock_bh include/linux/rwlock_api_smp.h:204 [inline] _raw_write_lock_bh+0x3b/0x50 kernel/locking/spinlock.c:312 x25_insert_socket+0x21/0xe0 net/x25/af_x25.c:267 x25_bind+0x273/0x340 net/x25/af_x25.c:703 __sys_bind+0x23f/0x290 net/socket.c:1481 __do_sys_bind net/socket.c:1492 [inline] __se_sys_bind net/socket.c:1490 [inline] __x64_sys_bind+0x73/0xb0 net/socket.c:1490 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x457e29 Fixes: 90c27297a9bf ("X.25 remove bkl in bind") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: andrew hendry <andrew.hendry@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24net: dsa: Remove documentation for port_fdb_prepareHauke Mehrtens1-7/+3
This callback was removed some time ago, also remove the documentation. Fixes: 1b6dd556c304 ("net: dsa: Remove prepare phase for FDB") Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24Revert "bridge: do not add port to router list when receives query with ↵Hangbin Liu1-8/+1
source 0.0.0.0" This reverts commit 5a2de63fd1a5 ("bridge: do not add port to router list when receives query with source 0.0.0.0") and commit 0fe5119e267f ("net: bridge: remove ipv6 zero address check in mcast queries") The reason is RFC 4541 is not a standard but suggestive. Currently we will elect 0.0.0.0 as Querier if there is no ip address configured on bridge. If we do not add the port which recives query with source 0.0.0.0 to router list, the IGMP reports will not be about to forward to Querier, IGMP data will also not be able to forward to dest. As Nikolay suggested, revert this change first and add a boolopt api to disable none-zero election in future if needed. Reported-by: Linus Lüssing <linus.luessing@c0d3.blue> Reported-by: Sebastian Gottschall <s.gottschall@newmedia-net.de> Fixes: 5a2de63fd1a5 ("bridge: do not add port to router list when receives query with source 0.0.0.0") Fixes: 0fe5119e267f ("net: bridge: remove ipv6 zero address check in mcast queries") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24selftests: fib_tests: sleep after changing carrier. again.Thadeu Lima de Souza Cascardo1-0/+1
Just like commit e2ba732a1681 ("selftests: fib_tests: sleep after changing carrier"), wait one second to allow linkwatch to propagate the carrier change to the stack. There are two sets of carrier tests. The first slept after the carrier was set to off, and when the second set ran, it was likely that the linkwatch would be able to run again without much delay, reducing the likelihood of a race. However, if you run 'fib_tests.sh -t carrier' on a loop, you will quickly notice the failures. Sleeping on the second set of tests make the failures go away. Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-23net: set static variable an initial value in atl2_probe()Mao Wenan1-3/+1
cards_found is a static variable, but when it enters atl2_probe(), cards_found is set to zero, the value is not consistent with last probe, so next behavior is not our expect. Signed-off-by: Mao Wenan <maowenan@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-23net: phy: marvell10g: Fix Multi-G advertisement to only advertise 10GMaxime Chevallier1-1/+5
Some Marvell Alaska PHYs support 2.5G, 5G and 10G BaseT links. Their default behaviour is to advertise all of these modes, but at the moment, only 10GBaseT is supported. To prevent link partners from establishing link at that speed, clear these modes upon configuring aneg parameters. Fixes: 20b2af32ff3f ("net: phy: add Marvell Alaska X 88X3310 10Gigabit PHY support") Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reported-by: Russell King <linux@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-23bpf, doc: add bpf list as secondary entry to maintainers fileDaniel Borkmann1-1/+13
We recently created a bpf@vger.kernel.org list (https://lore.kernel.org/bpf/) for BPF related discussions, originally in context of BPF track at LSF/MM for topic discussions. It's *optional* but *desirable* to keep it in Cc for BPF related kernel/loader/llvm/tooling threads, meaning also infrastructure like llvm that sits on top of kernel but is crucial to BPF. In any case, netdev with it's bpf delegate is *as-is* today primary list for patches, so nothing changes in the workflow. Main purpose is to have some more awareness for the bpf@vger.kernel.org list that folks can Cc for BPF specific topics. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-23udp: fix possible user after free in error handlerPaolo Abeni1-2/+4
Similar to the previous commit, this addresses the same issue for ipv4: use a single fetch operation and use the correct rcu annotation. Fixes: e7cc082455cb ("udp: Support for error handlers of tunnels with arbitrary destination port") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-23udpv6: fix possible user after free in error handlerPaolo Abeni1-4/+6
Before derefencing the encap pointer, commit e7cc082455cb ("udp: Support for error handlers of tunnels with arbitrary destination port") checks for a NULL value, but the two fetch operation can race with removal. Fix the above using a single access. Also fix a couple of type annotations, to make sparse happy. Fixes: e7cc082455cb ("udp: Support for error handlers of tunnels with arbitrary destination port") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-23fou6: fix proto error handler argument typePaolo Abeni1-1/+1
Last argument of gue6_err_proto_handler() has a wrong type annotation, fix it and make sparse happy again. Fixes: b8a51b38e4d4 ("fou, fou6: ICMP error handlers for FoU and GUE") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-23udpv6: add the required annotation to mib typePaolo Abeni1-1/+1
In commit 029a37434880 ("udp6: cleanup stats accounting in recvmsg()") I forgot to add the percpu annotation for the mib pointer. Add it, and make sparse happy. Fixes: 029a37434880 ("udp6: cleanup stats accounting in recvmsg()") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-23mdio_bus: Fix use-after-free on device_register failsYueHaibing1-1/+0
KASAN has found use-after-free in fixed_mdio_bus_init, commit 0c692d07842a ("drivers/net/phy/mdio_bus.c: call put_device on device_register() failure") call put_device() while device_register() fails,give up the last reference to the device and allow mdiobus_release to be executed ,kfreeing the bus. However in most drives, mdiobus_free be called to free the bus while mdiobus_register fails. use-after-free occurs when access bus again, this patch revert it to let mdiobus_free free the bus. KASAN report details as below: BUG: KASAN: use-after-free in mdiobus_free+0x85/0x90 drivers/net/phy/mdio_bus.c:482 Read of size 4 at addr ffff8881dc824d78 by task syz-executor.0/3524 CPU: 1 PID: 3524 Comm: syz-executor.0 Not tainted 5.0.0-rc7+ #45 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xfa/0x1ce lib/dump_stack.c:113 print_address_description+0x65/0x270 mm/kasan/report.c:187 kasan_report+0x149/0x18d mm/kasan/report.c:317 mdiobus_free+0x85/0x90 drivers/net/phy/mdio_bus.c:482 fixed_mdio_bus_init+0x283/0x1000 [fixed_phy] ? 0xffffffffc0e40000 ? 0xffffffffc0e40000 ? 0xffffffffc0e40000 do_one_initcall+0xfa/0x5ca init/main.c:887 do_init_module+0x204/0x5f6 kernel/module.c:3460 load_module+0x66b2/0x8570 kernel/module.c:3808 __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902 do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x462e99 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f6215c19c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99 RDX: 0000000000000000 RSI: 0000000020000080 RDI: 0000000000000003 RBP: 00007f6215c19c70 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f6215c1a6bc R13: 00000000004bcefb R14: 00000000006f7030 R15: 0000000000000004 Allocated by task 3524: set_track mm/kasan/common.c:85 [inline] __kasan_kmalloc.constprop.3+0xa0/0xd0 mm/kasan/common.c:496 kmalloc include/linux/slab.h:545 [inline] kzalloc include/linux/slab.h:740 [inline] mdiobus_alloc_size+0x54/0x1b0 drivers/net/phy/mdio_bus.c:143 fixed_mdio_bus_init+0x163/0x1000 [fixed_phy] do_one_initcall+0xfa/0x5ca init/main.c:887 do_init_module+0x204/0x5f6 kernel/module.c:3460 load_module+0x66b2/0x8570 kernel/module.c:3808 __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902 do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 3524: set_track mm/kasan/common.c:85 [inline] __kasan_slab_free+0x130/0x180 mm/kasan/common.c:458 slab_free_hook mm/slub.c:1409 [inline] slab_free_freelist_hook mm/slub.c:1436 [inline] slab_free mm/slub.c:2986 [inline] kfree+0xe1/0x270 mm/slub.c:3938 device_release+0x78/0x200 drivers/base/core.c:919 kobject_cleanup lib/kobject.c:662 [inline] kobject_release lib/kobject.c:691 [inline] kref_put include/linux/kref.h:67 [inline] kobject_put+0x146/0x240 lib/kobject.c:708 put_device+0x1c/0x30 drivers/base/core.c:2060 __mdiobus_register+0x483/0x560 drivers/net/phy/mdio_bus.c:382 fixed_mdio_bus_init+0x26b/0x1000 [fixed_phy] do_one_initcall+0xfa/0x5ca init/main.c:887 do_init_module+0x204/0x5f6 kernel/module.c:3460 load_module+0x66b2/0x8570 kernel/module.c:3808 __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902 do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff8881dc824c80 which belongs to the cache kmalloc-2k of size 2048 The buggy address is located 248 bytes inside of 2048-byte region [ffff8881dc824c80, ffff8881dc825480) The buggy address belongs to the page: page:ffffea0007720800 count:1 mapcount:0 mapping:ffff8881f6c02800 index:0x0 compound_mapcount: 0 flags: 0x2fffc0000010200(slab|head) raw: 02fffc0000010200 0000000000000000 0000000500000001 ffff8881f6c02800 raw: 0000000000000000 00000000800f000f 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8881dc824c00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8881dc824c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff8881dc824d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8881dc824d80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8881dc824e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: 0c692d07842a ("drivers/net/phy/mdio_bus.c: call put_device on device_register() failure") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-23net: Set rtm_table to RT_TABLE_COMPAT for ipv6 for tables > 255Kalash Nainwal1-1/+1
Set rtm_table to RT_TABLE_COMPAT for ipv6 for tables > 255 to keep legacy software happy. This is similar to what was done for ipv4 in commit 709772e6e065 ("net: Fix routing tables with id > 255 for legacy software"). Signed-off-by: Kalash Nainwal <kalash@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-23bnxt_en: Wait longer for the firmware message response to complete.Michael Chan2-2/+2
The code waits up to 20 usec for the firmware response to complete once we've seen the valid response header in the buffer. It turns out that in some scenarios, this wait time is not long enough. Extend it to 150 usec and use usleep_range() instead of udelay(). Fixes: 9751e8e71487 ("bnxt_en: reduce timeout on initial HWRM calls") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-23bnxt_en: Fix typo in firmware message timeout logic.Michael Chan1-1/+1
The logic that polls for the firmware message response uses a shorter sleep interval for the first few passes. But there was a typo so it was using the wrong counter (larger counter) for these short sleep passes. The result is a slightly shorter timeout period for these firmware messages than intended. Fix it by using the proper counter. Fixes: 9751e8e71487 ("bnxt_en: reduce timeout on initial HWRM calls") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-23nfp: bpf: fix ALU32 high bits clearance bugJiong Wang1-11/+6
NFP BPF JIT compiler is doing a couple of small optimizations when jitting ALU imm instructions, some of these optimizations could save code-gen, for example: A & -1 = A A | 0 = A A ^ 0 = A However, for ALU32, high 32-bit of the 64-bit register should still be cleared according to ISA semantics. Fixes: cd7df56ed3e6 ("nfp: add BPF to NFP code translator") Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-23nfp: bpf: fix code-gen bug on BPF_ALU | BPF_XOR | BPF_KJiong Wang1-1/+1
The intended optimization should be A ^ 0 = A, not A ^ -1 = A. Fixes: cd7df56ed3e6 ("nfp: add BPF to NFP code translator") Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-22Documentation: networking: switchdev: Update port parent ID sectionFlorian Fainelli1-5/+5
Update the section about switchdev drivers having to implement a switchdev_port_attr_get() function to return SWITCHDEV_ATTR_ID_PORT_PARENT_ID since that is no longer valid after commit bccb30254a4a ("net: Get rid of SWITCHDEV_ATTR_ID_PORT_PARENT_ID"). Fixes: bccb30254a4a ("net: Get rid of SWITCHDEV_ATTR_ID_PORT_PARENT_ID") Reviewed-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-22net: socket: add check for negative optlen in compat setsockoptJann Horn1-1/+5
__sys_setsockopt() already checks for `optlen < 0`. Add an equivalent check to the compat path for robustness. This has to be `> INT_MAX` instead of `< 0` because the signedness of `optlen` is different here. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-22ipv6: route: purge exception on removalPaolo Abeni1-1/+12
When a netdevice is unregistered, we flush the relevant exception via rt6_sync_down_dev() -> fib6_ifdown() -> fib6_del() -> fib6_del_route(). Finally, we end-up calling rt6_remove_exception(), where we release the relevant dst, while we keep the references to the related fib6_info and dev. Such references should be released later when the dst will be destroyed. There are a number of caches that can keep the exception around for an unlimited amount of time - namely dst_cache, possibly even socket cache. As a result device registration may hang, as demonstrated by this script: ip netns add cl ip netns add rt ip netns add srv ip netns exec rt sysctl -w net.ipv6.conf.all.forwarding=1 ip link add name cl_veth type veth peer name cl_rt_veth ip link set dev cl_veth netns cl ip -n cl link set dev cl_veth up ip -n cl addr add dev cl_veth 2001::2/64 ip -n cl route add default via 2001::1 ip -n cl link add tunv6 type ip6tnl mode ip6ip6 local 2001::2 remote 2002::1 hoplimit 64 dev cl_veth ip -n cl link set tunv6 up ip -n cl addr add 2013::2/64 dev tunv6 ip link set dev cl_rt_veth netns rt ip -n rt link set dev cl_rt_veth up ip -n rt addr add dev cl_rt_veth 2001::1/64 ip link add name rt_srv_veth type veth peer name srv_veth ip link set dev srv_veth netns srv ip -n srv link set dev srv_veth up ip -n srv addr add dev srv_veth 2002::1/64 ip -n srv route add default via 2002::2 ip -n srv link add tunv6 type ip6tnl mode ip6ip6 local 2002::1 remote 2001::2 hoplimit 64 dev srv_veth ip -n srv link set tunv6 up ip -n srv addr add 2013::1/64 dev tunv6 ip link set dev rt_srv_veth netns rt ip -n rt link set dev rt_srv_veth up ip -n rt addr add dev rt_srv_veth 2002::2/64 ip netns exec srv netserver & sleep 0.1 ip netns exec cl ping6 -c 4 2013::1 ip netns exec cl netperf -H 2013::1 -t TCP_STREAM -l 3 & sleep 1 ip -n rt link set dev rt_srv_veth mtu 1400 wait %2 ip -n cl link del cl_veth This commit addresses the issue purging all the references held by the exception at time, as we currently do for e.g. ipv6 pcpu dst entries. v1 -> v2: - re-order the code to avoid accessing dst and net after dst_dev_put() Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-22net: thunderx: remove link change polling code and info from nicpfVadim Lomovtsev1-102/+12
Since link change polling routine was moved to nicvf side, we don't need anymore polling function at nicpf side along with link status info for all enabled Vfs as at VF side this info is already tracked. This commit is to remove unnecessary code & fields from nicpf structure. Signed-off-by: Vadim Lomovtsev <vlomovtsev@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-22net: thunderx: move link state polling function to VFVadim Lomovtsev3-19/+74
Move the link change polling task to VF side in order to prevent races between VF and PF while sending link change message(s). This commit is to implement link change request to be initiated by VF. Signed-off-by: Vadim Lomovtsev <vlomovtsev@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-22net: thunderx: add mutex to protect mailbox from concurrent calls for same VFVadim Lomovtsev2-3/+12
In some cases it could happen that nicvf_send_msg_to_pf() could be called concurrently for the same NIC VF, and thus re-writing mailbox contents and breaking messaging sequence with PF by re-writing NICVF data. This commit is to implement mutex for NICVF to protect mailbox registers and NICVF messaging control data from concurrent access. Signed-off-by: Vadim Lomovtsev <vlomovtsev@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-22net: thunderx: rework xcast message structure to make it fit into 64 bitVadim Lomovtsev3-9/+7
To communicate to PF each of ThunderX NIC VF uses mailbox which is pair of 64 bit registers available to both VFn and PF. This commit is to change the xcast message structure in order to fit it into 64 bit. Signed-off-by: Vadim Lomovtsev <vlomovtsev@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-22net: thunderx: add nicvf_send_msg_to_pf result check for set_rx_mode_taskVadim Lomovtsev1-4/+8
The rx_set_mode invokes number of messages to be send to PF for receive mode configuration. In case if there any issues we need to stop sending messages and release allocated memory. This commit is to implement check of nicvf_msg_send_to_pf() result. Signed-off-by: Vadim Lomovtsev <vlomovtsev@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-22net: thunderx: make CFG_DONE message to run through generic send-ack sequenceVadim Lomovtsev2-4/+13
At the end of NIC VF initialization VF sends CFG_DONE message to PF without using nicvf_msg_send_to_pf routine. This potentially could re-write data in mailbox. This commit is to implement common way of sending CFG_DONE message by the same way with other configuration messages by using nicvf_send_msg_to_pf() routine. Signed-off-by: Vadim Lomovtsev <vlomovtsev@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>