summaryrefslogtreecommitdiffstats
path: root/drivers/net (follow)
Commit message (Collapse)AuthorAgeFilesLines
* net/mlx5e: Always start IPsec sequence number from 1Leon Romanovsky45 hours2-3/+14
| | | | | | | | | | | | | | | | According to RFC4303, section "3.3.3. Sequence Number Generation", the first packet sent using a given SA will contain a sequence number of 1. This is applicable to both ESN and non-ESN mode, which was not covered in commit mentioned in Fixes line. Fixes: 3d42c8cc67a8 ("net/mlx5e: Ensure that IPsec sequence packet number starts from 1") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* net/mlx5e: Rely on reqid in IPsec tunnel modeLeon Romanovsky45 hours1-7/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All packet offloads SAs have reqid in it to make sure they have corresponding policy. While it is not strictly needed for transparent mode, it is extremely important in tunnel mode. In that mode, policy and SAs have different match criteria. Policy catches the whole subnet addresses, and SA catches the tunnel gateways addresses. The source address of such tunnel is not known during egress packet traversal in flow steering as it is added only after successful encryption. As reqid is required for packet offload and it is unique for every SA, we can safely rely on it only. The output below shows the configured egress policy and SA by strongswan: [leonro@vm ~]$ sudo ip x s src 192.169.101.2 dst 192.169.101.1 proto esp spi 0xc88b7652 reqid 1 mode tunnel replay-window 0 flag af-unspec esn aead rfc4106(gcm(aes)) 0xe406a01083986e14d116488549094710e9c57bc6 128 anti-replay esn context: seq-hi 0x0, seq 0x0, oseq-hi 0x0, oseq 0x0 replay_window 1, bitmap-length 1 00000000 crypto offload parameters: dev eth2 dir out mode packet [leonro@064 ~]$ sudo ip x p src 192.170.0.0/16 dst 192.170.0.0/16 dir out priority 383615 ptype main tmpl src 192.169.101.2 dst 192.169.101.1 proto esp spi 0xc88b7652 reqid 1 mode tunnel crypto offload parameters: dev eth2 mode packet Fixes: b3beba1fb404 ("net/mlx5e: Allow policies with reqid 0, to support IKE policy holes") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* net/mlx5e: Fix inversion dependency warning while enabling IPsec tunnelLeon Romanovsky45 hours1-10/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Attempt to enable IPsec packet offload in tunnel mode in debug kernel generates the following kernel panic, which is happening due to two issues: 1. In SA add section, the should be _bh() variant when marking SA mode. 2. There is not needed flush_workqueue in SA delete routine. It is not needed as at this stage as it is removed from SADB and the running work will be canceled later in SA free. ===================================================== WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected 6.12.0+ #4 Not tainted ----------------------------------------------------- charon/1337 [HC0[0]:SC0[4]:HE1:SE0] is trying to acquire: ffff88810f365020 (&xa->xa_lock#24){+.+.}-{3:3}, at: mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core] and this task is already holding: ffff88813e0f0d48 (&x->lock){+.-.}-{3:3}, at: xfrm_state_delete+0x16/0x30 which would create a new lock dependency: (&x->lock){+.-.}-{3:3} -> (&xa->xa_lock#24){+.+.}-{3:3} but this new dependency connects a SOFTIRQ-irq-safe lock: (&x->lock){+.-.}-{3:3} ... which became SOFTIRQ-irq-safe at: lock_acquire+0x1be/0x520 _raw_spin_lock_bh+0x34/0x40 xfrm_timer_handler+0x91/0xd70 __hrtimer_run_queues+0x1dd/0xa60 hrtimer_run_softirq+0x146/0x2e0 handle_softirqs+0x266/0x860 irq_exit_rcu+0x115/0x1a0 sysvec_apic_timer_interrupt+0x6e/0x90 asm_sysvec_apic_timer_interrupt+0x16/0x20 default_idle+0x13/0x20 default_idle_call+0x67/0xa0 do_idle+0x2da/0x320 cpu_startup_entry+0x50/0x60 start_secondary+0x213/0x2a0 common_startup_64+0x129/0x138 to a SOFTIRQ-irq-unsafe lock: (&xa->xa_lock#24){+.+.}-{3:3} ... which became SOFTIRQ-irq-unsafe at: ... lock_acquire+0x1be/0x520 _raw_spin_lock+0x2c/0x40 xa_set_mark+0x70/0x110 mlx5e_xfrm_add_state+0xe48/0x2290 [mlx5_core] xfrm_dev_state_add+0x3bb/0xd70 xfrm_add_sa+0x2451/0x4a90 xfrm_user_rcv_msg+0x493/0x880 netlink_rcv_skb+0x12e/0x380 xfrm_netlink_rcv+0x6d/0x90 netlink_unicast+0x42f/0x740 netlink_sendmsg+0x745/0xbe0 __sock_sendmsg+0xc5/0x190 __sys_sendto+0x1fe/0x2c0 __x64_sys_sendto+0xdc/0x1b0 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&xa->xa_lock#24); local_irq_disable(); lock(&x->lock); lock(&xa->xa_lock#24); <Interrupt> lock(&x->lock); *** DEADLOCK *** 2 locks held by charon/1337: #0: ffffffff87f8f858 (&net->xfrm.xfrm_cfg_mutex){+.+.}-{4:4}, at: xfrm_netlink_rcv+0x5e/0x90 #1: ffff88813e0f0d48 (&x->lock){+.-.}-{3:3}, at: xfrm_state_delete+0x16/0x30 the dependencies between SOFTIRQ-irq-safe lock and the holding lock: -> (&x->lock){+.-.}-{3:3} ops: 29 { HARDIRQ-ON-W at: lock_acquire+0x1be/0x520 _raw_spin_lock_bh+0x34/0x40 xfrm_alloc_spi+0xc0/0xe60 xfrm_alloc_userspi+0x5f6/0xbc0 xfrm_user_rcv_msg+0x493/0x880 netlink_rcv_skb+0x12e/0x380 xfrm_netlink_rcv+0x6d/0x90 netlink_unicast+0x42f/0x740 netlink_sendmsg+0x745/0xbe0 __sock_sendmsg+0xc5/0x190 __sys_sendto+0x1fe/0x2c0 __x64_sys_sendto+0xdc/0x1b0 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 IN-SOFTIRQ-W at: lock_acquire+0x1be/0x520 _raw_spin_lock_bh+0x34/0x40 xfrm_timer_handler+0x91/0xd70 __hrtimer_run_queues+0x1dd/0xa60 hrtimer_run_softirq+0x146/0x2e0 handle_softirqs+0x266/0x860 irq_exit_rcu+0x115/0x1a0 sysvec_apic_timer_interrupt+0x6e/0x90 asm_sysvec_apic_timer_interrupt+0x16/0x20 default_idle+0x13/0x20 default_idle_call+0x67/0xa0 do_idle+0x2da/0x320 cpu_startup_entry+0x50/0x60 start_secondary+0x213/0x2a0 common_startup_64+0x129/0x138 INITIAL USE at: lock_acquire+0x1be/0x520 _raw_spin_lock_bh+0x34/0x40 xfrm_alloc_spi+0xc0/0xe60 xfrm_alloc_userspi+0x5f6/0xbc0 xfrm_user_rcv_msg+0x493/0x880 netlink_rcv_skb+0x12e/0x380 xfrm_netlink_rcv+0x6d/0x90 netlink_unicast+0x42f/0x740 netlink_sendmsg+0x745/0xbe0 __sock_sendmsg+0xc5/0x190 __sys_sendto+0x1fe/0x2c0 __x64_sys_sendto+0xdc/0x1b0 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 } ... key at: [<ffffffff87f9cd20>] __key.18+0x0/0x40 the dependencies between the lock to be acquired and SOFTIRQ-irq-unsafe lock: -> (&xa->xa_lock#24){+.+.}-{3:3} ops: 9 { HARDIRQ-ON-W at: lock_acquire+0x1be/0x520 _raw_spin_lock_bh+0x34/0x40 mlx5e_xfrm_add_state+0xc5b/0x2290 [mlx5_core] xfrm_dev_state_add+0x3bb/0xd70 xfrm_add_sa+0x2451/0x4a90 xfrm_user_rcv_msg+0x493/0x880 netlink_rcv_skb+0x12e/0x380 xfrm_netlink_rcv+0x6d/0x90 netlink_unicast+0x42f/0x740 netlink_sendmsg+0x745/0xbe0 __sock_sendmsg+0xc5/0x190 __sys_sendto+0x1fe/0x2c0 __x64_sys_sendto+0xdc/0x1b0 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 SOFTIRQ-ON-W at: lock_acquire+0x1be/0x520 _raw_spin_lock+0x2c/0x40 xa_set_mark+0x70/0x110 mlx5e_xfrm_add_state+0xe48/0x2290 [mlx5_core] xfrm_dev_state_add+0x3bb/0xd70 xfrm_add_sa+0x2451/0x4a90 xfrm_user_rcv_msg+0x493/0x880 netlink_rcv_skb+0x12e/0x380 xfrm_netlink_rcv+0x6d/0x90 netlink_unicast+0x42f/0x740 netlink_sendmsg+0x745/0xbe0 __sock_sendmsg+0xc5/0x190 __sys_sendto+0x1fe/0x2c0 __x64_sys_sendto+0xdc/0x1b0 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 INITIAL USE at: lock_acquire+0x1be/0x520 _raw_spin_lock_bh+0x34/0x40 mlx5e_xfrm_add_state+0xc5b/0x2290 [mlx5_core] xfrm_dev_state_add+0x3bb/0xd70 xfrm_add_sa+0x2451/0x4a90 xfrm_user_rcv_msg+0x493/0x880 netlink_rcv_skb+0x12e/0x380 xfrm_netlink_rcv+0x6d/0x90 netlink_unicast+0x42f/0x740 netlink_sendmsg+0x745/0xbe0 __sock_sendmsg+0xc5/0x190 __sys_sendto+0x1fe/0x2c0 __x64_sys_sendto+0xdc/0x1b0 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 } ... key at: [<ffffffffa078ff60>] __key.48+0x0/0xfffffffffff210a0 [mlx5_core] ... acquired at: __lock_acquire+0x30a0/0x5040 lock_acquire+0x1be/0x520 _raw_spin_lock_bh+0x34/0x40 mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core] xfrm_dev_state_delete+0x90/0x160 __xfrm_state_delete+0x662/0xae0 xfrm_state_delete+0x1e/0x30 xfrm_del_sa+0x1c2/0x340 xfrm_user_rcv_msg+0x493/0x880 netlink_rcv_skb+0x12e/0x380 xfrm_netlink_rcv+0x6d/0x90 netlink_unicast+0x42f/0x740 netlink_sendmsg+0x745/0xbe0 __sock_sendmsg+0xc5/0x190 __sys_sendto+0x1fe/0x2c0 __x64_sys_sendto+0xdc/0x1b0 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 stack backtrace: CPU: 7 UID: 0 PID: 1337 Comm: charon Not tainted 6.12.0+ #4 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x74/0xd0 check_irq_usage+0x12e8/0x1d90 ? print_shortest_lock_dependencies_backwards+0x1b0/0x1b0 ? check_chain_key+0x1bb/0x4c0 ? __lockdep_reset_lock+0x180/0x180 ? check_path.constprop.0+0x24/0x50 ? mark_lock+0x108/0x2fb0 ? print_circular_bug+0x9b0/0x9b0 ? mark_lock+0x108/0x2fb0 ? print_usage_bug.part.0+0x670/0x670 ? check_prev_add+0x1c4/0x2310 check_prev_add+0x1c4/0x2310 __lock_acquire+0x30a0/0x5040 ? lockdep_set_lock_cmp_fn+0x190/0x190 ? lockdep_set_lock_cmp_fn+0x190/0x190 lock_acquire+0x1be/0x520 ? mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core] ? lockdep_hardirqs_on_prepare+0x400/0x400 ? __xfrm_state_delete+0x5f0/0xae0 ? lock_downgrade+0x6b0/0x6b0 _raw_spin_lock_bh+0x34/0x40 ? mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core] mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core] xfrm_dev_state_delete+0x90/0x160 __xfrm_state_delete+0x662/0xae0 xfrm_state_delete+0x1e/0x30 xfrm_del_sa+0x1c2/0x340 ? xfrm_get_sa+0x250/0x250 ? check_chain_key+0x1bb/0x4c0 xfrm_user_rcv_msg+0x493/0x880 ? copy_sec_ctx+0x270/0x270 ? check_chain_key+0x1bb/0x4c0 ? lockdep_set_lock_cmp_fn+0x190/0x190 ? lockdep_set_lock_cmp_fn+0x190/0x190 netlink_rcv_skb+0x12e/0x380 ? copy_sec_ctx+0x270/0x270 ? netlink_ack+0xd90/0xd90 ? netlink_deliver_tap+0xcd/0xb60 xfrm_netlink_rcv+0x6d/0x90 netlink_unicast+0x42f/0x740 ? netlink_attachskb+0x730/0x730 ? lock_acquire+0x1be/0x520 netlink_sendmsg+0x745/0xbe0 ? netlink_unicast+0x740/0x740 ? __might_fault+0xbb/0x170 ? netlink_unicast+0x740/0x740 __sock_sendmsg+0xc5/0x190 ? fdget+0x163/0x1d0 __sys_sendto+0x1fe/0x2c0 ? __x64_sys_getpeername+0xb0/0xb0 ? do_user_addr_fault+0x856/0xe30 ? lock_acquire+0x1be/0x520 ? __task_pid_nr_ns+0x117/0x410 ? lock_downgrade+0x6b0/0x6b0 __x64_sys_sendto+0xdc/0x1b0 ? lockdep_hardirqs_on_prepare+0x284/0x400 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7f7d31291ba4 Code: 7d e8 89 4d d4 e8 4c 42 f7 ff 44 8b 4d d0 4c 8b 45 c8 89 c3 44 8b 55 d4 8b 7d e8 b8 2c 00 00 00 48 8b 55 d8 48 8b 75 e0 0f 05 <48> 3d 00 f0 ff ff 77 34 89 df 48 89 45 e8 e8 99 42 f7 ff 48 8b 45 RSP: 002b:00007f7d2ccd94f0 EFLAGS: 00000297 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f7d31291ba4 RDX: 0000000000000028 RSI: 00007f7d2ccd96a0 RDI: 000000000000000a RBP: 00007f7d2ccd9530 R08: 00007f7d2ccd9598 R09: 000000000000000c R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000028 R13: 00007f7d2ccd9598 R14: 00007f7d2ccd96a0 R15: 00000000000000e1 </TASK> Fixes: 4c24272b4e2b ("net/mlx5e: Listen to ARP events to update IPsec L2 headers in tunnel mode") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* net/mlx5: Clear port select structure when fail to createMark Zhang45 hours1-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clear the port select structure on error so no stale values left after definers are destroyed. That's because the mlx5_lag_destroy_definers() always try to destroy all lag definers in the tt_map, so in the flow below lag definers get double-destroyed and cause kernel crash: mlx5_lag_port_sel_create() mlx5_lag_create_definers() mlx5_lag_create_definer() <- Failed on tt 1 mlx5_lag_destroy_definers() <- definers[tt=0] gets destroyed mlx5_lag_port_sel_create() mlx5_lag_create_definers() mlx5_lag_create_definer() <- Failed on tt 0 mlx5_lag_destroy_definers() <- definers[tt=0] gets double-destroyed Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 Mem abort info: ESR = 0x0000000096000005 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x05: level 1 translation fault Data abort info: ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 64k pages, 48-bit VAs, pgdp=0000000112ce2e00 [0000000000000008] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000 Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP Modules linked in: iptable_raw bonding ip_gre ip6_gre gre ip6_tunnel tunnel6 geneve ip6_udp_tunnel udp_tunnel ipip tunnel4 ip_tunnel rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_ib(OE) ib_uverbs(OE) mlx5_fwctl(OE) fwctl(OE) mlx5_core(OE) mlxdevm(OE) ib_core(OE) mlxfw(OE) memtrack(OE) mlx_compat(OE) openvswitch nsh nf_conncount psample xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc netconsole overlay efi_pstore sch_fq_codel zram ip_tables crct10dif_ce qemu_fw_cfg fuse ipv6 crc_ccitt [last unloaded: mlx_compat(OE)] CPU: 3 UID: 0 PID: 217 Comm: kworker/u53:2 Tainted: G OE 6.11.0+ #2 Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 Workqueue: mlx5_lag mlx5_do_bond_work [mlx5_core] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : mlx5_del_flow_rules+0x24/0x2c0 [mlx5_core] lr : mlx5_lag_destroy_definer+0x54/0x100 [mlx5_core] sp : ffff800085fafb00 x29: ffff800085fafb00 x28: ffff0000da0c8000 x27: 0000000000000000 x26: ffff0000da0c8000 x25: ffff0000da0c8000 x24: ffff0000da0c8000 x23: ffff0000c31f81a0 x22: 0400000000000000 x21: ffff0000da0c8000 x20: 0000000000000000 x19: 0000000000000001 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffff8b0c9350 x14: 0000000000000000 x13: ffff800081390d18 x12: ffff800081dc3cc0 x11: 0000000000000001 x10: 0000000000000b10 x9 : ffff80007ab7304c x8 : ffff0000d00711f0 x7 : 0000000000000004 x6 : 0000000000000190 x5 : ffff00027edb3010 x4 : 0000000000000000 x3 : 0000000000000000 x2 : ffff0000d39b8000 x1 : ffff0000d39b8000 x0 : 0400000000000000 Call trace: mlx5_del_flow_rules+0x24/0x2c0 [mlx5_core] mlx5_lag_destroy_definer+0x54/0x100 [mlx5_core] mlx5_lag_destroy_definers+0xa0/0x108 [mlx5_core] mlx5_lag_port_sel_create+0x2d4/0x6f8 [mlx5_core] mlx5_activate_lag+0x60c/0x6f8 [mlx5_core] mlx5_do_bond_work+0x284/0x5c8 [mlx5_core] process_one_work+0x170/0x3e0 worker_thread+0x2d8/0x3e0 kthread+0x11c/0x128 ret_from_fork+0x10/0x20 Code: a9025bf5 aa0003f6 a90363f7 f90023f9 (f9400400) ---[ end trace 0000000000000000 ]--- Fixes: dc48516ec7d3 ("net/mlx5: Lag, add support to create definers for LAG") Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* net/mlx5: SF, Fix add port error handlingChris Mi45 hours1-0/+1
| | | | | | | | | | | | | | | | | If failed to add SF, error handling doesn't delete the SF from the SF table. But the hw resources are deleted. So when unload driver, hw resources will be deleted again. Firmware will report syndrome 0x68def3 which means "SF is not allocated can not deallocate". Fix it by delete SF from SF table if failed to add SF. Fixes: 2597ee190b4e ("net/mlx5: Call mlx5_sf_id_erase() once in mlx5_sf_dealloc()") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Shay Drori <shayd@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* net/mlx5: Fix a lockdep warning as part of the write combining testYishai Hadas45 hours1-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a lockdep warning [1] observed during the write combining test. The warning indicates a potential nested lock scenario that could lead to a deadlock. However, this is a false positive alarm because the SF lock and its parent lock are distinct ones. The lockdep confusion arises because the locks belong to the same object class (i.e., struct mlx5_core_dev). To resolve this, the code has been refactored to avoid taking both locks. Instead, only the parent lock is acquired. [1] raw_ethernet_bw/2118 is trying to acquire lock: [ 213.619032] ffff88811dd75e08 (&dev->wc_state_lock){+.+.}-{3:3}, at: mlx5_wc_support_get+0x18c/0x210 [mlx5_core] [ 213.620270] [ 213.620270] but task is already holding lock: [ 213.620943] ffff88810b585e08 (&dev->wc_state_lock){+.+.}-{3:3}, at: mlx5_wc_support_get+0x10c/0x210 [mlx5_core] [ 213.622045] [ 213.622045] other info that might help us debug this: [ 213.622778] Possible unsafe locking scenario: [ 213.622778] [ 213.623465] CPU0 [ 213.623815] ---- [ 213.624148] lock(&dev->wc_state_lock); [ 213.624615] lock(&dev->wc_state_lock); [ 213.625071] [ 213.625071] *** DEADLOCK *** [ 213.625071] [ 213.625805] May be due to missing lock nesting notation [ 213.625805] [ 213.626522] 4 locks held by raw_ethernet_bw/2118: [ 213.627019] #0: ffff88813f80d578 (&uverbs_dev->disassociate_srcu){.+.+}-{0:0}, at: ib_uverbs_ioctl+0xc4/0x170 [ib_uverbs] [ 213.628088] #1: ffff88810fb23930 (&file->hw_destroy_rwsem){.+.+}-{3:3}, at: ib_init_ucontext+0x2d/0xf0 [ib_uverbs] [ 213.629094] #2: ffff88810fb23878 (&file->ucontext_lock){+.+.}-{3:3}, at: ib_init_ucontext+0x49/0xf0 [ib_uverbs] [ 213.630106] #3: ffff88810b585e08 (&dev->wc_state_lock){+.+.}-{3:3}, at: mlx5_wc_support_get+0x10c/0x210 [mlx5_core] [ 213.631185] [ 213.631185] stack backtrace: [ 213.631718] CPU: 1 UID: 0 PID: 2118 Comm: raw_ethernet_bw Not tainted 6.12.0-rc7_internal_net_next_mlx5_89a0ad0 #1 [ 213.632722] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 213.633785] Call Trace: [ 213.634099] [ 213.634393] dump_stack_lvl+0x7e/0xc0 [ 213.634806] print_deadlock_bug+0x278/0x3c0 [ 213.635265] __lock_acquire+0x15f4/0x2c40 [ 213.635712] lock_acquire+0xcd/0x2d0 [ 213.636120] ? mlx5_wc_support_get+0x18c/0x210 [mlx5_core] [ 213.636722] ? mlx5_ib_enable_lb+0x24/0xa0 [mlx5_ib] [ 213.637277] __mutex_lock+0x81/0xda0 [ 213.637697] ? mlx5_wc_support_get+0x18c/0x210 [mlx5_core] [ 213.638305] ? mlx5_wc_support_get+0x18c/0x210 [mlx5_core] [ 213.638902] ? rcu_read_lock_sched_held+0x3f/0x70 [ 213.639400] ? mlx5_wc_support_get+0x18c/0x210 [mlx5_core] [ 213.640016] mlx5_wc_support_get+0x18c/0x210 [mlx5_core] [ 213.640615] set_ucontext_resp+0x68/0x2b0 [mlx5_ib] [ 213.641144] ? debug_mutex_init+0x33/0x40 [ 213.641586] mlx5_ib_alloc_ucontext+0x18e/0x7b0 [mlx5_ib] [ 213.642145] ib_init_ucontext+0xa0/0xf0 [ib_uverbs] [ 213.642679] ib_uverbs_handler_UVERBS_METHOD_GET_CONTEXT+0x95/0xc0 [ib_uverbs] [ 213.643426] ? _copy_from_user+0x46/0x80 [ 213.643878] ib_uverbs_cmd_verbs+0xa6b/0xc80 [ib_uverbs] [ 213.644426] ? ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x130/0x130 [ib_uverbs] [ 213.645213] ? __lock_acquire+0xa99/0x2c40 [ 213.645675] ? lock_acquire+0xcd/0x2d0 [ 213.646101] ? ib_uverbs_ioctl+0xc4/0x170 [ib_uverbs] [ 213.646625] ? reacquire_held_locks+0xcf/0x1f0 [ 213.647102] ? do_user_addr_fault+0x45d/0x770 [ 213.647586] ib_uverbs_ioctl+0xe0/0x170 [ib_uverbs] [ 213.648102] ? ib_uverbs_ioctl+0xc4/0x170 [ib_uverbs] [ 213.648632] __x64_sys_ioctl+0x4d3/0xaa0 [ 213.649060] ? do_user_addr_fault+0x4a8/0x770 [ 213.649528] do_syscall_64+0x6d/0x140 [ 213.649947] entry_SYSCALL_64_after_hwframe+0x4b/0x53 [ 213.650478] RIP: 0033:0x7fa179b0737b [ 213.650893] Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 7d 2a 0f 00 f7 d8 64 89 01 48 [ 213.652619] RSP: 002b:00007ffd2e6d46e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 213.653390] RAX: ffffffffffffffda RBX: 00007ffd2e6d47f8 RCX: 00007fa179b0737b [ 213.654084] RDX: 00007ffd2e6d47e0 RSI: 00000000c0181b01 RDI: 0000000000000003 [ 213.654767] RBP: 00007ffd2e6d47c0 R08: 00007fa1799be010 R09: 0000000000000002 [ 213.655453] R10: 00007ffd2e6d4960 R11: 0000000000000246 R12: 00007ffd2e6d487c [ 213.656170] R13: 0000000000000027 R14: 0000000000000001 R15: 00007ffd2e6d4f70 Fixes: d98995b4bf98 ("net/mlx5: Reimplement write combining test") Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* net/mlx5: Fix RDMA TX steering prioPatrisious Haddad45 hours1-0/+1
| | | | | | | | | | | | | | User added steering rules at RDMA_TX were being added to the first prio, which is the counters prio. Fix that so that they are correctly added to the BYPASS_PRIO instead. Fixes: 24670b1a3166 ("net/mlx5: Add support for RDMA TX steering") Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* net: ethernet: xgbe: re-add aneg to supported features in PHY quirksHeiner Kallweit2 days1-17/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In 4.19, before the switch to linkmode bitmaps, PHY_GBIT_FEATURES included feature bits for aneg and TP/MII ports. SUPPORTED_TP | \ SUPPORTED_MII) SUPPORTED_10baseT_Full) SUPPORTED_100baseT_Full) SUPPORTED_1000baseT_Full) PHY_100BT_FEATURES | \ PHY_DEFAULT_FEATURES) PHY_1000BT_FEATURES) Referenced commit expanded PHY_GBIT_FEATURES, silently removing PHY_DEFAULT_FEATURES. The removed part can be re-added by using the new PHY_GBIT_FEATURES definition. Not clear to me is why nobody seems to have noticed this issue. I stumbled across this when checking what it takes to make phy_10_100_features_array et al private to phylib. Fixes: d0939c26c53a ("net: ethernet: xgbe: expand PHY_GBIT_FEAUTRES") Cc: stable@vger.kernel.org Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/46521973-7738-4157-9f5e-0bb6f694acba@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: pcs: xpcs: actively unset DW_VR_MII_DIG_CTRL1_2G5_EN for 1G SGMIIVladimir Oltean2 days1-1/+2
| | | | | | | | | | | | | | xpcs_config_2500basex() sets DW_VR_MII_DIG_CTRL1_2G5_EN, but xpcs_config_aneg_c37_sgmii() never unsets it. So, on a protocol change from 2500base-x to sgmii, the DW_VR_MII_DIG_CTRL1_2G5_EN bit will remain set. Fixes: f27abde3042a ("net: pcs: add 2500BASEX support for Intel mGbE controller") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/20250114164721.2879380-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: pcs: xpcs: fix DW_VR_MII_DIG_CTRL1_2G5_EN bit being set for 1G SGMII ↵Vladimir Oltean2 days1-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | w/o inband On a port with SGMII fixed-link at SPEED_1000, DW_VR_MII_DIG_CTRL1 gets set to 0x2404. This is incorrect, because bit 2 (DW_VR_MII_DIG_CTRL1_2G5_EN) is set. It comes from the previous write to DW_VR_MII_AN_CTRL, because the "val" variable is reused and is dirty. Actually, its value is 0x4, aka FIELD_PREP(DW_VR_MII_PCS_MODE_MASK, DW_VR_MII_PCS_MODE_C37_SGMII). Resolve the issue by clearing "val" to 0 when writing to a new register. After the fix, the register value is 0x2400. Prior to the blamed commit, when the read-modify-write was open-coded, the code saved the content of the DW_VR_MII_DIG_CTRL1 register in the "ret" variable. Fixes: ce8d6081fcf4 ("net: pcs: xpcs: add _modify() accessors") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/20250114164721.2879380-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: fec: handle page_pool_dev_alloc_pages errorKevin Groeneveld3 days1-5/+14
| | | | | | | | | | | | | | | | | | | | | | | | The fec_enet_update_cbd function calls page_pool_dev_alloc_pages but did not handle the case when it returned NULL. There was a WARN_ON(!new_page) but it would still proceed to use the NULL pointer and then crash. This case does seem somewhat rare but when the system is under memory pressure it can happen. One case where I can duplicate this with some frequency is when writing over a smbd share to a SATA HDD attached to an imx6q. Setting /proc/sys/vm/min_free_kbytes to higher values also seems to solve the problem for my test case. But it still seems wrong that the fec driver ignores the memory allocation error and can crash. This commit handles the allocation error by dropping the current packet. Fixes: 95698ff6177b5 ("net: fec: using page pool to manage RX buffers") Signed-off-by: Kevin Groeneveld <kgroeneveld@lenbrook.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/20250113154846.1765414-1-kgroeneveld@lenbrook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: xilinx: axienet: Fix IRQ coalescing packet count overflowSean Anderson3 days1-0/+6
| | | | | | | | | | | | | | | | | If coalesce_count is greater than 255 it will not fit in the register and will overflow. This can be reproduced by running # ethtool -C ethX rx-frames 256 which will result in a timeout of 0us instead. Fix this by checking for invalid values and reporting an error. Fixes: 8a3b7a252dca ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver") Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> Link: https://patch.msgid.link/20250113163001.2335235-1-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* nfp: bpf: prevent integer overflow in nfp_bpf_event_output()Dan Carpenter3 days1-1/+2
| | | | | | | | | | | The "sizeof(struct cmsg_bpf_event) + pkt_size + data_size" math could potentially have an integer wrapping bug on 32bit systems. Check for this and return an error. Fixes: 9816dd35ecec ("nfp: bpf: perf event output helpers support") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://patch.msgid.link/6074805b-e78d-4b8a-bf05-e929b5377c28@stanley.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* Merge branch '100GbE' of ↵Jakub Kicinski3 days9-144/+209
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Fix E825 initialization Grzegorz Nitka says: E825 products have incorrect initialization procedure, which may lead to initialization failures and register values. Fix E825 products initialization by adding correct sync delay, checking the PHY revision only for current PHY and adding proper destination device when reading port/quad. In addition, E825 uses PF ID for indexing per PF registers and as a primary PHY lane number, which is incorrect. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: Add correct PHY lane assignment ice: Fix ETH56G FC-FEC Rx offset value ice: Fix quad registers read on E825 ice: Fix E825 initialization ==================== Link: https://patch.msgid.link/20250113182840.3564250-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * ice: Add correct PHY lane assignmentKarol Kolacinski5 days8-45/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Driver always naively assumes, that for PTP purposes, PHY lane to configure is corresponding to PF ID. This is not true for some port configurations, e.g.: - 2x50G per quad, where lanes used are 0 and 2 on each quad, but PF IDs are 0 and 1 - 100G per quad on 2 quads, where lanes used are 0 and 4, but PF IDs are 0 and 1 Use correct PHY lane assignment by getting and parsing port options. This is read from the NVM by the FW and provided to the driver with the indication of active port split. Remove ice_is_muxed_topo(), which is no longer needed. Fixes: 4409ea1726cb ("ice: Adjust PTP init for 2x50G E825C devices") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Arkadiusz Kubalewski <Arkadiusz.kubalewski@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
| * ice: Fix ETH56G FC-FEC Rx offset valueKarol Kolacinski5 days1-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix ETH56G FC-FEC incorrect Rx offset value by changing it from -255.96 to -469.26 ns. Those values are derived from HW spec and reflect internal delays. Hex value is a fixed point representation in Q23.9 format. Fixes: 7cab44f1c35f ("ice: Introduce ETH56G PHY model for E825C products") Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
| * ice: Fix quad registers read on E825Karol Kolacinski5 days2-87/+133
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Quad registers are read/written incorrectly. E825 devices always use quad 0 address and differentiate between the PHYs by changing SBQ destination device (phy_0 or phy_0_peer). Add helpers for reading/writing PTP registers shared per quad and use correct quad address and SBQ destination device based on port. Fixes: 7cab44f1c35f ("ice: Introduce ETH56G PHY model for E825C products") Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
| * ice: Fix E825 initializationKarol Kolacinski5 days1-13/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | Current implementation checks revision of all PHYs on all PFs, which is incorrect and may result in initialization failure. Check only the revision of the current PHY. Fixes: 7cab44f1c35f ("ice: Introduce ETH56G PHY model for E825C products") Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
* | pfcp: Destroy device along with udp socket's netns dismantle.Kuniyuki Iwashima4 days1-5/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pfcp_newlink() links the device to a list in dev_net(dev) instead of net, where a udp tunnel socket is created. Even when net is removed, the device stays alive on dev_net(dev). Then, removing net triggers the splat below. [0] In this example, pfcp0 is created in ns2, but the udp socket is created in ns1. ip netns add ns1 ip netns add ns2 ip -n ns1 link add netns ns2 name pfcp0 type pfcp ip netns del ns1 Let's link the device to the socket's netns instead. Now, pfcp_net_exit() needs another netdev iteration to remove all pfcp devices in the netns. pfcp_dev_list is not used under RCU, so the list API is converted to the non-RCU variant. pfcp_net_exit() can be converted to .exit_batch_rtnl() in net-next. [0]: ref_tracker: net notrefcnt@00000000128b34dc has 1/1 users at sk_alloc (./include/net/net_namespace.h:345 net/core/sock.c:2236) inet_create (net/ipv4/af_inet.c:326 net/ipv4/af_inet.c:252) __sock_create (net/socket.c:1558) udp_sock_create4 (net/ipv4/udp_tunnel_core.c:18) pfcp_create_sock (drivers/net/pfcp.c:168) pfcp_newlink (drivers/net/pfcp.c:182 drivers/net/pfcp.c:197) rtnl_newlink (net/core/rtnetlink.c:3786 net/core/rtnetlink.c:3897 net/core/rtnetlink.c:4012) rtnetlink_rcv_msg (net/core/rtnetlink.c:6922) netlink_rcv_skb (net/netlink/af_netlink.c:2542) netlink_unicast (net/netlink/af_netlink.c:1321 net/netlink/af_netlink.c:1347) netlink_sendmsg (net/netlink/af_netlink.c:1891) ____sys_sendmsg (net/socket.c:711 net/socket.c:726 net/socket.c:2583) ___sys_sendmsg (net/socket.c:2639) __sys_sendmsg (net/socket.c:2669) do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) WARNING: CPU: 1 PID: 11 at lib/ref_tracker.c:179 ref_tracker_dir_exit (lib/ref_tracker.c:179) Modules linked in: CPU: 1 UID: 0 PID: 11 Comm: kworker/u16:0 Not tainted 6.13.0-rc5-00147-g4c1224501e9d #5 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: netns cleanup_net RIP: 0010:ref_tracker_dir_exit (lib/ref_tracker.c:179) Code: 00 00 00 fc ff df 4d 8b 26 49 bd 00 01 00 00 00 00 ad de 4c 39 f5 0f 85 df 00 00 00 48 8b 74 24 08 48 89 df e8 a5 cc 12 02 90 <0f> 0b 90 48 8d 6b 44 be 04 00 00 00 48 89 ef e8 80 de 67 ff 48 89 RSP: 0018:ff11000007f3fb60 EFLAGS: 00010286 RAX: 00000000000020ef RBX: ff1100000d6481e0 RCX: 1ffffffff0e40d82 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8423ee3c RBP: ff1100000d648230 R08: 0000000000000001 R09: fffffbfff0e395af R10: 0000000000000001 R11: 0000000000000000 R12: ff1100000d648230 R13: dead000000000100 R14: ff1100000d648230 R15: dffffc0000000000 FS: 0000000000000000(0000) GS:ff1100006ce80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005620e1363990 CR3: 000000000eeb2002 CR4: 0000000000771ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ? __warn (kernel/panic.c:748) ? ref_tracker_dir_exit (lib/ref_tracker.c:179) ? report_bug (lib/bug.c:201 lib/bug.c:219) ? handle_bug (arch/x86/kernel/traps.c:285) ? exc_invalid_op (arch/x86/kernel/traps.c:309 (discriminator 1)) ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621) ? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/irqflags.h:42 ./arch/x86/include/asm/irqflags.h:97 ./arch/x86/include/asm/irqflags.h:155 ./include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194) ? ref_tracker_dir_exit (lib/ref_tracker.c:179) ? __pfx_ref_tracker_dir_exit (lib/ref_tracker.c:158) ? kfree (mm/slub.c:4613 mm/slub.c:4761) net_free (net/core/net_namespace.c:476 net/core/net_namespace.c:467) cleanup_net (net/core/net_namespace.c:664 (discriminator 3)) process_one_work (kernel/workqueue.c:3229) worker_thread (kernel/workqueue.c:3304 kernel/workqueue.c:3391) kthread (kernel/kthread.c:389) ret_from_fork (arch/x86/kernel/process.c:147) ret_from_fork_asm (arch/x86/entry/entry_64.S:257) </TASK> Fixes: 76c8764ef36a ("pfcp: add PFCP module") Reported-by: Xiao Liang <shaw.leon@gmail.com> Closes: https://lore.kernel.org/netdev/20250104125732.17335-1-shaw.leon@gmail.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* | gtp: Destroy device along with udp socket's netns dismantle.Kuniyuki Iwashima4 days1-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gtp_newlink() links the device to a list in dev_net(dev) instead of src_net, where a udp tunnel socket is created. Even when src_net is removed, the device stays alive on dev_net(dev). Then, removing src_net triggers the splat below. [0] In this example, gtp0 is created in ns2, and the udp socket is created in ns1. ip netns add ns1 ip netns add ns2 ip -n ns1 link add netns ns2 name gtp0 type gtp role sgsn ip netns del ns1 Let's link the device to the socket's netns instead. Now, gtp_net_exit_batch_rtnl() needs another netdev iteration to remove all gtp devices in the netns. [0]: ref_tracker: net notrefcnt@000000003d6e7d05 has 1/2 users at sk_alloc (./include/net/net_namespace.h:345 net/core/sock.c:2236) inet_create (net/ipv4/af_inet.c:326 net/ipv4/af_inet.c:252) __sock_create (net/socket.c:1558) udp_sock_create4 (net/ipv4/udp_tunnel_core.c:18) gtp_create_sock (./include/net/udp_tunnel.h:59 drivers/net/gtp.c:1423) gtp_create_sockets (drivers/net/gtp.c:1447) gtp_newlink (drivers/net/gtp.c:1507) rtnl_newlink (net/core/rtnetlink.c:3786 net/core/rtnetlink.c:3897 net/core/rtnetlink.c:4012) rtnetlink_rcv_msg (net/core/rtnetlink.c:6922) netlink_rcv_skb (net/netlink/af_netlink.c:2542) netlink_unicast (net/netlink/af_netlink.c:1321 net/netlink/af_netlink.c:1347) netlink_sendmsg (net/netlink/af_netlink.c:1891) ____sys_sendmsg (net/socket.c:711 net/socket.c:726 net/socket.c:2583) ___sys_sendmsg (net/socket.c:2639) __sys_sendmsg (net/socket.c:2669) do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83) WARNING: CPU: 1 PID: 60 at lib/ref_tracker.c:179 ref_tracker_dir_exit (lib/ref_tracker.c:179) Modules linked in: CPU: 1 UID: 0 PID: 60 Comm: kworker/u16:2 Not tainted 6.13.0-rc5-00147-g4c1224501e9d #5 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: netns cleanup_net RIP: 0010:ref_tracker_dir_exit (lib/ref_tracker.c:179) Code: 00 00 00 fc ff df 4d 8b 26 49 bd 00 01 00 00 00 00 ad de 4c 39 f5 0f 85 df 00 00 00 48 8b 74 24 08 48 89 df e8 a5 cc 12 02 90 <0f> 0b 90 48 8d 6b 44 be 04 00 00 00 48 89 ef e8 80 de 67 ff 48 89 RSP: 0018:ff11000009a07b60 EFLAGS: 00010286 RAX: 0000000000002bd3 RBX: ff1100000f4e1aa0 RCX: 1ffffffff0e40ac6 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8423ee3c RBP: ff1100000f4e1af0 R08: 0000000000000001 R09: fffffbfff0e395ae R10: 0000000000000001 R11: 0000000000036001 R12: ff1100000f4e1af0 R13: dead000000000100 R14: ff1100000f4e1af0 R15: dffffc0000000000 FS: 0000000000000000(0000) GS:ff1100006ce80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f9b2464bd98 CR3: 0000000005286005 CR4: 0000000000771ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ? __warn (kernel/panic.c:748) ? ref_tracker_dir_exit (lib/ref_tracker.c:179) ? report_bug (lib/bug.c:201 lib/bug.c:219) ? handle_bug (arch/x86/kernel/traps.c:285) ? exc_invalid_op (arch/x86/kernel/traps.c:309 (discriminator 1)) ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621) ? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/irqflags.h:42 ./arch/x86/include/asm/irqflags.h:97 ./arch/x86/include/asm/irqflags.h:155 ./include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194) ? ref_tracker_dir_exit (lib/ref_tracker.c:179) ? __pfx_ref_tracker_dir_exit (lib/ref_tracker.c:158) ? kfree (mm/slub.c:4613 mm/slub.c:4761) net_free (net/core/net_namespace.c:476 net/core/net_namespace.c:467) cleanup_net (net/core/net_namespace.c:664 (discriminator 3)) process_one_work (kernel/workqueue.c:3229) worker_thread (kernel/workqueue.c:3304 kernel/workqueue.c:3391) kthread (kernel/kthread.c:389) ret_from_fork (arch/x86/kernel/process.c:147) ret_from_fork_asm (arch/x86/entry/entry_64.S:257) </TASK> Fixes: 459aa660eb1d ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)") Reported-by: Xiao Liang <shaw.leon@gmail.com> Closes: https://lore.kernel.org/netdev/20250104125732.17335-1-shaw.leon@gmail.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* | gtp: Use for_each_netdev_rcu() in gtp_genl_dump_pdp().Kuniyuki Iwashima4 days1-8/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gtp_newlink() links the gtp device to a list in dev_net(dev). However, even after the gtp device is moved to another netns, it stays on the list but should be invisible. Let's use for_each_netdev_rcu() for netdev traversal in gtp_genl_dump_pdp(). Note that gtp_dev_list is no longer used under RCU, so list helpers are converted to the non-RCU variant. Fixes: 459aa660eb1d ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)") Reported-by: Xiao Liang <shaw.leon@gmail.com> Closes: https://lore.kernel.org/netdev/CABAhCOQdBL6h9M2C+kd+bGivRJ9Q72JUxW+-gur0nub_=PmFPA@mail.gmail.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* | r8169: remove redundant hwmon supportHeiner Kallweit4 days1-44/+0
|/ | | | | | | | | | | | The temperature sensor is actually part of the integrated PHY and available also on the standalone versions of the PHY. Therefore hwmon support will be added to the Realtek PHY driver and can be removed here. Fixes: 1ffcc8d41306 ("r8169: add support for the temperature sensor being available from RTL8125B") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/afba85f5-987b-4449-83cc-350438af7fe7@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: ravb: Fix max TX frame size for RZ/V2MPaul Barker7 days1-0/+1
| | | | | | | | | | | | | | | | | | | | | | When tx_max_frame_size was added to struct ravb_hw_info, no value was set in ravb_rzv2m_hw_info so the default value of zero was used. The maximum MTU is set by subtracting from tx_max_frame_size to allow space for headers and frame checksums. As ndev->max_mtu is unsigned, this subtraction wraps around leading to a ridiculously large positive value that is obviously incorrect. Before tx_max_frame_size was introduced, the maximum MTU was based on rx_max_frame_size. So, we can restore the correct maximum MTU by copying the rx_max_frame_size value into tx_max_frame_size for RZ/V2M. Fixes: 1d63864299ca ("net: ravb: Fix maximum TX frame size for GbEth devices") Signed-off-by: Paul Barker <paul.barker.ct@bp.renesas.com> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Link: https://patch.msgid.link/20250109113706.1409149-1-paul.barker.ct@bp.renesas.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: mana: Cleanup "mana" debugfs dir after cleanup of all childrenShradha Gupta7 days1-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In mana_driver_exit(), mana_debugfs_root gets cleanup before any of it's children (which happens later in the pci_unregister_driver()). Due to this, when mana driver is configured as a module and rmmod is invoked, following stack gets printed along with failure in rmmod command. [ 2399.317651] BUG: kernel NULL pointer dereference, address: 0000000000000098 [ 2399.318657] #PF: supervisor write access in kernel mode [ 2399.319057] #PF: error_code(0x0002) - not-present page [ 2399.319528] PGD 10eb68067 P4D 0 [ 2399.319914] Oops: Oops: 0002 [#1] SMP NOPTI [ 2399.320308] CPU: 72 UID: 0 PID: 5815 Comm: rmmod Not tainted 6.13.0-rc5+ #89 [ 2399.320986] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 09/28/2024 [ 2399.321892] RIP: 0010:down_write+0x1a/0x50 [ 2399.322303] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 54 49 89 fc e8 9d cd ff ff 31 c0 ba 01 00 00 00 <f0> 49 0f b1 14 24 75 17 65 48 8b 05 f6 84 dd 5f 49 89 44 24 08 4c [ 2399.323669] RSP: 0018:ff53859d6c663a70 EFLAGS: 00010246 [ 2399.324061] RAX: 0000000000000000 RBX: ff1d4eb505060180 RCX: ffffff8100000000 [ 2399.324620] RDX: 0000000000000001 RSI: 0000000000000064 RDI: 0000000000000098 [ 2399.325167] RBP: ff53859d6c663a78 R08: 00000000000009c4 R09: ff1d4eb4fac90000 [ 2399.325681] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000098 [ 2399.326185] R13: ff1d4e42e1a4a0c8 R14: ff1d4eb538ce0000 R15: 0000000000000098 [ 2399.326755] FS: 00007fe729570000(0000) GS:ff1d4eb2b7200000(0000) knlGS:0000000000000000 [ 2399.327269] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2399.327690] CR2: 0000000000000098 CR3: 00000001c0584005 CR4: 0000000000373ef0 [ 2399.328166] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2399.328623] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [ 2399.329055] Call Trace: [ 2399.329243] <TASK> [ 2399.329379] ? show_regs+0x69/0x80 [ 2399.329602] ? __die+0x25/0x70 [ 2399.329856] ? page_fault_oops+0x271/0x550 [ 2399.330088] ? psi_group_change+0x217/0x470 [ 2399.330341] ? do_user_addr_fault+0x455/0x7b0 [ 2399.330667] ? finish_task_switch.isra.0+0x91/0x2f0 [ 2399.331004] ? exc_page_fault+0x73/0x160 [ 2399.331275] ? asm_exc_page_fault+0x27/0x30 [ 2399.343324] ? down_write+0x1a/0x50 [ 2399.343631] simple_recursive_removal+0x4d/0x2c0 [ 2399.343977] ? __pfx_remove_one+0x10/0x10 [ 2399.344251] debugfs_remove+0x45/0x70 [ 2399.344511] mana_destroy_rxq+0x44/0x400 [mana] [ 2399.344845] mana_destroy_vport+0x54/0x1c0 [mana] [ 2399.345229] mana_detach+0x2f1/0x4e0 [mana] [ 2399.345466] ? ida_free+0x150/0x160 [ 2399.345718] ? __cond_resched+0x1a/0x50 [ 2399.345987] mana_remove+0xf4/0x1a0 [mana] [ 2399.346243] mana_gd_remove+0x25/0x80 [mana] [ 2399.346605] pci_device_remove+0x41/0xb0 [ 2399.346878] device_remove+0x46/0x70 [ 2399.347150] device_release_driver_internal+0x1e3/0x250 [ 2399.347831] ? klist_remove+0x81/0xe0 [ 2399.348377] driver_detach+0x4b/0xa0 [ 2399.348906] bus_remove_driver+0x83/0x100 [ 2399.349435] driver_unregister+0x31/0x60 [ 2399.349919] pci_unregister_driver+0x40/0x90 [ 2399.350492] mana_driver_exit+0x1c/0xb50 [mana] [ 2399.351102] __do_sys_delete_module.constprop.0+0x184/0x320 [ 2399.351664] ? __fput+0x1a9/0x2d0 [ 2399.352200] __x64_sys_delete_module+0x12/0x20 [ 2399.352760] x64_sys_call+0x1e66/0x2140 [ 2399.353316] do_syscall_64+0x79/0x150 [ 2399.353813] ? syscall_exit_to_user_mode+0x49/0x230 [ 2399.354346] ? do_syscall_64+0x85/0x150 [ 2399.354816] ? irqentry_exit+0x1d/0x30 [ 2399.355287] ? exc_page_fault+0x7f/0x160 [ 2399.355756] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 2399.356302] RIP: 0033:0x7fe728d26aeb [ 2399.356776] Code: 73 01 c3 48 8b 0d 45 33 0f 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 15 33 0f 00 f7 d8 64 89 01 48 [ 2399.358372] RSP: 002b:00007ffff954d6f8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 [ 2399.359066] RAX: ffffffffffffffda RBX: 00005609156cc760 RCX: 00007fe728d26aeb [ 2399.359779] RDX: 000000000000000a RSI: 0000000000000800 RDI: 00005609156cc7c8 [ 2399.360535] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [ 2399.361261] R10: 00007fe728dbeac0 R11: 0000000000000206 R12: 00007ffff954d950 [ 2399.361952] R13: 00005609156cc2a0 R14: 00007ffff954ee5f R15: 00005609156cc760 [ 2399.362688] </TASK> Fixes: 6607c17c6c5e ("net: mana: Enable debugfs files for MANA device") Cc: stable@vger.kernel.org Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/1736398991-764-1-git-send-email-shradhagupta@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* eth: bnxt: always recalculate features after XDP clearing, fix null-derefJakub Kicinski7 days3-13/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recalculate features when XDP is detached. Before: # ip li set dev eth0 xdp obj xdp_dummy.bpf.o sec xdp # ip li set dev eth0 xdp off # ethtool -k eth0 | grep gro rx-gro-hw: off [requested on] After: # ip li set dev eth0 xdp obj xdp_dummy.bpf.o sec xdp # ip li set dev eth0 xdp off # ethtool -k eth0 | grep gro rx-gro-hw: on The fact that HW-GRO doesn't get re-enabled automatically is just a minor annoyance. The real issue is that the features will randomly come back during another reconfiguration which just happens to invoke netdev_update_features(). The driver doesn't handle reconfiguring two things at a time very robustly. Starting with commit 98ba1d931f61 ("bnxt_en: Fix RSS logic in __bnxt_reserve_rings()") we only reconfigure the RSS hash table if the "effective" number of Rx rings has changed. If HW-GRO is enabled "effective" number of rings is 2x what user sees. So if we are in the bad state, with HW-GRO re-enablement "pending" after XDP off, and we lower the rings by / 2 - the HW-GRO rings doing 2x and the ethtool -L doing / 2 may cancel each other out, and the: if (old_rx_rings != bp->hw_resc.resv_rx_rings && condition in __bnxt_reserve_rings() will be false. The RSS map won't get updated, and we'll crash with: BUG: kernel NULL pointer dereference, address: 0000000000000168 RIP: 0010:__bnxt_hwrm_vnic_set_rss+0x13a/0x1a0 bnxt_hwrm_vnic_rss_cfg_p5+0x47/0x180 __bnxt_setup_vnic_p5+0x58/0x110 bnxt_init_nic+0xb72/0xf50 __bnxt_open_nic+0x40d/0xab0 bnxt_open_nic+0x2b/0x60 ethtool_set_channels+0x18c/0x1d0 As we try to access a freed ring. The issue is present since XDP support was added, really, but prior to commit 98ba1d931f61 ("bnxt_en: Fix RSS logic in __bnxt_reserve_rings()") it wasn't causing major issues. Fixes: 1054aee82321 ("bnxt_en: Use NETIF_F_GRO_HW.") Fixes: 98ba1d931f61 ("bnxt_en: Fix RSS logic in __bnxt_reserve_rings()") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Link: https://patch.msgid.link/20250109043057.2888953-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: ethernet: ti: cpsw_ale: Fix cpsw_ale_get_field()Sudheer Kumar Doredla8 days1-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | CPSW ALE has 75-bit ALE entries stored across three 32-bit words. The cpsw_ale_get_field() and cpsw_ale_set_field() functions support ALE field entries spanning up to two words at the most. The cpsw_ale_get_field() and cpsw_ale_set_field() functions work as expected when ALE field spanned across word1 and word2, but fails when ALE field spanned across word2 and word3. For example, while reading the ALE field spanned across word2 and word3 (i.e. bits 62 to 64), the word3 data shifted to an incorrect position due to the index becoming zero while flipping. The same issue occurred when setting an ALE entry. This issue has not been seen in practice but will be an issue in the future if the driver supports accessing ALE fields spanning word2 and word3 Fix the methods to handle getting/setting fields spanning up to two words. Fixes: b685f1a58956 ("net: ethernet: ti: cpsw_ale: Fix cpsw_ale_get_field()/cpsw_ale_set_field()") Signed-off-by: Sudheer Kumar Doredla <s-doredla@ti.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: Siddharth Vadapalli <s-vadapalli@ti.com> Link: https://patch.msgid.link/20250108172433.311694-1-s-doredla@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net/mlx5: Fix variable not being completed when function returnsChenguang Zhao9 days1-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When cmd_alloc_index(), fails cmd_work_handler() needs to complete ent->slotted before returning early. Otherwise the task which issued the command may hang: mlx5_core 0000:01:00.0: cmd_work_handler:877:(pid 3880418): failed to allocate command entry INFO: task kworker/13:2:4055883 blocked for more than 120 seconds. Not tainted 4.19.90-25.44.v2101.ky10.aarch64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/13:2 D 0 4055883 2 0x00000228 Workqueue: events mlx5e_tx_dim_work [mlx5_core] Call trace: __switch_to+0xe8/0x150 __schedule+0x2a8/0x9b8 schedule+0x2c/0x88 schedule_timeout+0x204/0x478 wait_for_common+0x154/0x250 wait_for_completion+0x28/0x38 cmd_exec+0x7a0/0xa00 [mlx5_core] mlx5_cmd_exec+0x54/0x80 [mlx5_core] mlx5_core_modify_cq+0x6c/0x80 [mlx5_core] mlx5_core_modify_cq_moderation+0xa0/0xb8 [mlx5_core] mlx5e_tx_dim_work+0x54/0x68 [mlx5_core] process_one_work+0x1b0/0x448 worker_thread+0x54/0x468 kthread+0x134/0x138 ret_from_fork+0x10/0x18 Fixes: 485d65e13571 ("net/mlx5: Add a timeout to acquire the command queue semaphore") Signed-off-by: Chenguang Zhao <zhaochenguang@kylinos.cn> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250108030009.68520-1-zhaochenguang@kylinos.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* rtase: Fix a check for error in rtase_alloc_msix()Dan Carpenter9 days1-1/+1
| | | | | | | | | | | | | The pci_irq_vector() function never returns zero. It returns negative error codes or a positive non-zero IRQ number. Fix the error checking to test for negatives. Fixes: a36e9f5cfe9e ("rtase: Add support for a pci table in this module") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/f2ecc88d-af13-4651-9820-7cc665230019@stanley.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: stmmac: dwmac-tegra: Read iommu stream id from device treeParker Newman9 days1-3/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Nvidia's Tegra MGBE controllers require the IOMMU "Stream ID" (SID) to be written to the MGBE_WRAP_AXI_ASID0_CTRL register. The current driver is hard coded to use MGBE0's SID for all controllers. This causes softirq time outs and kernel panics when using controllers other than MGBE0. Example dmesg errors when an ethernet cable is connected to MGBE1: [ 116.133290] tegra-mgbe 6910000.ethernet eth1: Link is Up - 1Gbps/Full - flow control rx/tx [ 121.851283] tegra-mgbe 6910000.ethernet eth1: NETDEV WATCHDOG: CPU: 5: transmit queue 0 timed out 5690 ms [ 121.851782] tegra-mgbe 6910000.ethernet eth1: Reset adapter. [ 121.892464] tegra-mgbe 6910000.ethernet eth1: Register MEM_TYPE_PAGE_POOL RxQ-0 [ 121.905920] tegra-mgbe 6910000.ethernet eth1: PHY [stmmac-1:00] driver [Aquantia AQR113] (irq=171) [ 121.907356] tegra-mgbe 6910000.ethernet eth1: Enabling Safety Features [ 121.907578] tegra-mgbe 6910000.ethernet eth1: IEEE 1588-2008 Advanced Timestamp supported [ 121.908399] tegra-mgbe 6910000.ethernet eth1: registered PTP clock [ 121.908582] tegra-mgbe 6910000.ethernet eth1: configuring for phy/10gbase-r link mode [ 125.961292] tegra-mgbe 6910000.ethernet eth1: Link is Up - 1Gbps/Full - flow control rx/tx [ 181.921198] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 181.921404] rcu: 7-....: (1 GPs behind) idle=540c/1/0x4000000000000002 softirq=1748/1749 fqs=2337 [ 181.921684] rcu: (detected by 4, t=6002 jiffies, g=1357, q=1254 ncpus=8) [ 181.921878] Sending NMI from CPU 4 to CPUs 7: [ 181.921886] NMI backtrace for cpu 7 [ 181.922131] CPU: 7 UID: 0 PID: 0 Comm: swapper/7 Kdump: loaded Not tainted 6.13.0-rc3+ #6 [ 181.922390] Hardware name: NVIDIA CTI Forge + Orin AGX/Jetson, BIOS 202402.1-Unknown 10/28/2024 [ 181.922658] pstate: 40400009 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 181.922847] pc : handle_softirqs+0x98/0x368 [ 181.922978] lr : __do_softirq+0x18/0x20 [ 181.923095] sp : ffff80008003bf50 [ 181.923189] x29: ffff80008003bf50 x28: 0000000000000008 x27: 0000000000000000 [ 181.923379] x26: ffffce78ea277000 x25: 0000000000000000 x24: 0000001c61befda0 [ 181.924486] x23: 0000000060400009 x22: ffffce78e99918bc x21: ffff80008018bd70 [ 181.925568] x20: ffffce78e8bb00d8 x19: ffff80008018bc20 x18: 0000000000000000 [ 181.926655] x17: ffff318ebe7d3000 x16: ffff800080038000 x15: 0000000000000000 [ 181.931455] x14: ffff000080816680 x13: ffff318ebe7d3000 x12: 000000003464d91d [ 181.938628] x11: 0000000000000040 x10: ffff000080165a70 x9 : ffffce78e8bb0160 [ 181.945804] x8 : ffff8000827b3160 x7 : f9157b241586f343 x6 : eeb6502a01c81c74 [ 181.953068] x5 : a4acfcdd2e8096bb x4 : ffffce78ea277340 x3 : 00000000ffffd1e1 [ 181.960329] x2 : 0000000000000101 x1 : ffffce78ea277340 x0 : ffff318ebe7d3000 [ 181.967591] Call trace: [ 181.970043] handle_softirqs+0x98/0x368 (P) [ 181.974240] __do_softirq+0x18/0x20 [ 181.977743] ____do_softirq+0x14/0x28 [ 181.981415] call_on_irq_stack+0x24/0x30 [ 181.985180] do_softirq_own_stack+0x20/0x30 [ 181.989379] __irq_exit_rcu+0x114/0x140 [ 181.993142] irq_exit_rcu+0x14/0x28 [ 181.996816] el1_interrupt+0x44/0xb8 [ 182.000316] el1h_64_irq_handler+0x14/0x20 [ 182.004343] el1h_64_irq+0x80/0x88 [ 182.007755] cpuidle_enter_state+0xc4/0x4a8 (P) [ 182.012305] cpuidle_enter+0x3c/0x58 [ 182.015980] cpuidle_idle_call+0x128/0x1c0 [ 182.020005] do_idle+0xe0/0xf0 [ 182.023155] cpu_startup_entry+0x3c/0x48 [ 182.026917] secondary_start_kernel+0xdc/0x120 [ 182.031379] __secondary_switched+0x74/0x78 [ 212.971162] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 7-.... } 6103 jiffies s: 417 root: 0x80/. [ 212.985935] rcu: blocking rcu_node structures (internal RCU debug): [ 212.992758] Sending NMI from CPU 0 to CPUs 7: [ 212.998539] NMI backtrace for cpu 7 [ 213.004304] CPU: 7 UID: 0 PID: 0 Comm: swapper/7 Kdump: loaded Not tainted 6.13.0-rc3+ #6 [ 213.016116] Hardware name: NVIDIA CTI Forge + Orin AGX/Jetson, BIOS 202402.1-Unknown 10/28/2024 [ 213.030817] pstate: 40400009 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 213.040528] pc : handle_softirqs+0x98/0x368 [ 213.046563] lr : __do_softirq+0x18/0x20 [ 213.051293] sp : ffff80008003bf50 [ 213.055839] x29: ffff80008003bf50 x28: 0000000000000008 x27: 0000000000000000 [ 213.067304] x26: ffffce78ea277000 x25: 0000000000000000 x24: 0000001c61befda0 [ 213.077014] x23: 0000000060400009 x22: ffffce78e99918bc x21: ffff80008018bd70 [ 213.087339] x20: ffffce78e8bb00d8 x19: ffff80008018bc20 x18: 0000000000000000 [ 213.097313] x17: ffff318ebe7d3000 x16: ffff800080038000 x15: 0000000000000000 [ 213.107201] x14: ffff000080816680 x13: ffff318ebe7d3000 x12: 000000003464d91d [ 213.116651] x11: 0000000000000040 x10: ffff000080165a70 x9 : ffffce78e8bb0160 [ 213.127500] x8 : ffff8000827b3160 x7 : 0a37b344852820af x6 : 3f049caedd1ff608 [ 213.138002] x5 : cff7cfdbfaf31291 x4 : ffffce78ea277340 x3 : 00000000ffffde04 [ 213.150428] x2 : 0000000000000101 x1 : ffffce78ea277340 x0 : ffff318ebe7d3000 [ 213.162063] Call trace: [ 213.165494] handle_softirqs+0x98/0x368 (P) [ 213.171256] __do_softirq+0x18/0x20 [ 213.177291] ____do_softirq+0x14/0x28 [ 213.182017] call_on_irq_stack+0x24/0x30 [ 213.186565] do_softirq_own_stack+0x20/0x30 [ 213.191815] __irq_exit_rcu+0x114/0x140 [ 213.196891] irq_exit_rcu+0x14/0x28 [ 213.202401] el1_interrupt+0x44/0xb8 [ 213.207741] el1h_64_irq_handler+0x14/0x20 [ 213.213519] el1h_64_irq+0x80/0x88 [ 213.217541] cpuidle_enter_state+0xc4/0x4a8 (P) [ 213.224364] cpuidle_enter+0x3c/0x58 [ 213.228653] cpuidle_idle_call+0x128/0x1c0 [ 213.233993] do_idle+0xe0/0xf0 [ 213.237928] cpu_startup_entry+0x3c/0x48 [ 213.243791] secondary_start_kernel+0xdc/0x120 [ 213.249830] __secondary_switched+0x74/0x78 This bug has existed since the dwmac-tegra driver was added in Dec 2022 (See Fixes tag below for commit hash). The Tegra234 SOC has 4 MGBE controllers, however Nvidia's Developer Kit only uses MGBE0 which is why the bug was not found previously. Connect Tech has many products that use 2 (or more) MGBE controllers. The solution is to read the controller's SID from the existing "iommus" device tree property. The 2nd field of the "iommus" device tree property is the controller's SID. Device tree snippet from tegra234.dtsi showing MGBE1's "iommus" property: smmu_niso0: iommu@12000000 { compatible = "nvidia,tegra234-smmu", "nvidia,smmu-500"; ... } /* MGBE1 */ ethernet@6900000 { compatible = "nvidia,tegra234-mgbe"; ... iommus = <&smmu_niso0 TEGRA234_SID_MGBE_VF1>; ... } Nvidia's arm-smmu driver reads the "iommus" property and stores the SID in the MGBE device's "fwspec" struct. The dwmac-tegra driver can access the SID using the tegra_dev_iommu_get_stream_id() helper function found in linux/iommu.h. Calling tegra_dev_iommu_get_stream_id() should not fail unless the "iommus" property is removed from the device tree or the IOMMU is disabled. While the Tegra234 SOC technically supports bypassing the IOMMU, it is not supported by the current firmware, has not been tested and not recommended. More detailed discussion with Thierry Reding from Nvidia linked below. Fixes: d8ca113724e7 ("net: stmmac: tegra: Add MGBE support") Link: https://lore.kernel.org/netdev/cover.1731685185.git.pnewman@connecttech.com Signed-off-by: Parker Newman <pnewman@connecttech.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Thierry Reding <treding@nvidia.com> Link: https://patch.msgid.link/6fb97f32cf4accb4f7cf92846f6b60064ba0a3bd.1736284360.git.pnewman@connecttech.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* mctp i3c: fix MCTP I3C driver multi-thread issueLeo Yang9 days1-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | We found a timeout problem with the pldm command on our system. The reason is that the MCTP-I3C driver has a race condition when receiving multiple-packet messages in multi-thread, resulting in a wrong packet order problem. We identified this problem by adding a debug message to the mctp_i3c_read function. According to the MCTP spec, a multiple-packet message must be composed in sequence, and if there is a wrong sequence, the whole message will be discarded and wait for the next SOM. For example, SOM → Pkt Seq #2 → Pkt Seq #1 → Pkt Seq #3 → EOM. Therefore, we try to solve this problem by adding a mutex to the mctp_i3c_read function. Before the modification, when a command requesting a multiple-packet message response is sent consecutively, an error usually occurs within 100 loops. After the mutex, it can go through 40000 loops without any error, and it seems to run well. Fixes: c8755b29b58e ("mctp i3c: MCTP I3C driver") Signed-off-by: Leo Yang <Leo-Yang@quantatw.com> Link: https://patch.msgid.link/20250107031529.3296094-1-Leo-Yang@quantatw.com [pabeni@redhat.com: dropped already answered question from changelog] Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* Merge branch '100GbE' of ↵Jakub Kicinski9 days4-14/+33
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2025-01-07 (ice, igc) For ice: Arkadiusz corrects mask value being used to determine DPLL phase range. Przemyslaw corrects frequency value for E823 devices. For igc: En-Wei Wu adds a check and, early, return for failed register read. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: igc: return early when failing to read EECD register ice: fix incorrect PHY settings for 100 GB/s ice: fix max values for dpll pin phase adjust ==================== Link: https://patch.msgid.link/20250107190150.1758577-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * igc: return early when failing to read EECD registerEn-Wei Wu11 days1-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When booting with a dock connected, the igc driver may get stuck for ~40 seconds if PCIe link is lost during initialization. This happens because the driver access device after EECD register reads return all F's, indicating failed reads. Consequently, hw->hw_addr is set to NULL, which impacts subsequent rd32() reads. This leads to the driver hanging in igc_get_hw_semaphore_i225(), as the invalid hw->hw_addr prevents retrieving the expected value. To address this, a validation check and a corresponding return value catch is added for the EECD register read result. If all F's are returned, indicating PCIe link loss, the driver will return -ENXIO immediately. This avoids the 40-second hang and significantly improves boot time when using a dock with an igc NIC. Log before the patch: [ 0.911913] igc 0000:70:00.0: enabling device (0000 -> 0002) [ 0.912386] igc 0000:70:00.0: PTM enabled, 4ns granularity [ 1.571098] igc 0000:70:00.0 (unnamed net_device) (uninitialized): PCIe link lost, device now detached [ 43.449095] igc_get_hw_semaphore_i225: igc 0000:70:00.0 (unnamed net_device) (uninitialized): Driver can't access device - SMBI bit is set. [ 43.449186] igc 0000:70:00.0: probe with driver igc failed with error -13 [ 46.345701] igc 0000:70:00.0: enabling device (0000 -> 0002) [ 46.345777] igc 0000:70:00.0: PTM enabled, 4ns granularity Log after the patch: [ 1.031000] igc 0000:70:00.0: enabling device (0000 -> 0002) [ 1.032097] igc 0000:70:00.0: PTM enabled, 4ns granularity [ 1.642291] igc 0000:70:00.0 (unnamed net_device) (uninitialized): PCIe link lost, device now detached [ 5.480490] igc 0000:70:00.0: enabling device (0000 -> 0002) [ 5.480516] igc 0000:70:00.0: PTM enabled, 4ns granularity Fixes: ab4056126813 ("igc: Add NVM support") Cc: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com> Signed-off-by: En-Wei Wu <en-wei.wu@canonical.com> Reviewed-by: Vitaly Lifshits <vitaly.lifshits@intel.com> Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
| * ice: fix incorrect PHY settings for 100 GB/sPrzemyslaw Korba11 days1-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ptp4l application reports too high offset when ran on E823 device with a 100GB/s link. Those values cannot go under 100ns, like in a working case when using 100 GB/s cable. This is due to incorrect frequency settings on the PHY clocks for 100 GB/s speed. Changes are introduced to align with the internal hardware documentation, and correctly initialize frequency in PHY clocks with the frequency values that are in our HW spec. To reproduce the issue run ptp4l as a Time Receiver on E823 device, and observe the offset, which will never approach values seen in the PTP working case. Reproduction output: ptp4l -i enp137s0f3 -m -2 -s -f /etc/ptp4l_8275.conf ptp4l[5278.775]: master offset 12470 s2 freq +41288 path delay -3002 ptp4l[5278.837]: master offset 10525 s2 freq +39202 path delay -3002 ptp4l[5278.900]: master offset -24840 s2 freq -20130 path delay -3002 ptp4l[5278.963]: master offset 10597 s2 freq +37908 path delay -3002 ptp4l[5279.025]: master offset 8883 s2 freq +36031 path delay -3002 ptp4l[5279.088]: master offset 7267 s2 freq +34151 path delay -3002 ptp4l[5279.150]: master offset 5771 s2 freq +32316 path delay -3002 ptp4l[5279.213]: master offset 4388 s2 freq +30526 path delay -3002 ptp4l[5279.275]: master offset -30434 s2 freq -28485 path delay -3002 ptp4l[5279.338]: master offset -28041 s2 freq -27412 path delay -3002 ptp4l[5279.400]: master offset 7870 s2 freq +31118 path delay -3002 Fixes: 3a7496234d17 ("ice: implement basic E822 PTP support") Reviewed-by: Milena Olech <milena.olech@intel.com> Signed-off-by: Przemyslaw Korba <przemyslaw.korba@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
| * ice: fix max values for dpll pin phase adjustArkadiusz Kubalewski11 days2-12/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mask admin command returned max phase adjust value for both input and output pins. Only 31 bits are relevant, last released data sheet wrongly points that 32 bits are valid - see [1] 3.2.6.4.1 Get CCU Capabilities Command for reference. Fix of the datasheet itself is in progress. Fix the min/max assignment logic, previously the value was wrongly considered as negative value due to most significant bit being set. Example of previous broken behavior: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/dpll.yaml \ --do pin-get --json '{"id":1}'| grep phase-adjust 'phase-adjust': 0, 'phase-adjust-max': 16723, 'phase-adjust-min': -16723, Correct behavior with the fix: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/dpll.yaml \ --do pin-get --json '{"id":1}'| grep phase-adjust 'phase-adjust': 0, 'phase-adjust-max': 2147466925, 'phase-adjust-min': -2147466925, [1] https://cdrdv2.intel.com/v1/dl/getContent/613875?explicitVersion=true Fixes: 90e1c90750d7 ("ice: dpll: implement phase related callbacks") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
* | net: hns3: fix kernel crash when 1588 is sent on HIP08 devicesJie Wang10 days1-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, HIP08 devices does not register the ptp devices, so the hdev->ptp is NULL. But the tx process would still try to set hardware time stamp info with SKBTX_HW_TSTAMP flag and cause a kernel crash. [ 128.087798] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000018 ... [ 128.280251] pc : hclge_ptp_set_tx_info+0x2c/0x140 [hclge] [ 128.286600] lr : hclge_ptp_set_tx_info+0x20/0x140 [hclge] [ 128.292938] sp : ffff800059b93140 [ 128.297200] x29: ffff800059b93140 x28: 0000000000003280 [ 128.303455] x27: ffff800020d48280 x26: ffff0cb9dc814080 [ 128.309715] x25: ffff0cb9cde93fa0 x24: 0000000000000001 [ 128.315969] x23: 0000000000000000 x22: 0000000000000194 [ 128.322219] x21: ffff0cd94f986000 x20: 0000000000000000 [ 128.328462] x19: ffff0cb9d2a166c0 x18: 0000000000000000 [ 128.334698] x17: 0000000000000000 x16: ffffcf1fc523ed24 [ 128.340934] x15: 0000ffffd530a518 x14: 0000000000000000 [ 128.347162] x13: ffff0cd6bdb31310 x12: 0000000000000368 [ 128.353388] x11: ffff0cb9cfbc7070 x10: ffff2cf55dd11e02 [ 128.359606] x9 : ffffcf1f85a212b4 x8 : ffff0cd7cf27dab0 [ 128.365831] x7 : 0000000000000a20 x6 : ffff0cd7cf27d000 [ 128.372040] x5 : 0000000000000000 x4 : 000000000000ffff [ 128.378243] x3 : 0000000000000400 x2 : ffffcf1f85a21294 [ 128.384437] x1 : ffff0cb9db520080 x0 : ffff0cb9db500080 [ 128.390626] Call trace: [ 128.393964] hclge_ptp_set_tx_info+0x2c/0x140 [hclge] [ 128.399893] hns3_nic_net_xmit+0x39c/0x4c4 [hns3] [ 128.405468] xmit_one.constprop.0+0xc4/0x200 [ 128.410600] dev_hard_start_xmit+0x54/0xf0 [ 128.415556] sch_direct_xmit+0xe8/0x634 [ 128.420246] __dev_queue_xmit+0x224/0xc70 [ 128.425101] dev_queue_xmit+0x1c/0x40 [ 128.429608] ovs_vport_send+0xac/0x1a0 [openvswitch] [ 128.435409] do_output+0x60/0x17c [openvswitch] [ 128.440770] do_execute_actions+0x898/0x8c4 [openvswitch] [ 128.446993] ovs_execute_actions+0x64/0xf0 [openvswitch] [ 128.453129] ovs_dp_process_packet+0xa0/0x224 [openvswitch] [ 128.459530] ovs_vport_receive+0x7c/0xfc [openvswitch] [ 128.465497] internal_dev_xmit+0x34/0xb0 [openvswitch] [ 128.471460] xmit_one.constprop.0+0xc4/0x200 [ 128.476561] dev_hard_start_xmit+0x54/0xf0 [ 128.481489] __dev_queue_xmit+0x968/0xc70 [ 128.486330] dev_queue_xmit+0x1c/0x40 [ 128.490856] ip_finish_output2+0x250/0x570 [ 128.495810] __ip_finish_output+0x170/0x1e0 [ 128.500832] ip_finish_output+0x3c/0xf0 [ 128.505504] ip_output+0xbc/0x160 [ 128.509654] ip_send_skb+0x58/0xd4 [ 128.513892] udp_send_skb+0x12c/0x354 [ 128.518387] udp_sendmsg+0x7a8/0x9c0 [ 128.522793] inet_sendmsg+0x4c/0x8c [ 128.527116] __sock_sendmsg+0x48/0x80 [ 128.531609] __sys_sendto+0x124/0x164 [ 128.536099] __arm64_sys_sendto+0x30/0x5c [ 128.540935] invoke_syscall+0x50/0x130 [ 128.545508] el0_svc_common.constprop.0+0x10c/0x124 [ 128.551205] do_el0_svc+0x34/0xdc [ 128.555347] el0_svc+0x20/0x30 [ 128.559227] el0_sync_handler+0xb8/0xc0 [ 128.563883] el0_sync+0x160/0x180 Fixes: 0bf5eb788512 ("net: hns3: add support for PTP") Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20250106143642.539698-8-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: hns3: fixed hclge_fetch_pf_reg accesses bar space out of bounds issueHao Lan10 days2-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The TQP BAR space is divided into two segments. TQPs 0-1023 and TQPs 1024-1279 are in different BAR space addresses. However, hclge_fetch_pf_reg does not distinguish the tqp space information when reading the tqp space information. When the number of TQPs is greater than 1024, access bar space overwriting occurs. The problem of different segments has been considered during the initialization of tqp.io_base. Therefore, tqp.io_base is directly used when the queue is read in hclge_fetch_pf_reg. The error message: Unable to handle kernel paging request at virtual address ffff800037200000 pc : hclge_fetch_pf_reg+0x138/0x250 [hclge] lr : hclge_get_regs+0x84/0x1d0 [hclge] Call trace: hclge_fetch_pf_reg+0x138/0x250 [hclge] hclge_get_regs+0x84/0x1d0 [hclge] hns3_get_regs+0x2c/0x50 [hns3] ethtool_get_regs+0xf4/0x270 dev_ethtool+0x674/0x8a0 dev_ioctl+0x270/0x36c sock_do_ioctl+0x110/0x2a0 sock_ioctl+0x2ac/0x530 __arm64_sys_ioctl+0xa8/0x100 invoke_syscall+0x4c/0x124 el0_svc_common.constprop.0+0x140/0x15c do_el0_svc+0x30/0xd0 el0_svc+0x1c/0x2c el0_sync_handler+0xb0/0xb4 el0_sync+0x168/0x180 Fixes: 939ccd107ffc ("net: hns3: move dump regs function to a separate file") Signed-off-by: Hao Lan <lanhao@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Link: https://patch.msgid.link/20250106143642.539698-7-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: hns3: initialize reset_timer before hclgevf_misc_irq_init()Jian Shen10 days1-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | Currently the misc irq is initialized before reset_timer setup. But it will access the reset_timer in the irq handler. So initialize the reset_timer earlier. Fixes: ff200099d271 ("net: hns3: remove unnecessary work in hclgevf_main") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Link: https://patch.msgid.link/20250106143642.539698-6-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: hns3: don't auto enable misc vectorJian Shen10 days1-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, there is a time window between misc irq enabled and service task inited. If an interrupte is reported at this time, it will cause warning like below: [ 16.324639] Call trace: [ 16.324641] __queue_delayed_work+0xb8/0xe0 [ 16.324643] mod_delayed_work_on+0x78/0xd0 [ 16.324655] hclge_errhand_task_schedule+0x58/0x90 [hclge] [ 16.324662] hclge_misc_irq_handle+0x168/0x240 [hclge] [ 16.324666] __handle_irq_event_percpu+0x64/0x1e0 [ 16.324667] handle_irq_event+0x80/0x170 [ 16.324670] handle_fasteoi_edge_irq+0x110/0x2bc [ 16.324671] __handle_domain_irq+0x84/0xfc [ 16.324673] gic_handle_irq+0x88/0x2c0 [ 16.324674] el1_irq+0xb8/0x140 [ 16.324677] arch_cpu_idle+0x18/0x40 [ 16.324679] default_idle_call+0x5c/0x1bc [ 16.324682] cpuidle_idle_call+0x18c/0x1c4 [ 16.324684] do_idle+0x174/0x17c [ 16.324685] cpu_startup_entry+0x30/0x6c [ 16.324687] secondary_start_kernel+0x1a4/0x280 [ 16.324688] ---[ end trace 6aa0bff672a964aa ]--- So don't auto enable misc vector when request irq.. Fixes: 7be1b9f3e99f ("net: hns3: make hclge_service use delayed workqueue") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Link: https://patch.msgid.link/20250106143642.539698-5-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: hns3: Resolved the issue that the debugfs query result is inconsistent.Hao Lan10 days2-68/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch modifies the implementation of debugfs: When the user process stops unexpectedly, not all data of the file system is read. In this case, the save_buf pointer is not released. When the user process is called next time, save_buf is used to copy the cached data to the user space. As a result, the queried data is stale. To solve this problem, this patch implements .open() and .release() handler for debugfs file_operations. moving allocation buffer and execution of the cmd to the .open() handler and freeing in to the .release() handler. Allocate separate buffer for each reader and associate the buffer with the file pointer. When different user read processes no longer share the buffer, the stale data problem is fixed. Fixes: 5e69ea7ee2a6 ("net: hns3: refactor the debugfs process") Signed-off-by: Hao Lan <lanhao@huawei.com> Signed-off-by: Guangwei Zhang <zhangwangwei6@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20250106143642.539698-4-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: hns3: fix missing features due to dev->features configuration too earlyHao Lan10 days1-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the netdev->features is configured in hns3_nic_set_features. As a result, __netdev_update_features considers that there is no feature difference, and the procedures of the real features are missing. Fixes: 2a7556bb2b73 ("net: hns3: implement ndo_features_check ops for hns3 driver") Signed-off-by: Hao Lan <lanhao@huawei.com> Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20250106143642.539698-3-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: hns3: fixed reset failure issues caused by the incorrect reset typeHao Lan10 days2-10/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a reset type that is not supported by the driver is input, a reset pending flag bit of the HNAE3_NONE_RESET type is generated in reset_pending. The driver does not have a mechanism to clear this type of error. As a result, the driver considers that the reset is not complete. This patch provides a mechanism to clear the HNAE3_NONE_RESET flag and the parameter of hnae3_ae_ops.set_default_reset_request is verified. The error message: hns3 0000:39:01.0: cmd failed -16 hns3 0000:39:01.0: hclge device re-init failed, VF is disabled! hns3 0000:39:01.0: failed to reset VF stack hns3 0000:39:01.0: failed to reset VF(4) hns3 0000:39:01.0: prepare reset(2) wait done hns3 0000:39:01.0 eth4: already uninitialized Use the crash tool to view struct hclgevf_dev: struct hclgevf_dev { ... default_reset_request = 0x20, reset_level = HNAE3_NONE_RESET, reset_pending = 0x100, reset_type = HNAE3_NONE_RESET, ... }; Fixes: 720bd5837e37 ("net: hns3: add set_default_reset_request in the hnae3_ae_ops") Signed-off-by: Hao Lan <lanhao@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Link: https://patch.msgid.link/20250106143642.539698-2-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | eth: gve: use appropriate helper to set xdp_featuresJakub Kicinski10 days1-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit f85949f98206 ("xdp: add xdp_set_features_flag utility routine") added routines to inform the core about XDP flag changes. GVE support was added around the same time and missed using them. GVE only changes the flags on error recover or resume. Presumably the flags may change during resume if VM migrated. User would not get the notification and upper devices would not get a chance to recalculate their flags. Fixes: 75eaae158b1b ("gve: Add XDP DROP and TX support for GQI-QPL format") Reviewed-By: Jeroen de Borst <jeroendb@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250106180210.1861784-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | eth: fbnic: Revert "eth: fbnic: Add hardware monitoring support via HWMON ↵Su Hui11 days7-126/+0
|/ | | | | | | | | | | | | | | | interface" There is a garbage value problem in fbnic_mac_get_sensor_asic(). 'fw_cmpl' is uninitialized which makes 'sensor' and '*val' to be stored garbage value. Revert commit d85ebade02e8 ("eth: fbnic: Add hardware monitoring support via HWMON interface") to avoid this problem. Fixes: d85ebade02e8 ("eth: fbnic: Add hardware monitoring support via HWMON interface") Signed-off-by: Su Hui <suhui@nfschina.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Suggested-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20250106023647.47756-1-suhui@nfschina.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* cxgb4: Avoid removal of uninserted tidAnumula Murali Mohan Reddy11 days1-1/+4
| | | | | | | | | | | | During ARP failure, tid is not inserted but _c4iw_free_ep() attempts to remove tid which results in error. This patch fixes the issue by avoiding removal of uninserted tid. Fixes: 59437d78f088 ("cxgb4/chtls: fix ULD connection failures due to wrong TID base") Signed-off-by: Anumula Murali Mohan Reddy <anumula@chelsio.com> Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Link: https://patch.msgid.link/20250103092327.1011925-1-anumula@chelsio.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* bnxt_en: Fix DIM shutdownMichael Chan11 days1-5/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | DIM work will call the firmware to adjust the coalescing parameters on the RX rings. We should cancel DIM work before we call the firmware to free the RX rings. Otherwise, FW will reject the call from DIM work if the RX ring has been freed. This will generate an error message like this: bnxt_en 0000:21:00.1 ens2f1np1: hwrm req_type 0x53 seq id 0x6fca error 0x2 and cause unnecessary concern for the user. It is also possible to modify the coalescing parameters of the wrong ring if the ring has been re-allocated. To prevent this, cancel DIM work right before freeing the RX rings. We also have to add a check in NAPI poll to not schedule DIM if the RX rings are shutting down. Check that the VNIC is active before we schedule DIM. The VNIC is always disabled before we free the RX rings. Fixes: 0bc0b97fca73 ("bnxt_en: cleanup DIM work on device shutdown") Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250104043849.3482067-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* bnxt_en: Fix possible memory leak when hwrm_req_replace failsKalesh AP11 days1-1/+2
| | | | | | | | | | | | When hwrm_req_replace() fails, the driver is not invoking bnxt_req_drop() which could cause a memory leak. Fixes: bbf33d1d9805 ("bnxt_en: update all firmware calls to use the new APIs") Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250104043849.3482067-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* pds_core: limit loop over fw name listShannon Nelson11 days1-1/+1
| | | | | | | | | | | | | | | Add an array size limit to the for-loop to be sure we don't try to reference a fw_version string off the end of the fw info names array. We know that our firmware only has a limited number of firmware slot names, but we shouldn't leave this unchecked. Fixes: 45d76f492938 ("pds_core: set up device and adminq") Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250103195147.7408-1-shannon.nelson@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: libwx: fix firmware mailbox abnormal returnJiawen Wu13 days1-13/+11
| | | | | | | | | | | | | | | | | | | The existing SW-FW interaction flow on the driver is wrong. Follow this wrong flow, driver would never return error if there is a unknown command. Since firmware writes back 'firmware ready' and 'unknown command' in the mailbox message if there is an unknown command sent by driver. So reading 'firmware ready' does not timeout. Then driver would mistakenly believe that the interaction has completed successfully. It tends to happen with the use of custom firmware. Move the check for 'unknown command' out of the poll timeout for 'firmware ready'. And adjust the debug log so that mailbox messages are always printed when commands timeout. Fixes: 1efa9bfe58c5 ("net: libwx: Implement interaction with firmware") Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Link: https://patch.msgid.link/20250103081013.1995939-1-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* Merge tag 'ieee802154-for-net-2025-01-03' of ↵Jakub Kicinski13 days1-1/+5
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan Stefan Schmidt says: ==================== pull-request: ieee802154 for net 2025-01-03 Keisuke Nishimura provided a fix to check for kfifo_alloc() in the ca8210 driver. Lizhi Xu provided a fix a corrupted list, found by syzkaller, by checking local interfaces first. * tag 'ieee802154-for-net-2025-01-03' of git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan: mac802154: check local interfaces before deleting sdata list ieee802154: ca8210: Add missing check for kfifo_alloc() in ca8210_probe() ==================== Link: https://patch.msgid.link/20250103160046.469363-1-stefan@datenfreihafen.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * ieee802154: ca8210: Add missing check for kfifo_alloc() in ca8210_probe()Keisuke Nishimura2024-11-191-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | ca8210_test_interface_init() returns the result of kfifo_alloc(), which can be non-zero in case of an error. The caller, ca8210_probe(), should check the return value and do error-handling if it fails. Fixes: ded845a781a5 ("ieee802154: Add CA8210 IEEE 802.15.4 device driver") Signed-off-by: Keisuke Nishimura <keisuke.nishimura@inria.fr> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/20241029182712.318271-1-keisuke.nishimura@inria.fr Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>