linux - linux

	Commit message (Collapse)	Author	Files	Lines
2022-07-29	net/mlx5e: kTLS, Recycle objects of device-offloaded TLS TX connections	Tariq Toukan	5	-47/+199
	The transport interface send (TIS) object is responsible for performing all transport related operations of the transmit side. The ConnectX HW uses a TIS object to save and access the TLS crypto information and state of an offloaded TX kTLS connection. Before this patch, we used to create a new TIS per connection and destroy it once it’s closed. Every create and destroy of a TIS is a FW command. Same applies for the private TLS context, where we used to dynamically allocate and free it per connection. Resources recycling reduce the impact of the allocation/free operations and helps speeding up the connection rate. In this feature we maintain a pool of TX objects and use it to recycle the resources instead of re-creating them per connection. A cached TIS popped from the pool is updated to serve the new connection via the fast-path HW interface, updating the tls static and progress params. This is a very fast operation, significantly faster than FW commands. On recycling, a WQE fence is required after the context params change. This guarantees that the data is sent after the context has been successfully updated in hardware, and that the context modification doesn't interfere with existing traffic. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-29	net/mlx5e: kTLS, Take stats out of OOO handler	Tariq Toukan	1	-16/+11
	Let the caller of mlx5e_ktls_tx_handle_ooo() take care of updating the stats, according to the returned value. As the switch/case blocks are already there, this change saves unnecessary branches in the handler. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-29	net/mlx5e: kTLS, Introduce TLS-specific create TIS	Tariq Toukan	1	-5/+9
	TLS TIS objects have a defined role in mapping and reaching the HW TLS contexts. Some standard TIS attributes (like LAG port affinity) are not relevant for them. Use a dedicated TLS TIS create function instead of the generic mlx5e_create_tis. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-29	net/tls: Multi-threaded calls to TX tls_dev_del	Tariq Toukan	2	-32/+33
	Multiple TLS device-offloaded contexts can be added in parallel via concurrent calls to .tls_dev_add, while calls to .tls_dev_del are sequential in tls_device_gc_task. This is not a sustainable behavior. This creates a rate gap between add and del operations (addition rate outperforms the deletion rate). When running for enough time, the TLS device resources could get exhausted, failing to offload new connections. Replace the single-threaded garbage collector work with a per-context alternative, so they can be handled on several cores in parallel. Use a new dedicated destruct workqueue for this. Tested with mlx5 device: Before: 22141 add/sec, 103 del/sec After: 11684 add/sec, 11684 del/sec Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-29	net/tls: Perform immediate device ctx cleanup when possible	Tariq Toukan	1	-8/+18
	TLS context destructor can be run in atomic context. Cleanup operations for device-offloaded contexts could require access and interaction with the device callbacks, which might sleep. Hence, the cleanup of such contexts must be deferred and completed inside an async work. For all others, this is not necessary, as cleanup is atomic. Invoke cleanup immediately for them, avoiding queueing redundant gc work. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-29	tls: rx: Fix unsigned comparison with less than zero	Yang Li	1	-1/+2
	The return from the call to tls_rx_msg_size() is int, it can be a negative error code, however this is being assigned to an unsigned long variable 'sz', so making 'sz' an int. Eliminate the following coccicheck warning: ./net/tls/tls_strp.c:211:6-8: WARNING: Unsigned expression compared with zero: sz < 0 Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Link: https://lore.kernel.org/r/20220728031019.32838-1-yang.lee@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-29	tls: rx: fix the false positive warning	Jakub Kicinski	1	-1/+1
	I went too far in the accessor conversion, we can't use tls_strp_msg() after decryption because the message may not be ready. What we care about on this path is that the output skb is detached, i.e. we didn't somehow just turn around and used the input skb with its TCP data still attached. So look at the anchor directly. Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-29	tls: strp: rename and multithread the workqueue	Jakub Kicinski	1	-1/+1
	Paolo points out that there seems to be no strong reason strparser users a single threaded workqueue. Perhaps there were some performance or pinning considerations? Since we don't know (and it's the slow path) let's default to the most natural, multi-threaded choice. Also rename the workqueue to "tls-". Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-29	tls: rx: don't consider sock_rcvtimeo() cumulative	Jakub Kicinski	1	-18/+19
	Eric indicates that restarting rcvtimeo on every wait may be fine. I thought that we should consider it cumulative, and made tls_rx_reader_lock() return the remaining timeo after acquiring the reader lock. tls_rx_rec_wait() gets its timeout passed in by value so it does not keep track of time previously spent. Make the lock waiting consistent with tls_rx_rec_wait() - don't keep track of time spent. Read the timeo fresh in tls_rx_rec_wait(). It's unclear to me why callers are supposed to cache the value. Link: https://lore.kernel.org/all/CANn89iKcmSfWgvZjzNGbsrndmCch2HC_EPZ7qmGboDNaWoviNQ@mail.gmail.com/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-29	selftests: tls: handful of memrnd() and length checks	Jakub Kicinski	1	-9/+17
	Add a handful of memory randomizations and precise length checks. Nothing is really broken here, I did this to increase confidence when debugging. It does fix a GCC warning, tho. Apparently GCC recognizes that memory needs to be initialized for send() but does not recognize that for write(). Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-29	net: usb: delete extra space and tab in blank line	Xie Shaowen	5	-30/+30
	delete extra space and tab in blank line, there is no functional change. Signed-off-by: Xie Shaowen <studentxswpy@163.com> Link: https://lore.kernel.org/r/20220727081253.3043941-1-studentxswpy@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28	stmmac: dwmac-mediatek: fix resource leak in probe	Dan Carpenter	1	-4/+5
	If mediatek_dwmac_clks_config() fails, then call stmmac_remove_config_dt() before returning. Otherwise it is a resource leak. Fixes: fa4b3ca60e80 ("stmmac: dwmac-mediatek: fix clock issue") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/YuJ4aZyMUlG6yGGa@kili Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28	ipv6/addrconf: fix a null-ptr-deref bug for ip6_ptr	Ziyang Xuan	1	-0/+3
	Change net device's MTU to smaller than IPV6_MIN_MTU or unregister device while matching route. That may trigger null-ptr-deref bug for ip6_ptr probability as following. ========================================================= BUG: KASAN: null-ptr-deref in find_match.part.0+0x70/0x134 Read of size 4 at addr 0000000000000308 by task ping6/263 CPU: 2 PID: 263 Comm: ping6 Not tainted 5.19.0-rc7+ #14 Call trace: dump_backtrace+0x1a8/0x230 show_stack+0x20/0x70 dump_stack_lvl+0x68/0x84 print_report+0xc4/0x120 kasan_report+0x84/0x120 __asan_load4+0x94/0xd0 find_match.part.0+0x70/0x134 __find_rr_leaf+0x408/0x470 fib6_table_lookup+0x264/0x540 ip6_pol_route+0xf4/0x260 ip6_pol_route_output+0x58/0x70 fib6_rule_lookup+0x1a8/0x330 ip6_route_output_flags_noref+0xd8/0x1a0 ip6_route_output_flags+0x58/0x160 ip6_dst_lookup_tail+0x5b4/0x85c ip6_dst_lookup_flow+0x98/0x120 rawv6_sendmsg+0x49c/0xc70 inet_sendmsg+0x68/0x94 Reproducer as following: Firstly, prepare conditions: $ip netns add ns1 $ip netns add ns2 $ip link add veth1 type veth peer name veth2 $ip link set veth1 netns ns1 $ip link set veth2 netns ns2 $ip netns exec ns1 ip -6 addr add 2001:0db8:0:f101::1/64 dev veth1 $ip netns exec ns2 ip -6 addr add 2001:0db8:0:f101::2/64 dev veth2 $ip netns exec ns1 ifconfig veth1 up $ip netns exec ns2 ifconfig veth2 up $ip netns exec ns1 ip -6 route add 2000::/64 dev veth1 metric 1 $ip netns exec ns2 ip -6 route add 2001::/64 dev veth2 metric 1 Secondly, execute the following two commands in two ssh windows respectively: $ip netns exec ns1 sh $while true; do ip -6 addr add 2001:0db8:0:f101::1/64 dev veth1; ip -6 route add 2000::/64 dev veth1 metric 1; ping6 2000::2; done $ip netns exec ns1 sh $while true; do ip link set veth1 mtu 1000; ip link set veth1 mtu 1500; sleep 5; done It is because ip6_ptr has been assigned to NULL in addrconf_ifdown() firstly, then ip6_ignore_linkdown() accesses ip6_ptr directly without NULL check. cpu0 cpu1 fib6_table_lookup __find_rr_leaf addrconf_notify [ NETDEV_CHANGEMTU ] addrconf_ifdown RCU_INIT_POINTER(dev->ip6_ptr, NULL) find_match ip6_ignore_linkdown So we can add NULL check for ip6_ptr before using in ip6_ignore_linkdown() to fix the null-ptr-deref bug. Fixes: dcd1f572954f ("net/ipv6: Remove fib6_idev") Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220728013307.656257-1-william.xuanziyang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28	net: ping6: Fix memleak in ipv6_renew_options().	Kuniyuki Iwashima	1	-0/+6
	When we close ping6 sockets, some resources are left unfreed because pingv6_prot is missing sk->sk_prot->destroy(). As reported by syzbot [0], just three syscalls leak 96 bytes and easily cause OOM. struct ipv6_sr_hdr hdr; char data[24] = {0}; int fd; hdr = (struct ipv6_sr_hdr )data; hdr->hdrlen = 2; hdr->type = IPV6_SRCRT_TYPE_4; fd = socket(AF_INET6, SOCK_DGRAM, NEXTHDR_ICMP); setsockopt(fd, IPPROTO_IPV6, IPV6_RTHDR, data, 24); close(fd); To fix memory leaks, let's add a destroy function. Note the socket() syscall checks if the GID is within the range of net.ipv4.ping_group_range. The default value is [1, 0] so that no GID meets the condition (1 <= GID <= 0). Thus, the local DoS does not succeed until we change the default value. However, at least Ubuntu/Fedora/RHEL loosen it. $ cat /usr/lib/sysctl.d/50-default.conf ... -net.ipv4.ping_group_range = 0 2147483647 Also, there could be another path reported with these options, and some of them require CAP_NET_RAW. setsockopt IPV6_ADDRFORM (inet6_sk(sk)->pktoptions) IPV6_RECVPATHMTU (inet6_sk(sk)->rxpmtu) IPV6_HOPOPTS (inet6_sk(sk)->opt) IPV6_RTHDRDSTOPTS (inet6_sk(sk)->opt) IPV6_RTHDR (inet6_sk(sk)->opt) IPV6_DSTOPTS (inet6_sk(sk)->opt) IPV6_2292PKTOPTIONS (inet6_sk(sk)->opt) getsockopt IPV6_FLOWLABEL_MGR (inet6_sk(sk)->ipv6_fl_list) For the record, I left a different splat with syzbot's one. unreferenced object 0xffff888006270c60 (size 96): comm "repro2", pid 231, jiffies 4294696626 (age 13.118s) hex dump (first 32 bytes): 01 00 00 00 44 00 00 00 00 00 00 00 00 00 00 00 ....D........... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000f6bc7ea9>] sock_kmalloc (net/core/sock.c:2564 net/core/sock.c:2554) [<000000006d699550>] do_ipv6_setsockopt.constprop.0 (net/ipv6/ipv6_sockglue.c:715) [<00000000c3c3b1f5>] ipv6_setsockopt (net/ipv6/ipv6_sockglue.c:1024) [<000000007096a025>] __sys_setsockopt (net/socket.c:2254) [<000000003a8ff47b>] __x64_sys_setsockopt (net/socket.c:2265 net/socket.c:2262 net/socket.c:2262) [<000000007c409dcb>] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) [<00000000e939c4a9>] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) [0]: https://syzkaller.appspot.com/bug?extid=a8430774139ec3ab7176 Fixes: 6d0bfe226116 ("net: ipv6: Add IPv6 support to the ping socket.") Reported-by: syzbot+a8430774139ec3ab7176@syzkaller.appspotmail.com Reported-by: Ayushman Dutta <ayudutta@amazon.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220728012220.46918-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28	watch_queue: Fix missing locking in add_watch_to_object()	Linus Torvalds	1	-22/+36
	If a watch is being added to a queue, it needs to guard against interference from addition of a new watch, manual removal of a watch and removal of a watch due to some other queue being destroyed. KEYCTL_WATCH_KEY guards against this for the same {key,queue} pair by holding the key->sem writelocked and by holding refs on both the key and the queue - but that doesn't prevent interaction from other {key,queue} pairs. While add_watch_to_object() does take the spinlock on the event queue, it doesn't take the lock on the source's watch list. The assumption was that the caller would prevent that (say by taking key->sem) - but that doesn't prevent interference from the destruction of another queue. Fix this by locking the watcher list in add_watch_to_object(). Fixes: c73be61cede5 ("pipe: Add general notification queue support") Reported-by: syzbot+03d7b43290037d1f87ca@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> cc: keyrings@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-07-28	watch_queue: Fix missing rcu annotation	David Howells	1	-1/+1
	Since __post_watch_notification() walks wlist->watchers with only the RCU read lock held, we need to use RCU methods to add to the list (we already use RCU methods to remove from the list). Fix add_watch_to_object() to use hlist_add_head_rcu() instead of hlist_add_head() for that list. Fixes: c73be61cede5 ("pipe: Add general notification queue support") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-07-28	net: cdns,macb: use correct xlnx prefix for Xilinx	Krzysztof Kozlowski	1	-3/+5
	Use correct vendor for Xilinx versions of Cadence MACB/GEM Ethernet controller. The Versal compatible was not released, so it can be changed. Zynq-7xxx and Ultrascale+ has to be kept in new and deprecated form. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Acked-by: Harini Katakam <harini.katakam@amd.com> Link: https://lore.kernel.org/r/20220726070802.26579-2-krzysztof.kozlowski@linaro.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-28	dt-bindings: net: cdns,macb: use correct xlnx prefix for Xilinx	Krzysztof Kozlowski	1	-2/+9
	Use correct vendor for Xilinx versions of Cadence MACB/GEM Ethernet controller. The Versal compatible was not released, so it can be changed. Zynq-7xxx and Ultrascale+ has to be kept in new and deprecated form. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20220726070802.26579-1-krzysztof.kozlowski@linaro.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-28	net/funeth: Fix fun_xdp_tx() and XDP packet reclaim	Dimitris Michailidis	3	-15/+16
	The current implementation of fun_xdp_tx(), used for XPD_TX, is incorrect in that it takes an address/length pair and later releases it with page_frag_free(). It is OK for XDP_TX but the same code is used by ndo_xdp_xmit. In that case it loses the XDP memory type and releases the packet incorrectly for some of the types. Assorted breakage follows. Change fun_xdp_tx() to take xdp_frame and rely on xdp_return_frame() in reclaim. Fixes: db37bc177dae ("net/funeth: add the data path") Signed-off-by: Dimitris Michailidis <dmichail@fungible.com> Link: https://lore.kernel.org/r/20220726215923.7887-1-dmichail@fungible.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-28	add missing includes and forward declarations to networking includes under ↵	Jakub Kicinski	18	-6/+57
	linux/ Similarly to a recent include/net/ cleanup, this patch adds missing includes to networking headers under include/linux. All these problems are currently masked by the existing users including the missing dependency before the broken header. Link: https://lore.kernel.org/all/20220723045755.2676857-1-kuba@kernel.org/ v1 Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20220726215652.158167-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-28	Revert "Merge branch 'octeontx2-minor-tc-fixes'"	Paolo Abeni	1	-73/+33
	This reverts commit 35d099da41967f114c6472b838e12014706c26e7, reversing changes made to 58d8bcd47ecc55f1ab92320fe36c31ff4d83cc0c. I wrongly applied that to the net-next tree instead of the intended target tree (net). Reverting it on net-next. Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-28	net: dsa: mv88e6xxx: fix speed setting for CPU/DSA ports	Marcin Wojtas	1	-1/+6
	Commit 3c783b83bd0f ("net: dsa: mv88e6xxx: get rid of SPEED_MAX setting") stopped relying on SPEED_MAX constant and hardcoded speed settings for the switch ports and rely on phylink configuration. It turned out, however, that when the relevant code is called, the mac_capabilites of CPU/DSA port remain unset. mv88e6xxx_setup_port() is called via mv88e6xxx_setup() in dsa_tree_setup_switches(), which precedes setting the caps in phylink_get_caps down in the chain of dsa_tree_setup_ports(). As a result the mac_capabilites are 0 and the default speed for CPU/DSA port is 10M at the start. To fix that, execute mv88e6xxx_get_caps() and obtain the capabilities driectly. Fixes: 3c783b83bd0f ("net: dsa: mv88e6xxx: get rid of SPEED_MAX setting") Signed-off-by: Marcin Wojtas <mw@semihalf.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20220726230918.2772378-1-mw@semihalf.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28	net: devlink: remove redundant net_eq() check from sb_pool_get_dumpit()	Jiri Pirko	1	-2/+1
	The net_eq() check is already performed inside devlinks_xa_for_each_registered_get() helper, so remove the redundant appearance. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20220727055912.568391-1-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28	dt-bindings: net: hirschmann,hellcreek: use absolute path to other schema	Krzysztof Kozlowski	1	-1/+1
	Absolute path to other DT schema is preferred over relative one. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Acked-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20220726115650.100726-1-krzysztof.kozlowski@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28	net/sched: sch_cbq: change the type of cbq_set_lss to void	Zhengchao Shao	1	-2/+1
	Change the type of cbq_set_lss to void. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Link: https://lore.kernel.org/r/20220726030748.243505-1-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28	sctp: leave the err path free in sctp_stream_init to sctp_stream_free	Xin Long	2	-19/+5
	A NULL pointer dereference was reported by Wei Chen: BUG: kernel NULL pointer dereference, address: 0000000000000000 RIP: 0010:__list_del_entry_valid+0x26/0x80 Call Trace: <TASK> sctp_sched_dequeue_common+0x1c/0x90 sctp_sched_prio_dequeue+0x67/0x80 __sctp_outq_teardown+0x299/0x380 sctp_outq_free+0x15/0x20 sctp_association_free+0xc3/0x440 sctp_do_sm+0x1ca7/0x2210 sctp_assoc_bh_rcv+0x1f6/0x340 This happens when calling sctp_sendmsg without connecting to server first. In this case, a data chunk already queues up in send queue of client side when processing the INIT_ACK from server in sctp_process_init() where it calls sctp_stream_init() to alloc stream_in. If it fails to alloc stream_in all stream_out will be freed in sctp_stream_init's err path. Then in the asoc freeing it will crash when dequeuing this data chunk as stream_out is missing. As we can't free stream out before dequeuing all data from send queue, and this patch is to fix it by moving the err path stream_out/in freeing in sctp_stream_init() to sctp_stream_free() which is eventually called when freeing the asoc in sctp_association_free(). This fix also makes the code in sctp_process_init() more clear. Note that in sctp_association_init() when it fails in sctp_stream_init(), sctp_association_free() will not be called, and in that case it should go to 'stream_free' err path to free stream instead of 'fail_init'. Fixes: 5bbbbe32a431 ("sctp: introduce stream scheduler foundations") Reported-by: Wei Chen <harperchen1110@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/831a3dc100c4908ff76e5bcc363be97f2778bc0b.1658787066.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28	sfc: disable softirqs for ptp TX	Alejandro Lucero	1	-0/+22
	Sending a PTP packet can imply to use the normal TX driver datapath but invoked from the driver's ptp worker. The kernel generic TX code disables softirqs and preemption before calling specific driver TX code, but the ptp worker does not. Although current ptp driver functionality does not require it, there are several reasons for doing so: 1) The invoked code is always executed with softirqs disabled for non PTP packets. 2) Better if a ptp packet transmission is not interrupted by softirq handling which could lead to high latencies. 3) netdev_xmit_more used by the TX code requires preemption to be disabled. Indeed a solution for dealing with kernel preemption state based on static kernel configuration is not possible since the introduction of dynamic preemption level configuration at boot time using the static calls functionality. Fixes: f79c957a0b537 ("drivers: net: sfc: use netdev_xmit_more helper") Signed-off-by: Alejandro Lucero <alejandro.lucero-palau@amd.com> Link: https://lore.kernel.org/r/20220726064504.49613-1-alejandro.lucero-palau@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28	ptp: ocp: Select CRC16 in the Kconfig.	Jonathan Lemon	1	-0/+1
	The crc16() function is used to check the firmware validity, but the library was not explicitly selected. Fixes: 3c3673bde50c ("ptp: ocp: Add firmware header checks") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Acked-by: Vadim Fedorenko <vadfed@fb.com> Link: https://lore.kernel.org/r/20220726220604.1339972-1-jonathan.lemon@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-27	tcp: md5: fix IPv4-mapped support	Eric Dumazet	1	-3/+12
	After the blamed commit, IPv4 SYN packets handled by a dual stack IPv6 socket are dropped, even if perfectly valid. $ nstat \| grep MD5 TcpExtTCPMD5Failure 5 0.0 For a dual stack listener, an incoming IPv4 SYN packet would call tcp_inbound_md5_hash() with @family == AF_INET, while tp->af_specific is pointing to tcp_sock_ipv6_specific. Only later when an IPv4-mapped child is created, tp->af_specific is changed to tcp_sock_ipv6_mapped_specific. Fixes: 7bbb765b7349 ("net/tcp: Merge TCP-MD5 inbound callbacks") Reported-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Dmitry Safonov <dima@arista.com> Tested-by: Leonard Crestez <cdleonard@gmail.com> Link: https://lore.kernel.org/r/20220726115743.2759832-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-27	net/smc: Enable module load on netlink usage	Stefan Raspl	2	-0/+2
	Previously, the smc and smc_diag modules were automatically loaded as dependencies of the ism module whenever an ISM device was present. With the pending rework of the ISM API, the smc module will no longer automatically be loaded in presence of an ISM device. Usage of an AF_SMC socket will still trigger loading of the smc modules, but usage of a netlink socket will not. This is addressed by setting the correct module aliases. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Wenjia Zhang < wenjia@linux.ibm.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-27	net/smc: Pass on DMBE bit mask in IRQ handler	Stefan Raspl	3	-5/+7
	Make the DMBE bits, which are passed on individually in ism_move() as parameter idx, available to the receiver. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Wenjia Zhang < wenjia@linux.ibm.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-27	s390/ism: Cleanups	Stefan Raspl	3	-8/+7
	Reworked signature of the function to retrieve the system EID: No plausible reason to use a double pointer. And neither to pass in the device as an argument, as this identifier is by definition per system, not per device. Plus some minor consistency edits. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Wenjia Zhang < wenjia@linux.ibm.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-27	net/smc: Eliminate struct smc_ism_position	Heiko Carstens	3	-27/+14
	This struct is used in a single place only, and its usage generates inefficient code. Time to clean up! Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-and-tested-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Wenjia Zhang < wenjia@linux.ibm.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-27	virtio-net: fix the race between refill work and close	Jason Wang	1	-3/+34
	We try using cancel_delayed_work_sync() to prevent the work from enabling NAPI. This is insufficient since we don't disable the source of the refill work scheduling. This means an NAPI poll callback after cancel_delayed_work_sync() can schedule the refill work then can re-enable the NAPI that leads to use-after-free [1]. Since the work can enable NAPI, we can't simply disable NAPI before calling cancel_delayed_work_sync(). So fix this by introducing a dedicated boolean to control whether or not the work could be scheduled from NAPI. [1] ================================================================== BUG: KASAN: use-after-free in refill_work+0x43/0xd4 Read of size 2 at addr ffff88810562c92e by task kworker/2:1/42 CPU: 2 PID: 42 Comm: kworker/2:1 Not tainted 5.19.0-rc1+ #480 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: events refill_work Call Trace: <TASK> dump_stack_lvl+0x34/0x44 print_report.cold+0xbb/0x6ac ? _printk+0xad/0xde ? refill_work+0x43/0xd4 kasan_report+0xa8/0x130 ? refill_work+0x43/0xd4 refill_work+0x43/0xd4 process_one_work+0x43d/0x780 worker_thread+0x2a0/0x6f0 ? process_one_work+0x780/0x780 kthread+0x167/0x1a0 ? kthread_exit+0x50/0x50 ret_from_fork+0x22/0x30 </TASK> ... Fixes: b2baed69e605c ("virtio_net: set/cancel work on ndo_open/ndo_stop") Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-27	net: dsa: microchip: add support for phylink mac config	Arun Ramadoss	6	-33/+21
	This patch add support for phylink mac config for ksz series of switches. All the files ksz8795, ksz9477 and lan937x uses the ksz common xmii function. Instead of calling from the individual files, it is moved to the ksz common phylink mac config function. Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-27	net: dsa: microchip: ksz8795: use common xmii function	Arun Ramadoss	2	-42/+1
	This patch updates the ksz8795 cpu configuration to use the ksz common xmii set functions. Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-27	net: dsa: microchip: ksz9477: use common xmii function	Arun Ramadoss	4	-167/+49
	In ksz9477.c file, configuring the xmii register is performed based on the flag NEW_XMII. The flag is reset for ksz9893 switch and set for other switch. This patch uses the ksz common xmii set and get function. The bit values are configured based on the chip id. Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-27	net: dsa: microchip: apply rgmii tx and rx delay in phylink mac config	Arun Ramadoss	5	-1/+130
	This patch read the rgmii tx and rx delay from device tree and stored it in the ksz_port. It applies the rgmii delay to the xmii tune adjust register based on the interface selected in phylink mac config. There are two rgmii port in LAN937x and value to be loaded in the register vary depends on the port selected. Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-27	net: dsa: microchip: lan937x: add support for configuing xMII register	Arun Ramadoss	4	-40/+53
	This patch add the common ksz_set_xmii function for ksz series switch and update the lan937x code phylink mac config. The register address for the ksz8795 is Port 5 Interface control 6 and for all other switch is xMII Control 1. The bit value for selecting the interface is same for KSZ8795 and KSZ9893 are same. The bit values for KSZ9477 and lan973x are same. So, this patch add the bit value for each switches in ksz_chip_data and configure the registers based on the chip id. Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-27	net: dsa: microchip: add support for common phylink mac link up	Arun Ramadoss	4	-33/+16
	This patch add the support for common phylink mac link up for the ksz series switch. The register address, bit position and values are configured based on the chip id to the dev->info structure. Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-27	net: dsa: microchip: add common duplex and flow control function	Arun Ramadoss	5	-23/+57
	This patch add common function for configuring the Full/Half duplex and transmit/receive flow control. KSZ8795 uses the Global control register 4 for configuring the duplex and flow control, whereas all other KSZ9477 based switch uses the xMII Control 0 register. Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-27	net: dsa: microchip: add common ksz port xmii speed selection function	Arun Ramadoss	5	-15/+69
	This patch adds the function for configuring the 100/10Mbps speed selection for the ksz switches. KSZ8795 switch uses Global control 4 register 0x06 bit 4 for choosing 100/10Mpbs. Other switches uses xMII control 1 0xN300 for it. For KSZ8795, if the bit is set then 10Mbps is chosen and if bit is clear then 100Mbps chosen. For all other switches it is other way around, if the bit is set then 100Mbps is chosen. So, this patch add the generic function for ksz switch to select the 100/10Mbps speed selection. While configuring, first it disables the gigabit functionality and then configure the respective speed. Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-27	net: dsa: microchip: add common gigabit set and get function	Arun Ramadoss	6	-48/+82
	This patch add helper function for setting and getting the gigabit enable for the ksz series switch. KSZ8795 switch has different register address compared to all other ksz switches. KSZ8795 series uses the Port 5 Interface control 6 Bit 6 for configuring the 1Gbps or 100/10Mbps speed selection. All other switches uses the xMII control 1 0xN301 register Bit6 for gigabit. Further, for KSZ8795 & KSZ9893 switches if bit 1 then 1Gbps is chosen and if bit 0 then 100/10Mbps is chosen. It is other way around for other switches bit 0 is for 1Gbps. So, this patch implements the common function for configuring the gigabit set and get capability. Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-27	selftests: net: Fix typo 'the the' in comment	Slark Xiao	1	-1/+1
	Replace 'the the' with 'the' in the comment. Signed-off-by: Slark Xiao <slark_xiao@163.com> Link: https://lore.kernel.org/r/20220725020124.5760-1-slark_xiao@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-27	ip6mr: remove stray rcu_read_unlock() from ip6_mr_forward()	Eric Dumazet	1	-3/+1
	One rcu_read_unlock() should have been removed in blamed commit. Fixes: 9b1c21d898fd ("ip6mr: do not acquire mrt_lock while calling ip6_mr_forward()") Reported-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20220725200554.2563581-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-27	mptcp: Do not return EINPROGRESS when subflow creation succeeds	Mat Martineau	1	-1/+1
	New subflows are created within the kernel using O_NONBLOCK, so EINPROGRESS is the expected return value from kernel_connect(). __mptcp_subflow_connect() has the correct logic to consider EINPROGRESS to be a successful case, but it has also used that error code as its return value. Before v5.19 this was benign: all the callers ignored the return value. Starting in v5.19 there is a MPTCP_PM_CMD_SUBFLOW_CREATE generic netlink command that does use the return value, so the EINPROGRESS gets propagated to userspace. Make __mptcp_subflow_connect() always return 0 on success instead. Fixes: ec3edaa7ca6c ("mptcp: Add handling of outgoing MP_JOIN requests") Fixes: 702c2f646d42 ("mptcp: netlink: allow userspace-driven subflow establishment") Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Link: https://lore.kernel.org/r/20220725205231.87529-1-mathew.j.martineau@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-27	mailmap: update Gao Xiang's email addresses	Gao Xiang	1	-0/+2
	I've been in Alibaba Cloud for more than one year, mainly to address cloud-native challenges (such as high-performance container images) for open source communities. Update my email addresses on behalf of my current employer (Alibaba Cloud) to support all my (team) work in this area. Also add an outdated @redhat.com address of me. Link: https://lkml.kernel.org/r/20220719154246.62970-1-xiang@kernel.org Signed-off-by: Gao Xiang <xiang@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-27	userfaultfd: provide properly masked address for huge-pages	Nadav Amit	1	-5/+7
	Commit 824ddc601adc ("userfaultfd: provide unmasked address on page-fault") was introduced to fix an old bug, in which the offset in the address of a page-fault was masked. Concerns were raised - although were never backed by actual code - that some userspace code might break because the bug has been around for quite a while. To address these concerns a new flag was introduced, and only when this flag is set by the user, userfaultfd provides the exact address of the page-fault. The commit however had a bug, and if the flag is unset, the offset was always masked based on a base-page granularity. Yet, for huge-pages, the behavior prior to the commit was that the address is masked to the huge-page granulrity. While there are no reports on real breakage, fix this issue. If the flag is unset, use the address with the masking that was done before. Link: https://lkml.kernel.org/r/20220711165906.2682-1-namit@vmware.com Fixes: 824ddc601adc ("userfaultfd: provide unmasked address on page-fault") Signed-off-by: Nadav Amit <namit@vmware.com> Reported-by: James Houghton <jthoughton@google.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: James Houghton <jthoughton@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-26	tls: rx: do not use the standard strparser	Jakub Kicinski	5	-69/+558
	TLS is a relatively poor fit for strparser. We pause the input every time a message is received, wait for a read which will decrypt the message, start the parser, repeat. strparser is built to delineate the messages, wrap them in individual skbs and let them float off into the stack or a different socket. TLS wants the data pages and nothing else. There's no need for TLS to keep cloning (and occasionally skb_unclone()'ing) the TCP rx queue. This patch uses a pre-allocated skb and attaches the skbs from the TCP rx queue to it as frags. TLS is careful never to modify the input skb without CoW'ing / detaching it first. Since we call TCP rx queue cleanup directly we also get back the benefit of skb deferred free. Overall this results in a 6% gain in my benchmarks. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26	tls: rx: device: add input CoW helper	Jakub Kicinski	3	-10/+21
	Wrap the remaining skb_cow_data() into a helper, so it's easier to replace down the lane. The new version will change the skb so make sure relevant pointers get reloaded after the call. Signed-off-by: Jakub Kicinski <kuba@kernel.org>