| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Li Li reports that casting away callback type may cause issues
for CFI. Let's generate a small wrapper for each callback,
to make sure compiler sees the anticipated types.
Reported-by: Li Li <dualli@chromium.org>
Link: https://lore.kernel.org/CANBPYPjQVqmzZ4J=rVQX87a9iuwmaetULwbK_5_3YWk2eGzkaA@mail.gmail.com
Fixes: 170aafe35cb9 ("netdev: support binding dma-buf to netdevice")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250115161436.648646-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When __netpoll_setup() is called directly, instead of through
netpoll_setup(), the np->skb_pool list head isn't initialized.
If skb_pool_flush() is later called, then we hit a NULL pointer
in skb_queue_purge_reason(). This can be seen with this repro,
when CONFIG_NETCONSOLE is enabled as a module:
ip tuntap add mode tap tap0
ip link add name br0 type bridge
ip link set dev tap0 master br0
modprobe netconsole netconsole=4444@10.0.0.1/br0,9353@10.0.0.2/
rmmod netconsole
The backtrace is:
BUG: kernel NULL pointer dereference, address: 0000000000000008
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
... ... ...
Call Trace:
<TASK>
__netpoll_free+0xa5/0xf0
br_netpoll_cleanup+0x43/0x50 [bridge]
do_netpoll_cleanup+0x43/0xc0
netconsole_netdev_event+0x1e3/0x300 [netconsole]
unregister_netdevice_notifier+0xd9/0x150
cleanup_module+0x45/0x920 [netconsole]
__se_sys_delete_module+0x205/0x290
do_syscall_64+0x70/0x150
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Move the skb_pool list setup and initial skb fill into __netpoll_setup().
Fixes: 221a9c1df790 ("net: netpoll: Individualize the skb pool")
Signed-off-by: John Sperbeck <jsperbeck@google.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250114011354.2096812-1-jsperbeck@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The wake-up condition currently implemented by mptcp_epollin_ready()
is wrong, as it could mark the MPTCP socket as readable even when
no data are present and the system is under memory pressure.
Explicitly check for some data being available in the receive queue.
Fixes: 5684ab1a0eff ("mptcp: give rcvlowat some love")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250113-net-mptcp-connect-st-flakes-v1-2-0d986ee7b1b6@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
mptcp_cleanup_rbuf() is responsible to send acks when the user-space
reads enough data to update the receive windows significantly.
It tries hard to avoid acquiring the subflow sockets locks by checking
conditions similar to the ones implemented at the TCP level.
To avoid too much code duplication - the MPTCP protocol can't reuse the
TCP helpers as part of the relevant status is maintained into the msk
socket - and multiple costly window size computation, mptcp_cleanup_rbuf
uses a rough estimate for the most recently advertised window size:
the MPTCP receive free space, as recorded as at last-ack time.
Unfortunately the above does not allow mptcp_cleanup_rbuf() to detect
a zero to non-zero win change in some corner cases, skipping the
tcp_cleanup_rbuf call and leaving the peer stuck.
After commit ea66758c1795 ("tcp: allow MPTCP to update the announced
window"), MPTCP has actually cheap access to the announced window value.
Use it in mptcp_cleanup_rbuf() for a more accurate ack generation.
Fixes: e3859603ba13 ("mptcp: better msk receive window updates")
Cc: stable@vger.kernel.org
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/20250107131845.5e5de3c5@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250113-net-mptcp-connect-st-flakes-v1-1-0d986ee7b1b6@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Recent reports have shown how we sometimes call vsock_*_has_data()
when a vsock socket has been de-assigned from a transport (see attached
links), but we shouldn't.
Previous commits should have solved the real problems, but we may have
more in the future, so to avoid null-ptr-deref, we can return 0
(no space, no data available) but with a warning.
This way the code should continue to run in a nearly consistent state
and have a warning that allows us to debug future problems.
Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/netdev/Z2K%2FI4nlHdfMRTZC@v4bel-B760M-AORUS-ELITE-AX/
Link: https://lore.kernel.org/netdev/5ca20d4c-1017-49c2-9516-f6f75fd331e9@rbox.co/
Link: https://lore.kernel.org/netdev/677f84a8.050a0220.25a300.01b3.GAE@google.com/
Co-developed-by: Hyunwoo Kim <v4bel@theori.io>
Signed-off-by: Hyunwoo Kim <v4bel@theori.io>
Co-developed-by: Wongi Lee <qwerty@theori.io>
Signed-off-by: Wongi Lee <qwerty@theori.io>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Reviewed-by: Hyunwoo Kim <v4bel@theori.io>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Transport's release() and destruct() are called when de-assigning the
vsock transport. These callbacks can touch some socket state like
sock flags, sk_state, and peer_shutdown.
Since we are reassigning the socket to a new transport during
vsock_connect(), let's reset these fields to have a clean state with
the new transport.
Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
Cc: stable@vger.kernel.org
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
During virtio_transport_release() we can schedule a delayed work to
perform the closing of the socket before destruction.
The destructor is called either when the socket is really destroyed
(reference counter to zero), or it can also be called when we are
de-assigning the transport.
In the former case, we are sure the delayed work has completed, because
it holds a reference until it completes, so the destructor will
definitely be called after the delayed work is finished.
But in the latter case, the destructor is called by AF_VSOCK core, just
after the release(), so there may still be delayed work scheduled.
Refactor the code, moving the code to delete the close work already in
the do_close() to a new function. Invoke it during destruction to make
sure we don't leave any pending work.
Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
Cc: stable@vger.kernel.org
Reported-by: Hyunwoo Kim <v4bel@theori.io>
Closes: https://lore.kernel.org/netdev/Z37Sh+utS+iV3+eb@v4bel-B760M-AORUS-ELITE-AX/
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Tested-by: Hyunwoo Kim <v4bel@theori.io>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some of the core functions can only be called if the transport
has been assigned.
As Michal reported, a socket might have the transport at NULL,
for example after a failed connect(), causing the following trace:
BUG: kernel NULL pointer dereference, address: 00000000000000a0
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 12faf8067 P4D 12faf8067 PUD 113670067 PMD 0
Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 15 UID: 0 PID: 1198 Comm: a.out Not tainted 6.13.0-rc2+
RIP: 0010:vsock_connectible_has_data+0x1f/0x40
Call Trace:
vsock_bpf_recvmsg+0xca/0x5e0
sock_recvmsg+0xb9/0xc0
__sys_recvfrom+0xb3/0x130
__x64_sys_recvfrom+0x20/0x30
do_syscall_64+0x93/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
So we need to check the `vsk->transport` in vsock_bpf_recvmsg(),
especially for connected sockets (stream/seqpacket) as we already
do in __vsock_connectible_recvmsg().
Fixes: 634f1a7110b4 ("vsock: support sockmap")
Cc: stable@vger.kernel.org
Reported-by: Michal Luczaj <mhal@rbox.co>
Closes: https://lore.kernel.org/netdev/5ca20d4c-1017-49c2-9516-f6f75fd331e9@rbox.co/
Tested-by: Michal Luczaj <mhal@rbox.co>
Reported-by: syzbot+3affdbfc986ecd9200fd@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/677f84a8.050a0220.25a300.01b3.GAE@google.com/
Tested-by: syzbot+3affdbfc986ecd9200fd@syzkaller.appspotmail.com
Reviewed-by: Hyunwoo Kim <v4bel@theori.io>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the socket has been de-assigned or assigned to another transport,
we must discard any packets received because they are not expected
and would cause issues when we access vsk->transport.
A possible scenario is described by Hyunwoo Kim in the attached link,
where after a first connect() interrupted by a signal, and a second
connect() failed, we can find `vsk->transport` at NULL, leading to a
NULL pointer dereference.
Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
Cc: stable@vger.kernel.org
Reported-by: Hyunwoo Kim <v4bel@theori.io>
Reported-by: Wongi Lee <qwerty@theori.io>
Closes: https://lore.kernel.org/netdev/Z2LvdTTQR7dBmPb5@v4bel-B760M-AORUS-ELITE-AX/
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Hyunwoo Kim <v4bel@theori.io>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As discussed in [0], rehash4 could be missed in udp_lib_rehash() when
udp hash4 changes while hash2 doesn't change. This patch fixes this by
moving rehash4 codes out of rehash2 checking, and then rehash2 and
rehash4 are done separately.
By doing this, we no longer need to call rehash4 explicitly in
udp_lib_hash4(), as the rehash callback in __ip4_datagram_connect takes
it. Thus, now udp_lib_hash4() returns directly if the sk is already
hashed.
Note that uhash4 may fail to work under consecutive connect(<dst
address>) calls because rehash() is not called with every connect(). To
overcome this, connect(<AF_UNSPEC>) needs to be called after the next
connect to a new destination.
[0]
https://lore.kernel.org/all/4761e466ab9f7542c68cdc95f248987d127044d2.1733499715.git.pabeni@redhat.com/
Fixes: 78c91ae2c6de ("ipv4/udp: Add 4-tuple hash for connected socket")
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
Link: https://patch.msgid.link/20250110010810.107145-1-lulie@linux.alibaba.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Obtaining RTNL lock in a response handler is not allowed since it runs
in an atomic softirq context. Postpone setting the MAC address by adding
a dedicated step to the configuration FSM.
Fixes: 790071347a0a ("net/ncsi: change from ndo_set_mac_address to dev_set_mac_address")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/20241129-potin-revert-ncsi-set-mac-addr-v1-1-94ea2cb596af@gmail.com
Signed-off-by: Paul Fertser <fercerpav@gmail.com>
Tested-by: Potin Lai <potin.lai.pt@gmail.com>
Link: https://patch.msgid.link/20250109145054.30925-1-fercerpav@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Passing a sufficient amount of imix entries leads to invalid access to the
pkt_dev->imix_entries array because of the incorrect boundary check.
UBSAN: array-index-out-of-bounds in net/core/pktgen.c:874:24
index 20 is out of range for type 'imix_pkt [20]'
CPU: 2 PID: 1210 Comm: bash Not tainted 6.10.0-rc1 #121
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Call Trace:
<TASK>
dump_stack_lvl lib/dump_stack.c:117
__ubsan_handle_out_of_bounds lib/ubsan.c:429
get_imix_entries net/core/pktgen.c:874
pktgen_if_write net/core/pktgen.c:1063
pde_write fs/proc/inode.c:334
proc_reg_write fs/proc/inode.c:346
vfs_write fs/read_write.c:593
ksys_write fs/read_write.c:644
do_syscall_64 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe arch/x86/entry/entry_64.S:130
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 52a62f8603f9 ("pktgen: Parse internet mix (imix) input")
Signed-off-by: Artem Chernyshev <artem.chernyshev@red-soft.ru>
[ fp: allow to fill the array completely; minor changelog cleanup ]
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit in a fixes tag attempted to fix the issue in the following
sequence of calls:
do_output
-> ovs_vport_send
-> dev_queue_xmit
-> __dev_queue_xmit
-> netdev_core_pick_tx
-> skb_tx_hash
When device is unregistering, the 'dev->real_num_tx_queues' goes to
zero and the 'while (unlikely(hash >= qcount))' loop inside the
'skb_tx_hash' becomes infinite, locking up the core forever.
But unfortunately, checking just the carrier status is not enough to
fix the issue, because some devices may still be in unregistering
state while reporting carrier status OK.
One example of such device is a net/dummy. It sets carrier ON
on start, but it doesn't implement .ndo_stop to set the carrier off.
And it makes sense, because dummy doesn't really have a carrier.
Therefore, while this device is unregistering, it's still easy to hit
the infinite loop in the skb_tx_hash() from the OVS datapath. There
might be other drivers that do the same, but dummy by itself is
important for the OVS ecosystem, because it is frequently used as a
packet sink for tcpdump while debugging OVS deployments. And when the
issue is hit, the only way to recover is to reboot.
Fix that by also checking if the device is running. The running
state is handled by the net core during unregistering, so it covers
unregistering case better, and we don't really need to send packets
to devices that are not running anyway.
While only checking the running state might be enough, the carrier
check is preserved. The running and the carrier states seem disjoined
throughout the code and different drivers. And other core functions
like __dev_direct_xmit() check both before attempting to transmit
a packet. So, it seems safer to check both flags in OVS as well.
Fixes: 066b86787fa3 ("net: openvswitch: fix race on port output")
Reported-by: Friedrich Weber <f.weber@proxmox.com>
Closes: https://mail.openvswitch.org/pipermail/ovs-discuss/2025-January/053423.html
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Tested-by: Friedrich Weber <f.weber@proxmox.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Link: https://patch.msgid.link/20250109122225.4034688-1-i.maximets@ovn.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 86e25f40aa1e ("net: napi: Add napi_config") moved napi->napi_id
assignment to a later point in time (napi_hash_add_with_id). This breaks
__xdp_rxq_info_reg which copies napi_id at an earlier time and now
stores 0 napi_id. It also makes sk_mark_napi_id_once_xdp and
__sk_mark_napi_id_once useless because they now work against 0 napi_id.
Since sk_busy_loop requires valid napi_id to busy-poll on, there is no way
to busy-poll AF_XDP sockets anymore.
Bring back the ability to busy-poll on XSK by resolving socket's napi_id
at bind time. This relies on relatively recent netif_queue_set_napi,
but (assume) at this point most popular drivers should have been converted.
This also removes per-tx/rx cycles which used to check and/or set
the napi_id value.
Confirmed by running a busy-polling AF_XDP socket
(github.com/fomichev/xskrtt) on mlx5 and looking at BusyPollRxPackets
from /proc/net/netstat.
Fixes: 86e25f40aa1e ("net: napi: Add napi_config")
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250109003436.2829560-1-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As pointed out in the original comment, lookup in sockmap can return a TCP
ESTABLISHED socket. Such TCP socket may have had SO_ATTACH_REUSEPORT_EBPF
set before it was ESTABLISHED. In other words, a non-NULL sk_reuseport_cb
does not imply a non-refcounted socket.
Drop sk's reference in both error paths.
unreferenced object 0xffff888101911800 (size 2048):
comm "test_progs", pid 44109, jiffies 4297131437
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
80 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace (crc 9336483b):
__kmalloc_noprof+0x3bf/0x560
__reuseport_alloc+0x1d/0x40
reuseport_alloc+0xca/0x150
reuseport_attach_prog+0x87/0x140
sk_reuseport_attach_bpf+0xc8/0x100
sk_setsockopt+0x1181/0x1990
do_sock_setsockopt+0x12b/0x160
__sys_setsockopt+0x7b/0xc0
__x64_sys_setsockopt+0x1b/0x30
do_syscall_64+0x93/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Fixes: 64d85290d79c ("bpf: Allow bpf_map_lookup_elem for SOCKMAP and SOCKHASH")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250110-reuseport-memleak-v1-1-fa1ddab0adfe@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When jumping to 'martian_destination' a drop reason is always set but
that label falls-through the 'e_nobufs' one, overriding the value.
The behavior was introduced by the mentioned commit. The logic went
from,
goto martian_destination;
...
martian_destination:
...
e_inval:
err = -EINVAL;
goto out;
e_nobufs:
err = -ENOBUFS;
goto out;
to,
reason = ...;
goto martian_destination;
...
martian_destination:
...
e_nobufs:
reason = SKB_DROP_REASON_NOMEM;
goto out;
A 'goto out' is clearly missing now after 'martian_destination' to avoid
overriding the drop reason.
Fixes: 5b92112acd8e ("net: ip: make ip_route_input_slow() return drop reasons")
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Cc: Menglong Dong <menglong8.dong@gmail.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20250108165725.404564-1-atenart@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Fix imbalance between flowtable BIND and UNBIND calls to configure
hardware offload, this fixes a possible kmemleak.
2) Clamp maximum conntrack hashtable size to INT_MAX to fix a possible
WARN_ON_ONCE splat coming from kvmalloc_array(), only possible from
init_netns.
* tag 'nf-25-01-09' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: conntrack: clamp maximum hashtable size to INT_MAX
netfilter: nf_tables: imbalance in flowtable binding
====================
Link: https://patch.msgid.link/20250109123532.41768-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Use INT_MAX as maximum size for the conntrack hashtable. Otherwise, it
is possible to hit WARN_ON_ONCE in __kvmalloc_node_noprof() when
resizing hashtable because __GFP_NOWARN is unset. See:
0708a0afe291 ("mm: Consider __GFP_NOWARN flag for oversized kvmalloc() calls")
Note: hashtable resize is only possible from init_netns.
Fixes: 9cc1c73ad666 ("netfilter: conntrack: avoid integer overflow when resizing")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
All these cases cause imbalance between BIND and UNBIND calls:
- Delete an interface from a flowtable with multiple interfaces
- Add a (device to a) flowtable with --check flag
- Delete a netns containing a flowtable
- In an interactive nft session, create a table with owner flag and
flowtable inside, then quit.
Fix it by calling FLOW_BLOCK_UNBIND when unregistering hooks, then
remove late FLOW_BLOCK_UNBIND call when destroying flowtable.
Fixes: ff4bf2f42a40 ("netfilter: nf_tables: add nft_unregister_flowtable_hook()")
Reported-by: Phil Sutter <phil@nwl.cc>
Tested-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The per-netns structure can be obtained from the table->data using
container_of(), then the 'net' one can be retrieved from the listen
socket (if available).
Fixes: c6a58ffed536 ("RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1]
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-9-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The 'net' structure can be obtained from the table->data using
container_of().
Note that table->data could also be used directly, as this is the only
member needed from the 'net' structure, but that would increase the size
of this fix, to use '*data' everywhere 'net->sctp.probe_interval' is
used.
Fixes: d1e462a7a5f3 ("sctp: add probe_interval in sysctl and sock/asoc/transport")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1]
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-8-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The 'net' structure can be obtained from the table->data using
container_of().
Note that table->data could also be used directly, but that would
increase the size of this fix, while 'sctp.ctl_sock' still needs to be
retrieved from 'net' structure.
Fixes: 046c052b475e ("sctp: enable udp tunneling socks")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1]
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-7-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The 'net' structure can be obtained from the table->data using
container_of().
Note that table->data could also be used directly, but that would
increase the size of this fix, while 'sctp.ctl_sock' still needs to be
retrieved from 'net' structure.
Fixes: b14878ccb7fa ("net: sctp: cache auth_enable per endpoint")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1]
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-6-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The 'net' structure can be obtained from the table->data using
container_of().
Note that table->data could also be used directly, as this is the only
member needed from the 'net' structure, but that would increase the size
of this fix, to use '*data' everywhere 'net->sctp.rto_min/max' is used.
Fixes: 4f3fdf3bc59c ("sctp: add check rto_min and rto_max in sysctl")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1]
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-5-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The 'net' structure can be obtained from the table->data using
container_of().
Note that table->data could also be used directly, as this is the only
member needed from the 'net' structure, but that would increase the size
of this fix, to use '*data' everywhere 'net->sctp.sctp_hmac_alg' is
used.
Fixes: 3c68198e7511 ("sctp: Make hmac algorithm selection for cookie generation dynamic")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1]
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-4-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
As mentioned in the previous commit, using the 'net' structure via
'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The 'pernet' structure can be obtained from the table->data using
container_of().
Fixes: 27069e7cb3d1 ("mptcp: disable active MPTCP in case of blackhole")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1]
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-3-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Using the 'net' structure via 'current' is not recommended for different
reasons.
First, if the goal is to use it to read or write per-netns data, this is
inconsistent with how the "generic" sysctl entries are doing: directly
by only using pointers set to the table entry, e.g. table->data. Linked
to that, the per-netns data should always be obtained from the table
linked to the netns it had been created for, which may not coincide with
the reader's or writer's netns.
Another reason is that access to current->nsproxy->netns can oops if
attempted when current->nsproxy had been dropped when the current task
is exiting. This is what syzbot found, when using acct(2):
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
CPU: 1 UID: 0 PID: 5924 Comm: syz-executor Not tainted 6.13.0-rc5-syzkaller-00004-gccb98ccef0e5 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125
Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00
RSP: 0018:ffffc900034774e8 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620
RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028
RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040
R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000
R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000
FS: 0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
proc_sys_call_handler+0x403/0x5d0 fs/proc/proc_sysctl.c:601
__kernel_write_iter+0x318/0xa80 fs/read_write.c:612
__kernel_write+0xf6/0x140 fs/read_write.c:632
do_acct_process+0xcb0/0x14a0 kernel/acct.c:539
acct_pin_kill+0x2d/0x100 kernel/acct.c:192
pin_kill+0x194/0x7c0 fs/fs_pin.c:44
mnt_pin_kill+0x61/0x1e0 fs/fs_pin.c:81
cleanup_mnt+0x3ac/0x450 fs/namespace.c:1366
task_work_run+0x14e/0x250 kernel/task_work.c:239
exit_task_work include/linux/task_work.h:43 [inline]
do_exit+0xad8/0x2d70 kernel/exit.c:938
do_group_exit+0xd3/0x2a0 kernel/exit.c:1087
get_signal+0x2576/0x2610 kernel/signal.c:3017
arch_do_signal_or_restart+0x90/0x7e0 arch/x86/kernel/signal.c:337
exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline]
__syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
syscall_exit_to_user_mode+0x150/0x2a0 kernel/entry/common.c:218
do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fee3cb87a6a
Code: Unable to access opcode bytes at 0x7fee3cb87a40.
RSP: 002b:00007fffcccac688 EFLAGS: 00000202 ORIG_RAX: 0000000000000037
RAX: 0000000000000000 RBX: 00007fffcccac710 RCX: 00007fee3cb87a6a
RDX: 0000000000000041 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 0000000000000003 R08: 00007fffcccac6ac R09: 00007fffcccacac7
R10: 00007fffcccac710 R11: 0000000000000202 R12: 00007fee3cd49500
R13: 00007fffcccac6ac R14: 0000000000000000 R15: 00007fee3cd4b000
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125
Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00
RSP: 0018:ffffc900034774e8 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620
RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028
RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040
R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000
R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000
FS: 0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
----------------
Code disassembly (best guess), 1 bytes skipped:
0: 42 80 3c 38 00 cmpb $0x0,(%rax,%r15,1)
5: 0f 85 fe 02 00 00 jne 0x309
b: 4d 8b a4 24 08 09 00 mov 0x908(%r12),%r12
12: 00
13: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax
1a: fc ff df
1d: 49 8d 7c 24 28 lea 0x28(%r12),%rdi
22: 48 89 fa mov %rdi,%rdx
25: 48 c1 ea 03 shr $0x3,%rdx
* 29: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) <-- trapping instruction
2d: 0f 85 cc 02 00 00 jne 0x2ff
33: 4d 8b 7c 24 28 mov 0x28(%r12),%r15
38: 48 rex.W
39: 8d .byte 0x8d
3a: 84 24 c8 test %ah,(%rax,%rcx,8)
Here with 'net.mptcp.scheduler', the 'net' structure is not really
needed, because the table->data already has a pointer to the current
scheduler, the only thing needed from the per-netns data.
Simply use 'data', instead of getting (most of the time) the same thing,
but from a longer and indirect way.
Fixes: 6963c508fd7a ("mptcp: only allow set existing scheduler for net.mptcp.scheduler")
Cc: stable@vger.kernel.org
Reported-by: syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-2-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
'net.mptcp.available_schedulers' sysctl knob is there to list available
schedulers, not to modify this list.
There are then no reasons to give write access to it.
Nothing would have been written anyway, but no errors would have been
returned, which is unexpected.
Fixes: 73c900aa3660 ("mptcp: add net.mptcp.available_schedulers")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-1-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Even though we fixed a logic error in the commit cited below, syzbot
still managed to trigger an underflow of the per-host bulk flow
counters, leading to an out of bounds memory access.
To avoid any such logic errors causing out of bounds memory accesses,
this commit factors out all accesses to the per-host bulk flow counters
to a series of helpers that perform bounds-checking before any
increments and decrements. This also has the benefit of improving
readability by moving the conditional checks for the flow mode into
these helpers, instead of having them spread out throughout the
code (which was the cause of the original logic error).
As part of this change, the flow quantum calculation is consolidated
into a helper function, which means that the dithering applied to the
ost load scaling is now applied both in the DRR rotation and when a
sparse flow's quantum is first initiated. The only user-visible effect
of this is that the maximum packet size that can be sent while a flow
stays sparse will now vary with +/- one byte in some cases. This should
not make a noticeable difference in practice, and thus it's not worth
complicating the code to preserve the old behaviour.
Fixes: 546ea84d07e3 ("sched: sch_cake: fix bulk flow accounting logic for host fairness")
Reported-by: syzbot+f63600d288bfb7057424@syzkaller.appspotmail.com
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Link: https://patch.msgid.link/20250107120105.70685-1-toke@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- btmtk: Fix failed to send func ctrl for MediaTek devices.
- hci_sync: Fix not setting Random Address when required
- MGMT: Fix Add Device to responding before completing
- btnxpuart: Fix driver sending truncated data
* tag 'for-net-2025-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: btmtk: Fix failed to send func ctrl for MediaTek devices.
Bluetooth: btnxpuart: Fix driver sending truncated data
Bluetooth: MGMT: Fix Add Device to responding before completing
Bluetooth: hci_sync: Fix not setting Random Address when required
====================
Link: https://patch.msgid.link/20250108162627.1623760-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Use usb_autopm_get_interface() and usb_autopm_put_interface()
in btmtk_usb_shutdown(), it could send func ctrl after enabling
autosuspend.
Bluetooth: btmtk_usb_hci_wmt_sync() hci0: Execution of wmt command
timed out
Bluetooth: btmtk_usb_shutdown() hci0: Failed to send wmt func ctrl
(-110)
Fixes: 5c5e8c52e3ca ("Bluetooth: btmtk: move btusb_mtk_[setup, shutdown] to btmtk.c")
Signed-off-by: Chris Lu <chris.lu@mediatek.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add Device with LE type requires updating resolving/accept list which
requires quite a number of commands to complete and each of them may
fail, so instead of pretending it would always work this checks the
return of hci_update_passive_scan_sync which indicates if everything
worked as intended.
Fixes: e8907f76544f ("Bluetooth: hci_sync: Make use of hci_cmd_sync_queue set 3")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This fixes errors such as the following when Own address type is set to
Random Address but it has not been programmed yet due to either be
advertising or connecting:
< HCI Command: LE Set Exte.. (0x08|0x0041) plen 13
Own address type: Random (0x03)
Filter policy: Ignore not in accept list (0x01)
PHYs: 0x05
Entry 0: LE 1M
Type: Passive (0x00)
Interval: 60.000 msec (0x0060)
Window: 30.000 msec (0x0030)
Entry 1: LE Coded
Type: Passive (0x00)
Interval: 180.000 msec (0x0120)
Window: 90.000 msec (0x0090)
> HCI Event: Command Complete (0x0e) plen 4
LE Set Extended Scan Parameters (0x08|0x0041) ncmd 1
Status: Success (0x00)
< HCI Command: LE Set Exten.. (0x08|0x0042) plen 6
Extended scan: Enabled (0x01)
Filter duplicates: Enabled (0x01)
Duration: 0 msec (0x0000)
Period: 0.00 sec (0x0000)
> HCI Event: Command Complete (0x0e) plen 4
LE Set Extended Scan Enable (0x08|0x0042) ncmd 1
Status: Invalid HCI Command Parameters (0x12)
Fixes: c45074d68a9b ("Bluetooth: Fix not generating RPA when required")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This is a follow-up to 3c5b4d69c358 ("net: annotate data-races around
sk->sk_mark"). sk->sk_mark can be read and written without holding
the socket lock. IPv6 equivalent is already covered with READ_ONCE()
annotation in tcp_v6_send_response().
Fixes: 3c5b4d69c358 ("net: annotate data-races around sk->sk_mark")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/f459d1fc44f205e13f6d8bdca2c8bfb9902ffac9.1736244569.git.daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|/ /
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The NAPI IDs were not fully exposed to user space prior to the netlink
API, so they were never namespaced. The netlink API must ensure that
at the very least NAPI instance belongs to the same netns as the owner
of the genl sock.
napi_by_id() can become static now, but it needs to move because of
dev_get_by_napi_id().
Cc: stable@vger.kernel.org
Fixes: 1287c1ae0fc2 ("netdev-genl: Support setting per-NAPI config values")
Fixes: 27f91aaf49b3 ("netdev-genl: Add netlink framework functions for napi")
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250106180137.1861472-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
syzbot presented an use-after-free report [0] regarding ipvlan and
linkwatch.
ipvlan does not hold a refcnt of the lower device unlike vlan and
macvlan.
If the linkwatch work is triggered for the ipvlan dev, the lower dev
might have already been freed, resulting in UAF of ipvlan->phy_dev in
ipvlan_get_iflink().
We can delay the lower dev unregistration like vlan and macvlan by
holding the lower dev's refcnt in dev->netdev_ops->ndo_init() and
releasing it in dev->priv_destructor().
Jakub pointed out calling .ndo_XXX after unregister_netdevice() has
returned is error prone and suggested [1] addressing this UAF in the
core by taking commit 750e51603395 ("net: avoid potential UAF in
default_operstate()") further.
Let's assume unregistering devices DOWN and use RCU protection in
default_operstate() not to race with the device unregistration.
[0]:
BUG: KASAN: slab-use-after-free in ipvlan_get_iflink+0x84/0x88 drivers/net/ipvlan/ipvlan_main.c:353
Read of size 4 at addr ffff0000d768c0e0 by task kworker/u8:35/6944
CPU: 0 UID: 0 PID: 6944 Comm: kworker/u8:35 Not tainted 6.13.0-rc2-g9bc5c9515b48 #12 4c3cb9e8b4565456f6a355f312ff91f4f29b3c47
Hardware name: linux,dummy-virt (DT)
Workqueue: events_unbound linkwatch_event
Call trace:
show_stack+0x38/0x50 arch/arm64/kernel/stacktrace.c:484 (C)
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0xbc/0x108 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0x16c/0x6f0 mm/kasan/report.c:489
kasan_report+0xc0/0x120 mm/kasan/report.c:602
__asan_report_load4_noabort+0x20/0x30 mm/kasan/report_generic.c:380
ipvlan_get_iflink+0x84/0x88 drivers/net/ipvlan/ipvlan_main.c:353
dev_get_iflink+0x7c/0xd8 net/core/dev.c:674
default_operstate net/core/link_watch.c:45 [inline]
rfc2863_policy+0x144/0x360 net/core/link_watch.c:72
linkwatch_do_dev+0x60/0x228 net/core/link_watch.c:175
__linkwatch_run_queue+0x2f4/0x5b8 net/core/link_watch.c:239
linkwatch_event+0x64/0xa8 net/core/link_watch.c:282
process_one_work+0x700/0x1398 kernel/workqueue.c:3229
process_scheduled_works kernel/workqueue.c:3310 [inline]
worker_thread+0x8c4/0xe10 kernel/workqueue.c:3391
kthread+0x2b0/0x360 kernel/kthread.c:389
ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:862
Allocated by task 9303:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x30/0x68 mm/kasan/common.c:68
kasan_save_alloc_info+0x44/0x58 mm/kasan/generic.c:568
poison_kmalloc_redzone mm/kasan/common.c:377 [inline]
__kasan_kmalloc+0x84/0xa0 mm/kasan/common.c:394
kasan_kmalloc include/linux/kasan.h:260 [inline]
__do_kmalloc_node mm/slub.c:4283 [inline]
__kmalloc_node_noprof+0x2a0/0x560 mm/slub.c:4289
__kvmalloc_node_noprof+0x9c/0x230 mm/util.c:650
alloc_netdev_mqs+0xb4/0x1118 net/core/dev.c:11209
rtnl_create_link+0x2b8/0xb60 net/core/rtnetlink.c:3595
rtnl_newlink_create+0x19c/0x868 net/core/rtnetlink.c:3771
__rtnl_newlink net/core/rtnetlink.c:3896 [inline]
rtnl_newlink+0x122c/0x15c0 net/core/rtnetlink.c:4011
rtnetlink_rcv_msg+0x61c/0x918 net/core/rtnetlink.c:6901
netlink_rcv_skb+0x1dc/0x398 net/netlink/af_netlink.c:2542
rtnetlink_rcv+0x34/0x50 net/core/rtnetlink.c:6928
netlink_unicast_kernel net/netlink/af_netlink.c:1321 [inline]
netlink_unicast+0x618/0x838 net/netlink/af_netlink.c:1347
netlink_sendmsg+0x5fc/0x8b0 net/netlink/af_netlink.c:1891
sock_sendmsg_nosec net/socket.c:711 [inline]
__sock_sendmsg net/socket.c:726 [inline]
__sys_sendto+0x2ec/0x438 net/socket.c:2197
__do_sys_sendto net/socket.c:2204 [inline]
__se_sys_sendto net/socket.c:2200 [inline]
__arm64_sys_sendto+0xe4/0x110 net/socket.c:2200
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0x90/0x278 arch/arm64/kernel/syscall.c:49
el0_svc_common+0x13c/0x250 arch/arm64/kernel/syscall.c:132
do_el0_svc+0x54/0x70 arch/arm64/kernel/syscall.c:151
el0_svc+0x4c/0xa8 arch/arm64/kernel/entry-common.c:744
el0t_64_sync_handler+0x78/0x108 arch/arm64/kernel/entry-common.c:762
el0t_64_sync+0x198/0x1a0 arch/arm64/kernel/entry.S:600
Freed by task 10200:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x30/0x68 mm/kasan/common.c:68
kasan_save_free_info+0x58/0x70 mm/kasan/generic.c:582
poison_slab_object mm/kasan/common.c:247 [inline]
__kasan_slab_free+0x48/0x68 mm/kasan/common.c:264
kasan_slab_free include/linux/kasan.h:233 [inline]
slab_free_hook mm/slub.c:2338 [inline]
slab_free mm/slub.c:4598 [inline]
kfree+0x140/0x420 mm/slub.c:4746
kvfree+0x4c/0x68 mm/util.c:693
netdev_release+0x94/0xc8 net/core/net-sysfs.c:2034
device_release+0x98/0x1c0
kobject_cleanup lib/kobject.c:689 [inline]
kobject_release lib/kobject.c:720 [inline]
kref_put include/linux/kref.h:65 [inline]
kobject_put+0x2b0/0x438 lib/kobject.c:737
netdev_run_todo+0xdd8/0xf48 net/core/dev.c:10924
rtnl_unlock net/core/rtnetlink.c:152 [inline]
rtnl_net_unlock net/core/rtnetlink.c:209 [inline]
rtnl_dellink+0x484/0x680 net/core/rtnetlink.c:3526
rtnetlink_rcv_msg+0x61c/0x918 net/core/rtnetlink.c:6901
netlink_rcv_skb+0x1dc/0x398 net/netlink/af_netlink.c:2542
rtnetlink_rcv+0x34/0x50 net/core/rtnetlink.c:6928
netlink_unicast_kernel net/netlink/af_netlink.c:1321 [inline]
netlink_unicast+0x618/0x838 net/netlink/af_netlink.c:1347
netlink_sendmsg+0x5fc/0x8b0 net/netlink/af_netlink.c:1891
sock_sendmsg_nosec net/socket.c:711 [inline]
__sock_sendmsg net/socket.c:726 [inline]
____sys_sendmsg+0x410/0x708 net/socket.c:2583
___sys_sendmsg+0x178/0x1d8 net/socket.c:2637
__sys_sendmsg net/socket.c:2669 [inline]
__do_sys_sendmsg net/socket.c:2674 [inline]
__se_sys_sendmsg net/socket.c:2672 [inline]
__arm64_sys_sendmsg+0x12c/0x1c8 net/socket.c:2672
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0x90/0x278 arch/arm64/kernel/syscall.c:49
el0_svc_common+0x13c/0x250 arch/arm64/kernel/syscall.c:132
do_el0_svc+0x54/0x70 arch/arm64/kernel/syscall.c:151
el0_svc+0x4c/0xa8 arch/arm64/kernel/entry-common.c:744
el0t_64_sync_handler+0x78/0x108 arch/arm64/kernel/entry-common.c:762
el0t_64_sync+0x198/0x1a0 arch/arm64/kernel/entry.S:600
The buggy address belongs to the object at ffff0000d768c000
which belongs to the cache kmalloc-cg-4k of size 4096
The buggy address is located 224 bytes inside of
freed 4096-byte region [ffff0000d768c000, ffff0000d768d000)
The buggy address belongs to the physical page:
page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x117688
head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
memcg:ffff0000c77ef981
flags: 0xbfffe0000000040(head|node=0|zone=2|lastcpupid=0x1ffff)
page_type: f5(slab)
raw: 0bfffe0000000040 ffff0000c000f500 dead000000000100 dead000000000122
raw: 0000000000000000 0000000000040004 00000001f5000000 ffff0000c77ef981
head: 0bfffe0000000040 ffff0000c000f500 dead000000000100 dead000000000122
head: 0000000000000000 0000000000040004 00000001f5000000 ffff0000c77ef981
head: 0bfffe0000000003 fffffdffc35da201 ffffffffffffffff 0000000000000000
head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff0000d768bf80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff0000d768c000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff0000d768c080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff0000d768c100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff0000d768c180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Fixes: 8c55facecd7a ("net: linkwatch: only report IF_OPER_LOWERLAYERDOWN if iflink is actually down")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/netdev/20250102174400.085fd8ac@kernel.org/ [1]
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250106071911.64355-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We've noticed that NFS can hang when using RPC over TLS on an unstable
connection, and investigation shows that the RPC layer is stuck in a tight
loop attempting to transmit, but forever getting -EBADMSG back from the
underlying network. The loop begins when tcp_sendmsg_locked() returns
-EPIPE to tls_tx_records(), but that error is converted to -EBADMSG when
calling the socket's error reporting handler.
Instead of converting errors from tcp_sendmsg_locked(), let's pass them
along in this path. The RPC layer handles -EPIPE by reconnecting the
transport, which prevents the endless attempts to transmit on a broken
connection.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance")
Link: https://patch.msgid.link/9594185559881679d81f071b181a10eb07cd079f.1736004079.git.bcodding@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We use NAPI ID as the key for continuing dumps. We also depend
on the NAPIs being sorted by ID within the driver list. Tx NAPIs
(which don't have an ID assigned) break this expectation, it's
not currently possible to dump them reliably. Since Tx NAPIs
are relatively rare, and can't be used in doit (GET or SET)
hide them from the dump API as well.
Fixes: 27f91aaf49b3 ("netdev-genl: Add netlink framework functions for napi")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250103183207.1216004-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan
Stefan Schmidt says:
====================
pull-request: ieee802154 for net 2025-01-03
Keisuke Nishimura provided a fix to check for kfifo_alloc() in the ca8210
driver.
Lizhi Xu provided a fix a corrupted list, found by syzkaller, by checking local
interfaces first.
* tag 'ieee802154-for-net-2025-01-03' of git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan:
mac802154: check local interfaces before deleting sdata list
ieee802154: ca8210: Add missing check for kfifo_alloc() in ca8210_probe()
====================
Link: https://patch.msgid.link/20250103160046.469363-1-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
syzkaller reported a corrupted list in ieee802154_if_remove. [1]
Remove an IEEE 802.15.4 network interface after unregister an IEEE 802.15.4
hardware device from the system.
CPU0 CPU1
==== ====
genl_family_rcv_msg_doit ieee802154_unregister_hw
ieee802154_del_iface ieee802154_remove_interfaces
rdev_del_virtual_intf_deprecated list_del(&sdata->list)
ieee802154_if_remove
list_del_rcu
The net device has been unregistered, since the rcu grace period,
unregistration must be run before ieee802154_if_remove.
To avoid this issue, add a check for local->interfaces before deleting
sdata list.
[1]
kernel BUG at lib/list_debug.c:58!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
CPU: 0 UID: 0 PID: 6277 Comm: syz-executor157 Not tainted 6.12.0-rc6-syzkaller-00005-g557329bcecc2 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
RIP: 0010:__list_del_entry_valid_or_report+0xf4/0x140 lib/list_debug.c:56
Code: e8 a1 7e 00 07 90 0f 0b 48 c7 c7 e0 37 60 8c 4c 89 fe e8 8f 7e 00 07 90 0f 0b 48 c7 c7 40 38 60 8c 4c 89 fe e8 7d 7e 00 07 90 <0f> 0b 48 c7 c7 a0 38 60 8c 4c 89 fe e8 6b 7e 00 07 90 0f 0b 48 c7
RSP: 0018:ffffc9000490f3d0 EFLAGS: 00010246
RAX: 000000000000004e RBX: dead000000000122 RCX: d211eee56bb28d00
RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000
RBP: ffff88805b278dd8 R08: ffffffff8174a12c R09: 1ffffffff2852f0d
R10: dffffc0000000000 R11: fffffbfff2852f0e R12: dffffc0000000000
R13: dffffc0000000000 R14: dead000000000100 R15: ffff88805b278cc0
FS: 0000555572f94380(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000056262e4a3000 CR3: 0000000078496000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__list_del_entry_valid include/linux/list.h:124 [inline]
__list_del_entry include/linux/list.h:215 [inline]
list_del_rcu include/linux/rculist.h:157 [inline]
ieee802154_if_remove+0x86/0x1e0 net/mac802154/iface.c:687
rdev_del_virtual_intf_deprecated net/ieee802154/rdev-ops.h:24 [inline]
ieee802154_del_iface+0x2c0/0x5c0 net/ieee802154/nl-phy.c:323
genl_family_rcv_msg_doit net/netlink/genetlink.c:1115 [inline]
genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
genl_rcv_msg+0xb14/0xec0 net/netlink/genetlink.c:1210
netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2551
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline]
netlink_unicast+0x7f6/0x990 net/netlink/af_netlink.c:1357
netlink_sendmsg+0x8e4/0xcb0 net/netlink/af_netlink.c:1901
sock_sendmsg_nosec net/socket.c:729 [inline]
__sock_sendmsg+0x221/0x270 net/socket.c:744
____sys_sendmsg+0x52a/0x7e0 net/socket.c:2607
___sys_sendmsg net/socket.c:2661 [inline]
__sys_sendmsg+0x292/0x380 net/socket.c:2690
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Reported-and-tested-by: syzbot+985f827280dc3a6e7e92@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=985f827280dc3a6e7e92
Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com>
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/20241113095129.1457225-1-lizhi.xu@windriver.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
syzbot found that TCA_FLOW_RSHIFT attribute was not validated.
Right shitfing a 32bit integer is undefined for large shift values.
UBSAN: shift-out-of-bounds in net/sched/cls_flow.c:329:23
shift exponent 9445 is too large for 32-bit type 'u32' (aka 'unsigned int')
CPU: 1 UID: 0 PID: 54 Comm: kworker/u8:3 Not tainted 6.13.0-rc3-syzkaller-00180-g4f619d518db9 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
Workqueue: ipv6_addrconf addrconf_dad_work
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
ubsan_epilogue lib/ubsan.c:231 [inline]
__ubsan_handle_shift_out_of_bounds+0x3c8/0x420 lib/ubsan.c:468
flow_classify+0x24d5/0x25b0 net/sched/cls_flow.c:329
tc_classify include/net/tc_wrapper.h:197 [inline]
__tcf_classify net/sched/cls_api.c:1771 [inline]
tcf_classify+0x420/0x1160 net/sched/cls_api.c:1867
sfb_classify net/sched/sch_sfb.c:260 [inline]
sfb_enqueue+0x3ad/0x18b0 net/sched/sch_sfb.c:318
dev_qdisc_enqueue+0x4b/0x290 net/core/dev.c:3793
__dev_xmit_skb net/core/dev.c:3889 [inline]
__dev_queue_xmit+0xf0e/0x3f50 net/core/dev.c:4400
dev_queue_xmit include/linux/netdevice.h:3168 [inline]
neigh_hh_output include/net/neighbour.h:523 [inline]
neigh_output include/net/neighbour.h:537 [inline]
ip_finish_output2+0xd41/0x1390 net/ipv4/ip_output.c:236
iptunnel_xmit+0x55d/0x9b0 net/ipv4/ip_tunnel_core.c:82
udp_tunnel_xmit_skb+0x262/0x3b0 net/ipv4/udp_tunnel_core.c:173
geneve_xmit_skb drivers/net/geneve.c:916 [inline]
geneve_xmit+0x21dc/0x2d00 drivers/net/geneve.c:1039
__netdev_start_xmit include/linux/netdevice.h:5002 [inline]
netdev_start_xmit include/linux/netdevice.h:5011 [inline]
xmit_one net/core/dev.c:3590 [inline]
dev_hard_start_xmit+0x27a/0x7d0 net/core/dev.c:3606
__dev_queue_xmit+0x1b73/0x3f50 net/core/dev.c:4434
Fixes: e5dfb815181f ("[NET_SCHED]: Add flow classifier")
Reported-by: syzbot+1dbb57d994e54aaa04d2@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6777bf49.050a0220.178762.0040.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250103104546.3714168-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
802.2+LLC+SNAP frames received by napi_complete_done() with GRO and DSA
have skb->transport_header set two bytes short, or pointing 2 bytes
before network_header & skb->data. This was an issue as snap_rcv()
expected offset to point to SNAP header (OID:PID), causing packet to
be dropped.
A fix at llc_fixup_skb() (a024e377efed) resets transport_header for any
LLC consumers that may care about it, and stops SNAP packets from being
dropped, but doesn't fix the problem which is that LLC and SNAP should
not use transport_header offset.
Ths patch eliminates the use of transport_header offset for SNAP lookup
of OID:PID so that SNAP does not rely on the offset at all.
The offset is reset after pull for any SNAP packet consumers that may
(but shouldn't) use it.
Fixes: fda55eca5a33 ("net: introduce skb_transport_header_was_set()")
Signed-off-by: Antonio Pastor <antonio.pastor@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250103012303.746521-1-antonio.pastor@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from wireles and netfilter.
Nothing major here. Over the last two weeks we gathered only around
two-thirds of our normal weekly fix count, but delaying sending these
until -rc7 seemed like a really bad idea.
AFAIK we have no bugs under investigation. One or two reverts for
stuff for which we haven't gotten a proper fix will likely come in the
next PR.
Current release - fix to a fix:
- netfilter: nft_set_hash: unaligned atomic read on struct
nft_set_ext
- eth: gve: trigger RX NAPI instead of TX NAPI in gve_xsk_wakeup
Previous releases - regressions:
- net: reenable NETIF_F_IPV6_CSUM offload for BIG TCP packets
- mptcp:
- fix sleeping rcvmsg sleeping forever after bad recvbuffer adjust
- fix TCP options overflow
- prevent excessive coalescing on receive, fix throughput
- net: fix memory leak in tcp_conn_request() if map insertion fails
- wifi: cw1200: fix potential NULL dereference after conversion to
GPIO descriptors
- phy: micrel: dynamically control external clock of KSZ PHY, fix
suspend behavior
Previous releases - always broken:
- af_packet: fix VLAN handling with MSG_PEEK
- net: restrict SO_REUSEPORT to inet sockets
- netdev-genl: avoid empty messages in NAPI get
- dsa: microchip: fix set_ageing_time function on KSZ9477 and LAN937X
- eth:
- gve: XDP fixes around transmit, queue wakeup etc.
- ti: icssg-prueth: fix firmware load sequence to prevent time
jump which breaks timesync related operations
Misc:
- netlink: specs: mptcp: add missing attr and improve documentation"
* tag 'net-6.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits)
net: ti: icssg-prueth: Fix clearing of IEP_CMP_CFG registers during iep_init
net: ti: icssg-prueth: Fix firmware load sequence.
mptcp: prevent excessive coalescing on receive
mptcp: don't always assume copied data in mptcp_cleanup_rbuf()
mptcp: fix recvbuffer adjust on sleeping rcvmsg
ila: serialize calls to nf_register_net_hooks()
af_packet: fix vlan_get_protocol_dgram() vs MSG_PEEK
af_packet: fix vlan_get_tci() vs MSG_PEEK
net: wwan: iosm: Properly check for valid exec stage in ipc_mmio_init()
net: restrict SO_REUSEPORT to inet sockets
net: reenable NETIF_F_IPV6_CSUM offload for BIG TCP packets
net: sfc: Correct key_len for efx_tc_ct_zone_ht_params
net: wwan: t7xx: Fix FSM command timeout issue
sky2: Add device ID 11ab:4373 for Marvell 88E8075
mptcp: fix TCP options overflow.
net: mv643xx_eth: fix an OF node reference leak
gve: trigger RX NAPI instead of TX NAPI in gve_xsk_wakeup
eth: bcmsysport: fix call balance of priv->clk handling routines
net: llc: reset skb->transport_header
netlink: specs: mptcp: fix missing doc
...
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Currently the skb size after coalescing is only limited by the skb
layout (the skb must not carry frag_list). A single coalesced skb
covering several MSS can potentially fill completely the receive
buffer. In such a case, the snd win will zero until the receive buffer
will be empty again, affecting tput badly.
Fixes: 8268ed4c9d19 ("mptcp: introduce and use mptcp_try_coalesce()")
Cc: stable@vger.kernel.org # please delay 2 weeks after 6.13-final release
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241230-net-mptcp-rbuf-fixes-v1-3-8608af434ceb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Under some corner cases the MPTCP protocol can end-up invoking
mptcp_cleanup_rbuf() when no data has been copied, but such helper
assumes the opposite condition.
Explicitly drop such assumption and performs the costly call only
when strictly needed - before releasing the msk socket lock.
Fixes: fd8976790a6c ("mptcp: be careful on MPTCP-level ack.")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241230-net-mptcp-rbuf-fixes-v1-2-8608af434ceb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
If the recvmsg() blocks after receiving some data - i.e. due to
SO_RCVLOWAT - the MPTCP code will attempt multiple times to
adjust the receive buffer size, wrongly accounting every time the
cumulative of received data - instead of accounting only for the
delta.
Address the issue moving mptcp_rcv_space_adjust just after the
data reception and passing it only the just received bytes.
This also removes an unneeded difference between the TCP and MPTCP
RX code path implementation.
Fixes: 581302298524 ("mptcp: error out earlier on disconnect")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241230-net-mptcp-rbuf-fixes-v1-1-8608af434ceb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
syzbot found a race in ila_add_mapping() [1]
commit 031ae72825ce ("ila: call nf_unregister_net_hooks() sooner")
attempted to fix a similar issue.
Looking at the syzbot repro, we have concurrent ILA_CMD_ADD commands.
Add a mutex to make sure at most one thread is calling nf_register_net_hooks().
[1]
BUG: KASAN: slab-use-after-free in rht_key_hashfn include/linux/rhashtable.h:159 [inline]
BUG: KASAN: slab-use-after-free in __rhashtable_lookup.constprop.0+0x426/0x550 include/linux/rhashtable.h:604
Read of size 4 at addr ffff888028f40008 by task dhcpcd/5501
CPU: 1 UID: 0 PID: 5501 Comm: dhcpcd Not tainted 6.13.0-rc4-syzkaller-00054-gd6ef8b40d075 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0xc3/0x620 mm/kasan/report.c:489
kasan_report+0xd9/0x110 mm/kasan/report.c:602
rht_key_hashfn include/linux/rhashtable.h:159 [inline]
__rhashtable_lookup.constprop.0+0x426/0x550 include/linux/rhashtable.h:604
rhashtable_lookup include/linux/rhashtable.h:646 [inline]
rhashtable_lookup_fast include/linux/rhashtable.h:672 [inline]
ila_lookup_wildcards net/ipv6/ila/ila_xlat.c:127 [inline]
ila_xlat_addr net/ipv6/ila/ila_xlat.c:652 [inline]
ila_nf_input+0x1ee/0x620 net/ipv6/ila/ila_xlat.c:185
nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline]
nf_hook_slow+0xbb/0x200 net/netfilter/core.c:626
nf_hook.constprop.0+0x42e/0x750 include/linux/netfilter.h:269
NF_HOOK include/linux/netfilter.h:312 [inline]
ipv6_rcv+0xa4/0x680 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x12e/0x1e0 net/core/dev.c:5672
__netif_receive_skb+0x1d/0x160 net/core/dev.c:5785
process_backlog+0x443/0x15f0 net/core/dev.c:6117
__napi_poll.constprop.0+0xb7/0x550 net/core/dev.c:6883
napi_poll net/core/dev.c:6952 [inline]
net_rx_action+0xa94/0x1010 net/core/dev.c:7074
handle_softirqs+0x213/0x8f0 kernel/softirq.c:561
__do_softirq kernel/softirq.c:595 [inline]
invoke_softirq kernel/softirq.c:435 [inline]
__irq_exit_rcu+0x109/0x170 kernel/softirq.c:662
irq_exit_rcu+0x9/0x30 kernel/softirq.c:678
instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1049 [inline]
sysvec_apic_timer_interrupt+0xa4/0xc0 arch/x86/kernel/apic/apic.c:1049
Fixes: 7f00feaf1076 ("ila: Add generic ILA translation facility")
Reported-by: syzbot+47e761d22ecf745f72b9@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6772c9ae.050a0220.2f3838.04c7.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Tom Herbert <tom@herbertland.com>
Link: https://patch.msgid.link/20241230162849.2795486-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Blamed commit forgot MSG_PEEK case, allowing a crash [1] as found
by syzbot.
Rework vlan_get_protocol_dgram() to not touch skb at all,
so that it can be used from many cpus on the same skb.
Add a const qualifier to skb argument.
[1]
skbuff: skb_under_panic: text:ffffffff8a8ccd05 len:29 put:14 head:ffff88807fc8e400 data:ffff88807fc8e3f4 tail:0x11 end:0x140 dev:<NULL>
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:206 !
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
CPU: 1 UID: 0 PID: 5892 Comm: syz-executor883 Not tainted 6.13.0-rc4-syzkaller-00054-gd6ef8b40d075 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
RIP: 0010:skb_panic net/core/skbuff.c:206 [inline]
RIP: 0010:skb_under_panic+0x14b/0x150 net/core/skbuff.c:216
Code: 0b 8d 48 c7 c6 86 d5 25 8e 48 8b 54 24 08 8b 0c 24 44 8b 44 24 04 4d 89 e9 50 41 54 41 57 41 56 e8 5a 69 79 f7 48 83 c4 20 90 <0f> 0b 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3
RSP: 0018:ffffc900038d7638 EFLAGS: 00010282
RAX: 0000000000000087 RBX: dffffc0000000000 RCX: 609ffd18ea660600
RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000
RBP: ffff88802483c8d0 R08: ffffffff817f0a8c R09: 1ffff9200071ae60
R10: dffffc0000000000 R11: fffff5200071ae61 R12: 0000000000000140
R13: ffff88807fc8e400 R14: ffff88807fc8e3f4 R15: 0000000000000011
FS: 00007fbac5e006c0(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fbac5e00d58 CR3: 000000001238e000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
skb_push+0xe5/0x100 net/core/skbuff.c:2636
vlan_get_protocol_dgram+0x165/0x290 net/packet/af_packet.c:585
packet_recvmsg+0x948/0x1ef0 net/packet/af_packet.c:3552
sock_recvmsg_nosec net/socket.c:1033 [inline]
sock_recvmsg+0x22f/0x280 net/socket.c:1055
____sys_recvmsg+0x1c6/0x480 net/socket.c:2803
___sys_recvmsg net/socket.c:2845 [inline]
do_recvmmsg+0x426/0xab0 net/socket.c:2940
__sys_recvmmsg net/socket.c:3014 [inline]
__do_sys_recvmmsg net/socket.c:3037 [inline]
__se_sys_recvmmsg net/socket.c:3030 [inline]
__x64_sys_recvmmsg+0x199/0x250 net/socket.c:3030
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Fixes: 79eecf631c14 ("af_packet: Handle outgoing VLAN packets without hardware offloading")
Reported-by: syzbot+74f70bb1cb968bf09e4f@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6772c485.050a0220.2f3838.04c5.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Chengen Du <chengen.du@canonical.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20241230161004.2681892-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Blamed commit forgot MSG_PEEK case, allowing a crash [1] as found
by syzbot.
Rework vlan_get_tci() to not touch skb at all,
so that it can be used from many cpus on the same skb.
Add a const qualifier to skb argument.
[1]
skbuff: skb_under_panic: text:ffffffff8a8da482 len:32 put:14 head:ffff88807a1d5800 data:ffff88807a1d5810 tail:0x14 end:0x140 dev:<NULL>
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:206 !
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
CPU: 0 UID: 0 PID: 5880 Comm: syz-executor172 Not tainted 6.13.0-rc3-syzkaller-00762-g9268abe611b0 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
RIP: 0010:skb_panic net/core/skbuff.c:206 [inline]
RIP: 0010:skb_under_panic+0x14b/0x150 net/core/skbuff.c:216
Code: 0b 8d 48 c7 c6 9e 6c 26 8e 48 8b 54 24 08 8b 0c 24 44 8b 44 24 04 4d 89 e9 50 41 54 41 57 41 56 e8 3a 5a 79 f7 48 83 c4 20 90 <0f> 0b 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3
RSP: 0018:ffffc90003baf5b8 EFLAGS: 00010286
RAX: 0000000000000087 RBX: dffffc0000000000 RCX: 8565c1eec37aa000
RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000
RBP: ffff88802616fb50 R08: ffffffff817f0a4c R09: 1ffff92000775e50
R10: dffffc0000000000 R11: fffff52000775e51 R12: 0000000000000140
R13: ffff88807a1d5800 R14: ffff88807a1d5810 R15: 0000000000000014
FS: 00007fa03261f6c0(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffd65753000 CR3: 0000000031720000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
skb_push+0xe5/0x100 net/core/skbuff.c:2636
vlan_get_tci+0x272/0x550 net/packet/af_packet.c:565
packet_recvmsg+0x13c9/0x1ef0 net/packet/af_packet.c:3616
sock_recvmsg_nosec net/socket.c:1044 [inline]
sock_recvmsg+0x22f/0x280 net/socket.c:1066
____sys_recvmsg+0x1c6/0x480 net/socket.c:2814
___sys_recvmsg net/socket.c:2856 [inline]
do_recvmmsg+0x426/0xab0 net/socket.c:2951
__sys_recvmmsg net/socket.c:3025 [inline]
__do_sys_recvmmsg net/socket.c:3048 [inline]
__se_sys_recvmmsg net/socket.c:3041 [inline]
__x64_sys_recvmmsg+0x199/0x250 net/socket.c:3041
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
Fixes: 79eecf631c14 ("af_packet: Handle outgoing VLAN packets without hardware offloading")
Reported-by: syzbot+8400677f3fd43f37d3bc@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6772c485.050a0220.2f3838.04c6.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Chengen Du <chengen.du@canonical.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20241230161004.2681892-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
After blamed commit, crypto sockets could accidentally be destroyed
from RCU call back, as spotted by zyzbot [1].
Trying to acquire a mutex in RCU callback is not allowed.
Restrict SO_REUSEPORT socket option to inet sockets.
v1 of this patch supported TCP, UDP and SCTP sockets,
but fcnal-test.sh test needed RAW and ICMP support.
[1]
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:562
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 24, name: ksoftirqd/1
preempt_count: 100, expected: 0
RCU nest depth: 0, expected: 0
1 lock held by ksoftirqd/1/24:
#0: ffffffff8e937ba0 (rcu_callback){....}-{0:0}, at: rcu_lock_acquire include/linux/rcupdate.h:337 [inline]
#0: ffffffff8e937ba0 (rcu_callback){....}-{0:0}, at: rcu_do_batch kernel/rcu/tree.c:2561 [inline]
#0: ffffffff8e937ba0 (rcu_callback){....}-{0:0}, at: rcu_core+0xa37/0x17a0 kernel/rcu/tree.c:2823
Preemption disabled at:
[<ffffffff8161c8c8>] softirq_handle_begin kernel/softirq.c:402 [inline]
[<ffffffff8161c8c8>] handle_softirqs+0x128/0x9b0 kernel/softirq.c:537
CPU: 1 UID: 0 PID: 24 Comm: ksoftirqd/1 Not tainted 6.13.0-rc3-syzkaller-00174-ga024e377efed #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
__might_resched+0x5d4/0x780 kernel/sched/core.c:8758
__mutex_lock_common kernel/locking/mutex.c:562 [inline]
__mutex_lock+0x131/0xee0 kernel/locking/mutex.c:735
crypto_put_default_null_skcipher+0x18/0x70 crypto/crypto_null.c:179
aead_release+0x3d/0x50 crypto/algif_aead.c:489
alg_do_release crypto/af_alg.c:118 [inline]
alg_sock_destruct+0x86/0xc0 crypto/af_alg.c:502
__sk_destruct+0x58/0x5f0 net/core/sock.c:2260
rcu_do_batch kernel/rcu/tree.c:2567 [inline]
rcu_core+0xaaa/0x17a0 kernel/rcu/tree.c:2823
handle_softirqs+0x2d4/0x9b0 kernel/softirq.c:561
run_ksoftirqd+0xca/0x130 kernel/softirq.c:950
smpboot_thread_fn+0x544/0xa30 kernel/smpboot.c:164
kthread+0x2f0/0x390 kernel/kthread.c:389
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
</TASK>
Fixes: 8c7138b33e5c ("net: Unpublish sk from sk_reuseport_cb before call_rcu")
Reported-by: syzbot+b3e02953598f447d4d2a@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6772f2f4.050a0220.2f3838.04cb.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20241231160527.3994168-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|