summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
| * | net: sfp: Add helper to return the SFP bus nameMaxime Chevallier2024-04-062-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | Knowing the bus name is helpful when we want to expose the link topology to userspace, add a helper to return the SFP bus name. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: phy: add helpers to handle sfp phy connect/disconnectMaxime Chevallier2024-04-067-0/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are a few PHY drivers that can handle SFP modules through their sfp_upstream_ops. Introduce Phylib helpers to keep track of connected SFP PHYs in a netdevice's namespace, by adding the SFP PHY to the upstream PHY's netdev's namespace. By doing so, these SFP PHYs can be enumerated and exposed to users, which will be able to use their capabilities. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: sfp: pass the phy_device when disconnecting an sfp module's PHYMaxime Chevallier2024-04-063-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pass the phy_device as a parameter to the sfp upstream .disconnect_phy operation. This is preparatory work to help track phy devices across a net_device's link. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: phy: Introduce ethernet link topology representationMaxime Chevallier2024-04-0610-2/+244
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Link topologies containing multiple network PHYs attached to the same net_device can be found when using a PHY as a media converter for use with an SFP connector, on which an SFP transceiver containing a PHY can be used. With the current model, the transceiver's PHY can't be used for operations such as cable testing, timestamping, macsec offload, etc. The reason being that most of the logic for these configuration, coming from either ethtool netlink or ioctls tend to use netdev->phydev, which in multi-phy systems will reference the PHY closest to the MAC. Introduce a numbering scheme allowing to enumerate PHY devices that belong to any netdev, which can in turn allow userspace to take more precise decisions with regard to each PHY's configuration. The numbering is maintained per-netdev, in a phy_device_list. The numbering works similarly to a netdevice's ifindex, with identifiers that are only recycled once INT_MAX has been reached. This prevents races that could occur between PHY listing and SFP transceiver removal/insertion. The identifiers are assigned at phy_attach time, as the numbering depends on the netdevice the phy is attached to. The PHY index can be re-used for PHYs that are persistent. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: phy: marvell: implement cable test for 88E1111Pawel Dembicki2024-04-061-0/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | The same implementation is also valid for 88E1145. VCT in 88E1111 is similar to the 88E609x family. The main difference lies in register organization and required workarounds. It utilizes the same fields in registers but requires a simpler implementation. Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netlink: add nlmsg_consume() and use it in devlink compatJakub Kicinski2024-04-062-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | devlink_compat_running_version() sticks out when running netdevsim tests and watching dropped skbs. Add nlmsg_consume() for cases were we want to free a netlink skb but it is expected, rather than a drop. af_netlink code uses consume_skb() directly, which is fine, but some may prefer the symmetry of nlmsg_new() / nlmsg_consume(). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: skbuff: generalize the skb->decrypted bitJakub Kicinski2024-04-068-24/+24
| | | | | | | | | | | | | | | | | | | | The ->decrypted bit can be reused for other crypto protocols. Remove the direct dependency on TLS, add helpers to clean up the ifdefs leaking out everywhere. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'ynl-rename-array-nest-to-indexed-array'Jakub Kicinski2024-04-0610-31/+70
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hangbin Liu says: ==================== ynl: rename array-nest to indexed-array rename array-nest to indexed-array and add un-nest sub-type support ==================== Link: https://lore.kernel.org/r/20240404063114.1221532-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | ynl: support binary and integer sub-type for indexed-arrayHangbin Liu2024-04-062-3/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add binary and integer sub-type support for indexed-array to display bond arp and ns targets. Here is what the result looks like: # ip link add bond0 type bond mode 1 \ arp_ip_target 192.168.1.1,192.168.1.2 ns_ip6_target 2001::1,2001::2 # ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/rt_link.yaml \ --do getlink --json '{"ifname": "bond0"}' --output-json | jq '.linkinfo' "arp-ip-target": [ "192.168.1.1", "192.168.1.2" ], [...] "ns-ip6-target": [ "2001::1", "2001::2" ], Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20240404063114.1221532-3-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | ynl: rename array-nest to indexed-arrayHangbin Liu2024-04-0610-28/+53
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | Some implementations, like bonding, has nest array with same attr type. To support all kinds of entries under one nest array. As discussed[1], let's rename array-nest to indexed-array, and assuming the value is a nest by passing the type via sub-type. [1] https://lore.kernel.org/netdev/20240312100105.16a59086@kernel.org/ Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20240404063114.1221532-2-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | tcp: annotate data-races around tp->window_clampEric Dumazet2024-04-067-23/+29
| | | | | | | | | | | | | | | | | | | | tp->window_clamp can be read locklessly, add READ_ONCE() and WRITE_ONCE() annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://lore.kernel.org/r/20240404114231.2195171-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | Merge branch 'ethtool-hw-timestamping-statistics'Jakub Kicinski2024-04-0612-12/+209
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rahul Rameshbabu says: ==================== ethtool HW timestamping statistics The goal of this patch series is to introduce a common set of ethtool statistics for hardware timestamping that a driver implementer can hook into. The statistics counters added are based on what I believe are common patterns/behaviors found across various hardware timestamping implementations seen in the kernel tree today. The mlx5 family of devices is used as the PoC for this patch series. Other vendors are more than welcome to chime in on this series. Link: https://lore.kernel.org/netdev/20240402205223.137565-1-rrameshbabu@nvidia.com/ Link: https://lore.kernel.org/netdev/20240309084440.299358-1-rrameshbabu@nvidia.com/ Link: https://lore.kernel.org/netdev/20240223192658.45893-1-rrameshbabu@nvidia.com/ Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> ==================== Link: https://lore.kernel.org/r/20240403212931.128541-1-rrameshbabu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | tools: ynl: ethtool.py: Output timestamping statistics from tsinfo-get operationRahul Rameshbabu2024-04-061-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Print the nested stats attribute containing timestamping statistics when the --show-time-stamping flag is used. [root@binary-eater-vm-01 linux-ethtool-ts]# ./tools/net/ynl/ethtool.py --show-time-stamping mlx5_1 Time stamping parameters for mlx5_1: Capabilities: hardware-transmit hardware-receive hardware-raw-clock PTP Hardware Clock: 0 Hardware Transmit Timestamp Modes: off on Hardware Receive Filter Modes: none all Statistics: tx-pkts: 8 tx-lost: 0 tx-err: 0 Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Link: https://lore.kernel.org/r/20240403212931.128541-8-rrameshbabu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | netlink: specs: ethtool: define header-flags as an enumJakub Kicinski2024-04-062-7/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recent changes added header flags to the spec. Use an enum instead of defines for more seamless codegen. [Jakub: drop the already applied parts and rewrite message] Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Link: https://lore.kernel.org/r/20240403212931.128541-6-rrameshbabu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | net/mlx5e: Implement ethtool hardware timestamping statisticsRahul Rameshbabu2024-04-063-0/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Feed driver statistics counters related to hardware timestamping to standardized ethtool hardware timestamping statistics group. Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Link: https://lore.kernel.org/r/20240403212931.128541-5-rrameshbabu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | net/mlx5e: Introduce timestamps statistic counter for Tx DMA layerRahul Rameshbabu2024-04-064-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Count number of transmitted packets that were hardware timestamped at the device DMA layer. Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Link: https://lore.kernel.org/r/20240403212931.128541-4-rrameshbabu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | net/mlx5e: Introduce lost_cqe statistic counter for PTP Tx port timestamping CQRahul Rameshbabu2024-04-064-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Track the number of times a CQE was expected to not be delivered on PTP Tx port timestamping CQ. A CQE is expected to not be delivered if a certain amount of time passes since the corresponding CQE containing the DMA timestamp information has arrived. Increment the late_cqe counter when such a CQE does manage to be delivered to the CQ. Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20240403212931.128541-3-rrameshbabu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | ethtool: add interface to read Tx hardware timestamping statisticsRahul Rameshbabu2024-04-065-2/+117
|/ / | | | | | | | | | | | | | | | | | | | | | | | | Multiple network devices that support hardware timestamping appear to have common behavior with regards to timestamp handling. Implement common Tx hardware timestamping statistics in a tx_stats struct_group. Common Rx hardware timestamping statistics can subsequently be implemented in a rx_stats struct_group for ethtool_ts_stats. Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Link: https://lore.kernel.org/r/20240403212931.128541-2-rrameshbabu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | Merge branch 'address-all-wunused-const-warnings'Jakub Kicinski2024-04-064-14/+7
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Arnd Bergmann says: ==================== address all -Wunused-const warnings (the net part) Compilers traditionally warn for unused 'static' variables, but not if they are constant. The reason here is a custom for C++ programmers to define named constants as 'static const' variables in header files instead of using macros or enums. In W=1 builds, we get warnings only static const variables in C files, but not in headers, which is a good compromise, but this still produces warning output in at least 30 files. These warnings are almost all harmless, but also trivial to fix, and there is no good reason to warn only about the non-const variables being unused. I've gone through all the files that I found using randconfig and allmodconfig builds and created patches to avoid these warnings, with the goal of retaining a clean build once the option is enabled by default. Unfortunately, there is one fairly large patch ("drivers: remove incorrect of_match_ptr/ACPI_PTR annotations") that touches 34 individual drivers that all need the same one-line change. If necessary, I can split it up by driver or by subsystem, but at least for reviewing I would keep it as one piece for the moment. ==================== Link: https://lore.kernel.org/r/20240403080702.3509288-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | net: xgbe: remove extraneous #ifdef checksArnd Bergmann2024-04-061-8/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When both ACPI and OF are disabled, xgbe_v1 is unused and causes a W=1 warning: drivers/net/ethernet/amd/xgbe/xgbe-platform.c:533:39: error: unused variable 'xgbe_v1' [-Werror,-Wunused-const-variable] static const struct xgbe_version_data xgbe_v1 = { There is no real point in trying to save a few bytes for the match tables, so just make them always visible. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20240403080702.3509288-29-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | isdn: kcapi: don't build unused procfs codeArnd Bergmann2024-04-062-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The procfs file is completely unused without CONFIG_PROC_FS but causes a compile time warning: drivers/isdn/capi/kcapi_proc.c:97:36: error: unused variable 'seq_controller_ops' [-Werror,-Wunused-const-variable] static const struct seq_operations seq_controller_ops = { drivers/isdn/capi/kcapi_proc.c:104:36: error: unused variable 'seq_contrstats_ops' [-Werror,-Wunused-const-variable] drivers/isdn/capi/kcapi_proc.c:179:36: error: unused variable 'seq_applications_ops' [-Werror,-Wunused-const-variable] drivers/isdn/capi/kcapi_proc.c:186:36: error: unused variable 'seq_applstats_ops' [-Werror,-Wunused-const-variable] Remove the file from the build in that config and make the calls into it conditional instead. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20240403080702.3509288-27-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | 3c515: remove unused 'mtu' variableArnd Bergmann2024-04-061-3/+0
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | This has never been used since the start of the git history. When building with W=1, the unused variable produces a gcc warning: drivers/net/ethernet/3com/3c515.c:35:18: error: 'mtu' defined but not used [-Werror=unused-const-variable=] Just remove it. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20240403080702.3509288-6-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | trace: events: cleanup deprecated strncpy usesJustin Stitt2024-04-063-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | strncpy() is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces. For 2 out of 3 of these changes we can simply swap in strscpy() as it guarantess NUL-termination which is needed for the following trace print. trace_rpcgss_context() should use memcpy as its format specifier %.*s allows for the length to be specifier (__entry->len). Due to this, acceptor does not technically need to be NUL-terminated. Moreover, swapping in strscpy() and keeping everything else the same could result in truncation of the source string by one byte. To remedy this, we could use `len + 1` but I am unsure of the size of the destination buffer so a simple memcpy should suffice. | TP_printk("win_size=%u expiry=%lu now=%lu timeout=%u acceptor=%.*s", | __entry->window_size, __entry->expiry, __entry->now, | __entry->timeout, __entry->len, __get_str(acceptor)) I suspect acceptor not to naturally be a NUL-terminated string due to the presence of some stringify methods. | .crstringify_acceptor = gss_stringify_acceptor, Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Signed-off-by: Justin Stitt <justinstitt@google.com> Acked-by: Chuck Lever <chuck.lever@oracle.com> Link: https://lore.kernel.org/r/20240401-strncpy-include-trace-events-mdio-h-v1-1-9cb5a4cda116@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | Merge branch 'mlx5e-rc2-misc-patches'Jakub Kicinski2024-04-065-32/+74
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tariq Toukan says: ==================== mlx5e rc2 misc patches (part) This patchset includes small features and a cleanup for the mlx5e driver. Patches 1-2 by Cosmin implements FEC settings for 100G/lane modes. Patch 3 is a simple cleanup. ==================== Link: https://lore.kernel.org/r/20240404173357.123307-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | net/mlx5e: Un-expose functions in en.hTariq Toukan2024-04-063-24/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Un-expose functions that are not used outside of their c file. Make them static. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://lore.kernel.org/r/20240404173357.123307-6-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | net/mlx5e: Support FEC settings for 100G/lane modesCosmin Ratiu2024-04-062-4/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This consists of: 1. Expose the 100G/lane capability bit in the PCAM reg. 2. Expose the per link mode FEC capability masks in the PPLM reg. 3. Set the overrides according to ethtool parameters. FEC for new modes is set if and only if the PCAM 100G/lane capability is advertised and the capability mask for a given link mode reports that it can accept the requested FEC mode. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240404173357.123307-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | net/mlx5e: Extract checking of FEC support for a link modeCosmin Ratiu2024-04-061-5/+11
|/ / | | | | | | | | | | | | | | | | | | | | The check of whether a given FEC mode is supported in a given link mode is about to get more complicated, so extract it in a separate function to avoid code duplication. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240404173357.123307-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | bnxt_en: Fix PTP firmware timeout parameterMichael Chan2024-04-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Use the correct tmo_us microsecond parameter for the PTP firmware timeout parameter. Fixes: 7de3c2218eed ("bnxt_en: Add a timeout parameter to bnxt_hwrm_port_ts_query()") Reported-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20240404195500.171071-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | Merge branch 'net-dsa-microchip-ksz8-refactor-fdb-dump-path'Jakub Kicinski2024-04-053-69/+69
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Oleksij Rempel says: ==================== net: dsa: microchip: ksz8: refactor FDB dump path Refactor FDB dump code path for Microchip KSZ8xxx series. This series mostly makes some cosmetic reworks and allows to forward errors detected by the regmap. Change logs are part of patch commit messages. ==================== Link: https://lore.kernel.org/r/20240403125039.3414824-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | net: dsa: microchip: ksz8_r_dyn_mac_table(): use entries variable to signal ↵Oleksij Rempel2024-04-051-17/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 0 entries We already have a variable to provide number of entries. So use it, instead of using error number. Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20240403125039.3414824-9-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | net: dsa: microchip: ksz8_r_dyn_mac_table(): return read/write error if we ↵Oleksij Rempel2024-04-051-5/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | got any The read/write path may fail. So, return error if we got it. Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20240403125039.3414824-8-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | net: dsa: microchip: ksz8_r_dyn_mac_table(): ksz: do not return EAGAIN on ↵Oleksij Rempel2024-04-051-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | timeout EAGAIN was not used by previous code and not used by current code. So, remove it and use proper error value. Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20240403125039.3414824-7-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | net: dsa: microchip: ksz8: Unify variable naming in ksz8_r_dyn_mac_table()Oleksij Rempel2024-04-051-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use 'ret' instead of 'rc' in ksz8_r_dyn_mac_table() to maintain consistency with the rest of the file. Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20240403125039.3414824-6-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | net: dsa: microchip: ksz8: Refactor ksz8_r_dyn_mac_table() for readabilityOleksij Rempel2024-04-051-29/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the code out of a long if statement scope in ksz8_r_dyn_mac_table() to improve code readability. Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20240403125039.3414824-5-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | net: dsa: microchip: ksz8: Refactor ksz8_fdb_dump()Oleksij Rempel2024-04-052-13/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Refactor ksz8_fdb_dump() to address potential issues: - Limit the number of iterations to avoid endless loops. - Handle error codes returned by ksz8_r_dyn_mac_table(), with an exception for -ENXIO when no more dynamic entries are detected. Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Link: https://lore.kernel.org/r/20240403125039.3414824-4-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | net: dsa: microchip: Make ksz8_r_dyn_mac_table() staticOleksij Rempel2024-04-052-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ksz8_r_dyn_mac_table() is not used outside the source file. Make it static. Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20240403125039.3414824-3-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | net: dsa: microchip: Remove unused FDB timestamp support in ↵Oleksij Rempel2024-04-052-6/+3
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | ksz8_r_dyn_mac_table() The FDB timestamps are not being utilized. This commit removes the unused timestamp support from ksz8_r_dyn_mac_table() function. Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20240403125039.3414824-2-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | Merge branch 'add-starfive-jh8100-dwmac-support'Jakub Kicinski2024-04-051-5/+23
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tan Chun Hau says: ==================== Add StarFive JH8100 dwmac support Add StarFive JH8100 dwmac support. The JH8100 dwmac shares the same driver code as the JH7110 dwmac and has only one reset signal. Please refer to below: JH8100: reset-names = "stmmaceth"; JH7110: reset-names = "stmmaceth", "ahb"; JH7100: reset-names = "ahb"; Example usage of JH8100 in the device tree: gmac0: ethernet@16030000 { compatible = "starfive,jh8100-dwmac", "starfive,jh7110-dwmac", "snps,dwmac-5.20"; ... }; Changes in v6: - Removed unnecessary enum "starfive,jh8100-dwmac". ==================== Link: https://lore.kernel.org/r/20240403100549.78719-1-chunhau.tan@starfivetech.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | dt-bindings: net: starfive,jh7110-dwmac: Add StarFive JH8100 supportTan Chun Hau2024-04-051-5/+23
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add StarFive JH8100 dwmac support. The JH8100 dwmac shares the same driver code as the JH7110 dwmac and has only one reset signal. Please refer to below: JH8100: reset-names = "stmmaceth"; JH7110: reset-names = "stmmaceth", "ahb"; JH7100: reset-names = "ahb"; Example usage of JH8100 in the device tree: gmac0: ethernet@16030000 { compatible = "starfive,jh8100-dwmac", "starfive,jh7110-dwmac", "snps,dwmac-5.20"; ... }; Signed-off-by: Tan Chun Hau <chunhau.tan@starfivetech.com> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20240403100549.78719-2-chunhau.tan@starfivetech.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2024-04-05442-3079/+6710
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cross-merge networking fixes after downstream PR. Conflicts: net/ipv4/ip_gre.c 17af420545a7 ("erspan: make sure erspan_base_hdr is present in skb->head") 5832c4a77d69 ("ip_tunnel: convert __be16 tunnel flags to bitmaps") https://lore.kernel.org/all/20240402103253.3b54a1cf@canb.auug.org.au/ Adjacent changes: net/ipv6/ip6_fib.c d21d40605bca ("ipv6: Fix infinite recursion in fib6_dump_done().") 5fc68320c1fb ("ipv6: remove RTNL protection from inet6_dump_fib()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * \ Merge tag 'net-6.9-rc3' of ↵Linus Torvalds2024-04-0485-405/+1606
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter, bluetooth and bpf. Fairly usual collection of driver and core fixes. The large selftest accompanying one of the fixes is also becoming a common occurrence. Current release - regressions: - ipv6: fix infinite recursion in fib6_dump_done() - net/rds: fix possible null-deref in newly added error path Current release - new code bugs: - net: do not consume a full cacheline for system_page_pool - bpf: fix bpf_arena-related file descriptor leaks in the verifier - drv: ice: fix freeing uninitialized pointers, fixing misuse of the newfangled __free() auto-cleanup Previous releases - regressions: - x86/bpf: fixes the BPF JIT with retbleed=stuff - xen-netfront: add missing skb_mark_for_recycle, fix page pool accounting leaks, revealed by recently added explicit warning - tcp: fix bind() regression for v6-only wildcard and v4-mapped-v6 non-wildcard addresses - Bluetooth: - replace "hci_qca: Set BDA quirk bit if fwnode exists in DT" with better workarounds to un-break some buggy Qualcomm devices - set conn encrypted before conn establishes, fix re-connecting to some headsets which use slightly unusual sequence of msgs - mptcp: - prevent BPF accessing lowat from a subflow socket - don't account accept() of non-MPC client as fallback to TCP - drv: mana: fix Rx DMA datasize and skb_over_panic - drv: i40e: fix VF MAC filter removal Previous releases - always broken: - gro: various fixes related to UDP tunnels - netns crossing problems, incorrect checksum conversions, and incorrect packet transformations which may lead to panics - bpf: support deferring bpf_link dealloc to after RCU grace period - nf_tables: - release batch on table validation from abort path - release mutex after nft_gc_seq_end from abort path - flush pending destroy work before exit_net release - drv: r8169: skip DASH fw status checks when DASH is disabled" * tag 'net-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (81 commits) netfilter: validate user input for expected length net/sched: act_skbmod: prevent kernel-infoleak net: usb: ax88179_178a: avoid the interface always configured as random address net: dsa: sja1105: Fix parameters order in sja1110_pcs_mdio_write_c45() net: ravb: Always update error counters net: ravb: Always process TX descriptor ring netfilter: nf_tables: discard table flag update with pending basechain deletion netfilter: nf_tables: Fix potential data-race in __nft_flowtable_type_get() netfilter: nf_tables: reject new basechain after table flag update netfilter: nf_tables: flush pending destroy work before exit_net release netfilter: nf_tables: release mutex after nft_gc_seq_end from abort path netfilter: nf_tables: release batch on table validation from abort path Revert "tg3: Remove residual error handling in tg3_suspend" tg3: Remove residual error handling in tg3_suspend net: mana: Fix Rx DMA datasize and skb_over_panic net/sched: fix lockdep splat in qdisc_tree_reduce_backlog() net: phy: micrel: lan8814: Fix when enabling/disabling 1-step timestamping net: stmmac: fix rx queue priority assignment net: txgbe: fix i2c dev name cannot match clkdev net: fec: Set mac_managed_pm during probe ...
| | * \ Merge tag 'for-netdev' of ↵Jakub Kicinski2024-04-049-24/+75
| | |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2024-04-04 We've added 7 non-merge commits during the last 5 day(s) which contain a total of 9 files changed, 75 insertions(+), 24 deletions(-). The main changes are: 1) Fix x86 BPF JIT under retbleed=stuff which causes kernel panics due to incorrect destination IP calculation and incorrect IP for relocations, from Uros Bizjak and Joan Bruguera Micó. 2) Fix BPF arena file descriptor leaks in the verifier, from Anton Protopopov. 3) Defer bpf_link deallocation to after RCU grace period as currently running multi-{kprobes,uprobes} programs might still access cookie information from the link, from Andrii Nakryiko. 4) Fix a BPF sockmap lock inversion deadlock in map_delete_elem reported by syzkaller, from Jakub Sitnicki. 5) Fix resolve_btfids build with musl libc due to missing linux/types.h include, from Natanael Copa. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf, sockmap: Prevent lock inversion deadlock in map delete elem x86/bpf: Fix IP for relocating call depth accounting x86/bpf: Fix IP after emitting call depth accounting bpf: fix possible file descriptor leaks in verifier tools/resolve_btfids: fix build with musl libc bpf: support deferring bpf_link dealloc to after RCU grace period bpf: put uprobe link's path and task in release callback ==================== Link: https://lore.kernel.org/r/20240404183258.4401-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | | * | bpf, sockmap: Prevent lock inversion deadlock in map delete elemJakub Sitnicki2024-04-021-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzkaller started using corpuses where a BPF tracing program deletes elements from a sockmap/sockhash map. Because BPF tracing programs can be invoked from any interrupt context, locks taken during a map_delete_elem operation must be hardirq-safe. Otherwise a deadlock due to lock inversion is possible, as reported by lockdep: CPU0 CPU1 ---- ---- lock(&htab->buckets[i].lock); local_irq_disable(); lock(&host->lock); lock(&htab->buckets[i].lock); <Interrupt> lock(&host->lock); Locks in sockmap are hardirq-unsafe by design. We expects elements to be deleted from sockmap/sockhash only in task (normal) context with interrupts enabled, or in softirq context. Detect when map_delete_elem operation is invoked from a context which is _not_ hardirq-unsafe, that is interrupts are disabled, and bail out with an error. Note that map updates are not affected by this issue. BPF verifier does not allow updating sockmap/sockhash from a BPF tracing program today. Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface") Reported-by: xingwei lee <xrivendell7@gmail.com> Reported-by: yue sun <samsun1006219@gmail.com> Reported-by: syzbot+bc922f476bd65abbd466@syzkaller.appspotmail.com Reported-by: syzbot+d4066896495db380182e@syzkaller.appspotmail.com Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: syzbot+d4066896495db380182e@syzkaller.appspotmail.com Acked-by: John Fastabend <john.fastabend@gmail.com> Closes: https://syzkaller.appspot.com/bug?extid=d4066896495db380182e Closes: https://syzkaller.appspot.com/bug?extid=bc922f476bd65abbd466 Link: https://lore.kernel.org/bpf/20240402104621.1050319-1-jakub@cloudflare.com
| | | * | Merge branch 'x86-bpf-fixes-for-the-bpf-jit-with-retbleed-stuff'Alexei Starovoitov2024-04-023-15/+12
| | | |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Joan Bruguera Micó says: ==================== x86/bpf: Fixes for the BPF JIT with retbleed=stuff From: Joan Bruguera Micó <joanbrugueram@gmail.com> Fixes two issues that cause kernels panic when using the BPF JIT with the call depth tracking / stuffing mitigation for Skylake processors (`retbleed=stuff`). Both issues can be triggered by running simple BPF programs (e.g. running the test suite should trigger both). The first (resubmit) fixes a trivial issue related to calculating the destination IP for call instructions with call depth tracking. The second is related to using the correct IP for relocations, related to the recently introduced %rip-relative addressing for PER_CPU_VAR. Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> --- v2: Simplify calculation of "ip". Add more details to the commit message. Joan Bruguera Micó (1): x86/bpf: Fix IP for relocating call depth accounting ==================== Link: https://lore.kernel.org/r/20240401185821.224068-1-ubizjak@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | | | * | x86/bpf: Fix IP for relocating call depth accountingJoan Bruguera Micó2024-04-023-15/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The commit: 59bec00ace28 ("x86/percpu: Introduce %rip-relative addressing to PER_CPU_VAR()") made PER_CPU_VAR() to use rip-relative addressing, hence INCREMENT_CALL_DEPTH macro and skl_call_thunk_template got rip-relative asm code inside of it. A follow up commit: 17bce3b2ae2d ("x86/callthunks: Handle %rip-relative relocations in call thunk template") changed x86_call_depth_emit_accounting() to use apply_relocation(), but mistakenly assumed that the code is being patched in-place (where the destination of the relocation matches the address of the code), using *pprog as the destination ip. This is not true for the call depth accounting, emitted by the BPF JIT, so the calculated address was wrong, JIT-ed BPF progs on kernels with call depth tracking got broken and usually caused a page fault. Pass the destination IP when the BPF JIT emits call depth accounting. Fixes: 17bce3b2ae2d ("x86/callthunks: Handle %rip-relative relocations in call thunk template") Signed-off-by: Joan Bruguera Micó <joanbrugueram@gmail.com> Reviewed-by: Uros Bizjak <ubizjak@gmail.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20240401185821.224068-3-ubizjak@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | | | * | x86/bpf: Fix IP after emitting call depth accountingUros Bizjak2024-04-021-1/+1
| | | |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adjust the IP passed to `emit_patch` so it calculates the correct offset for the CALL instruction if `x86_call_depth_emit_accounting` emits code. Otherwise we will skip some instructions and most likely crash. Fixes: b2e9dfe54be4 ("x86/bpf: Emit call depth accounting if required") Link: https://lore.kernel.org/lkml/20230105214922.250473-1-joanbrugueram@gmail.com/ Co-developed-by: Joan Bruguera Micó <joanbrugueram@gmail.com> Signed-off-by: Joan Bruguera Micó <joanbrugueram@gmail.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20240401185821.224068-2-ubizjak@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | | * | bpf: fix possible file descriptor leaks in verifierAnton Protopopov2024-03-291-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The resolve_pseudo_ldimm64() function might have leaked file descriptors when BPF_MAP_TYPE_ARENA was used in a program (some error paths missed a corresponding fdput). Add missing fdputs. v2: remove unrelated changes from the fix Fixes: 6082b6c328b5 ("bpf: Recognize addr_space_cast instruction in the verifier.") Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Link: https://lore.kernel.org/r/20240329071106.67968-1-aspsk@isovalent.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | | * | tools/resolve_btfids: fix build with musl libcNatanael Copa2024-03-291-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Include the header that defines u32. This fixes build of 6.6.23 and 6.1.83 kernels for Alpine Linux, which uses musl libc. I assume that GNU libc indirecly pulls in linux/types.h. Fixes: 9707ac4fe2f5 ("tools/resolve_btfids: Refactor set sorting with types from btf_ids.h") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218647 Cc: stable@vger.kernel.org Signed-off-by: Natanael Copa <ncopa@alpinelinux.org> Tested-by: Greg Thelen <gthelen@google.com> Link: https://lore.kernel.org/r/20240328110103.28734-1-ncopa@alpinelinux.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | | * | bpf: support deferring bpf_link dealloc to after RCU grace periodAndrii Nakryiko2024-03-293-6/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BPF link for some program types is passed as a "context" which can be used by those BPF programs to look up additional information. E.g., for multi-kprobes and multi-uprobes, link is used to fetch BPF cookie values. Because of this runtime dependency, when bpf_link refcnt drops to zero there could still be active BPF programs running accessing link data. This patch adds generic support to defer bpf_link dealloc callback to after RCU GP, if requested. This is done by exposing two different deallocation callbacks, one synchronous and one deferred. If deferred one is provided, bpf_link_free() will schedule dealloc_deferred() callback to happen after RCU GP. BPF is using two flavors of RCU: "classic" non-sleepable one and RCU tasks trace one. The latter is used when sleepable BPF programs are used. bpf_link_free() accommodates that by checking underlying BPF program's sleepable flag, and goes either through normal RCU GP only for non-sleepable, or through RCU tasks trace GP *and* then normal RCU GP (taking into account rcu_trace_implies_rcu_gp() optimization), if BPF program is sleepable. We use this for multi-kprobe and multi-uprobe links, which dereference link during program run. We also preventively switch raw_tp link to use deferred dealloc callback, as upcoming changes in bpf-next tree expose raw_tp link data (specifically, cookie value) to BPF program at runtime as well. Fixes: 0dcac2725406 ("bpf: Add multi kprobe link") Fixes: 89ae89f53d20 ("bpf: Add multi uprobe link") Reported-by: syzbot+981935d9485a560bfbcb@syzkaller.appspotmail.com Reported-by: syzbot+2cb5a6c573e98db598cc@syzkaller.appspotmail.com Reported-by: syzbot+62d8b26793e8a2bd0516@syzkaller.appspotmail.com Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20240328052426.3042617-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | | * | bpf: put uprobe link's path and task in release callbackAndrii Nakryiko2024-03-291-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no need to delay putting either path or task to deallocation step. It can be done right after bpf_uprobe_unregister. Between release and dealloc, there could be still some running BPF programs, but they don't access either task or path, only data in link->uprobes, so it is safe to do. On the other hand, doing path_put() in dealloc callback makes this dealloc sleepable because path_put() itself might sleep. Which is problematic due to the need to call uprobe's dealloc through call_rcu(), which is what is done in the next bug fix patch. So solve the problem by releasing these resources early. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240328052426.3042617-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>