| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
checkpatch warns about use of seq_printf where seq_puts would do.
Modify l2tp_debugfs accordingly.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
| |
Use BIT(x) rather than (1<<x), reported by checkpatch.pl.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reported by checkpatch:
"WARNING: function definition argument 'struct sock *'
should also have an identifier name"
Add an identifier name to help document the prototype.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
| |
l2tp_core has conditionally compiled code in l2tp_xmit_skb for IPv6
support. The structure of this code triggered a checkpatch warning
due to incorrect indentation.
Fix up the indentation to address the checkpatch warning.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
| |
Arguments should be aligned with the function call open parenthesis as
per checkpatch. Tweak some function calls which were not aligned
correctly.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some l2tp code had line breaks which made the code more difficult to
read. These were originally motivated by the 80-character line width
coding guidelines, but were actually a negative from the perspective of
trying to follow the code.
Remove these linebreaks for clearer code, even if we do exceed 80
characters in width in some places.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Modify some l2tp comments to better adhere to kernel coding style, as
reported by checkpatch.pl.
Add descriptive comments for the l2tp per-net spinlocks to document
their use.
Fix an incorrect comment in l2tp_recv_common:
RFC2661 section 5.4 states that:
"The LNS controls enabling and disabling of sequence numbers by sending a
data message with or without sequence numbers present at any time during
the life of a session."
l2tp handles this correctly in l2tp_recv_common, but the comment around
the code was incorrect and confusing. Fix up the comment accordingly.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix up various whitespace issues as reported by checkpatch.pl:
* remove spaces around operators where appropriate,
* add missing blank lines following declarations,
* remove multiple blank lines, or trailing blank lines at the end of
functions.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently devlink instance is searched on all doit() operations.
But it is optionally stored into user_ptr[0]. This requires
rediscovering devlink again doing post_doit().
Few devlink commands related to port shared buffers needs 3 pointers
(devlink, devlink_port, and devlink_sb) while executing doit commands.
Though devlink pointer can be derived from the devlink_port during
post_doit() operation when doit() callback has acquired devlink
instance lock, relying on such scheme to access devlik pointer makes
code very fragile.
Hence, to avoid ambiguity in post_doit() and to avoid searching
devlink instance again, simplify code by always storing devlink
instance in user_ptr[0] and derive devlink_sb pointer in their
respective callback routines.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Vlan tagged packets are getting dropped when used with DPDK that uses
the AF_PACKET interface on a hyperV guest.
The packet layer uses the tpacket interface to communicate the vlans
information to the upper layers. On Rx path, these drivers can read the
vlan info from the tpacket header but on the Tx path, this information
is still within the packet frame and requires the paravirtual drivers to
push this back into the NDIS header which is then used by the host OS to
form the packet.
This transition from the packet frame to NDIS header is currently missing
hence causing the host OS to drop the all vlan tagged packets sent by
the drivers that use AF_PACKET (ETH_P_ALL) such as DPDK.
Here is an overview of the changes in the vlan header in the packet path:
The RX path (userspace handles everything):
1. RX VLAN packet is stripped by HOST OS and placed in NDIS header
2. Guest Kernel RX hv_netvsc packets and moves VLAN info from NDIS
header into kernel SKB
3. Kernel shares packets with user space application with PACKET_MMAP.
The SKB VLAN info is copied to tpacket layer and indication set
TP_STATUS_VLAN_VALID.
4. The user space application will re-insert the VLAN info into the frame
The TX path:
1. The user space application has the VLAN info in the frame.
2. Guest kernel gets packets from the application with PACKET_MMAP.
3. The kernel later sends the frame to the hv_netvsc driver. The only way
to send VLANs is when the SKB is setup & the VLAN is stripped from the
frame.
4. TX VLAN is re-inserted by HOST OS based on the NDIS header. If it sees
a VLAN in the frame the packet is dropped.
Cc: xe-linux-external@cisco.com
Cc: Sriram Krishnan <srirakr2@cisco.com>
Signed-off-by: Sriram Krishnan <srirakr2@cisco.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Otherwise the 'chain_len' filed will carry random values,
some token creation calls will fail due to excessive chain
length, causing unexpected fallback to TCP.
Fixes: 2c5ebd001d4f ("mptcp: refactor token container")
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Tested-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
| |
The variable current_head_index is being initialized with a value that
is never read and it is being updated later with a new value. Replace
the initialization of -1 with the latter assignment.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
enetc_imdio_remove() is missing from the enetc_pf_probe()
bailout path. Not surprisingly because enetc_setup_serdes()
is registering the imdio bus for internal purposes, and it's
not obvious that enetc_imdio_remove() currently performs the
teardown of enetc_setup_serdes().
To fix this, define enetc_teardown_serdes() to wrap
enetc_imdio_remove() (improve code maintenance) and call it
on bailout and remove paths.
Fixes: 975d183ef0ca ("net: enetc: Initialize SerDes for SGMII and USXGMII protocols")
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove casting the values returned by memory allocation function.
Coccinelle emits WARNING: casting value returned by memory allocation
unction to (struct roce_destroy_qp_req_output_params *) is useless.
This issue was detected by using the Coccinelle software.
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After the patch below, the iteration through the available MMDs is
completely short-circuited, and devs_in_pkg remains set to the initial
value of zero.
Due to devs_in_pkg being zero, the rest of get_phy_c45_ids() is
short-circuited too: the following loop never reaches below this point
either (it executes "continue" for every device in package, failing to
retrieve PHY ID for any of them):
/* Now probe Device Identifiers for each device present. */
for (i = 1; i < num_ids; i++) {
if (!(devs_in_pkg & (1 << i)))
continue;
So c45_ids->device_ids remains populated with zeroes. This causes an
Aquantia AQR412 PHY (same as any C45 PHY would, in fact) to be probed by
the Generic PHY driver.
The issue seems to be a case of submitting partially committed work (and
therefore testing something other than was submitted).
The intention of the patch was to delay exiting the loop until one more
condition is reached (the devs_in_pkg read from hardware is either 0, OR
mostly f's). So fix the patch to reflect that.
Tested with traffic on a LS1028A-QDS, the PHY is now probed correctly
using the Aquantia driver. The devs_in_pkg bit field is set to
0xe000009a, and the MMDs that are present have the following IDs:
[ 5.600772] libphy: get_phy_c45_ids: device_ids[1]=0x3a1b662
[ 5.618781] libphy: get_phy_c45_ids: device_ids[3]=0x3a1b662
[ 5.630797] libphy: get_phy_c45_ids: device_ids[4]=0x3a1b662
[ 5.654535] libphy: get_phy_c45_ids: device_ids[7]=0x3a1b662
[ 5.791723] libphy: get_phy_c45_ids: device_ids[29]=0x3a1b662
[ 5.804050] libphy: get_phy_c45_ids: device_ids[30]=0x3a1b662
[ 5.816375] libphy: get_phy_c45_ids: device_ids[31]=0x0
[ 7.690237] mscc_felix 0000:00:00.5: PHY [0.5:00] driver [Aquantia AQR412] (irq=POLL)
[ 7.704739] mscc_felix 0000:00:00.5: PHY [0.5:01] driver [Aquantia AQR412] (irq=POLL)
[ 7.718918] mscc_felix 0000:00:00.5: PHY [0.5:02] driver [Aquantia AQR412] (irq=POLL)
[ 7.733044] mscc_felix 0000:00:00.5: PHY [0.5:03] driver [Aquantia AQR412] (irq=POLL)
Fixes: bba238ed037c ("net: phy: continue searching for C45 MMDs even if first returned ffff:ffff")
Reported-by: Colin King <colin.king@canonical.com>
Reported-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds support for the SIOCOUTQ IOCTL to get the send buffer fill
of a DCCP socket, like UDP and TCP sockets already have.
Regarding the used data field: DCCP uses per packet sequence numbers,
not per byte, so sequence numbers can't be used like in TCP. sk_wmem_queued
is not used by DCCP and always 0, even in test on highly congested paths.
Therefore this uses sk_wmem_alloc like in UDP.
Signed-off-by: Richard Sailer <richard_siegfried@systemli.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Kurt Kanzenbach says:
====================
Add DSA yaml binding
as discussed [1] [2] it makes sense to add a DSA yaml binding. This is the
second version and contains now two ways of specifying the switch ports: Either
by "ports" or by "ethernet-ports". That is why the third patch also adjusts the
DSA core for it.
Tested in combination with the hellcreek.yaml file.
Changes since v1:
* Use select to not match unrelated switches
* Allow ethernet-port(s)
* List ethernet-controller properties
* Include better description
* Let dsa.txt refer to dsa.yaml
Thanks,
Kurt
[1] - https://lkml.kernel.org/netdev/449f0a03-a91d-ae82-b31f-59dfd1457ec5@gmail.com/
[2] - https://lkml.kernel.org/netdev/20200710090618.28945-1-kurt@linutronix.de/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Due to unified Ethernet Switch Device Tree Bindings allow for ethernet-ports as
encapsulating node as well.
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The DSA bindings have been converted to YAML. Therefore, the old text style
documentation should refer to that one.
The text file can be removed completely once all the existing DSA switch
bindings have been converted as well.
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|/
|
|
|
|
|
|
|
|
| |
For future DSA drivers it makes sense to add a generic DSA yaml binding which
can be used then. This was created using the properties from dsa.txt. It
includes the ports and the dsa,member property.
Suggested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The VSC7514 is marketed as a 10-port switch, however it has 11 physical
ports (0->10) in the block diagram:
https://www.microsemi.com/product-directory/ethernet-switches/3992-vsc7514
(also in the device tree at arch/mips/boot/dts/mscc/ocelot.dtsi)
Additionally, by architecture it has one more entry in the analyzer
block, situated right after the physical ports, for the CPU port module.
This is not a physical port, it only represents a channel for frame
injection and extraction. That entry for the CPU port is at index 11 in
the analyzer.
When the register groups for QSYS_SWITCH_PORT_MODE, SYS_PORT_MODE and
SYS_PAUSE_CFG are declared to be replicated 11 times, the 11th entry in
the array of regfields is not initialized, so the CPU port module is not
initialized either.
The documentation of QSYS_SWITCH_PORT_MODE for VSC7514 also says that
this register group is replicated 12 times, so this patch is simply
reflecting that and not introducing any further inconsistency.
Fixes: 886e1387c73d ("net: mscc: ocelot: convert QSYS_SWITCH_PORT_MODE and SYS_PORT_MODE to regfields")
Fixes: 541132f0961a ("net: mscc: ocelot: convert SYS_PAUSE_CFG register access to regfield")
Reported-by: Bryan Whitehead <bryan.whitehead@microchip.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
| |
The buildbot found a config where the header isn't already implicitly
pulled in, so add an explicit include as well.
Fixes: 8c918ffbbad4 ("net: remove compat_sock_common_{get,set}sockopt")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Alexei Starovoitov says:
====================
pull-request: bpf-next 2020-07-21
The following pull-request contains BPF updates for your *net-next* tree.
We've added 46 non-merge commits during the last 6 day(s) which contain
a total of 68 files changed, 4929 insertions(+), 526 deletions(-).
The main changes are:
1) Run BPF program on socket lookup, from Jakub.
2) Introduce cpumap, from Lorenzo.
3) s390 JIT fixes, from Ilya.
4) teach riscv JIT to emit compressed insns, from Luke.
5) use build time computed BTF ids in bpf iter, from Yonghong.
====================
Purely independent overlapping changes in both filter.h and xdp.h
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The bpftool sources include code to walk file trees, but use multiple
frameworks to do so: nftw and fts. While nftw conforms to POSIX/SUSv3 and
is widely available, fts is not conformant and less common, especially on
non-glibc systems. The inconsistent framework usage hampers maintenance
and portability of bpftool, in particular for embedded systems.
Standardize code usage by rewriting one fts-based function to use nftw and
clean up some related function warnings by extending use of "const char *"
arguments. This change helps in building bpftool against musl for OpenWrt.
Also fix an unsafe call to dirname() by duplicating the string to pass,
since some implementations may directly alter it. The same approach is
used in libbpf.c.
Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200721024817.13701-1-Tony.Ambardar@gmail.com
|
| |\
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Yonghong Song says:
====================
Commit 5a2798ab32ba
("bpf: Add BTF_ID_LIST/BTF_ID/BTF_ID_UNUSED macros")
implemented a mechanism to compute btf_ids at kernel build
time which can simplify kernel implementation and reduce
runtime overhead by removing in-kernel btf_id calculation.
This patch set tried to use this mechanism to compute
btf_ids for bpf_skc_to_*() helpers and for btf_id_or_null ctx
arguments specified during bpf iterator registration.
Please see individual patch for details.
Changelogs:
v1 -> v2:
- v1 ([1]) is only for bpf_skc_to_*() helpers. This version
expanded it to cover ctx btf_id_or_null arguments
- abandoned the change of "extern u32 name[]" to
"static u32 name[]" for BPF_ID_LIST local "name" definition.
gcc 9 incurred a compilation error.
[1]: https://lore.kernel.org/bpf/20200717184706.3476992-1-yhs@fb.com/T
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
One additional field btf_id is added to struct
bpf_ctx_arg_aux to store the precomputed btf_ids.
The btf_id is computed at build time with
BTF_ID_LIST or BTF_ID_LIST_GLOBAL macro definitions.
All existing bpf iterators are changed to used
pre-compute btf_ids.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200720163403.1393551-1-yhs@fb.com
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
tcp and udp bpf_iter can reuse some socket ids in
btf_sock_ids, so make it global.
I put the extern definition in btf_ids.h as a central
place so it can be easily discovered by developers.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200720163402.1393427-1-yhs@fb.com
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Existing BTF_ID_LIST used a local static variable
to store btf_ids. This patch provided a new macro
BTF_ID_LIST_GLOBAL to store btf_ids in a global
variable which can be shared among multiple files.
The existing BTF_ID_LIST is still retained.
Two reasons. First, BTF_ID_LIST is also used to build
btf_ids for helper arguments which typically
is an array of 5. Since typically different
helpers have different signature, it makes
little sense to share them. Second, some
current computed btf_ids are indeed local.
If later those btf_ids are shared between
different files, they can use BTF_ID_LIST_GLOBAL then.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/bpf/20200720163401.1393159-1-yhs@fb.com
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Sync kernel header btf_ids.h to tools directory.
Also define macro CONFIG_DEBUG_INFO_BTF before
including btf_ids.h in prog_tests/resolve_btfids.c
since non-stub definitions for BTF_ID_LIST etc. macros
are defined under CONFIG_DEBUG_INFO_BTF. This
prevented test_progs from failing.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200720163359.1393079-1-yhs@fb.com
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Currently, socket types (struct tcp_sock, udp_sock, etc.)
used by bpf_skc_to_*() helpers are computed when vmlinux_btf
is first built in the kernel.
Commit 5a2798ab32ba
("bpf: Add BTF_ID_LIST/BTF_ID/BTF_ID_UNUSED macros")
implemented a mechanism to compute btf_ids at kernel build
time which can simplify kernel implementation and reduce
runtime overhead by removing in-kernel btf_id calculation.
This patch did exactly this, removing in-kernel btf_id
computation and utilizing build-time btf_id computation.
If CONFIG_DEBUG_INFO_BTF is not defined, BTF_ID_LIST will
define an array with size of 5, which is not enough for
btf_sock_ids. So define its own static array if
CONFIG_DEBUG_INFO_BTF is not defined.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200720163358.1393023-1-yhs@fb.com
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
A handful of samples and selftests fail to build on s390, because
after commit 0ebeea8ca8a4 ("bpf: Restrict bpf_probe_read{, str}()
only to archs where they work") bpf_probe_read is not available
anymore.
Fix by using bpf_probe_read_kernel.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200720114806.88823-1-iii@linux.ibm.com
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
OpenBSD netcat (Debian patchlevel 1.195-2) does not seem to react to
SIGINT for whatever reason, causing prefix.pl to hang after
test_lwt_seg6local.sh exits due to netcat inheriting
test_lwt_seg6local.sh's file descriptors.
Fix by using SIGTERM instead.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200720101810.84299-1-iii@linux.ibm.com
|
| |\
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Luke Nelson says:
====================
his patch series enables using compressed riscv (RVC) instructions
in the rv64 BPF JIT.
RVC is a standard riscv extension that adds a set of compressed,
2-byte instructions that can replace some regular 4-byte instructions
for improved code density.
This series first modifies the JIT to support using 2-byte instructions
(e.g., in jump offset computations), then adds RVC encoding and
helper functions, and finally uses the helper functions to optimize
the rv64 JIT.
I used our formal verification framework, Serval, to verify the
correctness of the RVC encodings and their uses in the rv64 JIT.
The JIT continues to pass all tests in lib/test_bpf.c, and introduces
no new failures to test_verifier; both with and without RVC being enabled.
The following are examples of the JITed code for the verifier selftest
"direct packet read test#3 for CGROUP_SKB OK", without and with RVC
enabled, respectively. The former uses 178 bytes, and the latter uses 112,
for a ~37% reduction in code size for this example.
Without RVC:
0: 02000813 addi a6,zero,32
4: fd010113 addi sp,sp,-48
8: 02813423 sd s0,40(sp)
c: 02913023 sd s1,32(sp)
10: 01213c23 sd s2,24(sp)
14: 01313823 sd s3,16(sp)
18: 01413423 sd s4,8(sp)
1c: 03010413 addi s0,sp,48
20: 03056683 lwu a3,48(a0)
24: 02069693 slli a3,a3,0x20
28: 0206d693 srli a3,a3,0x20
2c: 03456703 lwu a4,52(a0)
30: 02071713 slli a4,a4,0x20
34: 02075713 srli a4,a4,0x20
38: 03856483 lwu s1,56(a0)
3c: 02049493 slli s1,s1,0x20
40: 0204d493 srli s1,s1,0x20
44: 03c56903 lwu s2,60(a0)
48: 02091913 slli s2,s2,0x20
4c: 02095913 srli s2,s2,0x20
50: 04056983 lwu s3,64(a0)
54: 02099993 slli s3,s3,0x20
58: 0209d993 srli s3,s3,0x20
5c: 09056a03 lwu s4,144(a0)
60: 020a1a13 slli s4,s4,0x20
64: 020a5a13 srli s4,s4,0x20
68: 00900313 addi t1,zero,9
6c: 006a7463 bgeu s4,t1,0x74
70: 00000a13 addi s4,zero,0
74: 02d52823 sw a3,48(a0)
78: 02e52a23 sw a4,52(a0)
7c: 02952c23 sw s1,56(a0)
80: 03252e23 sw s2,60(a0)
84: 05352023 sw s3,64(a0)
88: 00000793 addi a5,zero,0
8c: 02813403 ld s0,40(sp)
90: 02013483 ld s1,32(sp)
94: 01813903 ld s2,24(sp)
98: 01013983 ld s3,16(sp)
9c: 00813a03 ld s4,8(sp)
a0: 03010113 addi sp,sp,48
a4: 00078513 addi a0,a5,0
a8: 00008067 jalr zero,0(ra)
With RVC:
0: 02000813 addi a6,zero,32
4: 7179 c.addi16sp sp,-48
6: f422 c.sdsp s0,40(sp)
8: f026 c.sdsp s1,32(sp)
a: ec4a c.sdsp s2,24(sp)
c: e84e c.sdsp s3,16(sp)
e: e452 c.sdsp s4,8(sp)
10: 1800 c.addi4spn s0,sp,48
12: 03056683 lwu a3,48(a0)
16: 1682 c.slli a3,0x20
18: 9281 c.srli a3,0x20
1a: 03456703 lwu a4,52(a0)
1e: 1702 c.slli a4,0x20
20: 9301 c.srli a4,0x20
22: 03856483 lwu s1,56(a0)
26: 1482 c.slli s1,0x20
28: 9081 c.srli s1,0x20
2a: 03c56903 lwu s2,60(a0)
2e: 1902 c.slli s2,0x20
30: 02095913 srli s2,s2,0x20
34: 04056983 lwu s3,64(a0)
38: 1982 c.slli s3,0x20
3a: 0209d993 srli s3,s3,0x20
3e: 09056a03 lwu s4,144(a0)
42: 1a02 c.slli s4,0x20
44: 020a5a13 srli s4,s4,0x20
48: 4325 c.li t1,9
4a: 006a7363 bgeu s4,t1,0x50
4e: 4a01 c.li s4,0
50: d914 c.sw a3,48(a0)
52: d958 c.sw a4,52(a0)
54: dd04 c.sw s1,56(a0)
56: 03252e23 sw s2,60(a0)
5a: 05352023 sw s3,64(a0)
5e: 4781 c.li a5,0
60: 7422 c.ldsp s0,40(sp)
62: 7482 c.ldsp s1,32(sp)
64: 6962 c.ldsp s2,24(sp)
66: 69c2 c.ldsp s3,16(sp)
68: 6a22 c.ldsp s4,8(sp)
6a: 6145 c.addi16sp sp,48
6c: 853e c.mv a0,a5
6e: 8082 c.jr ra
RFC -> v1:
- From Björn Töpel:
* Changed RVOFF macro to static inline "ninsns_rvoff".
* Changed return type of rvc_ functions from u32 to u16.
* Changed sizeof(u16) to sizeof(*ctx->insns).
* Factored unsigned immediate checks into helper functions
(is_8b_uint, etc.)
* Changed to use IS_ENABLED instead of #ifdef to check if RVC is
enabled.
* Changed type of immediate arguments to rvc_* encoding to u32
to avoid issues from promotion of u16 to signed int.
* Cleaned up RVC checks in emit_{addi,slli,srli,srai}.
+ Wrapped lines at 100 instead of 80 columns for increased clarity.
+ Move !imm checks into each branch instead of checking
separately.
+ Strengthed checks for c.{slli,srli,srai} to check that
imm < XLEN. Otherwise, imm could be non-zero but the lower
XLEN bits could all be zero, leading to invalid RVC encoding.
* Changed emit_imm to sign-extend the 12-bit value in "lower"
+ The immediate checks for emit_{addiw,li,addi} use signed
comparisons, so this enables the RVC variants to be used
more often (e.g., if val == -1, then lower should be -1
as opposed to 4095).
====================
Reviewed-by: Björn Töpel <bjorn.topel@gmail.com>
Acked-by: Björn Töpel <bjorn.topel@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch uses the RVC support and encodings from bpf_jit.h to optimize
the rv64 jit.
The optimizations work by replacing emit(rv_X(...)) with a call to a
helper function emit_X, which will emit a compressed version of the
instruction when possible, and when RVC is enabled.
The JIT continues to pass all tests in lib/test_bpf.c, and introduces
no new failures to test_verifier; both with and without RVC being enabled.
Most changes are straightforward replacements of emit(rv_X(...), ctx)
with emit_X(..., ctx), with the following exceptions bearing mention;
* Change emit_imm to sign-extend the value in "lower", since the
checks for RVC (and the instructions themselves) treat the value as
signed. Otherwise, small negative immediates will not be recognized as
encodable using an RVC instruction. For example, without this change,
emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a
6b int even though a "c.li rd, -1" instruction suffices.
* For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit
cases since the values are zero-extended into the upper 32 bits in
the following instructions anyways, and the addition commutes with
zero-extension. (BPF_SUB BPF_X must still use subw since subtraction
does not commute with zero-extension.)
This patch avoids optimizing branches and jumps to use RVC instructions
since surrounding code often makes assumptions about the sizes of
emitted instructions. Optimizing these will require changing these
functions (e.g., emit_branch) to dynamically compute jump offsets.
The following are examples of the JITed code for the verifier selftest
"direct packet read test#3 for CGROUP_SKB OK", without and with RVC
enabled, respectively. The former uses 178 bytes, and the latter uses 112,
for a ~37% reduction in code size for this example.
Without RVC:
0: 02000813 addi a6,zero,32
4: fd010113 addi sp,sp,-48
8: 02813423 sd s0,40(sp)
c: 02913023 sd s1,32(sp)
10: 01213c23 sd s2,24(sp)
14: 01313823 sd s3,16(sp)
18: 01413423 sd s4,8(sp)
1c: 03010413 addi s0,sp,48
20: 03056683 lwu a3,48(a0)
24: 02069693 slli a3,a3,0x20
28: 0206d693 srli a3,a3,0x20
2c: 03456703 lwu a4,52(a0)
30: 02071713 slli a4,a4,0x20
34: 02075713 srli a4,a4,0x20
38: 03856483 lwu s1,56(a0)
3c: 02049493 slli s1,s1,0x20
40: 0204d493 srli s1,s1,0x20
44: 03c56903 lwu s2,60(a0)
48: 02091913 slli s2,s2,0x20
4c: 02095913 srli s2,s2,0x20
50: 04056983 lwu s3,64(a0)
54: 02099993 slli s3,s3,0x20
58: 0209d993 srli s3,s3,0x20
5c: 09056a03 lwu s4,144(a0)
60: 020a1a13 slli s4,s4,0x20
64: 020a5a13 srli s4,s4,0x20
68: 00900313 addi t1,zero,9
6c: 006a7463 bgeu s4,t1,0x74
70: 00000a13 addi s4,zero,0
74: 02d52823 sw a3,48(a0)
78: 02e52a23 sw a4,52(a0)
7c: 02952c23 sw s1,56(a0)
80: 03252e23 sw s2,60(a0)
84: 05352023 sw s3,64(a0)
88: 00000793 addi a5,zero,0
8c: 02813403 ld s0,40(sp)
90: 02013483 ld s1,32(sp)
94: 01813903 ld s2,24(sp)
98: 01013983 ld s3,16(sp)
9c: 00813a03 ld s4,8(sp)
a0: 03010113 addi sp,sp,48
a4: 00078513 addi a0,a5,0
a8: 00008067 jalr zero,0(ra)
With RVC:
0: 02000813 addi a6,zero,32
4: 7179 c.addi16sp sp,-48
6: f422 c.sdsp s0,40(sp)
8: f026 c.sdsp s1,32(sp)
a: ec4a c.sdsp s2,24(sp)
c: e84e c.sdsp s3,16(sp)
e: e452 c.sdsp s4,8(sp)
10: 1800 c.addi4spn s0,sp,48
12: 03056683 lwu a3,48(a0)
16: 1682 c.slli a3,0x20
18: 9281 c.srli a3,0x20
1a: 03456703 lwu a4,52(a0)
1e: 1702 c.slli a4,0x20
20: 9301 c.srli a4,0x20
22: 03856483 lwu s1,56(a0)
26: 1482 c.slli s1,0x20
28: 9081 c.srli s1,0x20
2a: 03c56903 lwu s2,60(a0)
2e: 1902 c.slli s2,0x20
30: 02095913 srli s2,s2,0x20
34: 04056983 lwu s3,64(a0)
38: 1982 c.slli s3,0x20
3a: 0209d993 srli s3,s3,0x20
3e: 09056a03 lwu s4,144(a0)
42: 1a02 c.slli s4,0x20
44: 020a5a13 srli s4,s4,0x20
48: 4325 c.li t1,9
4a: 006a7363 bgeu s4,t1,0x50
4e: 4a01 c.li s4,0
50: d914 c.sw a3,48(a0)
52: d958 c.sw a4,52(a0)
54: dd04 c.sw s1,56(a0)
56: 03252e23 sw s2,60(a0)
5a: 05352023 sw s3,64(a0)
5e: 4781 c.li a5,0
60: 7422 c.ldsp s0,40(sp)
62: 7482 c.ldsp s1,32(sp)
64: 6962 c.ldsp s2,24(sp)
66: 69c2 c.ldsp s3,16(sp)
68: 6a22 c.ldsp s4,8(sp)
6a: 6145 c.addi16sp sp,48
6c: 853e c.mv a0,a5
6e: 8082 c.jr ra
Signed-off-by: Luke Nelson <luke.r.nels@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: Björn Töpel <bjorn.topel@gmail.com>
Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch adds functions for encoding and emitting compressed riscv
(RVC) instructions to the BPF JIT.
Some regular riscv instructions can be compressed into an RVC instruction
if the instruction fields meet some requirements. For example, "add rd,
rs1, rs2" can be compressed into "c.add rd, rs2" when rd == rs1.
To make using RVC encodings simpler, this patch also adds helper
functions that selectively emit either a regular instruction or a
compressed instruction if possible.
For example, emit_add will produce a "c.add" if possible and regular
"add" otherwise.
Signed-off-by: Luke Nelson <luke.r.nels@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200721025241.8077-3-luke.r.nels@gmail.com
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch makes the necessary changes to struct rv_jit_context and to
bpf_int_jit_compile to support compressed riscv (RVC) instructions in
the BPF JIT.
It changes the JIT image to be u16 instead of u32, since RVC instructions
are 2 bytes as opposed to 4.
It also changes ctx->offset and ctx->ninsns to refer to 2-byte
instructions rather than 4-byte ones. The riscv PC is required to be
16-bit aligned with or without RVC, so this is sufficient to refer to
any valid riscv offset.
The code for computing jump offsets in bytes is updated accordingly,
and factored into a new "ninsns_rvoff" function to simplify the code.
Signed-off-by: Luke Nelson <luke.r.nels@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200721025241.8077-2-luke.r.nels@gmail.com
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Fix pass 0 to PTR_ERR, also dump more err info using
libbpf_strerror.
Fixes: 5dc7a8b21144 ("bpftool, selftests/bpf: Embed object file inside skeleton")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200717123059.29624-1-yuehaibing@huawei.com
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The non-builtin route for offsetof has a dependency on size_t from
stdlib.h/stdint.h that is undeclared and may break targets.
The offsetof macro in bpf_helpers may disable the same macro in other
headers that have a #ifdef offsetof guard. Rather than add additional
dependencies improve the offsetof macro declared here to use the
builtin that is available since llvm 3.7 (the first with a BPF backend).
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200720061741.1514673-1-irogers@google.com
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Now that we have bpf_skip() for emitting nops, use it in
bpf_jit_prologue() in order to reduce code duplication.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200717165326.6786-6-iii@linux.ibm.com
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
"BPF_MAXINSNS: Maximum possible literals" unnecessarily falls back to
the interpreter because of failing sanity check in bpf_set_addr. The
problem is that there are a lot of branches that can be shrunk, and
doing so opens up the possibility to shrink even more. This process
does not converge after 3 passes, causing code offsets to change during
the codegen pass, which must never happen.
Fix by inserting nops during codegen pass in order to preserve code
offets.
Fixes: 4e9b4a6883dd ("s390/bpf: Use relative long branches")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200717165326.6786-5-iii@linux.ibm.com
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
"BPF_MAXINSNS: Maximum possible literals" test causes panic with
bpf_jit_harden = 2. The reason is that BPF_JMP | BPF_EXIT is always
emitted as brc, however, after removal of JITed image size
limitations, brcl might be required.
Fix by using brcl when necessary.
Fixes: 4e9b4a6883dd ("s390/bpf: Use relative long branches")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200717165326.6786-4-iii@linux.ibm.com
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Both signed and unsigned variants of BPF_JMP | BPF_K require
sign-extending the immediate. JIT emits cgfi for the signed case,
which is correct, and clgfi for the unsigned case, which is not
correct: clgfi zero-extends the immediate.
s390 does not provide an instruction that does sign-extension and
unsigned comparison at the same time. Therefore, fix by first loading
the sign-extended immediate into work register REG_1 and proceeding
as if it's BPF_X.
Fixes: 4e9b4a6883dd ("s390/bpf: Use relative long branches")
Reported-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Seth Forshee <seth.forshee@canonical.com>
Link: https://lore.kernel.org/bpf/20200717165326.6786-3-iii@linux.ibm.com
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When running out of srctree, relative path to lib/test_bpf.ko is
different than when running in srctree. Check $building_out_of_srctree
environment variable and use a different relative path if needed.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200717165326.6786-2-iii@linux.ibm.com
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Fix the following cpumap kthread hung. The issue is currently occurring
when __cpu_map_load_bpf_program fails (e.g if the bpf prog has not
BPF_XDP_CPUMAP as expected_attach_type)
$./test_progs -n 101
101/1 cpumap_with_progs:OK
101 xdp_cpumap_attach:OK
Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED
[ 369.996478] INFO: task cpumap/0/map:7:205 blocked for more than 122 seconds.
[ 369.998463] Not tainted 5.8.0-rc4-01472-ge57892f50a07 #212
[ 370.000102] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 370.001918] cpumap/0/map:7 D 0 205 2 0x00004000
[ 370.003228] Call Trace:
[ 370.003930] __schedule+0x5c7/0xf50
[ 370.004901] ? io_schedule_timeout+0xb0/0xb0
[ 370.005934] ? static_obj+0x31/0x80
[ 370.006788] ? mark_held_locks+0x24/0x90
[ 370.007752] ? cpu_map_bpf_prog_run_xdp+0x6c0/0x6c0
[ 370.008930] schedule+0x6f/0x160
[ 370.009728] schedule_preempt_disabled+0x14/0x20
[ 370.010829] kthread+0x17b/0x240
[ 370.011433] ? kthread_create_worker_on_cpu+0xd0/0xd0
[ 370.011944] ret_from_fork+0x1f/0x30
[ 370.012348]
Showing all locks held in the system:
[ 370.013025] 1 lock held by khungtaskd/33:
[ 370.013432] #0: ffffffff82b24720 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x28/0x1c3
[ 370.014461] =============================================
Fixes: 9216477449f3 ("bpf: cpumap: Add the possibility to attach an eBPF program to cpumap")
Reported-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/e54f2aabf959f298939e5507b09c48f8c2e380be.1595170625.git.lorenzo@kernel.org
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When CONFIG_NET is set but CONFIG_INET isn't, build fails with:
ld: kernel/bpf/net_namespace.o: in function `netns_bpf_attach_type_unneed':
kernel/bpf/net_namespace.c:32: undefined reference to `bpf_sk_lookup_enabled'
ld: kernel/bpf/net_namespace.o: in function `netns_bpf_attach_type_need':
kernel/bpf/net_namespace.c:43: undefined reference to `bpf_sk_lookup_enabled'
This is because without CONFIG_INET bpf_sk_lookup_enabled symbol is not
available. Wrap references to bpf_sk_lookup_enabled with preprocessor
conditionals.
Fixes: 1559b4aa1db4 ("inet: Run SK_LOOKUP BPF program on socket lookup")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Link: https://lore.kernel.org/bpf/20200721100716.720477-1-jakub@cloudflare.com
|
| |\
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Jakub Sitnicki says:
====================
Changelog
=========
v4 -> v5:
- Enforce BPF prog return value to be SK_DROP or SK_PASS. (Andrii)
- Simplify prog runners now that only SK_DROP/PASS can be returned.
- Enable bpf_perf_event_output from the start. (Andrii)
- Drop patch
"selftests/bpf: Rename test_sk_lookup_kern.c to test_ref_track_kern.c"
- Remove tests for narrow loads from context at an offset wider in size
than target field, while we are discussing how to fix it:
https://lore.kernel.org/bpf/20200710173123.427983-1-jakub@cloudflare.com/
- Rebase onto recent bpf-next (bfdfa51702de)
- Other minor changes called out in per-patch changelogs,
see patches: 2, 4, 6, 13-15
- Carried over Andrii's Acks where nothing changed.
v3 -> v4:
- Reduce BPF prog return codes to SK_DROP/SK_PASS (Lorenz)
- Default to drop on illegal return value from BPF prog (Lorenz)
- Extend bpf_sk_assign to accept NULL socket pointer.
- Switch to saner return values and add docs for new prog_array API (Andrii)
- Add support for narrow loads from BPF context fields (Yonghong)
- Fix broken build when IPv6 is compiled as a module (kernel test robot)
- Fix null/wild-ptr-deref on BPF context access
- Rebase to recent bpf-next (eef8a42d6ce0)
- Other minor changes called out in per-patch changelogs,
see patches 1-2, 4, 6, 8, 10-12, 14, 16
v2 -> v3:
- Switch to link-based program attachment
- Support for multi-prog attachment
- Ability to skip reuseport socket selection
- Code on RX path is guarded by a static key
- struct in6_addr's are no longer copied into BPF prog context
- BPF prog context is initialized as late as possible
- Changes called out in patches 1-2, 4, 6, 8, 10-14, 16
- Patches dropped:
01/17 flow_dissector: Extract attach/detach/query helpers
03/17 inet: Store layer 4 protocol in inet_hashinfo
08/17 udp: Store layer 4 protocol in udp_table
v1 -> v2:
- Changes called out in patches 2, 13-15, 17
- Rebase to recent bpf-next (b4563facdcae)
RFCv2 -> v1:
- Switch to fetching a socket from a map and selecting a socket with
bpf_sk_assign, instead of having a dedicated helper that does both.
- Run reuseport logic on sockets selected by BPF sk_lookup.
- Allow BPF sk_lookup to fail the lookup with no match.
- Go back to having just 2 hash table lookups in UDP.
RFCv1 -> RFCv2:
- Make socket lookup redirection map-based. BPF program now uses a
dedicated helper and a SOCKARRAY map to select the socket to redirect to.
A consequence of this change is that bpf_inet_lookup context is now
read-only.
- Look for connected UDP sockets before allowing redirection from BPF.
This makes connected UDP socket work as expected in the presence of
inet_lookup prog.
- Share the code for BPF_PROG_{ATTACH,DETACH,QUERY} with flow_dissector,
the only other per-netns BPF prog type.
Overview
========
This series proposes a new BPF program type named BPF_PROG_TYPE_SK_LOOKUP,
or BPF sk_lookup for short.
BPF sk_lookup program runs when transport layer is looking up a listening
socket for a new connection request (TCP), or when looking up an
unconnected socket for a packet (UDP).
This serves as a mechanism to overcome the limits of what bind() API allows
to express. Two use-cases driving this work are:
(1) steer packets destined to an IP range, fixed port to a single socket
192.0.2.0/24, port 80 -> NGINX socket
(2) steer packets destined to an IP address, any port to a single socket
198.51.100.1, any port -> L7 proxy socket
In its context, program receives information about the packet that
triggered the socket lookup. Namely IP version, L4 protocol identifier, and
address 4-tuple.
To select a socket BPF program fetches it from a map holding socket
references, like SOCKMAP or SOCKHASH, calls bpf_sk_assign(ctx, sk, ...)
helper to record the selection, and returns SK_PASS code. Transport layer
then uses the selected socket as a result of socket lookup.
Alternatively, program can also fail the lookup (SK_DROP), or let the
lookup continue as usual (SK_PASS without selecting a socket).
This lets the user match packets with listening (TCP) or receiving (UDP)
sockets freely at the last possible point on the receive path, where we
know that packets are destined for local delivery after undergoing
policing, filtering, and routing.
Program is attached to a network namespace, similar to BPF flow_dissector.
We add a new attach type, BPF_SK_LOOKUP, for this. Multiple programs can be
attached at the same time, in which case their return values are aggregated
according the rules outlined in patch #4 description.
Series structure
================
Patches are organized as so:
1: enables multiple link-based prog attachments for bpf-netns
2: introduces sk_lookup program type
3-4: hook up the program to run on ipv4/tcp socket lookup
5-6: hook up the program to run on ipv6/tcp socket lookup
7-8: hook up the program to run on ipv4/udp socket lookup
9-10: hook up the program to run on ipv6/udp socket lookup
11-13: libbpf & bpftool support for sk_lookup
14-15: verifier and selftests for sk_lookup
Patches are also available on GH:
https://github.com/jsitnicki/linux/commits/bpf-inet-lookup-v5
Follow-up work
==============
I'll follow up with below items, which IMHO don't block the review:
- benchmark results for udp6 small packet flood scenario,
- user docs for new BPF prog type, Documentation/bpf/prog_sk_lookup.rst,
- timeout for accept() in tests after extending network_helper.[ch].
Thanks to the reviewers for their feedback to this patch series:
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andriin@fb.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Cc: Marek Majkowski <marek@cloudflare.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Yonghong Song <yhs@fb.com>
-jkbs
[RFCv1] https://lore.kernel.org/bpf/20190618130050.8344-1-jakub@cloudflare.com/
[RFCv2] https://lore.kernel.org/bpf/20190828072250.29828-1-jakub@cloudflare.com/
[v1] https://lore.kernel.org/bpf/20200511185218.1422406-18-jakub@cloudflare.com/
[v2] https://lore.kernel.org/bpf/20200506125514.1020829-1-jakub@cloudflare.com/
[v3] https://lore.kernel.org/bpf/20200702092416.11961-1-jakub@cloudflare.com/
[v4] https://lore.kernel.org/bpf/20200713174654.642628-1-jakub@cloudflare.com/
====================
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add tests to test_progs that exercise:
- attaching/detaching/querying programs to BPF_SK_LOOKUP hook,
- redirecting socket lookup to a socket selected by BPF program,
- failing a socket lookup on BPF program's request,
- error scenarios for selecting a socket from BPF program,
- accessing BPF program context,
- attaching and running multiple BPF programs.
Run log:
bash-5.0# ./test_progs -n 70
#70/1 query lookup prog:OK
#70/2 TCP IPv4 redir port:OK
#70/3 TCP IPv4 redir addr:OK
#70/4 TCP IPv4 redir with reuseport:OK
#70/5 TCP IPv4 redir skip reuseport:OK
#70/6 TCP IPv6 redir port:OK
#70/7 TCP IPv6 redir addr:OK
#70/8 TCP IPv4->IPv6 redir port:OK
#70/9 TCP IPv6 redir with reuseport:OK
#70/10 TCP IPv6 redir skip reuseport:OK
#70/11 UDP IPv4 redir port:OK
#70/12 UDP IPv4 redir addr:OK
#70/13 UDP IPv4 redir with reuseport:OK
#70/14 UDP IPv4 redir skip reuseport:OK
#70/15 UDP IPv6 redir port:OK
#70/16 UDP IPv6 redir addr:OK
#70/17 UDP IPv4->IPv6 redir port:OK
#70/18 UDP IPv6 redir and reuseport:OK
#70/19 UDP IPv6 redir skip reuseport:OK
#70/20 TCP IPv4 drop on lookup:OK
#70/21 TCP IPv6 drop on lookup:OK
#70/22 UDP IPv4 drop on lookup:OK
#70/23 UDP IPv6 drop on lookup:OK
#70/24 TCP IPv4 drop on reuseport:OK
#70/25 TCP IPv6 drop on reuseport:OK
#70/26 UDP IPv4 drop on reuseport:OK
#70/27 TCP IPv6 drop on reuseport:OK
#70/28 sk_assign returns EEXIST:OK
#70/29 sk_assign honors F_REPLACE:OK
#70/30 sk_assign accepts NULL socket:OK
#70/31 access ctx->sk:OK
#70/32 narrow access to ctx v4:OK
#70/33 narrow access to ctx v6:OK
#70/34 sk_assign rejects TCP established:OK
#70/35 sk_assign rejects UDP connected:OK
#70/36 multi prog - pass, pass:OK
#70/37 multi prog - drop, drop:OK
#70/38 multi prog - pass, drop:OK
#70/39 multi prog - drop, pass:OK
#70/40 multi prog - pass, redir:OK
#70/41 multi prog - redir, pass:OK
#70/42 multi prog - drop, redir:OK
#70/43 multi prog - redir, drop:OK
#70/44 multi prog - redir, redir:OK
#70 sk_lookup:OK
Summary: 1/44 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200717103536.397595-16-jakub@cloudflare.com
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Exercise verifier access checks for bpf_sk_lookup context fields.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200717103536.397595-15-jakub@cloudflare.com
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Make bpftool show human-friendly identifiers for newly introduced program
and attach type, BPF_PROG_TYPE_SK_LOOKUP and BPF_SK_LOOKUP, respectively.
Also, add the new prog type bash-completion, man page and help message.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200717103536.397595-14-jakub@cloudflare.com
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Make libbpf aware of the newly added program type, and assign it a
section name.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200717103536.397595-13-jakub@cloudflare.com
|