summaryrefslogtreecommitdiffstats
path: root/kernel (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'net-next-5.18' of ↵Linus Torvalds2022-03-2440-1191/+3148
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "The sprinkling of SPI drivers is because we added a new one and Mark sent us a SPI driver interface conversion pull request. Core ---- - Introduce XDP multi-buffer support, allowing the use of XDP with jumbo frame MTUs and combination with Rx coalescing offloads (LRO). - Speed up netns dismantling (5x) and lower the memory cost a little. Remove unnecessary per-netns sockets. Scope some lists to a netns. Cut down RCU syncing. Use batch methods. Allow netdev registration to complete out of order. - Support distinguishing timestamp types (ingress vs egress) and maintaining them across packet scrubbing points (e.g. redirect). - Continue the work of annotating packet drop reasons throughout the stack. - Switch netdev error counters from an atomic to dynamically allocated per-CPU counters. - Rework a few preempt_disable(), local_irq_save() and busy waiting sections problematic on PREEMPT_RT. - Extend the ref_tracker to allow catching use-after-free bugs. BPF --- - Introduce "packing allocator" for BPF JIT images. JITed code is marked read only, and used to be allocated at page granularity. Custom allocator allows for more efficient memory use, lower iTLB pressure and prevents identity mapping huge pages from getting split. - Make use of BTF type annotations (e.g. __user, __percpu) to enforce the correct probe read access method, add appropriate helpers. - Convert the BPF preload to use light skeleton and drop the user-mode-driver dependency. - Allow XDP BPF_PROG_RUN test infra to send real packets, enabling its use as a packet generator. - Allow local storage memory to be allocated with GFP_KERNEL if called from a hook allowed to sleep. - Introduce fprobe (multi kprobe) to speed up mass attachment (arch bits to come later). - Add unstable conntrack lookup helpers for BPF by using the BPF kfunc infra. - Allow cgroup BPF progs to return custom errors to user space. - Add support for AF_UNIX iterator batching. - Allow iterator programs to use sleepable helpers. - Support JIT of add, and, or, xor and xchg atomic ops on arm64. - Add BTFGen support to bpftool which allows to use CO-RE in kernels without BTF info. - Large number of libbpf API improvements, cleanups and deprecations. Protocols --------- - Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev. - Adjust TSO packet sizes based on min_rtt, allowing very low latency links (data centers) to always send full-sized TSO super-frames. - Make IPv6 flow label changes (AKA hash rethink) more configurable, via sysctl and setsockopt. Distinguish between server and client behavior. - VxLAN support to "collect metadata" devices to terminate only configured VNIs. This is similar to VLAN filtering in the bridge. - Support inserting IPv6 IOAM information to a fraction of frames. - Add protocol attribute to IP addresses to allow identifying where given address comes from (kernel-generated, DHCP etc.) - Support setting socket and IPv6 options via cmsg on ping6 sockets. - Reject mis-use of ECN bits in IP headers as part of DSCP/TOS. Define dscp_t and stop taking ECN bits into account in fib-rules. - Add support for locked bridge ports (for 802.1X). - tun: support NAPI for packets received from batched XDP buffs, doubling the performance in some scenarios. - IPv6 extension header handling in Open vSwitch. - Support IPv6 control message load balancing in bonding, prevent neighbor solicitation and advertisement from using the wrong port. Support NS/NA monitor selection similar to existing ARP monitor. - SMC - improve performance with TCP_CORK and sendfile() - support auto-corking - support TCP_NODELAY - MCTP (Management Component Transport Protocol) - add user space tag control interface - I2C binding driver (as specified by DMTF DSP0237) - Multi-BSSID beacon handling in AP mode for WiFi. - Bluetooth: - handle MSFT Monitor Device Event - add MGMT Adv Monitor Device Found/Lost events - Multi-Path TCP: - add support for the SO_SNDTIMEO socket option - lots of selftest cleanups and improvements - Increase the max PDU size in CAN ISOTP to 64 kB. Driver API ---------- - Add HW counters for SW netdevs, a mechanism for devices which offload packet forwarding to report packet statistics back to software interfaces such as tunnels. - Select the default NIC queue count as a fraction of number of physical CPU cores, instead of hard-coding to 8. - Expose devlink instance locks to drivers. Allow device layer of drivers to use that lock directly instead of creating their own which always runs into ordering issues in devlink callbacks. - Add header/data split indication to guide user space enabling of TCP zero-copy Rx. - Allow configuring completion queue event size. - Refactor page_pool to enable fragmenting after allocation. - Add allocation and page reuse statistics to page_pool. - Improve Multiple Spanning Trees support in the bridge to allow reuse of topologies across VLANs, saving HW resources in switches. - DSA (Distributed Switch Architecture): - replay and offload of host VLAN entries - offload of static and local FDB entries on LAG interfaces - FDB isolation and unicast filtering New hardware / drivers ---------------------- - Ethernet: - LAN937x T1 PHYs - Davicom DM9051 SPI NIC driver - Realtek RTL8367S, RTL8367RB-VB switch and MDIO - Microchip ksz8563 switches - Netronome NFP3800 SmartNICs - Fungible SmartNICs - MediaTek MT8195 switches - WiFi: - mt76: MediaTek mt7916 - mt76: MediaTek mt7921u USB adapters - brcmfmac: Broadcom BCM43454/6 - Mobile: - iosm: Intel M.2 7360 WWAN card Drivers ------- - Convert many drivers to the new phylink API built for split PCS designs but also simplifying other cases. - Intel Ethernet NICs: - add TTY for GNSS module for E810T device - improve AF_XDP performance - GTP-C and GTP-U filter offload - QinQ VLAN support - Mellanox Ethernet NICs (mlx5): - support xdp->data_meta - multi-buffer XDP - offload tc push_eth and pop_eth actions - Netronome Ethernet NICs (nfp): - flow-independent tc action hardware offload (police / meter) - AF_XDP - Other Ethernet NICs: - at803x: fiber and SFP support - xgmac: mdio: preamble suppression and custom MDC frequencies - r8169: enable ASPM L1.2 if system vendor flags it as safe - macb/gem: ZynqMP SGMII - hns3: add TX push mode - dpaa2-eth: software TSO - lan743x: multi-queue, mdio, SGMII, PTP - axienet: NAPI and GRO support - Mellanox Ethernet switches (mlxsw): - source and dest IP address rewrites - RJ45 ports - Marvell Ethernet switches (prestera): - basic routing offload - multi-chain TC ACL offload - NXP embedded Ethernet switches (ocelot & felix): - PTP over UDP with the ocelot-8021q DSA tagging protocol - basic QoS classification on Felix DSA switch using dcbnl - port mirroring for ocelot switches - Microchip high-speed industrial Ethernet (sparx5): - offloading of bridge port flooding flags - PTP Hardware Clock - Other embedded switches: - lan966x: PTP Hardward Clock - qca8k: mdio read/write operations via crafted Ethernet packets - Qualcomm 802.11ax WiFi (ath11k): - add LDPC FEC type and 802.11ax High Efficiency data in radiotap - enable RX PPDU stats in monitor co-exist mode - Intel WiFi (iwlwifi): - UHB TAS enablement via BIOS - band disablement via BIOS - channel switch offload - 32 Rx AMPDU sessions in newer devices - MediaTek WiFi (mt76): - background radar detection - thermal management improvements on mt7915 - SAR support for more mt76 platforms - MBSSID and 6 GHz band on mt7915 - RealTek WiFi: - rtw89: AP mode - rtw89: 160 MHz channels and 6 GHz band - rtw89: hardware scan - Bluetooth: - mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS) - Microchip CAN (mcp251xfd): - multiple RX-FIFOs and runtime configurable RX/TX rings - internal PLL, runtime PM handling simplification - improve chip detection and error handling after wakeup" * tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2521 commits) llc: fix netdevice reference leaks in llc_ui_bind() drivers: ethernet: cpsw: fix panic when interrupt coaleceing is set via ethtool ice: don't allow to run ice_send_event_to_aux() in atomic ctx ice: fix 'scheduling while atomic' on aux critical err interrupt net/sched: fix incorrect vlan_push_eth dest field net: bridge: mst: Restrict info size queries to bridge ports net: marvell: prestera: add missing destroy_workqueue() in prestera_module_init() drivers: net: xgene: Fix regression in CRC stripping net: geneve: add missing netlink policy and size for IFLA_GENEVE_INNER_PROTO_INHERIT net: dsa: fix missing host-filtered multicast addresses net/mlx5e: Fix build warning, detected write beyond size of field iwlwifi: mvm: Don't fail if PPAG isn't supported selftests/bpf: Fix kprobe_multi test. Revert "rethook: x86: Add rethook x86 implementation" Revert "arm64: rethook: Add arm64 rethook implementation" Revert "powerpc: Add rethook support" Revert "ARM: rethook: Add rethook arm implementation" netdevice: add missing dm_private kdoc net: bridge: mst: prevent NULL deref in br_mst_info_size() selftests: forwarding: Use same VRF for port and VLAN upper ...
| * Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski2022-03-2221-230/+1475
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Alexei Starovoitov says: ==================== pull-request: bpf-next 2022-03-21 v2 We've added 137 non-merge commits during the last 17 day(s) which contain a total of 143 files changed, 7123 insertions(+), 1092 deletions(-). The main changes are: 1) Custom SEC() handling in libbpf, from Andrii. 2) subskeleton support, from Delyan. 3) Use btf_tag to recognize __percpu pointers in the verifier, from Hao. 4) Fix net.core.bpf_jit_harden race, from Hou. 5) Fix bpf_sk_lookup remote_port on big-endian, from Jakub. 6) Introduce fprobe (multi kprobe) _without_ arch bits, from Masami. The arch specific bits will come later. 7) Introduce multi_kprobe bpf programs on top of fprobe, from Jiri. 8) Enable non-atomic allocations in local storage, from Joanne. 9) Various var_off ptr_to_btf_id fixed, from Kumar. 10) bpf_ima_file_hash helper, from Roberto. 11) Add "live packet" mode for XDP in BPF_PROG_RUN, from Toke. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (137 commits) selftests/bpf: Fix kprobe_multi test. Revert "rethook: x86: Add rethook x86 implementation" Revert "arm64: rethook: Add arm64 rethook implementation" Revert "powerpc: Add rethook support" Revert "ARM: rethook: Add rethook arm implementation" bpftool: Fix a bug in subskeleton code generation bpf: Fix bpf_prog_pack when PMU_SIZE is not defined bpf: Fix bpf_prog_pack for multi-node setup bpf: Fix warning for cast from restricted gfp_t in verifier bpf, arm: Fix various typos in comments libbpf: Close fd in bpf_object__reuse_map bpftool: Fix print error when show bpf map bpf: Fix kprobe_multi return probe backtrace Revert "bpf: Add support to inline bpf_get_func_ip helper on x86" bpf: Simplify check in btf_parse_hdr() selftests/bpf/test_lirc_mode2.sh: Exit with proper code bpf: Check for NULL return from bpf_get_btf_vmlinux selftests/bpf: Test skipping stacktrace bpf: Adjust BPF stack helper functions to accommodate skip > 0 bpf: Select proper size for bpf_prog_pack ... ==================== Link: https://lore.kernel.org/r/20220322050159.5507-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * bpf: Fix bpf_prog_pack when PMU_SIZE is not definedSong Liu2022-03-211-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PMD_SIZE is not available in some special config, e.g. ARCH=arm with CONFIG_MMU=n. Use bpf_prog_pack of PAGE_SIZE in these cases. Fixes: ef078600eec2 ("bpf: Select proper size for bpf_prog_pack") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220321180009.1944482-3-song@kernel.org
| | * bpf: Fix bpf_prog_pack for multi-node setupSong Liu2022-03-211-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | module_alloc requires num_online_nodes * PMD_SIZE to allocate huge pages. bpf_prog_pack uses pack of size num_online_nodes * PMD_SIZE. OTOH, module_alloc returns addresses that are PMD_SIZE aligned (instead of num_online_nodes * PMD_SIZE aligned). Therefore, PMD_MASK should be used to calculate pack_ptr in bpf_prog_pack_free(). Fixes: ef078600eec2 ("bpf: Select proper size for bpf_prog_pack") Reported-by: syzbot+c946805b5ce6ab87df0b@syzkaller.appspotmail.com Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220321180009.1944482-2-song@kernel.org
| | * bpf: Fix warning for cast from restricted gfp_t in verifierJoanne Koong2022-03-211-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes the sparse warning reported by the kernel test robot: kernel/bpf/verifier.c:13499:47: sparse: warning: cast from restricted gfp_t kernel/bpf/verifier.c:13501:47: sparse: warning: cast from restricted gfp_t This fix can be verified locally by running: 1) wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O make.cross 2) chmod +x ~/bin/make.cross 3) COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 ./make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' Fixes: b00fa38a9c1c ("bpf: Enable non-atomic allocations in local storage") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220321185802.824223-1-joannekoong@fb.com
| | * bpf: Fix kprobe_multi return probe backtraceJiri Olsa2022-03-211-30/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Andrii reported that backtraces from kprobe_multi program attached as return probes are not complete and showing just initial entry [1]. It's caused by changing registers to have original function ip address as instruction pointer even for return probe, which will screw backtrace from return probe. This change keeps registers intact and store original entry ip and link address on the stack in bpf_kprobe_multi_run_ctx struct, where bpf_get_func_ip and bpf_get_attach_cookie helpers for kprobe_multi programs can find it. [1] https://lore.kernel.org/bpf/CAEf4BzZDDqK24rSKwXNp7XL3ErGD4bZa1M6c_c4EvDSt3jrZcg@mail.gmail.com/T/#m8d1301c0ea0892ddf9dc6fba57a57b8cf11b8c51 Fixes: ca74823c6e16 ("bpf: Add cookie support to programs attached with kprobe multi link") Reported-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220321070113.1449167-3-jolsa@kernel.org
| | * Revert "bpf: Add support to inline bpf_get_func_ip helper on x86"Jiri Olsa2022-03-212-21/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 97ee4d20ee67eb462581a7af01442de6586e390b. Following change is adding more complexity to bpf_get_func_ip helper for kprobe_multi programs, which can't be inlined easily. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220321070113.1449167-2-jolsa@kernel.org
| | * bpf: Simplify check in btf_parse_hdr()Yuntao Wang2022-03-211-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Replace offsetof(hdr_len) + sizeof(hdr_len) with offsetofend(hdr_len) to simplify the check for correctness of btf_data_size in btf_parse_hdr() Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220320075240.1001728-1-ytcoode@gmail.com
| | * bpf: Check for NULL return from bpf_get_btf_vmlinuxKumar Kartikeya Dwivedi2022-03-211-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When CONFIG_DEBUG_INFO_BTF is disabled, bpf_get_btf_vmlinux can return a NULL pointer. Check for it in btf_get_module_btf to prevent a NULL pointer dereference. While kernel test robot only complained about this specific case, let's also check for NULL in other call sites of bpf_get_btf_vmlinux. Fixes: 9492450fd287 ("bpf: Always raise reference in btf_get_module_btf") Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220320143003.589540-1-memxor@gmail.com
| | * bpf: Adjust BPF stack helper functions to accommodate skip > 0Namhyung Kim2022-03-211-32/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Let's say that the caller has storage for num_elem stack frames. Then, the BPF stack helper functions walk the stack for only num_elem frames. This means that if skip > 0, one keeps only 'num_elem - skip' frames. This is because it sets init_nr in the perf_callchain_entry to the end of the buffer to save num_elem entries only. I believe it was because the perf callchain code unwound the stack frames until it reached the global max size (sysctl_perf_event_max_stack). However it now has perf_callchain_entry_ctx.max_stack to limit the iteration locally. This simplifies the code to handle init_nr in the BPF callstack entries and removes the confusion with the perf_event's __PERF_SAMPLE_CALLCHAIN_EARLY which sets init_nr to 0. Also change the comment on bpf_get_stack() in the header file to be more explicit what the return value means. Fixes: c195651e565a ("bpf: add bpf_get_stack helper") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/30a7b5d5-6726-1cc2-eaee-8da2828a9a9c@oracle.com Link: https://lore.kernel.org/bpf/20220314182042.71025-1-namhyung@kernel.org Based-on-patch-by: Eugene Loh <eugene.loh@oracle.com>
| | * bpf: Select proper size for bpf_prog_packSong Liu2022-03-211-23/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using HPAGE_PMD_SIZE as the size for bpf_prog_pack is not ideal in some cases. Specifically, for NUMA systems, __vmalloc_node_range requires PMD_SIZE * num_online_nodes() to allocate huge pages. Also, if the system does not support huge pages (i.e., with cmdline option nohugevmalloc), it is better to use PAGE_SIZE packs. Add logic to select proper size for bpf_prog_pack. This solution is not ideal, as it makes assumption about the behavior of module_alloc and __vmalloc_node_range. However, it appears to be the easiest solution as it doesn't require changes in module_alloc and vmalloc code. Fixes: 57631054fae6 ("bpf: Introduce bpf_prog_pack allocator") Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220311201135.3573610-1-song@kernel.org
| | * bpf: Enable non-atomic allocations in local storageJoanne Koong2022-03-214-29/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, local storage memory can only be allocated atomically (GFP_ATOMIC). This restriction is too strict for sleepable bpf programs. In this patch, the verifier detects whether the program is sleepable, and passes the corresponding GFP_KERNEL or GFP_ATOMIC flag as a 5th argument to bpf_task/sk/inode_storage_get. This flag will propagate down to the local storage functions that allocate memory. Please note that bpf_task/sk/inode_storage_update_elem functions are invoked by userspace applications through syscalls. Preemption is disabled before bpf_task/sk/inode_storage_update_elem is called, which means they will always have to allocate memory atomically. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: KP Singh <kpsingh@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20220318045553.3091807-2-joannekoong@fb.com
| | * bpf: Always raise reference in btf_get_module_btfKumar Kartikeya Dwivedi2022-03-191-10/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Align it with helpers like bpf_find_btf_id, so all functions returning BTF in out parameter follow the same rule of raising reference consistently, regardless of module or vmlinux BTF. Adjust existing callers to handle the change accordinly. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220317115957.3193097-10-memxor@gmail.com
| | * bpf: Factor out fd returning from bpf_btf_find_by_name_kindKumar Kartikeya Dwivedi2022-03-191-37/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In next few patches, we need a helper that searches all kernel BTFs (vmlinux and module BTFs), and finds the type denoted by 'name' and 'kind'. Turns out bpf_btf_find_by_name_kind already does the same thing, but it instead returns a BTF ID and optionally fd (if module BTF). This is used for relocating ksyms in BPF loader code (bpftool gen skel -L). We extract the core code out into a new helper bpf_find_btf_id, which returns the BTF ID in the return value, and BTF pointer in an out parameter. The reference for the returned BTF pointer is always raised, hence user must either transfer it (e.g. to a fd), or release it after use. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220317115957.3193097-2-memxor@gmail.com
| | * bpf: Add cookie support to programs attached with kprobe multi linkJiri Olsa2022-03-182-2/+114
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adding support to call bpf_get_attach_cookie helper from kprobe programs attached with kprobe multi link. The cookie is provided by array of u64 values, where each value is paired with provided function address or symbol with the same array index. When cookie array is provided it's sorted together with addresses (check bpf_kprobe_multi_cookie_swap). This way we can find cookie based on the address in bpf_get_attach_cookie helper. Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220316122419.933957-7-jolsa@kernel.org
| | * bpf: Add support to inline bpf_get_func_ip helper on x86Jiri Olsa2022-03-182-1/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | Adding support to inline it on x86, because it's single load instruction. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220316122419.933957-6-jolsa@kernel.org
| | * bpf: Add bpf_get_func_ip kprobe helper for multi kprobe linkJiri Olsa2022-03-181-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adding support to call bpf_get_func_ip helper from kprobe programs attached by multi kprobe link. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220316122419.933957-5-jolsa@kernel.org
| | * bpf: Add multi kprobe linkJiri Olsa2022-03-182-5/+232
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adding new link type BPF_LINK_TYPE_KPROBE_MULTI that attaches kprobe program through fprobe API. The fprobe API allows to attach probe on multiple functions at once very fast, because it works on top of ftrace. On the other hand this limits the probe point to the function entry or return. The kprobe program gets the same pt_regs input ctx as when it's attached through the perf API. Adding new attach type BPF_TRACE_KPROBE_MULTI that allows attachment kprobe to multiple function with new link. User provides array of addresses or symbols with count to attach the kprobe program to. The new link_create uapi interface looks like: struct { __u32 flags; __u32 cnt; __aligned_u64 syms; __aligned_u64 addrs; } kprobe_multi; The flags field allows single BPF_TRACE_KPROBE_MULTI bit to create return multi kprobe. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220316122419.933957-4-jolsa@kernel.org
| | * kallsyms: Skip the name search for empty stringJiri Olsa2022-03-181-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When kallsyms_lookup_name is called with empty string, it will do futile search for it through all the symbols. Skipping the search for empty string. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220316122419.933957-3-jolsa@kernel.org
| | * fprobe: Introduce FPROBE_FL_KPROBE_SHARED flag for fprobeMasami Hiramatsu2022-03-181-1/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce FPROBE_FL_KPROBE_SHARED flag for sharing fprobe callback with kprobes safely from the viewpoint of recursion. Since the recursion safety of the fprobe (and ftrace) is a bit different from the kprobes, this may cause an issue if user wants to run the same code from the fprobe and the kprobes. The kprobes has per-cpu 'current_kprobe' variable which protects the kprobe handler from recursion in any case. On the other hand, the fprobe uses only ftrace_test_recursion_trylock(), which will allow interrupt context calls another (or same) fprobe during the fprobe user handler is running. This is not a matter in cases if the common callback shared among the kprobes and the fprobe has its own recursion detection, or it can handle the recursion in the different contexts (normal/interrupt/NMI.) But if it relies on the 'current_kprobe' recursion lock, it has to check kprobe_running() and use kprobe_busy_*() APIs. Fprobe has FPROBE_FL_KPROBE_SHARED flag to do this. If your common callback code will be shared with kprobes, please set FPROBE_FL_KPROBE_SHARED *before* registering the fprobe, like; fprobe.flags = FPROBE_FL_KPROBE_SHARED; register_fprobe(&fprobe, "func*", NULL); This will protect your common callback from the nested call. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/164735293127.1084943.15687374237275817599.stgit@devnote2
| | * fprobe: Add exit_handler supportMasami Hiramatsu2022-03-182-9/+116
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add exit_handler to fprobe. fprobe + rethook allows us to hook the kernel function return. The rethook will be enabled only if the fprobe::exit_handler is set. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/164735290790.1084943.10601965782208052202.stgit@devnote2
| | * rethook: Add a generic return hookMasami Hiramatsu2022-03-185-0/+334
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a return hook framework which hooks the function return. Most of the logic came from the kretprobe, but this is independent from kretprobe. Note that this is expected to be used with other function entry hooking feature, like ftrace, fprobe, adn kprobes. Eventually this will replace the kretprobe (e.g. kprobe + rethook = kretprobe), but at this moment, this is just an additional hook. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/164735285066.1084943.9259661137330166643.stgit@devnote2
| | * fprobe: Add ftrace based probe APIsMasami Hiramatsu2022-03-183-0/+224
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The fprobe is a wrapper API for ftrace function tracer. Unlike kprobes, this probes only supports the function entry, but this can probe multiple functions by one fprobe. The usage is similar, user will set their callback to fprobe::entry_handler and call register_fprobe*() with probed functions. There are 3 registration interfaces, - register_fprobe() takes filtering patterns of the functin names. - register_fprobe_ips() takes an array of ftrace-location addresses. - register_fprobe_syms() takes an array of function names. The registered fprobes can be unregistered with unregister_fprobe(). e.g. struct fprobe fp = { .entry_handler = user_handler }; const char *targets[] = { "func1", "func2", "func3"}; ... ret = register_fprobe_syms(&fp, targets, ARRAY_SIZE(targets)); ... unregister_fprobe(&fp); Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/164735283857.1084943.1154436951479395551.stgit@devnote2
| | * ftrace: Add ftrace_set_filter_ips functionJiri Olsa2022-03-181-9/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adding ftrace_set_filter_ips function to be able to set filter on multiple ip addresses at once. With the kprobe multi attach interface we have cases where we need to initialize ftrace_ops object with thousands of functions, so having single function diving into ftrace_hash_move_and_update_ops with ftrace_lock is faster. The functions ips are passed as unsigned long array with count. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/164735282673.1084943.18310504594134769804.stgit@devnote2
| | * bpf: Fix net.core.bpf_jit_harden raceHou Tao2022-03-162-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is the bpf_jit_harden counterpart to commit 60b58afc96c9 ("bpf: fix net.core.bpf_jit_enable race"). bpf_jit_harden will be tested twice for each subprog if there are subprogs in bpf program and constant blinding may increase the length of program, so when running "./test_progs -t subprogs" and toggling bpf_jit_harden between 0 and 2, jit_subprogs may fail because constant blinding increases the length of subprog instructions during extra passs. So cache the value of bpf_jit_blinding_enabled() during program allocation, and use the cached value during constant blinding, subprog JITing and args tracking of tail call. Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220309123321.2400262-4-houtao1@huawei.com
| | * bpf-lsm: Make bpf_lsm_kernel_read_file() as sleepableRoberto Sassu2022-03-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make bpf_lsm_kernel_read_file() as sleepable, so that bpf_ima_inode_hash() or bpf_ima_file_hash() can be called inside the implementation of this hook. Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220302111404.193900-8-roberto.sassu@huawei.com
| | * bpf-lsm: Introduce new helper bpf_ima_file_hash()Roberto Sassu2022-03-111-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ima_file_hash() has been modified to calculate the measurement of a file on demand, if it has not been already performed by IMA or the measurement is not fresh. For compatibility reasons, ima_inode_hash() remains unchanged. Keep the same approach in eBPF and introduce the new helper bpf_ima_file_hash() to take advantage of the modified behavior of ima_file_hash(). Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220302111404.193900-4-roberto.sassu@huawei.com
| | * bpf: Use offsetofend() to simplify macro definitionYuntao Wang2022-03-101-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use offsetofend() instead of offsetof() + sizeof() to simplify MIN_BPF_LINEINFO_SIZE macro definition. Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Joanne Koong <joannelkoong@gmail.com> Link: https://lore.kernel.org/bpf/20220310161518.534544-1-ytcoode@gmail.com
| | * bpf: Add "live packet" mode for XDP in BPF_PROG_RUNToke Høiland-Jørgensen2022-03-092-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds support for running XDP programs through BPF_PROG_RUN in a mode that enables live packet processing of the resulting frames. Previous uses of BPF_PROG_RUN for XDP returned the XDP program return code and the modified packet data to userspace, which is useful for unit testing of XDP programs. The existing BPF_PROG_RUN for XDP allows userspace to set the ingress ifindex and RXQ number as part of the context object being passed to the kernel. This patch reuses that code, but adds a new mode with different semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES flag. When running BPF_PROG_RUN in this mode, the XDP program return codes will be honoured: returning XDP_PASS will result in the frame being injected into the networking stack as if it came from the selected networking interface, while returning XDP_TX and XDP_REDIRECT will result in the frame being transmitted out that interface. XDP_TX is translated into an XDP_REDIRECT operation to the same interface, since the real XDP_TX action is only possible from within the network drivers themselves, not from the process context where BPF_PROG_RUN is executed. Internally, this new mode of operation creates a page pool instance while setting up the test run, and feeds pages from that into the XDP program. The setup cost of this is amortised over the number of repetitions specified by userspace. To support the performance testing use case, we further optimise the setup step so that all pages in the pool are pre-initialised with the packet data, and pre-computed context and xdp_frame objects stored at the start of each page. This makes it possible to entirely avoid touching the page content on each XDP program invocation, and enables sending up to 9 Mpps/core on my test box. Because the data pages are recycled by the page pool, and the test runner doesn't re-initialise them for each run, subsequent invocations of the XDP program will see the packet data in the state it was after the last time it ran on that particular page. This means that an XDP program that modifies the packet before redirecting it has to be careful about which assumptions it makes about the packet content, but that is only an issue for the most naively written programs. Enabling the new flag is only allowed when not setting ctx_out and data_out in the test specification, since using it means frames will be redirected somewhere else, so they can't be returned. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
| | * bpf: Determine buf_info inside check_buffer_access()Shung-Hsi Yu2022-03-081-9/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of determining buf_info string in the caller of check_buffer_access(), we can determine whether the register type is read-only through type_is_rdonly_mem() helper inside check_buffer_access() and construct buf_info, making the code slightly cleaner. Signed-off-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/YiWYLnAkEZXBP/gH@syu-laptop
| | * bpf: Remove redundant slashYuntao Wang2022-03-081-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | The trailing slash of LIBBPF_SRCS is redundant, remove it. Also inline it as its only used in LIBBPF_INCLUDE. Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220305161013.361646-1-ytcoode@gmail.com
| | * bpf: Replace strncpy() with strscpy()Yuntao Wang2022-03-081-7/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using strncpy() on NUL-terminated strings is considered deprecated[1]. Moreover, if the length of 'task->comm' is less than the destination buffer size, strncpy() will NUL-pad the destination buffer, which is a needless performance penalty. Replacing strncpy() with strscpy() fixes all these issues. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220304070408.233658-1-ytcoode@gmail.com
| | * bpf: Reject programs that try to load __percpu memory.Hao Luo2022-03-062-11/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the introduction of the btf_type_tag "percpu", we can add a MEM_PERCPU to identify those pointers that point to percpu memory. The ability of differetiating percpu pointers from regular memory pointers have two benefits: 1. It forbids unexpected use of percpu pointers, such as direct loads. In kernel, there are special functions used for accessing percpu memory. Directly loading percpu memory is meaningless. We already have BPF helpers like bpf_per_cpu_ptr() and bpf_this_cpu_ptr() that wrap the kernel percpu functions. So we can now convert percpu pointers into regular pointers in a safe way. 2. Previously, bpf_per_cpu_ptr() and bpf_this_cpu_ptr() only work on PTR_TO_PERCPU_BTF_ID, a special reg_type which describes static percpu variables in kernel (we rely on pahole to encode them into vmlinux BTF). Now, since we can identify __percpu tagged pointers, we can also identify dynamically allocated percpu memory as well. It means we can use bpf_xxx_cpu_ptr() on dynamic percpu memory. This would be very convenient when accessing fields like "cgroup->rstat_cpu". Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220304191657.981240-4-haoluo@google.com
| | * bpf: Fix checking PTR_TO_BTF_ID in check_mem_accessHao Luo2022-03-061-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the introduction of MEM_USER in commit c6f1bfe89ac9 ("bpf: reject program if a __user tagged memory accessed in kernel way") PTR_TO_BTF_ID can be combined with a MEM_USER tag. Therefore, most likely, when we compare reg_type against PTR_TO_BTF_ID, we want to use the reg's base_type. Previously the check in check_mem_access() wants to say: if the reg is BTF_ID but not NULL, the execution flow falls into the 'then' branch. But now a reg of (BTF_ID | MEM_USER), which should go into the 'then' branch, goes into the 'else'. The end results before and after this patch are the same: regs tagged with MEM_USER get rejected, but not in a way we intended. So fix the condition, the error message now is correct. Before (log from commit 696c39011538): $ ./test_progs -v -n 22/3 ... libbpf: prog 'test_user1': BPF program load failed: Permission denied libbpf: prog 'test_user1': -- BEGIN PROG LOAD LOG -- R1 type=ctx expected=fp 0: R1=ctx(id=0,off=0,imm=0) R10=fp0 ; int BPF_PROG(test_user1, struct bpf_testmod_btf_type_tag_1 *arg) 0: (79) r1 = *(u64 *)(r1 +0) func 'bpf_testmod_test_btf_type_tag_user_1' arg0 has btf_id 136561 type STRUCT 'bpf_testmod_btf_type_tag_1' 1: R1_w=user_ptr_bpf_testmod_btf_type_tag_1(id=0,off=0,imm=0) ; g = arg->a; 1: (61) r1 = *(u32 *)(r1 +0) R1 invalid mem access 'user_ptr_' Now: libbpf: prog 'test_user1': BPF program load failed: Permission denied libbpf: prog 'test_user1': -- BEGIN PROG LOAD LOG -- R1 type=ctx expected=fp 0: R1=ctx(id=0,off=0,imm=0) R10=fp0 ; int BPF_PROG(test_user1, struct bpf_testmod_btf_type_tag_1 *arg) 0: (79) r1 = *(u64 *)(r1 +0) func 'bpf_testmod_test_btf_type_tag_user_1' arg0 has btf_id 104036 type STRUCT 'bpf_testmod_btf_type_tag_1' 1: R1_w=user_ptr_bpf_testmod_btf_type_tag_1(id=0,ref_obj_id=0,off=0,imm=0) ; g = arg->a; 1: (61) r1 = *(u32 *)(r1 +0) R1 is ptr_bpf_testmod_btf_type_tag_1 access user memory: off=0 Note the error message for the reason of rejection. Fixes: c6f1bfe89ac9 ("bpf: reject program if a __user tagged memory accessed in kernel way") Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220304191657.981240-2-haoluo@google.com
| | * bpf: Harden register offset checks for release helpers and kfuncsKumar Kartikeya Dwivedi2022-03-062-17/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Let's ensure that the PTR_TO_BTF_ID reg being passed in to release BPF helpers and kfuncs always has its offset set to 0. While not a real problem now, there's a very real possibility this will become a problem when more and more kfuncs are exposed, and more BPF helpers are added which can release PTR_TO_BTF_ID. Previous commits already protected against non-zero var_off. One of the case we are concerned about now is when we have a type that can be returned by e.g. an acquire kfunc: struct foo { int a; int b; struct bar b; }; ... and struct bar is also a type that can be returned by another acquire kfunc. Then, doing the following sequence: struct foo *f = bpf_get_foo(); // acquire kfunc if (!f) return 0; bpf_put_bar(&f->b); // release kfunc ... would work with the current code, since the btf_struct_ids_match takes reg->off into account for matching pointer type with release kfunc argument type, but would obviously be incorrect, and most likely lead to a kernel crash. A test has been included later to prevent regressions in this area. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220304224645.3677453-5-memxor@gmail.com
| | * bpf: Disallow negative offset in check_ptr_off_regKumar Kartikeya Dwivedi2022-03-061-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | check_ptr_off_reg only allows fixed offset to be set for PTR_TO_BTF_ID, where reg->off < 0 doesn't make sense. This would shift the pointer backwards, and fails later in btf_struct_ids_match or btf_struct_walk due to out of bounds access (since offset is interpreted as unsigned). Improve the verifier by rejecting this case by using a better error message for BPF helpers and kfunc, by putting a check inside the check_func_arg_reg_off function. Also, update existing verifier selftests to work with new error string. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220304224645.3677453-4-memxor@gmail.com
| | * bpf: Fix PTR_TO_BTF_ID var_off checkKumar Kartikeya Dwivedi2022-03-061-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When kfunc support was added, check_ctx_reg was called for PTR_TO_CTX register, but no offset checks were made for PTR_TO_BTF_ID. Only reg->off was taken into account by btf_struct_ids_match, which protected against type mismatch due to non-zero reg->off, but when reg->off was zero, a user could set the variable offset of the register and allow it to be passed to kfunc, leading to bad pointer being passed into the kernel. Fix this by reusing the extracted helper check_func_arg_reg_off from previous commit, and make one call before checking all supported register types. Since the list is maintained, any future changes will be taken into account by updating check_func_arg_reg_off. This function prevents non-zero var_off to be set for PTR_TO_BTF_ID, but still allows a fixed non-zero reg->off, which is needed for type matching to work correctly when using pointer arithmetic. ARG_DONTCARE is passed as arg_type, since kfunc doesn't support accepting a ARG_PTR_TO_ALLOC_MEM without relying on size of parameter type from BTF (in case of pointer), or using a mem, len pair. The forcing of offset check for ARG_PTR_TO_ALLOC_MEM is done because ringbuf helpers obtain the size from the header located at the beginning of the memory region, hence any changes to the original pointer shouldn't be allowed. In case of kfunc, size is always known, either at verification time, or using the length parameter, hence this forcing is not required. Since this check will happen once already for PTR_TO_CTX, remove the check_ptr_off_reg call inside its block. Fixes: e6ac2450d6de ("bpf: Support bpf program calling kernel function") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220304224645.3677453-3-memxor@gmail.com
| | * bpf: Add check_func_arg_reg_off functionKumar Kartikeya Dwivedi2022-03-061-28/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lift the list of register types allowed for having fixed and variable offsets when passed as helper function arguments into a common helper, so that they can be reused for kfunc checks in later commits. Keeping a common helper aids maintainability and allows us to follow the same consistent rules across helpers and kfuncs. Also, convert check_func_arg to use this function. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220304224645.3677453-2-memxor@gmail.com
| * | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2022-03-174-13/+45
| |\ \ | | | | | | | | | | | | | | | | | | | | No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * \ \ Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2022-03-119-32/+61
| |\ \ \ | | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | net/dsa/dsa2.c commit afb3cc1a397d ("net: dsa: unlock the rtnl_mutex when dsa_master_setup() fails") commit e83d56537859 ("net: dsa: replay master state events in dsa_tree_{setup,teardown}_master") https://lore.kernel.org/all/20220307101436.7ae87da0@canb.auug.org.au/ drivers/net/ethernet/intel/ice/ice.h commit 97b0129146b1 ("ice: Fix error with handling of bonding MTU") commit 43113ff73453 ("ice: add TTY for GNSS module for E810T device") https://lore.kernel.org/all/20220310112843.3233bcf1@canb.auug.org.au/ drivers/staging/gdm724x/gdm_lte.c commit fc7f750dc9d1 ("staging: gdm724x: fix use after free in gdm_lte_rx()") commit 4bcc4249b4cf ("staging: Use netif_rx().") https://lore.kernel.org/all/20220308111043.1018a59d@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski2022-03-0514-50/+82
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Daniel Borkmann says: ==================== pull-request: bpf-next 2022-03-04 We've added 32 non-merge commits during the last 14 day(s) which contain a total of 59 files changed, 1038 insertions(+), 473 deletions(-). The main changes are: 1) Optimize BPF stackmap's build_id retrieval by caching last valid build_id, as consecutive stack frames are likely to be in the same VMA and therefore have the same build id, from Hao Luo. 2) Several improvements to arm64 BPF JIT, that is, support for JITing the atomic[64]_fetch_add, atomic[64]_[fetch_]{and,or,xor} and lastly atomic[64]_{xchg|cmpxchg}. Also fix the BTF line info dump for JITed programs, from Hou Tao. 3) Optimize generic BPF map batch deletion by only enforcing synchronize_rcu() barrier once upon return to user space, from Eric Dumazet. 4) For kernel build parse DWARF and generate BTF through pahole with enabled multithreading, from Kui-Feng Lee. 5) BPF verifier usability improvements by making log info more concise and replacing inv with scalar type name, from Mykola Lysenko. 6) Two follow-up fixes for BPF prog JIT pack allocator, from Song Liu. 7) Add a new Kconfig to allow for loading kernel modules with non-matching BTF type info; their BTF info is then removed on load, from Connor O'Brien. 8) Remove reallocarray() usage from bpftool and switch to libbpf_reallocarray() in order to fix compilation errors for older glibc, from Mauricio Vásquez. 9) Fix libbpf to error on conflicting name in BTF when type declaration appears before the definition, from Xu Kuohai. 10) Fix issue in BPF preload for in-kernel light skeleton where loaded BPF program fds prevent init process from setting up fd 0-2, from Yucong Sun. 11) Fix libbpf reuse of pinned perf RB map when max_entries is auto-determined by libbpf, from Stijn Tintel. 12) Several cleanups for libbpf and a fix to enforce perf RB map #pages to be non-zero, from Yuntao Wang. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (32 commits) bpf: Small BPF verifier log improvements libbpf: Add a check to ensure that page_cnt is non-zero bpf, x86: Set header->size properly before freeing it x86: Disable HAVE_ARCH_HUGE_VMALLOC on 32-bit x86 bpf, test_run: Fix overflow in XDP frags bpf_test_finish selftests/bpf: Update btf_dump case for conflicting names libbpf: Skip forward declaration when counting duplicated type names bpf: Add some description about BPF_JIT_ALWAYS_ON in Kconfig bpf, docs: Add a missing colon in verifier.rst bpf: Cache the last valid build_id libbpf: Fix BPF_MAP_TYPE_PERF_EVENT_ARRAY auto-pinning bpf, selftests: Use raw_tp program for atomic test bpf, arm64: Support more atomic operations bpftool: Remove redundant slashes bpf: Add config to allow loading modules with BTF mismatches bpf, arm64: Feed byte-offset into bpf line info bpf, arm64: Call build_prologue() first in first JIT pass bpf: Fix issue with bpf preload module taking over stdout/stdin of kernel. bpftool: Bpf skeletons assert type sizes bpf: Cleanup comments ... ==================== Link: https://lore.kernel.org/r/20220304164313.31675-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | | bpf: Small BPF verifier log improvementsMykola Lysenko2022-03-031-29/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In particular these include: 1) Remove output of inv for scalars in print_verifier_state 2) Replace inv with scalar in verifier error messages 3) Remove _value suffixes for umin/umax/s32_min/etc (except map_value) 4) Remove output of id=0 5) Remove output of ref_obj_id=0 Signed-off-by: Mykola Lysenko <mykolal@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220301222745.1667206-1-mykolal@fb.com
| | * | | bpf, x86: Set header->size properly before freeing itSong Liu2022-03-021-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On do_jit failure path, the header is freed by bpf_jit_binary_pack_free. While bpf_jit_binary_pack_free doesn't require proper ro_header->size, bpf_prog_pack_free still uses it. Set header->size in bpf_int_jit_compile before calling bpf_jit_binary_pack_free. Fixes: 1022a5498f6f ("bpf, x86_64: Use bpf_jit_binary_pack_alloc") Fixes: 33c9805860e5 ("bpf: Introduce bpf_jit_binary_pack_[alloc|finalize|free]") Reported-by: Kui-Feng Lee <kuifeng@fb.com> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220302175126.247459-3-song@kernel.org
| | * | | bpf: Add some description about BPF_JIT_ALWAYS_ON in KconfigTiezhu Yang2022-03-011-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When CONFIG_BPF_JIT_ALWAYS_ON is enabled, /proc/sys/net/core/bpf_jit_enable is permanently set to 1 and setting any other value than that will return failure. Add the above description in the help text of config BPF_JIT_ALWAYS_ON, and then we can distinguish between BPF_JIT_ALWAYS_ON and BPF_JIT_DEFAULT_ON. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/1645523826-18149-2-git-send-email-yangtiezhu@loongson.cn
| | * | | bpf: Cache the last valid build_idHao Luo2022-02-281-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For binaries that are statically linked, consecutive stack frames are likely to be in the same VMA and therefore have the same build id. On a real-world workload, we observed that 66% of CPU cycles in __bpf_get_stackid() were spent on build_id_parse() and find_vma(). As an optimization for this case, we can cache the previous frame's VMA, if the new frame has the same VMA as the previous one, reuse the previous one's build id. We are holding the MM locks as reader across the entire loop, so we don't need to worry about VMA going away. Tested through "stacktrace_build_id" and "stacktrace_build_id_nmi" in test_progs. Suggested-by: Greg Thelen <gthelen@google.com> Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/bpf/20220224000531.1265030-1-haoluo@google.com
| | * | | bpf: Add config to allow loading modules with BTF mismatchesConnor O'Brien2022-02-281-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BTF mismatch can occur for a separately-built module even when the ABI is otherwise compatible and nothing else would prevent successfully loading. Add a new Kconfig to control how mismatches are handled. By default, preserve the current behavior of refusing to load the module. If MODULE_ALLOW_BTF_MISMATCH is enabled, load the module but ignore its BTF information. Suggested-by: Yonghong Song <yhs@fb.com> Suggested-by: Michal Suchánek <msuchanek@suse.de> Signed-off-by: Connor O'Brien <connoro@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/CAADnVQJ+OVPnBz8z3vNu8gKXX42jCUqfuvhWAyCQDu8N_yqqwQ@mail.gmail.com Link: https://lore.kernel.org/bpf/20220223012814.1898677-1-connoro@google.com
| | * | | bpf: Fix issue with bpf preload module taking over stdout/stdin of kernel.Yucong Sun2022-02-251-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In cb80ddc67152 ("bpf: Convert bpf_preload.ko to use light skeleton.") BPF preload was switched from user mode process to use in-kernel light skeleton instead. However, in the kernel context, early in the boot sequence, the first available FD can start from 0, instead of normally 3 for user mode process. So FDs 0 and 1 are then used for loaded BPF programs and prevent init process from setting up stdin/stdout/stderr on FD 0, 1, and 2 as expected. Before the fix: ls -lah /proc/1/fd/* lrwx------1 root root 64 Feb 23 17:20 /proc/1/fd/0 -> /dev/null lrwx------ 1 root root 64 Feb 23 17:20 /proc/1/fd/1 -> /dev/null lrwx------ 1 root root 64 Feb 23 17:20 /proc/1/fd/2 -> /dev/console lrwx------ 1 root root 64 Feb 23 17:20 /proc/1/fd/6 -> /dev/console lrwx------ 1 root root 64 Feb 23 17:20 /proc/1/fd/7 -> /dev/console After the fix: ls -lah /proc/1/fd/* lrwx------ 1 root root 64 Feb 24 21:23 /proc/1/fd/0 -> /dev/console lrwx------ 1 root root 64 Feb 24 21:23 /proc/1/fd/1 -> /dev/console lrwx------ 1 root root 64 Feb 24 21:23 /proc/1/fd/2 -> /dev/console Fix by closing prog FDs after initialization. struct bpf_prog's themselves are kept alive through direct kernel references taken with bpf_link_get_from_fd(). Fixes: cb80ddc67152 ("bpf: Convert bpf_preload.ko to use light skeleton.") Signed-off-by: Yucong Sun <fallentree@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220225185923.2535519-1-fallentree@fb.com
| | * | | bpf: Cleanup commentsTom Rix2022-02-249-14/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add leading space to spdx tag Use // for spdx c file comment Replacements resereved to reserved inbetween to in between everytime to every time intutivie to intuitive currenct to current encontered to encountered referenceing to referencing upto to up to exectuted to executed Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20220220184055.3608317-1-trix@redhat.com
| | * | | bpf: Initialize ret to 0 inside btf_populate_kfunc_set()Souptick Joarder (HPE)2022-02-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Kernel test robot reported below error -> kernel/bpf/btf.c:6718 btf_populate_kfunc_set() error: uninitialized symbol 'ret'. Initialize ret to 0. Fixes: dee872e124e8 ("bpf: Populate kfunc BTF ID sets in struct btf") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Souptick Joarder (HPE) <jrdr.linux@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/bpf/20220219163915.125770-1-jrdr.linux@gmail.com
| | * | | bpf: Call maybe_wait_bpf_programs() only once from generic_map_delete_batch()Eric Dumazet2022-02-181-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As stated in the comment found in maybe_wait_bpf_programs(), the synchronize_rcu() barrier is only needed before returning to userspace, not after each deletion in the batch. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20220218181801.2971275-1-eric.dumazet@gmail.com