summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
| | * | | | bpf, docs: Move Clang notes to a separate fileDave Thaler2022-09-302-6/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move Clang notes to a separate file. Signed-off-by: Dave Thaler <dthaler@microsoft.com> Link: https://lore.kernel.org/r/20220927185958.14995-3-dthaler1968@googlemail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | * | | | bpf, docs: Linux byteswap noteDave Thaler2022-09-302-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add Linux byteswap note. Signed-off-by: Dave Thaler <dthaler@microsoft.com> Link: https://lore.kernel.org/r/20220927185958.14995-2-dthaler1968@googlemail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | * | | | bpf, docs: Move legacy packet instructions to a separate fileDave Thaler2022-09-302-35/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move legacy packet instructions to a separate file. Signed-off-by: Dave Thaler <dthaler@microsoft.com> Link: https://lore.kernel.org/r/20220927185958.14995-1-dthaler1968@googlemail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | * | | | Merge branch 'bpf: Remove recursion check for struct_ops prog'Alexei Starovoitov2022-09-298-24/+112
| | |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Martin KaFai Lau says: ==================== From: Martin KaFai Lau <martin.lau@kernel.org> The struct_ops is sharing the tracing-trampoline's enter/exit function which tracks prog->active to avoid recursion. It turns out the struct_ops bpf prog will hit this prog->active and unnecessarily skipped running the struct_ops prog. eg. The '.ssthresh' may run in_task() and then interrupted by softirq that runs the same '.ssthresh'. The kernel does not call the tcp-cc's ops in a recursive way, so this set is to remove the recursion check for struct_ops prog. v3: - Clear the bpf_chg_cc_inprogress from the newly cloned tcp_sock in tcp_create_openreq_child() because the listen sk can be cloned without lock being held. (Eric Dumazet) v2: - v1 [0] turned into a long discussion on a few cases and also whether it needs to follow the bpf_run_ctx chain if there is tracing bpf_run_ctx (kprobe/trace/trampoline) running in between. It is a good signal that it is not obvious enough to reason about it and needs a tradeoff for a more straight forward approach. This revision uses one bit out of an existing 1 byte hole in the tcp_sock. It is in Patch 4. [0]: https://lore.kernel.org/bpf/20220922225616.3054840-1-kafai@fb.com/T/#md98d40ac5ec295fdadef476c227a3401b2b6b911 ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | | * | | | selftests/bpf: Check -EBUSY for the recurred bpf_setsockopt(TCP_CONGESTION)Martin KaFai Lau2022-09-292-8/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch changes the bpf_dctcp test to ensure the recurred bpf_setsockopt(TCP_CONGESTION) returns -EBUSY. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20220929070407.965581-6-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | | * | | | bpf: tcp: Stop bpf_setsockopt(TCP_CONGESTION) in init ops to recur itselfMartin KaFai Lau2022-09-293-1/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a bad bpf prog '.init' calls bpf_setsockopt(TCP_CONGESTION, "itself"), it will trigger this loop: .init => bpf_setsockopt(tcp_cc) => .init => bpf_setsockopt(tcp_cc) ... ... => .init => bpf_setsockopt(tcp_cc). It was prevented by the prog->active counter before but the prog->active detection cannot be used in struct_ops as explained in the earlier patch of the set. In this patch, the second bpf_setsockopt(tcp_cc) is not allowed in order to break the loop. This is done by using a bit of an existing 1 byte hole in tcp_sock to check if there is on-going bpf_setsockopt(TCP_CONGESTION) in this tcp_sock. Note that this essentially limits only the first '.init' can call bpf_setsockopt(TCP_CONGESTION) to pick a fallback cc (eg. peer does not support ECN) and the second '.init' cannot fallback to another cc. This applies even the second bpf_setsockopt(TCP_CONGESTION) will not cause a loop. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220929070407.965581-5-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | | * | | | bpf: Refactor bpf_setsockopt(TCP_CONGESTION) handling into another functionMartin KaFai Lau2022-09-291-17/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch moves the bpf_setsockopt(TCP_CONGESTION) logic into another function. The next patch will add extra logic to avoid recursion and this will make the latter patch easier to follow. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20220929070407.965581-4-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | | * | | | bpf: Move the "cdg" tcp-cc check to the common sol_tcp_sockopt()Martin KaFai Lau2022-09-291-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The check on the tcp-cc, "cdg", is done in the bpf_sk_setsockopt which is used by the bpf_tcp_ca, bpf_lsm, cg_sockopt, and tcp_iter hooks. However, it is not done for cg sock_ddr, cg sockops, and some of the bpf_lsm_cgroup hooks. The tcp-cc "cdg" should have very limited usage. This patch is to move the "cdg" check to the common sol_tcp_sockopt() so that all hooks have a consistent behavior. The motivation to make this check consistent now is because the latter patch will refactor the bpf_setsockopt(TCP_CONGESTION) into another function, so it is better to take this chance to refactor this piece also. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20220929070407.965581-3-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | | * | | | bpf: Add __bpf_prog_{enter,exit}_struct_ops for struct_ops trampolineMartin KaFai Lau2022-09-293-0/+30
| | |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The struct_ops prog is to allow using bpf to implement the functions in a struct (eg. kernel module). The current usage is to implement the tcp_congestion. The kernel does not call the tcp-cc's ops (ie. the bpf prog) in a recursive way. The struct_ops is sharing the tracing-trampoline's enter/exit function which tracks prog->active to avoid recursion. It is needed for tracing prog. However, it turns out the struct_ops bpf prog will hit this prog->active and unnecessarily skipped running the struct_ops prog. eg. The '.ssthresh' may run in_task() and then interrupted by softirq that runs the same '.ssthresh'. Skip running the '.ssthresh' will end up returning random value to the caller. The patch adds __bpf_prog_{enter,exit}_struct_ops for the struct_ops trampoline. They do not track the prog->active to detect recursion. One exception is when the tcp_congestion's '.init' ops is doing bpf_setsockopt(TCP_CONGESTION) and then recurs to the same '.init' ops. This will be addressed in the following patches. Fixes: ca06f55b9002 ("bpf: Add per-program recursion prevention mechanism") Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20220929070407.965581-2-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | * | | | Merge branch 'bpf/selftests: convert some tests to ASSERT_* macros'Andrii Nakryiko2022-09-2911-202/+117
| | |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Wang Yufen says: ==================== Convert some tests to use the preferred ASSERT_* macros instead of the deprecated CHECK(). ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
| | | * | | | selftests/bpf: Convert udp_limit test to ASSERT_* macrosWang Yufen2022-09-291-11/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert the selftest to use the preferred ASSERT_* macros instead of the deprecated CHECK(). Signed-off-by: Wang Yufen <wangyufen@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1664169131-32405-12-git-send-email-wangyufen@huawei.com
| | | * | | | selftests/bpf: Convert tcpbpf_user test to ASSERT_* macrosWang Yufen2022-09-291-20/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert the selftest to use the preferred ASSERT_* macros instead of the deprecated CHECK(). Signed-off-by: Wang Yufen <wangyufen@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1664169131-32405-11-git-send-email-wangyufen@huawei.com
| | | * | | | selftests/bpf: Convert tcp_rtt test to ASSERT_* macrosWang Yufen2022-09-291-8/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert the selftest to use the preferred ASSERT_* macros instead of the deprecated CHECK(). Signed-off-by: Wang Yufen <wangyufen@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1664169131-32405-10-git-send-email-wangyufen@huawei.com
| | | * | | | selftests/bpf: Convert tcp_hdr_options test to ASSERT_* macrosWang Yufen2022-09-291-52/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert the selftest to use the preferred ASSERT_* macros instead of the deprecated CHECK(). Signed-off-by: Wang Yufen <wangyufen@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1664169131-32405-9-git-send-email-wangyufen@huawei.com
| | | * | | | selftests/bpf: Convert tcp_estats test to ASSERT_* macrosWang Yufen2022-09-291-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert the selftest to use the preferred ASSERT_* macros instead of the deprecated CHECK(). Signed-off-by: Wang Yufen <wangyufen@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1664169131-32405-8-git-send-email-wangyufen@huawei.com
| | | * | | | selftests/bpf: Convert sockopt_sk test to ASSERT_* macrosWang Yufen2022-09-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert the selftest to use the preferred ASSERT_* macros instead of the deprecated CHECK(). Signed-off-by: Wang Yufen <wangyufen@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1664169131-32405-7-git-send-email-wangyufen@huawei.com
| | | * | | | selftests/bpf: Convert sockopt_multi test to ASSERT_* macrosWang Yufen2022-09-291-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert the selftest to use the preferred ASSERT_* macros instead of the deprecated CHECK(). Signed-off-by: Wang Yufen <wangyufen@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1664169131-32405-6-git-send-email-wangyufen@huawei.com
| | | * | | | selftests/bpf: Convert sockopt_inherit test to ASSERT_* macrosWang Yufen2022-09-291-17/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert the selftest to use the preferred ASSERT_* macros instead of the deprecated CHECK(). Signed-off-by: Wang Yufen <wangyufen@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1664169131-32405-5-git-send-email-wangyufen@huawei.com
| | | * | | | selftests/bpf: Convert sockopt test to ASSERT_* macrosWang Yufen2022-09-291-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert the selftest to use the preferred ASSERT_* macros instead of the deprecated CHECK(). Signed-off-by: Wang Yufen <wangyufen@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1664169131-32405-4-git-send-email-wangyufen@huawei.com
| | | * | | | selftests/bpf: Convert sockmap_ktls test to ASSERT_* macrosWang Yufen2022-09-291-29/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert the selftest to use the preferred ASSERT_* macros instead of the deprecated CHECK(). Signed-off-by: Wang Yufen <wangyufen@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1664169131-32405-3-git-send-email-wangyufen@huawei.com
| | | * | | | selftests/bpf: Convert sockmap_basic test to ASSERT_* macrosWang Yufen2022-09-291-54/+33
| | |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert the selftest to use the preferred ASSERT_* macros instead of the deprecated CHECK(). Signed-off-by: Wang Yufen <wangyufen@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1664169131-32405-2-git-send-email-wangyufen@huawei.com
| | * | | | Merge branch 'Parameterize task iterators.'Andrii Nakryiko2022-09-2911-46/+588
| | |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Kui-Feng Lee says: ==================== Allow creating an iterator that loops through resources of one task/thread. People could only create iterators to loop through all resources of files, vma, and tasks in the system, even though they were interested in only the resources of a specific task or process. Passing the additional parameters, people can now create an iterator to go through all resources or only the resources of a task. Major Changes: - Add new parameters in bpf_iter_link_info to indicate to go through all tasks or to go through a specific task. - Change the implementations of BPF iterators of vma, files, and tasks to allow going through only the resources of a specific task. - Provide the arguments of parameterized task iterators in bpf_link_info. Differences from v10: - Check pid_alive() to avoid potential errors. Differences from v9: - Fix the boundary check of computing page_shift. - Rewording the reason of checking and returning the same task. Differences from v8: - Fix uninitialized variable. - Avoid redundant work of getting task from pid. - Change format string to use %u instead of %d. - Use the value of page_shift to compute correct offset in bpf_iter_vm_offset.c. Differences from v7: - Travel the tasks of a process through task_group linked list instead of traveling through the whole namespace. Differences from v6: - Add part 5 to make bpftool show the value of parameters. - Change of wording of show_fdinfo() to show pid or tid instead of always pid. - Simplify error handling and naming of test cases. Differences from v5: - Use user-space tid/pid terminologies in bpf_iter_link_info and bpf_link_info. - Fix reference count - Merge all variants to one 'u32 pid' in internal structs. (bpf_iter_aux_info and bpf_iter_seq_task_common) - Compare the result of get_uprobe_offset() with the implementation with the vma iterators. - Implement show_fdinfo. Differences from v4: - Remove 'type' from bpf_iter_link_info and bpf_link_info. v10: https://lore.kernel.org/all/20220831181039.2680134-1-kuifeng@fb.com/ v9: https://lore.kernel.org/bpf/20220829192317.486946-1-kuifeng@fb.com/ v8: https://lore.kernel.org/bpf/20220829192317.486946-1-kuifeng@fb.com/ v7: https://lore.kernel.org/bpf/20220826003712.2810158-1-kuifeng@fb.com/ v6: https://lore.kernel.org/bpf/20220819220927.3409575-1-kuifeng@fb.com/ v5: https://lore.kernel.org/bpf/20220811001654.1316689-1-kuifeng@fb.com/ v4: https://lore.kernel.org/bpf/20220809195429.1043220-1-kuifeng@fb.com/ v3: https://lore.kernel.org/bpf/20220809063501.667610-1-kuifeng@fb.com/ v2: https://lore.kernel.org/bpf/20220801232649.2306614-1-kuifeng@fb.com/ v1: https://lore.kernel.org/bpf/20220726051713.840431-1-kuifeng@fb.com/ ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
| | | * | | | bpftool: Show parameters of BPF task iterators.Kui-Feng Lee2022-09-291-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Show tid or pid of iterators if giving an argument of tid or pid For example, the command `bpftool link list` may list following lines. 1: iter prog 2 target_name bpf_map 2: iter prog 3 target_name bpf_prog 33: iter prog 225 target_name task_file tid 1644 pids test_progs(1644) Link 33 is a task_file iterator with tid 1644. For now, only targets of task, task_file and task_vma may be with tid or pid to filter out tasks other than those belonging to a process (pid) or a thread (tid). Signed-off-by: Kui-Feng Lee <kuifeng@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Quentin Monnet <quentin@isovalent.com> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/bpf/20220926184957.208194-6-kuifeng@fb.com
| | | * | | | selftests/bpf: Test parameterized task BPF iterators.Kui-Feng Lee2022-09-296-24/+322
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Test iterators of vma, files and tasks. Ensure the API works appropriately to visit all tasks, tasks in a process, or a particular task. Signed-off-by: Kui-Feng Lee <kuifeng@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/bpf/20220926184957.208194-5-kuifeng@fb.com
| | | * | | | bpf: Handle show_fdinfo for the parameterized task BPF iteratorsKui-Feng Lee2022-09-291-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Show information of iterators in the respective files under /proc/<pid>/fdinfo/. For example, for a task file iterator with 1723 as the value of tid parameter, its fdinfo would look like the following lines. pos: 0 flags: 02000000 mnt_id: 14 ino: 38 link_type: iter link_id: 51 prog_tag: a590ac96db22b825 prog_id: 299 target_name: task_file task_type: TID tid: 1723 This patch add the last three fields. task_type is the type of the task parameter. TID means the iterator visit only the thread specified by tid. The value of tid in the above example is 1723. For the case of PID task_type, it means the iterator visits only threads of a process and will show the pid value of the process instead of a tid. Signed-off-by: Kui-Feng Lee <kuifeng@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/bpf/20220926184957.208194-4-kuifeng@fb.com
| | | * | | | bpf: Handle bpf_link_info for the parameterized task BPF iterators.Kui-Feng Lee2022-09-293-0/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add new fields to bpf_link_info that users can query it through bpf_obj_get_info_by_fd(). Signed-off-by: Kui-Feng Lee <kuifeng@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/bpf/20220926184957.208194-3-kuifeng@fb.com
| | | * | | | bpf: Parameterize task iterators.Kui-Feng Lee2022-09-294-22/+203
| | |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow creating an iterator that loops through resources of one thread/process. People could only create iterators to loop through all resources of files, vma, and tasks in the system, even though they were interested in only the resources of a specific task or process. Passing the additional parameters, people can now create an iterator to go through all resources or only the resources of a task. Signed-off-by: Kui-Feng Lee <kuifeng@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/bpf/20220926184957.208194-2-kuifeng@fb.com
| | * | | | libbpf: Don't require full struct enum64 in UAPI headersAndrii Nakryiko2022-09-271-1/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Drop the requirement for system-wide kernel UAPI headers to provide full struct btf_enum64 definition. This is an unexpected requirement that slipped in libbpf 1.0 and put unnecessary pressure ([0]) on users to have a bleeding-edge kernel UAPI header from unreleased Linux 6.0. To achieve this, we forward declare struct btf_enum64. But that's not enough as there is btf_enum64_value() helper that expects to know the layout of struct btf_enum64. So we get a bit creative with reinterpreting memory layout as array of __u32 and accesing lo32/hi32 fields as array elements. Alternative way would be to have a local pointer variable for anonymous struct with exactly the same layout as struct btf_enum64, but that gets us into C++ compiler errors complaining about invalid type casts. So play it safe, if ugly. [0] Closes: https://github.com/libbpf/libbpf/issues/562 Fixes: d90ec262b35b ("libbpf: Add enum64 support for btf_dump") Reported-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Link: https://lore.kernel.org/bpf/20220927042940.147185-1-andrii@kernel.org
| | * | | | selftests/bpf: Fix passing arguments via function in test_kmod.shYauheni Kaliuta2022-09-271-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the tests are run in a function $@ there actually contains the function arguments, not the script ones. Pass "$@" to the function as well. Fixes: 272d1f4cfa3c ("selftests: bpf: test_kmod.sh: Pass parameters to the module") Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220926092320.564631-1-ykaliuta@redhat.com
| | * | | | libbpf: Fix the case of running as non-root with capabilitiesJon Doron2022-09-273-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When running rootless with special capabilities like: FOWNER / DAC_OVERRIDE / DAC_READ_SEARCH The "access" API will not make the proper check if there is really access to a file or not. >From the access man page: " The check is done using the calling process's real UID and GID, rather than the effective IDs as is done when actually attempting an operation (e.g., open(2)) on the file. Similarly, for the root user, the check uses the set of permitted capabilities rather than the set of effective capabilities; ***and for non-root users, the check uses an empty set of capabilities.*** " What that means is that for non-root user the access API will not do the proper validation if the process really has permission to a file or not. To resolve this this patch replaces all the access API calls with faccessat with AT_EACCESS flag. Signed-off-by: Jon Doron <jond@wiz.io> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220925070431.1313680-1-arilou@gmail.com
| | * | | | Merge branch 'enforce W^X for trampoline and dispatcher'Alexei Starovoitov2022-09-276-35/+48
| | |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Song Liu says: ==================== Changes v1 => v2: 1. Update arch_prepare_bpf_dispatcher to use a RO image and a RW buffer. (Alexei) Note: I haven't found an existing test to cover this part, so this part was tested manually (comparing the generated dispatcher is the same). Jeff Layton reported CPA W^X warning linux-next [1]. It turns out to be W^X issue with bpf trampoline and bpf dispatcher. Fix these by: 1. Use bpf_prog_pack for bpf_dispatcher; 2. Set memory permission properly with bpf trampoline. [1] https://lore.kernel.org/lkml/c84cc27c1a5031a003039748c3c099732a718aec.camel@kernel.org/ ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | | * | | | bpf: Enforce W^X for bpf trampolineSong Liu2022-09-272-18/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mark the trampoline as RO+X after arch_prepare_bpf_trampoline, so that the trampoine follows W^X rule strictly. This will turn off warnings like CPA refuse W^X violation: 8000000000000163 -> 0000000000000163 range: ... Also remove bpf_jit_alloc_exec_page(), since it is not used any more. Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20220926184739.3512547-3-song@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | | * | | | bpf: use bpf_prog_pack for bpf_dispatcherSong Liu2022-09-275-17/+43
| | |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allocate bpf_dispatcher with bpf_prog_pack_alloc so that bpf_dispatcher can share pages with bpf programs. arch_prepare_bpf_dispatcher() is updated to provide a RW buffer as working area for arch code to write to. This also fixes CPA W^X warnning like: CPA refuse W^X violation: 8000000000000163 -> 0000000000000163 range: ... Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20220926184739.3512547-2-song@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | * | | | Merge branch 'bpf: Fixes for CONFIG_X86_KERNEL_IBT'Alexei Starovoitov2022-09-2710-38/+98
| | |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jiri Olsa says: ==================== Martynas reported bpf_get_func_ip returning +4 address when CONFIG_X86_KERNEL_IBT option is enabled and I found there are some failing bpf tests when this option is enabled. The CONFIG_X86_KERNEL_IBT option adds endbr instruction at the function entry, so the idea is to 'fix' entry ip for kprobe_multi and trampoline probes, because they are placed on the function entry. v5 changes: - updated uapi/linux/bpf.h headers with comment for bpf_get_func_ip returning 0 [Andrii] - added acks v4 changes: - used get_kernel_nofault to read previous instruction [Peter] - used movabs instruction in trampoline comment [Peter] - renamed fentry_ip argument in kprobe_multi_link_handler [Peter] v3 changes: - using 'unused' bpf function to get IBT config option into selftest skeleton - rebased to current bpf-next/master - added ack/review from Masami v2 changes: - change kprobes get_func_ip to return zero for kprobes attached within the function body [Andrii] - detect IBT config and properly test kprobe with offset [Andrii] v1 changes: - read previous instruction in kprobe_multi link handler and adjust entry_ip for CONFIG_X86_KERNEL_IBT option - split first patch into 2 separate changes - update changelogs ==================== Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | | * | | | selftests/bpf: Fix get_func_ip offset test for CONFIG_X86_KERNEL_IBTJiri Olsa2022-09-272-22/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With CONFIG_X86_KERNEL_IBT enabled the test for kprobe with offset won't work because of the extra endbr instruction. As suggested by Andrii adding CONFIG_X86_KERNEL_IBT detection and using appropriate offset value based on that. Also removing test7 program, because it does the same as test6. Suggested-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20220926153340.1621984-7-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | | * | | | bpf: Return value in kprobe get_func_ip only for entry addressJiri Olsa2022-09-274-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changing return value of kprobe's version of bpf_get_func_ip to return zero if the attach address is not on the function's entry point. For kprobes attached in the middle of the function we can't easily get to the function address especially now with the CONFIG_X86_KERNEL_IBT support. If user cares about current IP for kprobes attached within the function body, they can get it with PT_REGS_IP(ctx). Suggested-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20220926153340.1621984-6-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | | * | | | bpf: Adjust kprobe_multi entry_ip for CONFIG_X86_KERNEL_IBTJiri Olsa2022-09-272-5/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Martynas reported bpf_get_func_ip returning +4 address when CONFIG_X86_KERNEL_IBT option is enabled. When CONFIG_X86_KERNEL_IBT is enabled we'll have endbr instruction at the function entry, which screws return value of bpf_get_func_ip() helper that should return the function address. There's short term workaround for kprobe_multi bpf program made by Alexei [1], but we need this fixup also for bpf_get_attach_cookie, that returns cookie based on the entry_ip value. Moving the fixup in the fprobe handler, so both bpf_get_func_ip and bpf_get_attach_cookie get expected function address when CONFIG_X86_KERNEL_IBT option is enabled. Also renaming kprobe_multi_link_handler entry_ip argument to fentry_ip so it's clearer this is an ftrace __fentry__ ip. [1] commit 7f0059b58f02 ("selftests/bpf: Fix kprobe_multi test.") Cc: Peter Zijlstra <peterz@infradead.org> Reported-by: Martynas Pumputis <m@lambda.lt> Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20220926153340.1621984-5-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | | * | | | bpf: Use given function address for trampoline ip argJiri Olsa2022-09-271-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using function address given at the generation time as the trampoline ip argument. This way we get directly the function address that we need, so we don't need to: - read the ip from the stack - subtract X86_PATCH_SIZE - subtract ENDBR_INSN_SIZE if CONFIG_X86_KERNEL_IBT is enabled which is not even implemented yet ;-) Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20220926153340.1621984-4-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | | * | | | ftrace: Keep the resolved addr in kallsyms_callbackJiri Olsa2022-09-271-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Keeping the resolved 'addr' in kallsyms_callback, instead of taking ftrace_location value, because we depend on symbol address in the cookie related code. With CONFIG_X86_KERNEL_IBT option the ftrace_location value differs from symbol address, which screwes the symbol address cookies matching. There are 2 users of this function: - bpf_kprobe_multi_link_attach for which this fix is for - get_ftrace_locations which is used by register_fprobe_syms this function needs to get symbols resolved to addresses, but does not need 'ftrace location addresses' at this point there's another ftrace location translation in the path done by ftrace_set_filter_ips call: register_fprobe_syms addrs = get_ftrace_locations register_fprobe_ips(addrs) ... ftrace_set_filter_ips ... __ftrace_match_addr ip = ftrace_location(ip); ... Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20220926153340.1621984-3-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | | * | | | kprobes: Add new KPROBE_FLAG_ON_FUNC_ENTRY kprobe flagJiri Olsa2022-09-272-1/+6
| | |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adding KPROBE_FLAG_ON_FUNC_ENTRY kprobe flag to indicate that attach address is on function entry. This is used in following changes in get_func_ip helper to return correct function address. Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20220926153340.1621984-2-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | * | | | skmsg: Schedule psock work if the cached skb exists on the psockLiu Jian2022-09-261-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In sk_psock_backlog function, for ingress direction skb, if no new data packet arrives after the skb is cached, the cached skb does not have a chance to be added to the receive queue of psock. As a result, the cached skb cannot be received by the upper-layer application. Fix this by reschedule the psock work to dispose the cached skb in sk_msg_recvmsg function. Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: Liu Jian <liujian56@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20220907071311.60534-1-liujian56@huawei.com
| | * | | | selftests/bpf: Add wait send memory test for sockmap redirectLiu Jian2022-09-261-0/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add one test for wait redirect sock's send memory test for sockmap. Signed-off-by: Liu Jian <liujian56@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20220823133755.314697-3-liujian56@huawei.com
| | * | | | net: If sock is dead don't access sock's sk_wq in sk_stream_wait_memoryLiu Jian2022-09-261-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes the below NULL pointer dereference: [...] [ 14.471200] Call Trace: [ 14.471562] <TASK> [ 14.471882] lock_acquire+0x245/0x2e0 [ 14.472416] ? remove_wait_queue+0x12/0x50 [ 14.473014] ? _raw_spin_lock_irqsave+0x17/0x50 [ 14.473681] _raw_spin_lock_irqsave+0x3d/0x50 [ 14.474318] ? remove_wait_queue+0x12/0x50 [ 14.474907] remove_wait_queue+0x12/0x50 [ 14.475480] sk_stream_wait_memory+0x20d/0x340 [ 14.476127] ? do_wait_intr_irq+0x80/0x80 [ 14.476704] do_tcp_sendpages+0x287/0x600 [ 14.477283] tcp_bpf_push+0xab/0x260 [ 14.477817] tcp_bpf_sendmsg_redir+0x297/0x500 [ 14.478461] ? __local_bh_enable_ip+0x77/0xe0 [ 14.479096] tcp_bpf_send_verdict+0x105/0x470 [ 14.479729] tcp_bpf_sendmsg+0x318/0x4f0 [ 14.480311] sock_sendmsg+0x2d/0x40 [ 14.480822] ____sys_sendmsg+0x1b4/0x1c0 [ 14.481390] ? copy_msghdr_from_user+0x62/0x80 [ 14.482048] ___sys_sendmsg+0x78/0xb0 [ 14.482580] ? vmf_insert_pfn_prot+0x91/0x150 [ 14.483215] ? __do_fault+0x2a/0x1a0 [ 14.483738] ? do_fault+0x15e/0x5d0 [ 14.484246] ? __handle_mm_fault+0x56b/0x1040 [ 14.484874] ? lock_is_held_type+0xdf/0x130 [ 14.485474] ? find_held_lock+0x2d/0x90 [ 14.486046] ? __sys_sendmsg+0x41/0x70 [ 14.486587] __sys_sendmsg+0x41/0x70 [ 14.487105] ? intel_pmu_drain_pebs_core+0x350/0x350 [ 14.487822] do_syscall_64+0x34/0x80 [ 14.488345] entry_SYSCALL_64_after_hwframe+0x63/0xcd [...] The test scenario has the following flow: thread1 thread2 ----------- --------------- tcp_bpf_sendmsg tcp_bpf_send_verdict tcp_bpf_sendmsg_redir sock_close tcp_bpf_push_locked __sock_release tcp_bpf_push //inet_release do_tcp_sendpages sock->ops->release sk_stream_wait_memory // tcp_close sk_wait_event sk->sk_prot->close release_sock(__sk); *** lock_sock(sk); __tcp_close sock_orphan(sk) sk->sk_wq = NULL release_sock **** lock_sock(__sk); remove_wait_queue(sk_sleep(sk), &wait); sk_sleep(sk) //NULL pointer dereference &rcu_dereference_raw(sk->sk_wq)->wait While waiting for memory in thread1, the socket is released with its wait queue because thread2 has closed it. This caused by tcp_bpf_send_verdict didn't increase the f_count of psock->sk_redir->sk_socket->file in thread1. We should check if SOCK_DEAD flag is set on wakeup in sk_stream_wait_memory before accessing the wait queue. Suggested-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Liu Jian <liujian56@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/bpf/20220823133755.314697-2-liujian56@huawei.com
| | * | | | Merge branch 'veristat: further usability improvements'Alexei Starovoitov2022-09-242-19/+118
| | |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Andrii Nakryiko says: ==================== A small patch set adding few usability improvements and features making veristat a more convenient tool to be used for work on BPF verifier: - patch #2 speeds up and makes stats parsing from BPF verifier log more robust; - patch #3 makes veristat less strict about input object files; veristat will ignore non-BPF ELF files; - patch #4 adds progress log, by default, so that user doing mass-verification is aware that veristat is not stuck; - patch #5 allows to tune requested BPF verifier log level, which makes veristat a simplest way to get BPF verifier log, especially successfully verified ones. v1->v2: - don't emit progress in non-table mode, as it breaks CSV output. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | | * | | | selftests/bpf: allow to adjust BPF verifier log level in veristatAndrii Nakryiko2022-09-241-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add -l (--log-level) flag to override default BPF verifier log lever. This only matters in verbose mode, which is the mode in which veristat emits verifier log for each processed BPF program. This is important because for successfully verified BPF programs log_level 1 is empty, as BPF verifier truncates all the successfully verified paths. So -l2 is the only way to actually get BPF verifier log in practice. It looks sometihng like this: [vmuser@archvm bpf]$ sudo ./veristat xdp_tx.bpf.o -vl2 Processing 'xdp_tx.bpf.o'... PROCESSING xdp_tx.bpf.o/xdp_tx, DURATION US: 19, VERDICT: success, VERIFIER LOG: func#0 @0 0: R1=ctx(off=0,imm=0) R10=fp0 ; return XDP_TX; 0: (b4) w0 = 3 ; R0_w=3 1: (95) exit verification time 19 usec stack depth 0 processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 File Program Verdict Duration (us) Total insns Total states Peak states ------------ ------- ------- ------------- ----------- ------------ ----------- xdp_tx.bpf.o xdp_tx success 19 2 0 0 ------------ ------- ------- ------------- ----------- ------------ ----------- Done. Processed 1 files, 0 programs. Skipped 1 files, 0 programs. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20220923175913.3272430-6-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | | * | | | selftests/bpf: emit processing progress and add quiet mode to veristatAndrii Nakryiko2022-09-241-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Emit "Processing <filepath>..." for each BPF object file to be processed, to show progress. But also add -q (--quiet) flag to silence such messages. Doing something more clever (like overwriting same output line) is to cumbersome and easily breakable if there is any other console output (e.g., errors from libbpf). Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20220923175913.3272430-5-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | | * | | | selftests/bpf: make veristat skip non-BPF and failing-to-open BPF objectsAndrii Nakryiko2022-09-241-8/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make veristat ignore non-BPF object files. This allows simpler mass-verification (e.g., `sudo ./veristat *.bpf.o` in selftests/bpf directory). Note that `sudo ./veristat *.o` would also work, but with selftests's multiple copies of BPF object files (.bpf.o and .bpf.linked{1,2,3}.o) it's 4x slower. Also, given some of BPF object files could be incomplete in the sense that they are meant to be statically linked into final BPF object file (like linked_maps, linked_funcs, linked_vars), note such instances in stderr, but proceed anyways. This seems like a better trade off between completely silently ignoring BPF object file and aborting mass-verification altogether. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20220923175913.3272430-4-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | | * | | | selftests/bpf: make veristat's verifier log parsing faster and more robustAndrii Nakryiko2022-09-241-9/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make sure veristat doesn't spend ridiculous amount of time parsing verifier stats from verifier log, especially for very large logs or truncated logs (e.g., when verifier returns -ENOSPC due to too small buffer). For this, parse lines from the end of the log and make sure we parse only up to 100 last lines, where stats should be, if at all. Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20220923175913.3272430-3-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | | * | | | selftests/bpf: add sign-file to .gitignoreAndrii Nakryiko2022-09-241-0/+1
| | |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add sign-file to .gitignore to avoid accidentally checking it in. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20220923175913.3272430-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | * | | | libbpf: restore memory layout of bpf_object_open_optsAndrii Nakryiko2022-09-241-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When attach_prog_fd field was removed in libbpf 1.0 and replaced with `long: 0` placeholder, it actually shifted all the subsequent fields by 8 byte. This is due to `long: 0` promising to adjust next field's offset to long-aligned offset. But in this case we were already long-aligned as pin_root_path is a pointer. So `long: 0` had no effect, and thus didn't feel the gap created by removed attach_prog_fd. Non-zero bitfield should have been used instead. I validated using pahole. Originally kconfig field was at offset 40. With `long: 0` it's at offset 32, which is wrong. With this change it's back at offset 40. While technically libbpf 1.0 is allowed to break backwards compatibility and applications should have been recompiled against libbpf 1.0 headers, but given how trivial it is to preserve memory layout, let's fix this. Reported-by: Grant Seltzer Richman <grantseltzer@gmail.com> Fixes: 146bf811f5ac ("libbpf: remove most other deprecated high-level APIs") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20220923230559.666608-1-andrii@kernel.org Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>