summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
* wifi: rtw89: fw: refine download flow to support variant firmware suitsPing-Ke Shih2023-09-071-19/+65
| | | | | | | | | | | | | | | To support download more than one firmware, adjust flow to download firmware by unit of firmware suit. Then, flow becomes 1. initial setup - disable/enable_wcpu 2. for all firmware suits 2.1. download WiFi CPU, and check ready 2.2. download BB MCU, and check ready 3. check status code to make sure all ready Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20230901073956.54203-8-pkshih@realtek.com
* wifi: rtw89: 8922a: add chip_ops::bb_preinit to enable BB before downloading ↵Ping-Ke Shih2023-09-076-0/+20
| | | | | | | | | | | firmware Before downloading firmware for BB MCU, call this ops to enable baseband hardware. Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20230901073956.54203-7-pkshih@realtek.com
* wifi: rtw89: fw: propagate an argument include_bb for BB MCU firmwarePing-Ke Shih2023-09-0712-14/+31
| | | | | | | | | | | Though WiFi 7 chips need BB MCU firmware, we don't download it in probe stage. Instead, only bring interface up under normal operation or WoWLAN mode. So, add an argument to assist download flow to setup download settings properly. Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20230901073956.54203-6-pkshih@realtek.com
* wifi: rtw89: fw: add checking type for variant type of firmwarePing-Ke Shih2023-09-076-15/+45
| | | | | | | | | | | | | | For WiFi 6 chips, there is only single one firmware i.e. WiFi CPU firmware, so no need an argument to discriminate them. For WiFi 7 chips, BB MCU firmware is introduced, and we need to check it ready after downloading. For each type of firmware, we need to check corresponding hardware ready bit. After downloading all firmware, check status code to determine if all things are ready. Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20230901073956.54203-5-pkshih@realtek.com
* wifi: rtw89: fw: implement supported functions of download firmware for WiFi ↵Ping-Ke Shih2023-09-072-0/+238
| | | | | | | | | | | | 7 chips To work with generalized flow of download firmware, implement WiFi 7 specific functions to support it. These functions include disable/enable WiFi CPU, status of path ready, and status of firmware. Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20230901073956.54203-4-pkshih@realtek.com
* wifi: rtw89: fw: generalize download firmware flow by mac_gen pointersPing-Ke Shih2023-09-073-18/+29
| | | | | | | | | | In order to reuse the flow to download firmware, define some mac_gen::ops to implement them for WiFi 6 and 7 chips individually. This doesn't change logic at all. Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20230901073956.54203-3-pkshih@realtek.com
* wifi: rtw89: fw: move polling function of firmware path ready to an ↵Ping-Ke Shih2023-09-074-9/+17
| | | | | | | | | | | | | | | individual function To download firmware, we need to check path is ready. There are two kinds of path -- one is to download firmware header, and the other is to download firmware body. Since the polling method is different from WiFi 7 chips, make it to be an individual function, and then we can reuse the download flow. Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20230901073956.54203-2-pkshih@realtek.com
* wifi: rtw89: mcc: trigger FW to start/stop MCCZong-Zhe Yang2023-09-071-0/+173
| | | | | | | | | | | | According to Wi-Fi/BT roles' settings, we fill corresponding H2Cs (host to chip packets). Then, following MCC (multi-channel concurrency) pattern, we send these H2Cs as planned. Eventually, the trigger H2Cs will be sent to tell FW to really start/stop MCC. Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20230831053133.24015-7-pkshih@realtek.com
* wifi: rtw89: fix typo of rtw89_fw_h2c_mcc_macid_bitmap()Zong-Zhe Yang2023-09-072-2/+2
| | | | | | | | | | Fix a typo where `bitamp` should be `bitmap`. Don't change functionality at all. Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20230831053133.24015-6-pkshih@realtek.com
* wifi: rtw89: mcc: decide pattern and calculate parametersZong-Zhe Yang2023-09-072-0/+235
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After the previous works, we can now expand and display the MCC pattern in more detail, as shown below. |< MCC interval >| |< duration ref >| (if mid bt) |< duration aux >| (if tail bt) | |<tob ref >|< toa ref>| ... |<tob aux >|< toa aux>| ... | V V tbtt ref tbtt aux |< beacon offset >| (where tob means `time offset behind` and toa means `time offset ahead`) There are two key points. 1. decide position of BT slot if MCC pattern needs to handle BT duration. 2. calculate all parameters related to tob and toa in MCC pattern. For point (1), when BT duration needs to be handled, BT position will rely on beacon offset, either middle or tail. For point (2), to ensure durations of the Wi-Fi roles cover their beacons, we have to calculate tob and toa for them according to their TBTT. And, there are two strategies to calculate parameters, strict and loose. In strict pattern, all parameters take HW time into account as limitation. But, the strict calculation are not always successful. In loose pattern, it only tries to give positive parameters to reference role and doesn't care much about auxiliary role. If unfortunately auxiliary role gets negative parameters in loose pattern, FW will be notified and then deal with it. So, the loose calculation won't fail. In general, we always try strict pattern cases before using a loose pattern. Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20230831053133.24015-5-pkshih@realtek.com
* wifi: rtw89: mcc: consider and determine BT durationZong-Zhe Yang2023-09-071-0/+169
| | | | | | | | | | | | | | | | | | | | Before calculating MCC pattern, we have to determine whether to handle BT duration in it or not. The decision will depend on the channels that Wi-Fi roles use. And, we have three cases shown below. 1. non-2GHz + non-2GHz 2. non-2GHz + 2GHz (different band) 3. 2GHz + 2GHz (dual 2GHz) For case (1), we don't care BT duration in MCC pattern. For case (2), we still don't care BT duration in MCC pattern. Instead, we try to satisfy it by modifying duration of Wi-Fi role on non-2GHz channel. For case (3), we need to modify Wi-Fi durations and also need to handle BT duration in MCC pattern. Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20230831053133.24015-4-pkshih@realtek.com
* wifi: rtw89: mcc: fill fundamental configurationsZong-Zhe Yang2023-09-073-0/+187
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We determine the fundamental settings shown below. |< MCC interval >| |< duration ref >|< duration aux >| | | | | |< beacon offset >| | | V V (tbtt ref) (tbtt aux) (where `ref` (reference) and `aux` (auxiliary) mean the two MCC roles) Based on MCC mode (GO+STA or GC+STA), we fill configurations of MCC interval and beacon offset. And, we make sure each MCC role have a basically required duration in the MCC interval. The beacon offset mentioned above is a parameter for further MCC pattern calculation. If MCC is in GC+STA mode, we will calculate the real beacon offset through TSFs shown in beacons of both MCC roles. Otherwise, we will use a default beacon offset, and make GO sync STA's TSF timer with this offset. MCC pattern calculation will break down each MCC role's duration in more detail. We will implement it in the following. Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20230831053133.24015-3-pkshih@realtek.com
* wifi: rtw89: mcc: initialize start flowZong-Zhe Yang2023-09-074-0/+502
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We prepare to support TDMA-based MCC (multi-channel concurrency) which allows two kinds of modes below. * P2P GO + normal STA * P2P GC + normal STA Each mode has two vif and two chanctx. Then, each vif binds one separate chanctx and becomes one MCC role. We name the two MCC roles as follows. * MCC role - reference (ref) We calculate the baseline of our TDMA things accodring to its info, e.g. TBTT. In normal case, it will be put at the first slot of TDMA. * MCC role - auxiliary (aux) MCC state machine will be running in FW eventually, but before that, we have to fill and calculate things that are needed by FW. We fill the information of MCC role according to its vif and its chanctx. Then, we calculate the start time for MCC. Note that the parameters used in the calculation now is assigned by default rules. The precise parameters for better MCC behavior will be derived in the following. Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20230831053133.24015-2-pkshih@realtek.com
* wifi: rtw89: 8852c: Fix TSSI causes transmit power inaccuracyKuan-Chung Chen2023-09-043-23/+46
| | | | | | | | | | Modify TSSI ADC FIFO Clock follow RX ADC Clock can avoid transmit power inaccuracy. Signed-off-by: Kuan-Chung Chen <damon.chen@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20230830092849.153251-3-pkshih@realtek.com
* wifi: rtw89: 8852c: Update bandedge parameters for better performanceKuan-Chung Chen2023-09-043-3/+15
| | | | | | | | | | | TSSI configures bandedge to TX proper waveform, these new bandedge parameters improve the accuracy of transmit power compensation. This helps to avoid throughput degradation. Signed-off-by: Kuan-Chung Chen <damon.chen@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20230830092849.153251-2-pkshih@realtek.com
* wifi: plfxlc: fix clang-specific fortify warningDmitry Antipov2023-09-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When compiling with clang 16.0.6 and CONFIG_FORTIFY_SOURCE=y, I've noticed the following (somewhat confusing due to absence of an actual source code location): In file included from drivers/net/wireless/purelifi/plfxlc/mac.c:6: In file included from ./include/linux/netdevice.h:24: In file included from ./include/linux/timer.h:6: In file included from ./include/linux/ktime.h:24: In file included from ./include/linux/time.h:60: In file included from ./include/linux/time32.h:13: In file included from ./include/linux/timex.h:67: In file included from ./arch/x86/include/asm/timex.h:5: In file included from ./arch/x86/include/asm/processor.h:23: In file included from ./arch/x86/include/asm/msr.h:11: In file included from ./arch/x86/include/asm/cpumask.h:5: In file included from ./include/linux/cpumask.h:12: In file included from ./include/linux/bitmap.h:11: In file included from ./include/linux/string.h:254: ./include/linux/fortify-string.h:592:4: warning: call to '__read_overflow2_field' declared with 'warning' attribute: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Wattribute-warning] __read_overflow2_field(q_size_field, size); The compiler actually complains on 'plfxlc_get_et_strings()' where fortification logic inteprets call to 'memcpy()' as an attempt to copy the whole 'et_strings' array from its first member and so issues an overread warning. This warning may be silenced by passing an address of the whole array and not the first member to 'memcpy()'. Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20230829094541.234751-1-dmantipov@yandex.ru
* wifi: rtl8xxxu: mark TOTOLINK N150UA V5/N150UA-B as testedZenm Chen2023-09-041-0/+1
| | | | | | | | | TOTOLINK N150UA V5/N150UA-B (VID=0x0bda, PID=0x2005) works fine with the rtl8xxxu driver, so mark as tested. Signed-off-by: Zenm Chen <zenmchen@gmail.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20230829074358.14795-1-zenmchen@gmail.com
* wifi: rtw88: fix typo rtw8822cu_probePo-Hao Huang2023-09-041-2/+2
| | | | | | | | | | The probe function of 8822cu is misplaced to 8822bu, so we fix it. Just cosmetics, no changes in functionality. Signed-off-by: Po-Hao Huang <phhuang@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20230825062404.50813-1-pkshih@realtek.com
* Merge branch '1GbE' of ↵Jakub Kicinski2023-08-2611-64/+210
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-08-24 (igc, e1000e) This series contains updates to igc and e1000e drivers. Vinicius adds support for utilizing multiple PTP registers on igc. Sasha reduces interval time for PTM on igc and adds new device support on e1000e. * '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: e1000e: Add support for the next LOM generation igc: Decrease PTM short interval from 10 us to 1 us igc: Add support for multiple in-flight TX timestamps ==================== Link: https://lore.kernel.org/r/20230824204418.1551093-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * e1000e: Add support for the next LOM generationSasha Neftin2023-08-245-0/+17
| | | | | | | | | | | | | | | | | | | | Add devices IDs for the next LOM generations that will be available on the next Intel Client platforms. This patch provides the initial support for these devices. Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
| * igc: Decrease PTM short interval from 10 us to 1 usSasha Neftin2023-08-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | With the 10us interval, we were seeing PTM transactions take around 12us. Hardware team suggested this interval could be lowered to 1us which was confirmed with PCIe sniffer. With the 1us interval, PTM dialogs took around 2us. Suggested-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Reviewed-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
| * igc: Add support for multiple in-flight TX timestampsVinicius Costa Gomes2023-08-246-63/+192
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for using the four sets of timestamping registers that i225/i226 have available for TX. In some workloads, where multiple applications request hardware transmission timestamps, it was possible that some of those requests were denied because the only in use register was already occupied. This is also in preparation to future support for hardware timestamping with multiple PTP domains. With multiple domains chances of multiple TX timestamps being requested at the same time increase. Before: $ sudo ./ntpperf -i enp3s0 -m 10:22:22:22:22:21 -d 192.168.1.3 -s 172.18.0.0/16 -I -H -o 37 | responses | TX timestamp offset (ns) rate clients | lost invalid basic xleave | min mean max stddev 1000 100 0.00% 0.00% 0.00% 100.00% +1 +41 +73 13 1500 150 0.00% 0.00% 0.00% 100.00% +9 +49 +87 15 2250 225 0.00% 0.00% 0.00% 100.00% +9 +42 +79 13 3375 337 0.00% 0.00% 0.00% 100.00% +11 +46 +81 13 5062 506 0.00% 0.00% 0.00% 100.00% +7 +44 +80 13 7593 759 0.00% 0.00% 0.00% 100.00% +9 +44 +79 12 11389 1138 0.00% 0.00% 0.00% 100.00% +14 +51 +87 13 17083 1708 0.00% 0.00% 0.00% 100.00% +1 +41 +80 14 25624 2562 0.00% 0.00% 0.00% 100.00% +11 +50 +5107 51 38436 3843 0.00% 0.00% 0.00% 100.00% -2 +36 +7843 38 57654 5765 0.00% 0.00% 0.00% 100.00% +4 +42 +10503 69 86481 8648 0.00% 0.00% 0.00% 100.00% +11 +54 +5492 65 129721 12972 0.00% 0.00% 0.00% 100.00% +31 +2680 +6942 2606 194581 16384 16.79% 0.00% 0.87% 82.34% +73 +4444 +15879 3116 291871 16384 35.05% 0.00% 1.53% 63.42% +188 +5381 +17019 3035 437806 16384 54.95% 0.00% 2.55% 42.50% +233 +6302 +13885 2846 After: $ sudo ./ntpperf -i enp3s0 -m 10:22:22:22:22:21 -d 192.168.1.3 -s 172.18.0.0/16 -I -H -o 37 | responses | TX timestamp offset (ns) rate clients | lost invalid basic xleave | min mean max stddev 1000 100 0.00% 0.00% 0.00% 100.00% -20 +12 +43 13 1500 150 0.00% 0.00% 0.00% 100.00% -23 +18 +57 14 2250 225 0.00% 0.00% 0.00% 100.00% -2 +33 +67 13 3375 337 0.00% 0.00% 0.00% 100.00% +1 +38 +76 13 5062 506 0.00% 0.00% 0.00% 100.00% +9 +52 +93 14 7593 759 0.00% 0.00% 0.00% 100.00% +11 +47 +82 13 11389 1138 0.00% 0.00% 0.00% 100.00% -9 +27 +74 13 17083 1708 0.00% 0.00% 0.00% 100.00% -13 +25 +66 14 25624 2562 0.00% 0.00% 0.00% 100.00% -8 +28 +65 13 38436 3843 0.00% 0.00% 0.00% 100.00% -13 +28 +69 13 57654 5765 0.00% 0.00% 0.00% 100.00% -11 +32 +71 14 86481 8648 0.00% 0.00% 0.00% 100.00% +2 +44 +83 14 129721 12972 15.36% 0.00% 0.35% 84.29% -2 +2248 +22907 4252 194581 16384 42.98% 0.00% 1.98% 55.04% -4 +5278 +65039 5856 291871 16384 54.33% 0.00% 2.21% 43.46% -3 +6306 +22608 5665 We can see that with 4 registers, as expected, we are able to handle a increasing number of requests more consistently, but as soon as all registers are in use, the decrease in quality of service happens in a sharp step. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
* | doc/netlink: Add delete operation to ovs_vport specDonald Hunter2023-08-261-1/+12
| | | | | | | | | | | | | | | | Add del operation to the spec to help with testing. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20230824142221.71339-1-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | tools: ynl-gen: fix uAPI generation after tempfile changesJakub Kicinski2023-08-261-10/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We use a tempfile for code generation, to avoid wiping the target file out if the code generator crashes. File contents are copied from tempfile to actual destination at the end of main(). uAPI generation is relatively simple so when generating the uAPI header we return from main() early, and never reach the "copy code over" stage. Since commit under Fixes uAPI headers are not updated by ynl-gen. Move the copy/commit of the code into CodeWriter, to make it easier to call at any point in time. Hook it into the destructor to make sure we don't miss calling it. Fixes: f65f305ae008 ("tools: ynl-gen: use temporary file for rendering") Link: https://lore.kernel.org/r/20230824212431.1683612-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | Merge branch 'stmmac-cleanups'Jakub Kicinski2023-08-269-39/+70
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Russell King says: ==================== stmmac cleanups One of the comments I had on Feiyang Chen's series was concerning the initialisation of phylink... and so I've decided to do something about it, cleaning it up a bit. This series: 1) adds a new phylink function to limit the MAC capabilities according to a maximum speed. This allows us to greatly simplify stmmac's initialisation of phylink's mac capabilities. 2) everywhere that uses priv->plat->phylink_node first converts this to a fwnode before doing anything with it. This is silly. Let's instead store it as a fwnode to eliminate these conversions in multiple places. 3) clean up passing the fwnode to phylink - it might as well happen at the phylink_create() callsite, rather than being scattered throughout the entire function. 4) same for mdio_bus_data 5) use phylink_limit_mac_speed() to handle the priv->plat->max_speed restriction. 6) add a method to get the MAC-specific capabilities from the code dealing with the MACs, and arrange to call it at an appropriate time. 7) convert the gmac4 users to use the MAC specific method. 8) same for xgmac. 9) group all the simple phylink_config initialisations together. 10) convert half-duplex logic to being positive logic. While looking into all of this, this raised eyebrows: if (priv->plat->tx_queues_to_use > 1) priv->phylink_config.mac_capabilities &= ~(MAC_10HD | MAC_100HD | MAC_1000HD); priv->plat->tx_queues_to_use is initialised by platforms to either 1, 4 or 8, and can be controlled from userspace via the --set-channels ethtool op. The implementation of this op in this driver limits the number of channels to priv->dma_cap.number_tx_queues, which is derived from the DMA hwcap. So, the obvious questions are: 1) what guarantees that the static initialisation of tx_queues_to_use will always be less than or equal to number_tx_queues from the DMA hw cap? 2) tx_queues_to_use starts off as 1, but number_tx_queues is larger, we will leave the half-duplex capabilities in place, but userspace can increase tx_queues_to_use above 1. Does that mean half-duplex is then not supported? 3) Should we be basing the decision whether half-duplex is supported off the DMA capabilities? 4) What about priv->dma_cap.half_duplex? Doesn't that get a say in whether half-duplex is supported or not? Why isn't this used? Why is it only reported via debugfs? If it's not being used by the driver, what's the point of reporting it via debugfs? ==================== Link: https://lore.kernel.org/r/ZOddFH22PWmOmbT5@shell.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | net: stmmac: convert half-duplex support to positive logicRussell King (Oracle)2023-08-261-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than detecting when half-duplex is not supported, and clearing the MAC capabilities, reverse the if() condition and use it to set the capabilities instead. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1qZAXn-005pUb-SP@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | net: stmmac: move priv->phylink_config.mac_managed_pmRussell King (Oracle)2023-08-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Move priv->phylink_config.mac_managed_pm to be along side the other phylink initialisations. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1qZAXi-005pUV-Nq@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | net: stmmac: move xgmac specific phylink caps to dwxgmac2 coreRussell King (Oracle)2023-08-262-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | Move the xgmac specific phylink capabilities to the dwxgmac2 support core. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1qZAXd-005pUP-JL@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | net: stmmac: move gmac4 specific phylink capabilities to gmac4Russell King (Oracle)2023-08-262-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | Move the setup of gmac4 speicifc phylink capabilities into gmac4 code. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1qZAXY-005pUJ-Ez@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | net: stmmac: provide stmmac_mac_phylink_get_caps()Russell King (Oracle)2023-08-262-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | Allow MACs to provide their own capabilities via the MAC operations struct. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1qZAXT-005pUD-Aj@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | net: stmmac: use phylink_limit_mac_speed()Russell King (Oracle)2023-08-261-21/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | Use phylink_limit_mac_speed() to limit the MAC capabilities rather than coding this for each speed. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1qZAXO-005pU7-61@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | net: stmmac: use "mdio_bus_data" local variableRussell King (Oracle)2023-08-261-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have a local variable for priv->plat->mdio_bus_data, which we use later in the conditional if() block, but we evaluate the above within the conditional expression. Use mdio_bus_data instead. Since these will be the only two users of this local variable, move its assignment just before the if(). Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1qZAXJ-005pU1-1z@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | net: stmmac: clean up passing fwnode to phylinkRussell King (Oracle)2023-08-261-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | Move the initialisation of the fwnode variable closer to its use site, rather than scattered throughout stmmac_phy_setup(). Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1qZAXD-005pTv-TN@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | net: stmmac: convert plat->phylink_node to fwnodeRussell King (Oracle)2023-08-264-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All users of plat->phylink_node first convert it to a fwnode. Rather than repeatedly convert to a fwnode, store it as a fwnode. To reflect this change, call it plat->port_node instead - it is used for more than just phylink. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1qZAX8-005pTo-OT@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | net: phylink: add phylink_limit_mac_speed()Russell King (Oracle)2023-08-262-0/+20
|/ / | | | | | | | | | | | | | | | | Add a function which can be used to limit the phylink MAC capabilities to an upper speed limit. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1qZAX3-005pTi-K1@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | veth: Avoid NAPI scheduling on failed SKB forwardingLiang Chen2023-08-261-3/+2
| | | | | | | | | | | | | | | | | | | | | | When an skb fails to be forwarded to the peer(e.g., skb data buffer length exceeds MTU), it will not be added to the peer's receive queue. Therefore, we should schedule the peer's NAPI poll function only when skb forwarding is successful to avoid unnecessary overhead. Signed-off-by: Liang Chen <liangchen.linux@gmail.com> Link: https://lore.kernel.org/r/20230824123131.7673-1-liangchen.linux@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | Merge tag 'for-netdev' of ↵Jakub Kicinski2023-08-26104-4212/+3719
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-08-25 We've added 87 non-merge commits during the last 8 day(s) which contain a total of 104 files changed, 3719 insertions(+), 4212 deletions(-). The main changes are: 1) Add multi uprobe BPF links for attaching multiple uprobes and usdt probes, which is significantly faster and saves extra fds, from Jiri Olsa. 2) Add support BPF cpu v4 instructions for arm64 JIT compiler, from Xu Kuohai. 3) Add support BPF cpu v4 instructions for riscv64 JIT compiler, from Pu Lehui. 4) Fix LWT BPF xmit hooks wrt their return values where propagating the result from skb_do_redirect() would trigger a use-after-free, from Yan Zhai. 5) Fix a BPF verifier issue related to bpf_kptr_xchg() with local kptr where the map's value kptr type and locally allocated obj type mismatch, from Yonghong Song. 6) Fix BPF verifier's check_func_arg_reg_off() function wrt graph root/node which bypassed reg->off == 0 enforcement, from Kumar Kartikeya Dwivedi. 7) Lift BPF verifier restriction in networking BPF programs to treat comparison of packet pointers not as a pointer leak, from Yafang Shao. 8) Remove unmaintained XDP BPF samples as they are maintained in xdp-tools repository out of tree, from Toke Høiland-Jørgensen. 9) Batch of fixes for the tracing programs from BPF samples in order to make them more libbpf-aware, from Daniel T. Lee. 10) Fix a libbpf signedness determination bug in the CO-RE relocation handling logic, from Andrii Nakryiko. 11) Extend libbpf to support CO-RE kfunc relocations. Also follow-up fixes for bpf_refcount shared ownership implementation, both from Dave Marchevsky. 12) Add a new bpf_object__unpin() API function to libbpf, from Daniel Xu. 13) Fix a memory leak in libbpf to also free btf_vmlinux when the bpf_object gets closed, from Hao Luo. 14) Small error output improvements to test_bpf module, from Helge Deller. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (87 commits) selftests/bpf: Add tests for rbtree API interaction in sleepable progs bpf: Allow bpf_spin_{lock,unlock} in sleepable progs bpf: Consider non-owning refs to refcounted nodes RCU protected bpf: Reenable bpf_refcount_acquire bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes bpf: Consider non-owning refs trusted bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire selftests/bpf: Enable cpu v4 tests for RV64 riscv, bpf: Support unconditional bswap insn riscv, bpf: Support signed div/mod insns riscv, bpf: Support 32-bit offset jmp insn riscv, bpf: Support sign-extension mov insns riscv, bpf: Support sign-extension load insns riscv, bpf: Fix missing exception handling and redundant zext for LDX_B/H/W samples/bpf: Add note to README about the XDP utilities moved to xdp-tools samples/bpf: Cleanup .gitignore samples/bpf: Remove the xdp_sample_pkts utility samples/bpf: Remove the xdp1 and xdp2 utilities samples/bpf: Remove the xdp_rxq_info utility samples/bpf: Remove the xdp_redirect* utilities ... ==================== Link: https://lore.kernel.org/r/20230825194319.12727-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * \ Merge branch 'bpf-refcount-followups-3-bpf_mem_free_rcu-refcounted-nodes'Alexei Starovoitov2023-08-257-14/+165
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dave Marchevsky says: ==================== BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes This series is the third of three (or more) followups to address issues in the bpf_refcount shared ownership implementation discovered by Kumar. This series addresses the use-after-free scenario described in [0]. The first followup series ([1]) also attempted to address the same use-after-free, but only got rid of the splat without addressing the underlying issue. After this series the underyling issue is fixed and bpf_refcount_acquire can be re-enabled. The main fix here is migration of bpf_obj_drop to use bpf_mem_free_rcu. To understand why this fixes the issue, let us consider the example interleaving provided by Kumar in [0]: CPU 0 CPU 1 n = bpf_obj_new lock(lock1) bpf_rbtree_add(rbtree1, n) m = bpf_rbtree_acquire(n) unlock(lock1) kptr_xchg(map, m) // move to map // at this point, refcount = 2 m = kptr_xchg(map, NULL) lock(lock2) lock(lock1) bpf_rbtree_add(rbtree2, m) p = bpf_rbtree_first(rbtree1) if (!RB_EMPTY_NODE) bpf_obj_drop_impl(m) // A bpf_rbtree_remove(rbtree1, p) unlock(lock1) bpf_obj_drop(p) // B bpf_refcount_acquire(m) // use-after-free ... Before this series, bpf_obj_drop returns memory to the allocator using bpf_mem_free. At this point (B in the example) there might be some non-owning references to that memory which the verifier believes are valid, but where the underlying memory was reused for some other allocation. Commit 7793fc3babe9 ("bpf: Make bpf_refcount_acquire fallible for non-owning refs") attempted to fix this by doing refcount_inc_non_zero on refcount_acquire in instead of refcount_inc under the assumption that preventing erroneous incr-on-0 would be sufficient. This isn't true, though: refcount_inc_non_zero must *check* if the refcount is zero, and the memory it's checking could have been reused, so the check may look at and incr random reused bytes. If we wait to reuse this memory until all non-owning refs that could point to it are gone, there is no possibility of this scenario happening. Migrating bpf_obj_drop to use bpf_mem_free_rcu for refcounted nodes accomplishes this. For such nodes, the validity of their underlying memory is now tied to RCU critical section. This matches MEM_RCU trustedness expectations, so the series takes the opportunity to more explicitly mark this trustedness state. The functional effects of trustedness changes here are rather small. This is largely due to local kptrs having separate verifier handling - with implicit trustedness assumptions - than arbitrary kptrs. Regardless, let's take the opportunity to move towards a world where trustedness is more explicitly handled. Changelog: v1 -> v2: https://lore.kernel.org/bpf/20230801203630.3581291-1-davemarchevsky@fb.com/ Patch 1 ("bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire") * Spent some time experimenting with a better approach as per convo w/ Yonghong on v1's patch. It started getting too complex, so left unchanged for now. Yonghong was fine with this approach being shipped. Patch 2 ("bpf: Consider non-owning refs trusted") * Add Yonghong ack Patch 3 ("bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes") * Add Yonghong ack Patch 4 ("bpf: Reenable bpf_refcount_acquire") * Add Yonghong ack Patch 5 ("bpf: Consider non-owning refs to refcounted nodes RCU protected") * Undo a nonfunctional whitespace change that shouldn't have been included (Yonghong) * Better logging message when complaining about rcu_read_{lock,unlock} in rbtree cb (Alexei) * Don't invalidate_non_owning_refs when processing bpf_rcu_read_unlock (Yonghong, Alexei) Patch 6 ("[RFC] bpf: Allow bpf_spin_{lock,unlock} in sleepable prog's RCU CS") * preempt_{disable,enable} in __bpf_spin_{lock,unlock} (Alexei) * Due to this we can consider spin_lock CS an RCU-sched read-side CS (per RCU/Design/Requirements/Requirements.rst). Modify in_rcu_cs accordingly. * no need to check for !in_rcu_cs before allowing bpf_spin_{lock,unlock} (Alexei) * RFC tag removed and renamed to "bpf: Allow bpf_spin_{lock,unlock} in sleepable progs" Patch 7 ("selftests/bpf: Add tests for rbtree API interaction in sleepable progs") * Remove "no explicit bpf_rcu_read_lock" failure test, add similar success test (Alexei) Summary of patch contents, with sub-bullets being leading questions and comments I think are worth reviewer attention: * Patches 1 and 2 are moreso documententation - and enforcement, in patch 1's case - of existing semantics / expectations * Patch 3 changes bpf_obj_drop behavior for refcounted nodes such that their underlying memory is not reused until RCU grace period elapses * Perhaps it makes sense to move to mem_free_rcu for _all_ non-owning refs in the future, not just refcounted. This might allow custom non-owning ref lifetime + invalidation logic to be entirely subsumed by MEM_RCU handling. IMO this needs a bit more thought and should be tackled outside of a fix series, so it's not attempted here. * Patch 4 re-enables bpf_refcount_acquire as changes in patch 3 fix the remaining use-after-free * One might expect this patch to be last in the series, or last before selftest changes. Patches 5 and 6 don't change verification or runtime behavior for existing BPF progs, though. * Patch 5 brings the verifier's understanding of refcounted node trustedness in line with Patch 4's changes * Patch 6 allows some bpf_spin_{lock, unlock} calls in sleepable progs. Marked RFC for a few reasons: * bpf_spin_{lock,unlock} haven't been usable in sleepable progs since before the introduction of bpf linked list and rbtree. As such this feels more like a new feature that may not belong in this fixes series. * Patch 7 adds tests [0]: https://lore.kernel.org/bpf/atfviesiidev4hu53hzravmtlau3wdodm2vqs7rd7tnwft34e3@xktodqeqevir/ [1]: https://lore.kernel.org/bpf/20230602022647.1571784-1-davemarchevsky@fb.com/ ==================== Link: https://lore.kernel.org/r/20230821193311.3290257-1-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | * | selftests/bpf: Add tests for rbtree API interaction in sleepable progsDave Marchevsky2023-08-252-0/+99
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Confirm that the following sleepable prog states fail verification: * bpf_rcu_read_unlock before bpf_spin_unlock * RCU CS will last at least as long as spin_lock CS Also confirm that correct usage passes verification, specifically: * Explicit use of bpf_rcu_read_{lock, unlock} in sleepable test prog * Implied RCU CS due to spin_lock CS None of the selftest progs actually attach to bpf_testmod's bpf_testmod_test_read. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230821193311.3290257-8-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | * | bpf: Allow bpf_spin_{lock,unlock} in sleepable progsDave Marchevsky2023-08-252-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 9e7a4d9831e8 ("bpf: Allow LSM programs to use bpf spin locks") disabled bpf_spin_lock usage in sleepable progs, stating: Sleepable LSM programs can be preempted which means that allowng spin locks will need more work (disabling preemption and the verifier ensuring that no sleepable helpers are called when a spin lock is held). This patch disables preemption before grabbing bpf_spin_lock. The second requirement above "no sleepable helpers are called when a spin lock is held" is implicitly enforced by current verifier logic due to helper calls in spin_lock CS being disabled except for a few exceptions, none of which sleep. Due to above preemption changes, bpf_spin_lock CS can also be considered a RCU CS, so verifier's in_rcu_cs check is modified to account for this. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230821193311.3290257-7-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | * | bpf: Consider non-owning refs to refcounted nodes RCU protectedDave Marchevsky2023-08-252-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An earlier patch in the series ensures that the underlying memory of nodes with bpf_refcount - which can have multiple owners - is not reused until RCU grace period has elapsed. This prevents use-after-free with non-owning references that may point to recently-freed memory. While RCU read lock is held, it's safe to dereference such a non-owning ref, as by definition RCU GP couldn't have elapsed and therefore underlying memory couldn't have been reused. From the perspective of verifier "trustedness" non-owning refs to refcounted nodes are now trusted only in RCU CS and therefore should no longer pass is_trusted_reg, but rather is_rcu_reg. Let's mark them MEM_RCU in order to reflect this new state. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230821193311.3290257-6-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | * | bpf: Reenable bpf_refcount_acquireDave Marchevsky2023-08-252-4/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that all reported issues are fixed, bpf_refcount_acquire can be turned back on. Also reenable all bpf_refcount-related tests which were disabled. This a revert of: * commit f3514a5d6740 ("selftests/bpf: Disable newly-added 'owner' field test until refcount re-enabled") * commit 7deca5eae833 ("bpf: Disable bpf_refcount_acquire kfunc calls until race conditions are fixed") Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230821193311.3290257-5-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | * | bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodesDave Marchevsky2023-08-251-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the final fix for the use-after-free scenario described in commit 7793fc3babe9 ("bpf: Make bpf_refcount_acquire fallible for non-owning refs"). That commit, by virtue of changing bpf_refcount_acquire's refcount_inc to a refcount_inc_not_zero, fixed the "refcount incr on 0" splat. The not_zero check in refcount_inc_not_zero, though, still occurs on memory that could have been free'd and reused, so the commit didn't properly fix the root cause. This patch actually fixes the issue by free'ing using the recently-added bpf_mem_free_rcu, which ensures that the memory is not reused until RCU grace period has elapsed. If that has happened then there are no non-owning references alive that point to the recently-free'd memory, so it can be safely reused. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230821193311.3290257-4-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | * | bpf: Consider non-owning refs trustedDave Marchevsky2023-08-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recent discussions around default kptr "trustedness" led to changes such as commit 6fcd486b3a0a ("bpf: Refactor RCU enforcement in the verifier."). One of the conclusions of those discussions, as expressed in code and comments in that patch, is that we'd like to move away from 'raw' PTR_TO_BTF_ID without some type flag or other register state indicating trustedness. Although PTR_TRUSTED and PTR_UNTRUSTED flags mark this state explicitly, the verifier currently considers trustedness implied by other register state. For example, owning refs to graph collection nodes must have a nonzero ref_obj_id, so they pass the is_trusted_reg check despite having no explicit PTR_{UN}TRUSTED flag. This patch makes trustedness of non-owning refs to graph collection nodes explicit as well. By definition, non-owning refs are currently trusted. Although the ref has no control over pointee lifetime, due to non-owning ref clobbering rules (see invalidate_non_owning_refs) dereferencing a non-owning ref is safe in the critical section controlled by bpf_spin_lock associated with its owning collection. Note that the previous statement does not hold true for nodes with shared ownership due to the use-after-free issue that this series is addressing. True shared ownership was disabled by commit 7deca5eae833 ("bpf: Disable bpf_refcount_acquire kfunc calls until race conditions are fixed"), though, so the statement holds for now. Further patches in the series will change the trustedness state of non-owning refs before re-enabling bpf_refcount_acquire. Let's add NON_OWN_REF type flag to BPF_REG_TRUSTED_MODIFIERS such that a non-owning ref reg state would pass is_trusted_reg check. Somewhat surprisingly, this doesn't result in any change to user-visible functionality elsewhere in the verifier: graph collection nodes are all marked MEM_ALLOC, which tends to be handled in separate codepaths from "raw" PTR_TO_BTF_ID. Regardless, let's be explicit here and document the current state of things before changing it elsewhere in the series. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230821193311.3290257-3-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | * | bpf: Ensure kptr_struct_meta is non-NULL for collection insert and ↵Dave Marchevsky2023-08-251-0/+14
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | refcount_acquire It's straightforward to prove that kptr_struct_meta must be non-NULL for any valid call to these kfuncs: * btf_parse_struct_metas in btf.c creates a btf_struct_meta for any struct in user BTF with a special field (e.g. bpf_refcount, {rb,list}_node). These are stored in that BTF's struct_meta_tab. * __process_kf_arg_ptr_to_graph_node in verifier.c ensures that nodes have {rb,list}_node field and that it's at the correct offset. Similarly, check_kfunc_args ensures bpf_refcount field existence for node param to bpf_refcount_acquire. * So a btf_struct_meta must have been created for the struct type of node param to these kfuncs * That BTF and its struct_meta_tab are guaranteed to still be around. Any arbitrary {rb,list} node the BPF program interacts with either: came from bpf_obj_new or a collection removal kfunc in the same program, in which case the BTF is associated with the program and still around; or came from bpf_kptr_xchg, in which case the BTF was associated with the map and is still around Instead of silently continuing with NULL struct_meta, which caused confusing bugs such as those addressed by commit 2140a6e3422d ("bpf: Set kptr_struct_meta for node param to list and rbtree insert funcs"), let's error out. Then, at runtime, we can confidently say that the implementations of these kfuncs were given a non-NULL kptr_struct_meta, meaning that special-field-specific functionality like bpf_obj_free_fields and the bpf_obj_drop change introduced later in this series are guaranteed to execute. This patch doesn't change functionality, just makes it easier to reason about existing functionality. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230821193311.3290257-2-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * | Merge branch 'add-support-cpu-v4-insns-for-rv64'Alexei Starovoitov2023-08-248-28/+122
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pu Lehui says: ==================== Add support cpu v4 insns for RV64 Add support cpu v4 instructions for RV64. The relevant tests have passed as show bellow: Summary: 6/166 PASSED, 0 SKIPPED, 0 FAILED NOTE: ldsx_insn testcase uses fentry and needs to rely on ftrace direct call [0]. [0] https://lore.kernel.org/all/20230627111612.761164-1-suagrfillet@gmail.com/ v2: - Use temporary reg to avoid clobbering the source reg in movs_8/16 insns. (Björn) - Add Acked-by v1: https://lore.kernel.org/bpf/20230823231059.3363698-1-pulehui@huaweicloud.com ==================== Tested-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20230824095001.3408573-1-pulehui@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | * | selftests/bpf: Enable cpu v4 tests for RV64Pu Lehui2023-08-246-6/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enable cpu v4 tests for RV64, and the relevant tests have passed. Signed-off-by: Pu Lehui <pulehui@huawei.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/r/20230824095001.3408573-8-pulehui@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | * | riscv, bpf: Support unconditional bswap insnPu Lehui2023-08-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support unconditional bswap instruction. Since riscv is always little-endian, just treat the unconditional scenario the same as big-endian conversion. Signed-off-by: Pu Lehui <pulehui@huawei.com> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/r/20230824095001.3408573-7-pulehui@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | * | riscv, bpf: Support signed div/mod insnsPu Lehui2023-08-242-6/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support signed div/mod instructions for RV64. Signed-off-by: Pu Lehui <pulehui@huawei.com> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/r/20230824095001.3408573-6-pulehui@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | * | riscv, bpf: Support 32-bit offset jmp insnPu Lehui2023-08-241-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support 32-bit offset jmp instruction for RV64. Signed-off-by: Pu Lehui <pulehui@huawei.com> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/r/20230824095001.3408573-5-pulehui@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>