diff options
author | Tejun Heo <tj@kernel.org> | 2024-11-05 22:49:04 +0100 |
---|---|---|
committer | Tejun Heo <tj@kernel.org> | 2024-11-08 21:42:22 +0100 |
commit | e32c260195e6ff72940ab7826e38e0a0066fc58f (patch) | |
tree | 4397fbaba6a85afa05b90fb9a1c5d31c10e6e16b /tools/sched_ext/scx_show_state.py | |
parent | sched_ext: Avoid live-locking bypass mode switching (diff) | |
download | linux-e32c260195e6ff72940ab7826e38e0a0066fc58f.tar.xz linux-e32c260195e6ff72940ab7826e38e0a0066fc58f.zip |
sched_ext: Enable the ops breather and eject BPF scheduler on softlockup
On 2 x Intel Sapphire Rapids machines with 224 logical CPUs, a poorly
behaving BPF scheduler can live-lock the system by making multiple CPUs bang
on the same DSQ to the point where soft-lockup detection triggers before
SCX's own watchdog can take action. It also seems possible that the machine
can be live-locked enough to prevent scx_ops_helper, which is an RT task,
from running in a timely manner.
Implement scx_softlockup() which is called when three quarters of
soft-lockup threshold has passed. The function immediately enables the ops
breather and triggers an ops error to initiate ejection of the BPF
scheduler.
The previous and this patch combined enable the kernel to reliably recover
the system from live-lock conditions that can be triggered by a poorly
behaving BPF scheduler on Intel dual socket systems.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Douglas Anderson <dianders@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Diffstat (limited to 'tools/sched_ext/scx_show_state.py')
-rw-r--r-- | tools/sched_ext/scx_show_state.py | 2 |
1 files changed, 2 insertions, 0 deletions
diff --git a/tools/sched_ext/scx_show_state.py b/tools/sched_ext/scx_show_state.py index c4b3fdda9a0b..b800d4f5f2e9 100644 --- a/tools/sched_ext/scx_show_state.py +++ b/tools/sched_ext/scx_show_state.py @@ -35,6 +35,8 @@ print(f'enabled : {read_static_key("__scx_ops_enabled")}') print(f'switching_all : {read_int("scx_switching_all")}') print(f'switched_all : {read_static_key("__scx_switched_all")}') print(f'enable_state : {ops_state_str(enable_state)} ({enable_state})') +print(f'in_softlockup : {prog["scx_in_softlockup"].value_()}') +print(f'breather_depth: {read_atomic("scx_ops_breather_depth")}') print(f'bypass_depth : {prog["scx_ops_bypass_depth"].value_()}') print(f'nr_rejected : {read_atomic("scx_nr_rejected")}') print(f'enable_seq : {read_atomic("scx_enable_seq")}') |