diff options
author | Tejun Heo <tj@kernel.org> | 2024-09-04 22:24:59 +0200 |
---|---|---|
committer | Tejun Heo <tj@kernel.org> | 2024-09-04 22:24:59 +0200 |
commit | a4103eacc2ab408bb65e9902f0857b219fb489de (patch) | |
tree | dc6455907805ca7379d0d58d155a41e843d478f1 /tools/sched_ext/README.md | |
parent | sched_ext: Add cgroup support (diff) | |
download | linux-a4103eacc2ab408bb65e9902f0857b219fb489de.tar.xz linux-a4103eacc2ab408bb65e9902f0857b219fb489de.zip |
sched_ext: Add a cgroup scheduler which uses flattened hierarchy
This patch adds scx_flatcg example scheduler which implements hierarchical
weight-based cgroup CPU control by flattening the cgroup hierarchy into a
single layer by compounding the active weight share at each level.
This flattening of hierarchy can bring a substantial performance gain when
the cgroup hierarchy is nested multiple levels. in a simple benchmark using
wrk[8] on apache serving a CGI script calculating sha1sum of a small file,
it outperforms CFS by ~3% with CPU controller disabled and by ~10% with two
apache instances competing with 2:1 weight ratio nested four level deep.
However, the gain comes at the cost of not being able to properly handle
thundering herd of cgroups. For example, if many cgroups which are nested
behind a low priority parent cgroup wake up around the same time, they may
be able to consume more CPU cycles than they are entitled to. In many use
cases, this isn't a real concern especially given the performance gain.
Also, there are ways to mitigate the problem further by e.g. introducing an
extra scheduling layer on cgroup delegation boundaries.
v5: - Updated to specify SCX_OPS_HAS_CGROUP_WEIGHT instead of
SCX_OPS_KNOB_CGROUP_WEIGHT.
v4: - Revert reference counted kptr for cgv_node as the change caused easily
reproducible stalls.
v3: - Updated to reflect the core API changes including ops.init/exit_task()
and direct dispatch from ops.select_cpu(). Fixes and improvements
including additional statistics.
- Use reference counted kptr for cgv_node instead of xchg'ing against
stash location.
- Dropped '-p' option.
v2: - Use SCX_BUG[_ON]() to simplify error handling.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: David Vernet <dvernet@meta.com>
Acked-by: Josh Don <joshdon@google.com>
Acked-by: Hao Luo <haoluo@google.com>
Acked-by: Barret Rhoden <brho@google.com>
Diffstat (limited to 'tools/sched_ext/README.md')
-rw-r--r-- | tools/sched_ext/README.md | 12 |
1 files changed, 12 insertions, 0 deletions
diff --git a/tools/sched_ext/README.md b/tools/sched_ext/README.md index 8efe70cc4363..16a42e4060f6 100644 --- a/tools/sched_ext/README.md +++ b/tools/sched_ext/README.md @@ -192,6 +192,18 @@ where this could be particularly useful is running VMs, where running with infinite slices and no timer ticks allows the VM to avoid unnecessary expensive vmexits. +## scx_flatcg + +A flattened cgroup hierarchy scheduler. This scheduler implements hierarchical +weight-based cgroup CPU control by flattening the cgroup hierarchy into a single +layer, by compounding the active weight share at each level. The effect of this +is a much more performant CPU controller, which does not need to descend down +cgroup trees in order to properly compute a cgroup's share. + +Similar to scx_simple, in limited scenarios, this scheduler can perform +reasonably well on single socket-socket systems with a unified L3 cache and show +significantly lowered hierarchical scheduling overhead. + # Troubleshooting |