summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--Documentation/power/index.rst1
-rw-r--r--Documentation/power/powercap/dtpm.rst212
-rw-r--r--drivers/powercap/Kconfig13
-rw-r--r--drivers/powercap/Makefile2
-rw-r--r--drivers/powercap/dtpm.c480
-rw-r--r--drivers/powercap/dtpm_cpu.c257
-rw-r--r--drivers/powercap/intel_rapl_common.c9
-rw-r--r--include/asm-generic/vmlinux.lds.h11
-rw-r--r--include/linux/cpuhotplug.h1
-rw-r--r--include/linux/dtpm.h77
-rw-r--r--include/linux/units.h4
-rw-r--r--kernel/power/Kconfig12
12 files changed, 1067 insertions, 12 deletions
diff --git a/Documentation/power/index.rst b/Documentation/power/index.rst
index ced8a8007434..a0f5244fb427 100644
--- a/Documentation/power/index.rst
+++ b/Documentation/power/index.rst
@@ -30,6 +30,7 @@ Power Management
userland-swsusp
powercap/powercap
+ powercap/dtpm
regulator/consumer
regulator/design
diff --git a/Documentation/power/powercap/dtpm.rst b/Documentation/power/powercap/dtpm.rst
new file mode 100644
index 000000000000..a38dee3d815b
--- /dev/null
+++ b/Documentation/power/powercap/dtpm.rst
@@ -0,0 +1,212 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==========================================
+Dynamic Thermal Power Management framework
+==========================================
+
+On the embedded world, the complexity of the SoC leads to an
+increasing number of hotspots which need to be monitored and mitigated
+as a whole in order to prevent the temperature to go above the
+normative and legally stated 'skin temperature'.
+
+Another aspect is to sustain the performance for a given power budget,
+for example virtual reality where the user can feel dizziness if the
+performance is capped while a big CPU is processing something else. Or
+reduce the battery charging because the dissipated power is too high
+compared with the power consumed by other devices.
+
+The user space is the most adequate place to dynamically act on the
+different devices by limiting their power given an application
+profile: it has the knowledge of the platform.
+
+The Dynamic Thermal Power Management (DTPM) is a technique acting on
+the device power by limiting and/or balancing a power budget among
+different devices.
+
+The DTPM framework provides an unified interface to act on the
+device power.
+
+Overview
+========
+
+The DTPM framework relies on the powercap framework to create the
+powercap entries in the sysfs directory and implement the backend
+driver to do the connection with the power manageable device.
+
+The DTPM is a tree representation describing the power constraints
+shared between devices, not their physical positions.
+
+The nodes of the tree are a virtual description aggregating the power
+characteristics of the children nodes and their power limitations.
+
+The leaves of the tree are the real power manageable devices.
+
+For instance::
+
+ SoC
+ |
+ `-- pkg
+ |
+ |-- pd0 (cpu0-3)
+ |
+ `-- pd1 (cpu4-5)
+
+The pkg power will be the sum of pd0 and pd1 power numbers::
+
+ SoC (400mW - 3100mW)
+ |
+ `-- pkg (400mW - 3100mW)
+ |
+ |-- pd0 (100mW - 700mW)
+ |
+ `-- pd1 (300mW - 2400mW)
+
+When the nodes are inserted in the tree, their power characteristics are propagated to the parents::
+
+ SoC (600mW - 5900mW)
+ |
+ |-- pkg (400mW - 3100mW)
+ | |
+ | |-- pd0 (100mW - 700mW)
+ | |
+ | `-- pd1 (300mW - 2400mW)
+ |
+ `-- pd2 (200mW - 2800mW)
+
+Each node have a weight on a 2^10 basis reflecting the percentage of power consumption along the siblings::
+
+ SoC (w=1024)
+ |
+ |-- pkg (w=538)
+ | |
+ | |-- pd0 (w=231)
+ | |
+ | `-- pd1 (w=794)
+ |
+ `-- pd2 (w=486)
+
+ Note the sum of weights at the same level are equal to 1024.
+
+When a power limitation is applied to a node, then it is distributed along the children given their weights. For example, if we set a power limitation of 3200mW at the 'SoC' root node, the resulting tree will be::
+
+ SoC (w=1024) <--- power_limit = 3200mW
+ |
+ |-- pkg (w=538) --> power_limit = 1681mW
+ | |
+ | |-- pd0 (w=231) --> power_limit = 378mW
+ | |
+ | `-- pd1 (w=794) --> power_limit = 1303mW
+ |
+ `-- pd2 (w=486) --> power_limit = 1519mW
+
+
+Flat description
+----------------
+
+A root node is created and it is the parent of all the nodes. This
+description is the simplest one and it is supposed to give to user
+space a flat representation of all the devices supporting the power
+limitation without any power limitation distribution.
+
+Hierarchical description
+------------------------
+
+The different devices supporting the power limitation are represented
+hierarchically. There is one root node, all intermediate nodes are
+grouping the child nodes which can be intermediate nodes also or real
+devices.
+
+The intermediate nodes aggregate the power information and allows to
+set the power limit given the weight of the nodes.
+
+User space API
+==============
+
+As stated in the overview, the DTPM framework is built on top of the
+powercap framework. Thus the sysfs interface is the same, please refer
+to the powercap documentation for further details.
+
+ * power_uw: Instantaneous power consumption. If the node is an
+ intermediate node, then the power consumption will be the sum of all
+ children power consumption.
+
+ * max_power_range_uw: The power range resulting of the maximum power
+ minus the minimum power.
+
+ * name: The name of the node. This is implementation dependent. Even
+ if it is not recommended for the user space, several nodes can have
+ the same name.
+
+ * constraint_X_name: The name of the constraint.
+
+ * constraint_X_max_power_uw: The maximum power limit to be applicable
+ to the node.
+
+ * constraint_X_power_limit_uw: The power limit to be applied to the
+ node. If the value contained in constraint_X_max_power_uw is set,
+ the constraint will be removed.
+
+ * constraint_X_time_window_us: The meaning of this file will depend
+ on the constraint number.
+
+Constraints
+-----------
+
+ * Constraint 0: The power limitation is immediately applied, without
+ limitation in time.
+
+Kernel API
+==========
+
+Overview
+--------
+
+The DTPM framework has no power limiting backend support. It is
+generic and provides a set of API to let the different drivers to
+implement the backend part for the power limitation and create the
+power constraints tree.
+
+It is up to the platform to provide the initialization function to
+allocate and link the different nodes of the tree.
+
+A special macro has the role of declaring a node and the corresponding
+initialization function via a description structure. This one contains
+an optional parent field allowing to hook different devices to an
+already existing tree at boot time.
+
+For instance::
+
+ struct dtpm_descr my_descr = {
+ .name = "my_name",
+ .init = my_init_func,
+ };
+
+ DTPM_DECLARE(my_descr);
+
+The nodes of the DTPM tree are described with dtpm structure. The
+steps to add a new power limitable device is done in three steps:
+
+ * Allocate the dtpm node
+ * Set the power number of the dtpm node
+ * Register the dtpm node
+
+The registration of the dtpm node is done with the powercap
+ops. Basically, it must implements the callbacks to get and set the
+power and the limit.
+
+Alternatively, if the node to be inserted is an intermediate one, then
+a simple function to insert it as a future parent is available.
+
+If a device has its power characteristics changing, then the tree must
+be updated with the new power numbers and weights.
+
+Nomenclature
+------------
+
+ * dtpm_alloc() : Allocate and initialize a dtpm structure
+
+ * dtpm_register() : Add the dtpm node to the tree
+
+ * dtpm_unregister() : Remove the dtpm node from the tree
+
+ * dtpm_update_power() : Update the power characteristics of the dtpm node
diff --git a/drivers/powercap/Kconfig b/drivers/powercap/Kconfig
index bc228725346b..20b4325c6161 100644
--- a/drivers/powercap/Kconfig
+++ b/drivers/powercap/Kconfig
@@ -43,4 +43,17 @@ config IDLE_INJECT
CPUs for power capping. Idle period can be injected
synchronously on a set of specified CPUs or alternatively
on a per CPU basis.
+
+config DTPM
+ bool "Power capping for Dynamic Thermal Power Management"
+ help
+ This enables support for the power capping for the dynamic
+ thermal power management userspace engine.
+
+config DTPM_CPU
+ bool "Add CPU power capping based on the energy model"
+ depends on DTPM && ENERGY_MODEL
+ help
+ This enables support for CPU power limitation based on
+ energy model.
endif
diff --git a/drivers/powercap/Makefile b/drivers/powercap/Makefile
index 7255c94ec61c..fabcf388a8d3 100644
--- a/drivers/powercap/Makefile
+++ b/drivers/powercap/Makefile
@@ -1,4 +1,6 @@
# SPDX-License-Identifier: GPL-2.0-only
+obj-$(CONFIG_DTPM) += dtpm.o
+obj-$(CONFIG_DTPM_CPU) += dtpm_cpu.o
obj-$(CONFIG_POWERCAP) += powercap_sys.o
obj-$(CONFIG_INTEL_RAPL_CORE) += intel_rapl_common.o
obj-$(CONFIG_INTEL_RAPL) += intel_rapl_msr.o
diff --git a/drivers/powercap/dtpm.c b/drivers/powercap/dtpm.c
new file mode 100644
index 000000000000..5a51cd34a7e8
--- /dev/null
+++ b/drivers/powercap/dtpm.c
@@ -0,0 +1,480 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright 2020 Linaro Limited
+ *
+ * Author: Daniel Lezcano <daniel.lezcano@linaro.org>
+ *
+ * The powercap based Dynamic Thermal Power Management framework
+ * provides to the userspace a consistent API to set the power limit
+ * on some devices.
+ *
+ * DTPM defines the functions to create a tree of constraints. Each
+ * parent node is a virtual description of the aggregation of the
+ * children. It propagates the constraints set at its level to its
+ * children and collect the children power information. The leaves of
+ * the tree are the real devices which have the ability to get their
+ * current power consumption and set their power limit.
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/dtpm.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/powercap.h>
+#include <linux/slab.h>
+#include <linux/mutex.h>
+
+#define DTPM_POWER_LIMIT_FLAG 0
+
+static const char *constraint_name[] = {
+ "Instantaneous",
+};
+
+static DEFINE_MUTEX(dtpm_lock);
+static struct powercap_control_type *pct;
+static struct dtpm *root;
+
+static int get_time_window_us(struct powercap_zone *pcz, int cid, u64 *window)
+{
+ return -ENOSYS;
+}
+
+static int set_time_window_us(struct powercap_zone *pcz, int cid, u64 window)
+{
+ return -ENOSYS;
+}
+
+static int get_max_power_range_uw(struct powercap_zone *pcz, u64 *max_power_uw)
+{
+ struct dtpm *dtpm = to_dtpm(pcz);
+
+ mutex_lock(&dtpm_lock);
+ *max_power_uw = dtpm->power_max - dtpm->power_min;
+ mutex_unlock(&dtpm_lock);
+
+ return 0;
+}
+
+static int __get_power_uw(struct dtpm *dtpm, u64 *power_uw)
+{
+ struct dtpm *child;
+ u64 power;
+ int ret = 0;
+
+ if (dtpm->ops) {
+ *power_uw = dtpm->ops->get_power_uw(dtpm);
+ return 0;
+ }
+
+ *power_uw = 0;
+
+ list_for_each_entry(child, &dtpm->children, sibling) {
+ ret = __get_power_uw(child, &power);
+ if (ret)
+ break;
+ *power_uw += power;
+ }
+
+ return ret;
+}
+
+static int get_power_uw(struct powercap_zone *pcz, u64 *power_uw)
+{
+ struct dtpm *dtpm = to_dtpm(pcz);
+ int ret;
+
+ mutex_lock(&dtpm_lock);
+ ret = __get_power_uw(dtpm, power_uw);
+ mutex_unlock(&dtpm_lock);
+
+ return ret;
+}
+
+static void __dtpm_rebalance_weight(struct dtpm *dtpm)
+{
+ struct dtpm *child;
+
+ list_for_each_entry(child, &dtpm->children, sibling) {
+
+ pr_debug("Setting weight '%d' for '%s'\n",
+ child->weight, child->zone.name);
+
+ child->weight = DIV64_U64_ROUND_CLOSEST(
+ child->power_max * 1024, dtpm->power_max);
+
+ __dtpm_rebalance_weight(child);
+ }
+}
+
+static void __dtpm_sub_power(struct dtpm *dtpm)
+{
+ struct dtpm *parent = dtpm->parent;
+
+ while (parent) {
+ parent->power_min -= dtpm->power_min;
+ parent->power_max -= dtpm->power_max;
+ parent->power_limit -= dtpm->power_limit;
+ parent = parent->parent;
+ }
+
+ __dtpm_rebalance_weight(root);
+}
+
+static void __dtpm_add_power(struct dtpm *dtpm)
+{
+ struct dtpm *parent = dtpm->parent;
+
+ while (parent) {
+ parent->power_min += dtpm->power_min;
+ parent->power_max += dtpm->power_max;
+ parent->power_limit += dtpm->power_limit;
+ parent = parent->parent;
+ }
+
+ __dtpm_rebalance_weight(root);
+}
+
+/**
+ * dtpm_update_power - Update the power on the dtpm
+ * @dtpm: a pointer to a dtpm structure to update
+ * @power_min: a u64 representing the new power_min value
+ * @power_max: a u64 representing the new power_max value
+ *
+ * Function to update the power values of the dtpm node specified in
+ * parameter. These new values will be propagated to the tree.
+ *
+ * Return: zero on success, -EINVAL if the values are inconsistent
+ */
+int dtpm_update_power(struct dtpm *dtpm, u64 power_min, u64 power_max)
+{
+ int ret = 0;
+
+ mutex_lock(&dtpm_lock);
+
+ if (power_min == dtpm->power_min && power_max == dtpm->power_max)
+ goto unlock;
+
+ if (power_max < power_min) {
+ ret = -EINVAL;
+ goto unlock;
+ }
+
+ __dtpm_sub_power(dtpm);
+
+ dtpm->power_min = power_min;
+ dtpm->power_max = power_max;
+ if (!test_bit(DTPM_POWER_LIMIT_FLAG, &dtpm->flags))
+ dtpm->power_limit = power_max;
+
+ __dtpm_add_power(dtpm);
+
+unlock:
+ mutex_unlock(&dtpm_lock);
+
+ return ret;
+}
+
+/**
+ * dtpm_release_zone - Cleanup when the node is released
+ * @pcz: a pointer to a powercap_zone structure
+ *
+ * Do some housecleaning and update the weight on the tree. The
+ * release will be denied if the node has children. This function must
+ * be called by the specific release callback of the different
+ * backends.
+ *
+ * Return: 0 on success, -EBUSY if there are children
+ */
+int dtpm_release_zone(struct powercap_zone *pcz)
+{
+ struct dtpm *dtpm = to_dtpm(pcz);
+ struct dtpm *parent = dtpm->parent;
+
+ mutex_lock(&dtpm_lock);
+
+ if (!list_empty(&dtpm->children)) {
+ mutex_unlock(&dtpm_lock);
+ return -EBUSY;
+ }
+
+ if (parent)
+ list_del(&dtpm->sibling);
+
+ __dtpm_sub_power(dtpm);
+
+ mutex_unlock(&dtpm_lock);
+
+ if (dtpm->ops)
+ dtpm->ops->release(dtpm);
+
+ kfree(dtpm);
+
+ return 0;
+}
+
+static int __get_power_limit_uw(struct dtpm *dtpm, int cid, u64 *power_limit)
+{
+ *power_limit = dtpm->power_limit;
+ return 0;
+}
+
+static int get_power_limit_uw(struct powercap_zone *pcz,
+ int cid, u64 *power_limit)
+{
+ struct dtpm *dtpm = to_dtpm(pcz);
+ int ret;
+
+ mutex_lock(&dtpm_lock);
+ ret = __get_power_limit_uw(dtpm, cid, power_limit);
+ mutex_unlock(&dtpm_lock);
+
+ return ret;
+}
+
+/*
+ * Set the power limit on the nodes, the power limit is distributed
+ * given the weight of the children.
+ *
+ * The dtpm node lock must be held when calling this function.
+ */
+static int __set_power_limit_uw(struct dtpm *dtpm, int cid, u64 power_limit)
+{
+ struct dtpm *child;
+ int ret = 0;
+ u64 power;
+
+ /*
+ * A max power limitation means we remove the power limit,
+ * otherwise we set a constraint and flag the dtpm node.
+ */
+ if (power_limit == dtpm->power_max) {
+ clear_bit(DTPM_POWER_LIMIT_FLAG, &dtpm->flags);
+ } else {
+ set_bit(DTPM_POWER_LIMIT_FLAG, &dtpm->flags);
+ }
+
+ pr_debug("Setting power limit for '%s': %llu uW\n",
+ dtpm->zone.name, power_limit);
+
+ /*
+ * Only leaves of the dtpm tree has ops to get/set the power
+ */
+ if (dtpm->ops) {
+ dtpm->power_limit = dtpm->ops->set_power_uw(dtpm, power_limit);
+ } else {
+ dtpm->power_limit = 0;
+
+ list_for_each_entry(child, &dtpm->children, sibling) {
+
+ /*
+ * Integer division rounding will inevitably
+ * lead to a different min or max value when
+ * set several times. In order to restore the
+ * initial value, we force the child's min or
+ * max power every time if the constraint is
+ * at the boundaries.
+ */
+ if (power_limit == dtpm->power_max) {
+ power = child->power_max;
+ } else if (power_limit == dtpm->power_min) {
+ power = child->power_min;
+ } else {
+ power = DIV_ROUND_CLOSEST_ULL(
+ power_limit * child->weight, 1024);
+ }
+
+ pr_debug("Setting power limit for '%s': %llu uW\n",
+ child->zone.name, power);
+
+ ret = __set_power_limit_uw(child, cid, power);
+ if (!ret)
+ ret = __get_power_limit_uw(child, cid, &power);
+
+ if (ret)
+ break;
+
+ dtpm->power_limit += power;
+ }
+ }
+
+ return ret;
+}
+
+static int set_power_limit_uw(struct powercap_zone *pcz,
+ int cid, u64 power_limit)
+{
+ struct dtpm *dtpm = to_dtpm(pcz);
+ int ret;
+
+ mutex_lock(&dtpm_lock);
+
+ /*
+ * Don't allow values outside of the power range previously
+ * set when initializing the power numbers.
+ */
+ power_limit = clamp_val(power_limit, dtpm->power_min, dtpm->power_max);
+
+ ret = __set_power_limit_uw(dtpm, cid, power_limit);
+
+ pr_debug("%s: power limit: %llu uW, power max: %llu uW\n",
+ dtpm->zone.name, dtpm->power_limit, dtpm->power_max);
+
+ mutex_unlock(&dtpm_lock);
+
+ return ret;
+}
+
+static const char *get_constraint_name(struct powercap_zone *pcz, int cid)
+{
+ return constraint_name[cid];
+}
+
+static int get_max_power_uw(struct powercap_zone *pcz, int id, u64 *max_power)
+{
+ struct dtpm *dtpm = to_dtpm(pcz);
+
+ mutex_lock(&dtpm_lock);
+ *max_power = dtpm->power_max;
+ mutex_unlock(&dtpm_lock);
+
+ return 0;
+}
+
+static struct powercap_zone_constraint_ops constraint_ops = {
+ .set_power_limit_uw = set_power_limit_uw,
+ .get_power_limit_uw = get_power_limit_uw,
+ .set_time_window_us = set_time_window_us,
+ .get_time_window_us = get_time_window_us,
+ .get_max_power_uw = get_max_power_uw,
+ .get_name = get_constraint_name,
+};
+
+static struct powercap_zone_ops zone_ops = {
+ .get_max_power_range_uw = get_max_power_range_uw,
+ .get_power_uw = get_power_uw,
+ .release = dtpm_release_zone,
+};
+
+/**
+ * dtpm_alloc - Allocate and initialize a dtpm struct
+ * @name: a string specifying the name of the node
+ *
+ * Return: a struct dtpm pointer, NULL in case of error
+ */
+struct dtpm *dtpm_alloc(struct dtpm_ops *ops)
+{
+ struct dtpm *dtpm;
+
+ dtpm = kzalloc(sizeof(*dtpm), GFP_KERNEL);
+ if (dtpm) {
+ INIT_LIST_HEAD(&dtpm->children);
+ INIT_LIST_HEAD(&dtpm->sibling);
+ dtpm->weight = 1024;
+ dtpm->ops = ops;
+ }
+
+ return dtpm;
+}
+
+/**
+ * dtpm_unregister - Unregister a dtpm node from the hierarchy tree
+ * @dtpm: a pointer to a dtpm structure corresponding to the node to be removed
+ *
+ * Call the underlying powercap unregister function. That will call
+ * the release callback of the powercap zone.
+ */
+void dtpm_unregister(struct dtpm *dtpm)
+{
+ powercap_unregister_zone(pct, &dtpm->zone);
+
+ pr_info("Unregistered dtpm node '%s'\n", dtpm->zone.name);
+}
+
+/**
+ * dtpm_register - Register a dtpm node in the hierarchy tree
+ * @name: a string specifying the name of the node
+ * @dtpm: a pointer to a dtpm structure corresponding to the new node
+ * @parent: a pointer to a dtpm structure corresponding to the parent node
+ *
+ * Create a dtpm node in the tree. If no parent is specified, the node
+ * is the root node of the hierarchy. If the root node already exists,
+ * then the registration will fail. The powercap controller must be
+ * initialized before calling this function.
+ *
+ * The dtpm structure must be initialized with the power numbers
+ * before calling this function.
+ *
+ * Return: zero on success, a negative value in case of error:
+ * -EAGAIN: the function is called before the framework is initialized.
+ * -EBUSY: the root node is already inserted
+ * -EINVAL: * there is no root node yet and @parent is specified
+ * * no all ops are defined
+ * * parent have ops which are reserved for leaves
+ * Other negative values are reported back from the powercap framework
+ */
+int dtpm_register(const char *name, struct dtpm *dtpm, struct dtpm *parent)
+{
+ struct powercap_zone *pcz;
+
+ if (!pct)
+ return -EAGAIN;
+
+ if (root && !parent)
+ return -EBUSY;
+
+ if (!root && parent)
+ return -EINVAL;
+
+ if (parent && parent->ops)
+ return -EINVAL;
+
+ if (!dtpm)
+ return -EINVAL;
+
+ if (dtpm->ops && !(dtpm->ops->set_power_uw &&
+ dtpm->ops->get_power_uw &&
+ dtpm->ops->release))
+ return -EINVAL;
+
+ pcz = powercap_register_zone(&dtpm->zone, pct, name,
+ parent ? &parent->zone : NULL,
+ &zone_ops, MAX_DTPM_CONSTRAINTS,
+ &constraint_ops);
+ if (IS_ERR(pcz))
+ return PTR_ERR(pcz);
+
+ mutex_lock(&dtpm_lock);
+
+ if (parent) {
+ list_add_tail(&dtpm->sibling, &parent->children);
+ dtpm->parent = parent;
+ } else {
+ root = dtpm;
+ }
+
+ __dtpm_add_power(dtpm);
+
+ pr_info("Registered dtpm node '%s' / %llu-%llu uW, \n",
+ dtpm->zone.name, dtpm->power_min, dtpm->power_max);
+
+ mutex_unlock(&dtpm_lock);
+
+ return 0;
+}
+
+static int __init dtpm_init(void)
+{
+ struct dtpm_descr **dtpm_descr;
+
+ pct = powercap_register_control_type(NULL, "dtpm", NULL);
+ if (IS_ERR(pct)) {
+ pr_err("Failed to register control type\n");
+ return PTR_ERR(pct);
+ }
+
+ for_each_dtpm_table(dtpm_descr)
+ (*dtpm_descr)->init(*dtpm_descr);
+
+ return 0;
+}
+late_initcall(dtpm_init);
diff --git a/drivers/powercap/dtpm_cpu.c b/drivers/powercap/dtpm_cpu.c
new file mode 100644
index 000000000000..51c366938acd
--- /dev/null
+++ b/drivers/powercap/dtpm_cpu.c
@@ -0,0 +1,257 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright 2020 Linaro Limited
+ *
+ * Author: Daniel Lezcano <daniel.lezcano@linaro.org>
+ *
+ * The DTPM CPU is based on the energy model. It hooks the CPU in the
+ * DTPM tree which in turns update the power number by propagating the
+ * power number from the CPU energy model information to the parents.
+ *
+ * The association between the power and the performance state, allows
+ * to set the power of the CPU at the OPP granularity.
+ *
+ * The CPU hotplug is supported and the power numbers will be updated
+ * if a CPU is hot plugged / unplugged.
+ */
+#include <linux/cpumask.h>
+#include <linux/cpufreq.h>
+#include <linux/cpuhotplug.h>
+#include <linux/dtpm.h>
+#include <linux/energy_model.h>
+#include <linux/pm_qos.h>
+#include <linux/slab.h>
+#include <linux/units.h>
+
+static struct dtpm *__parent;
+
+static DEFINE_PER_CPU(struct dtpm *, dtpm_per_cpu);
+
+struct dtpm_cpu {
+ struct freq_qos_request qos_req;
+ int cpu;
+};
+
+/*
+ * When a new CPU is inserted at hotplug or boot time, add the power
+ * contribution and update the dtpm tree.
+ */
+static int power_add(struct dtpm *dtpm, struct em_perf_domain *em)
+{
+ u64 power_min, power_max;
+
+ power_min = em->table[0].power;
+ power_min *= MICROWATT_PER_MILLIWATT;
+ power_min += dtpm->power_min;
+
+ power_max = em->table[em->nr_perf_states - 1].power;
+ power_max *= MICROWATT_PER_MILLIWATT;
+ power_max += dtpm->power_max;
+
+ return dtpm_update_power(dtpm, power_min, power_max);
+}
+
+/*
+ * When a CPU is unplugged, remove its power contribution from the
+ * dtpm tree.
+ */
+static int power_sub(struct dtpm *dtpm, struct em_perf_domain *em)
+{
+ u64 power_min, power_max;
+
+ power_min = em->table[0].power;
+ power_min *= MICROWATT_PER_MILLIWATT;
+ power_min = dtpm->power_min - power_min;
+
+ power_max = em->table[em->nr_perf_states - 1].power;
+ power_max *= MICROWATT_PER_MILLIWATT;
+ power_max = dtpm->power_max - power_max;
+
+ return dtpm_update_power(dtpm, power_min, power_max);
+}
+
+static u64 set_pd_power_limit(struct dtpm *dtpm, u64 power_limit)
+{
+ struct dtpm_cpu *dtpm_cpu = dtpm->private;
+ struct em_perf_domain *pd;
+ struct cpumask cpus;
+ unsigned long freq;
+ u64 power;
+ int i, nr_cpus;
+
+ pd = em_cpu_get(dtpm_cpu->cpu);
+
+ cpumask_and(&cpus, cpu_online_mask, to_cpumask(pd->cpus));
+
+ nr_cpus = cpumask_weight(&cpus);
+
+ for (i = 0; i < pd->nr_perf_states; i++) {
+
+ power = pd->table[i].power * MICROWATT_PER_MILLIWATT * nr_cpus;
+
+ if (power > power_limit)
+ break;
+ }
+
+ freq = pd->table[i - 1].frequency;
+
+ freq_qos_update_request(&dtpm_cpu->qos_req, freq);
+
+ power_limit = pd->table[i - 1].power *
+ MICROWATT_PER_MILLIWATT * nr_cpus;
+
+ return power_limit;
+}
+
+static u64 get_pd_power_uw(struct dtpm *dtpm)
+{
+ struct dtpm_cpu *dtpm_cpu = dtpm->private;
+ struct em_perf_domain *pd;
+ struct cpumask cpus;
+ unsigned long freq;
+ int i, nr_cpus;
+
+ pd = em_cpu_get(dtpm_cpu->cpu);
+ freq = cpufreq_quick_get(dtpm_cpu->cpu);
+ cpumask_and(&cpus, cpu_online_mask, to_cpumask(pd->cpus));
+ nr_cpus = cpumask_weight(&cpus);
+
+ for (i = 0; i < pd->nr_perf_states; i++) {
+
+ if (pd->table[i].frequency < freq)
+ continue;
+
+ return pd->table[i].power *
+ MICROWATT_PER_MILLIWATT * nr_cpus;
+ }
+
+ return 0;
+}
+
+static void pd_release(struct dtpm *dtpm)
+{
+ struct dtpm_cpu *dtpm_cpu = dtpm->private;
+
+ if (freq_qos_request_active(&dtpm_cpu->qos_req))
+ freq_qos_remove_request(&dtpm_cpu->qos_req);
+
+ kfree(dtpm_cpu);
+}
+
+static struct dtpm_ops dtpm_ops = {
+ .set_power_uw = set_pd_power_limit,
+ .get_power_uw = get_pd_power_uw,
+ .release = pd_release,
+};
+
+static int cpuhp_dtpm_cpu_offline(unsigned int cpu)
+{
+ struct cpufreq_policy *policy;
+ struct em_perf_domain *pd;
+ struct dtpm *dtpm;
+
+ policy = cpufreq_cpu_get(cpu);
+
+ if (!policy)
+ return 0;
+
+ pd = em_cpu_get(cpu);
+ if (!pd)
+ return -EINVAL;
+
+ dtpm = per_cpu(dtpm_per_cpu, cpu);
+
+ power_sub(dtpm, pd);
+
+ if (cpumask_weight(policy->cpus) != 1)
+ return 0;
+
+ for_each_cpu(cpu, policy->related_cpus)
+ per_cpu(dtpm_per_cpu, cpu) = NULL;
+
+ dtpm_unregister(dtpm);
+
+ return 0;
+}
+
+static int cpuhp_dtpm_cpu_online(unsigned int cpu)
+{
+ struct dtpm *dtpm;
+ struct dtpm_cpu *dtpm_cpu;
+ struct cpufreq_policy *policy;
+ struct em_perf_domain *pd;
+ char name[CPUFREQ_NAME_LEN];
+ int ret = -ENOMEM;
+
+ policy = cpufreq_cpu_get(cpu);
+
+ if (!policy)
+ return 0;
+
+ pd = em_cpu_get(cpu);
+ if (!pd)
+ return -EINVAL;
+
+ dtpm = per_cpu(dtpm_per_cpu, cpu);
+ if (dtpm)
+ return power_add(dtpm, pd);
+
+ dtpm = dtpm_alloc(&dtpm_ops);
+ if (!dtpm)
+ return -EINVAL;
+
+ dtpm_cpu = kzalloc(sizeof(*dtpm_cpu), GFP_KERNEL);
+ if (!dtpm_cpu)
+ goto out_kfree_dtpm;
+
+ dtpm->private = dtpm_cpu;
+ dtpm_cpu->cpu = cpu;
+
+ for_each_cpu(cpu, policy->related_cpus)
+ per_cpu(dtpm_per_cpu, cpu) = dtpm;
+
+ sprintf(name, "cpu%d", dtpm_cpu->cpu);
+
+ ret = dtpm_register(name, dtpm, __parent);
+ if (ret)
+ goto out_kfree_dtpm_cpu;
+
+ ret = power_add(dtpm, pd);
+ if (ret)
+ goto out_dtpm_unregister;
+
+ ret = freq_qos_add_request(&policy->constraints,
+ &dtpm_cpu->qos_req, FREQ_QOS_MAX,
+ pd->table[pd->nr_perf_states - 1].frequency);
+ if (ret)
+ goto out_power_sub;
+
+ return 0;
+
+out_power_sub:
+ power_sub(dtpm, pd);
+
+out_dtpm_unregister:
+ dtpm_unregister(dtpm);
+ dtpm_cpu = NULL;
+ dtpm = NULL;
+
+out_kfree_dtpm_cpu:
+ for_each_cpu(cpu, policy->related_cpus)
+ per_cpu(dtpm_per_cpu, cpu) = NULL;
+ kfree(dtpm_cpu);
+
+out_kfree_dtpm:
+ kfree(dtpm);
+ return ret;
+}
+
+int dtpm_register_cpu(struct dtpm *parent)
+{
+ __parent = parent;
+
+ return cpuhp_setup_state(CPUHP_AP_DTPM_CPU_ONLINE,
+ "dtpm_cpu:online",
+ cpuhp_dtpm_cpu_online,
+ cpuhp_dtpm_cpu_offline);
+}
diff --git a/drivers/powercap/intel_rapl_common.c b/drivers/powercap/intel_rapl_common.c
index c9e57237d778..fdda2a737186 100644
--- a/drivers/powercap/intel_rapl_common.c
+++ b/drivers/powercap/intel_rapl_common.c
@@ -547,7 +547,7 @@ static void rapl_init_domains(struct rapl_package *rp)
if (i == RAPL_DOMAIN_PLATFORM && rp->id > 0) {
snprintf(rd->name, RAPL_DOMAIN_NAME_LENGTH, "psys-%d",
- cpu_data(rp->lead_cpu).phys_proc_id);
+ topology_physical_package_id(rp->lead_cpu));
} else
snprintf(rd->name, RAPL_DOMAIN_NAME_LENGTH, "%s",
rapl_domain_names[i]);
@@ -1049,6 +1049,7 @@ static const struct x86_cpu_id rapl_ids[] __initconst = {
X86_MATCH_INTEL_FAM6_MODEL(TIGERLAKE, &rapl_defaults_core),
X86_MATCH_INTEL_FAM6_MODEL(ROCKETLAKE, &rapl_defaults_core),
X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE, &rapl_defaults_core),
+ X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE_L, &rapl_defaults_core),
X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X, &rapl_defaults_spr_server),
X86_MATCH_INTEL_FAM6_MODEL(LAKEFIELD, &rapl_defaults_core),
@@ -1309,7 +1310,6 @@ struct rapl_package *rapl_add_package(int cpu, struct rapl_if_priv *priv)
{
int id = topology_logical_die_id(cpu);
struct rapl_package *rp;
- struct cpuinfo_x86 *c = &cpu_data(cpu);
int ret;
if (!rapl_defaults)
@@ -1326,10 +1326,11 @@ struct rapl_package *rapl_add_package(int cpu, struct rapl_if_priv *priv)
if (topology_max_die_per_package() > 1)
snprintf(rp->name, PACKAGE_DOMAIN_NAME_LENGTH,
- "package-%d-die-%d", c->phys_proc_id, c->cpu_die_id);
+ "package-%d-die-%d",
+ topology_physical_package_id(cpu), topology_die_id(cpu));
else
snprintf(rp->name, PACKAGE_DOMAIN_NAME_LENGTH, "package-%d",
- c->phys_proc_id);
+ topology_physical_package_id(cpu));
/* check if the package contains valid domains */
if (rapl_detect_domains(rp, cpu) || rapl_defaults->check_unit(rp, cpu)) {
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index b2b3d81b1535..b3e4e0740089 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -316,6 +316,16 @@
#define THERMAL_TABLE(name)
#endif
+#ifdef CONFIG_DTPM
+#define DTPM_TABLE() \
+ . = ALIGN(8); \
+ __dtpm_table = .; \
+ KEEP(*(__dtpm_table)) \
+ __dtpm_table_end = .;
+#else
+#define DTPM_TABLE()
+#endif
+
#define KERNEL_DTB() \
STRUCT_ALIGN(); \
__dtb_start = .; \
@@ -733,6 +743,7 @@
ACPI_PROBE_TABLE(irqchip) \
ACPI_PROBE_TABLE(timer) \
THERMAL_TABLE(governor) \
+ DTPM_TABLE() \
EARLYCON_TABLE() \
LSM_TABLE() \
EARLY_LSM_TABLE() \
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 0042ef362511..ee09a39627d6 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -193,6 +193,7 @@ enum cpuhp_state {
CPUHP_AP_ONLINE_DYN_END = CPUHP_AP_ONLINE_DYN + 30,
CPUHP_AP_X86_HPET_ONLINE,
CPUHP_AP_X86_KVM_CLK_ONLINE,
+ CPUHP_AP_DTPM_CPU_ONLINE,
CPUHP_AP_ACTIVE,
CPUHP_ONLINE,
};
diff --git a/include/linux/dtpm.h b/include/linux/dtpm.h
new file mode 100644
index 000000000000..e80a332e3d8a
--- /dev/null
+++ b/include/linux/dtpm.h
@@ -0,0 +1,77 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2020 Linaro Ltd
+ *
+ * Author: Daniel Lezcano <daniel.lezcano@linaro.org>
+ */
+#ifndef ___DTPM_H__
+#define ___DTPM_H__
+
+#include <linux/powercap.h>
+
+#define MAX_DTPM_DESCR 8
+#define MAX_DTPM_CONSTRAINTS 1
+
+struct dtpm {
+ struct powercap_zone zone;
+ struct dtpm *parent;
+ struct list_head sibling;
+ struct list_head children;
+ struct dtpm_ops *ops;
+ unsigned long flags;
+ u64 power_limit;
+ u64 power_max;
+ u64 power_min;
+ int weight;
+ void *private;
+};
+
+struct dtpm_ops {
+ u64 (*set_power_uw)(struct dtpm *, u64);
+ u64 (*get_power_uw)(struct dtpm *);
+ void (*release)(struct dtpm *);
+};
+
+struct dtpm_descr;
+
+typedef int (*dtpm_init_t)(struct dtpm_descr *);
+
+struct dtpm_descr {
+ struct dtpm *parent;
+ const char *name;
+ dtpm_init_t init;
+};
+
+/* Init section thermal table */
+extern struct dtpm_descr *__dtpm_table[];
+extern struct dtpm_descr *__dtpm_table_end[];
+
+#define DTPM_TABLE_ENTRY(name) \
+ static typeof(name) *__dtpm_table_entry_##name \
+ __used __section("__dtpm_table") = &name
+
+#define DTPM_DECLARE(name) DTPM_TABLE_ENTRY(name)
+
+#define for_each_dtpm_table(__dtpm) \
+ for (__dtpm = __dtpm_table; \
+ __dtpm < __dtpm_table_end; \
+ __dtpm++)
+
+static inline struct dtpm *to_dtpm(struct powercap_zone *zone)
+{
+ return container_of(zone, struct dtpm, zone);
+}
+
+int dtpm_update_power(struct dtpm *dtpm, u64 power_min, u64 power_max);
+
+int dtpm_release_zone(struct powercap_zone *pcz);
+
+struct dtpm *dtpm_alloc(struct dtpm_ops *ops);
+
+void dtpm_unregister(struct dtpm *dtpm);
+
+int dtpm_register(const char *name, struct dtpm *dtpm, struct dtpm *parent);
+
+int dtpm_register_cpu(struct dtpm *parent);
+
+#endif
diff --git a/include/linux/units.h b/include/linux/units.h
index 5c115c809507..dcc30a53fa93 100644
--- a/include/linux/units.h
+++ b/include/linux/units.h
@@ -4,6 +4,10 @@
#include <linux/math.h>
+#define MILLIWATT_PER_WATT 1000L
+#define MICROWATT_PER_MILLIWATT 1000L
+#define MICROWATT_PER_WATT 1000000L
+
#define ABSOLUTE_ZERO_MILLICELSIUS -273150
static inline long milli_kelvin_to_millicelsius(long t)
diff --git a/kernel/power/Kconfig b/kernel/power/Kconfig
index a7320f07689d..6bfe3ead10ad 100644
--- a/kernel/power/Kconfig
+++ b/kernel/power/Kconfig
@@ -139,7 +139,6 @@ config PM_SLEEP_SMP_NONZERO_CPU
config PM_AUTOSLEEP
bool "Opportunistic sleep"
depends on PM_SLEEP
- default n
help
Allow the kernel to trigger a system transition into a global sleep
state automatically whenever there are no active wakeup sources.
@@ -147,7 +146,6 @@ config PM_AUTOSLEEP
config PM_WAKELOCKS
bool "User space wakeup sources interface"
depends on PM_SLEEP
- default n
help
Allow user space to create, activate and deactivate wakeup source
objects with the help of a sysfs-based interface.
@@ -293,7 +291,6 @@ config PM_GENERIC_DOMAINS
config WQ_POWER_EFFICIENT_DEFAULT
bool "Enable workqueue power-efficient mode by default"
depends on PM
- default n
help
Per-cpu workqueues are generally preferred because they show
better performance thanks to cache locality; unfortunately,
@@ -322,15 +319,14 @@ config CPU_PM
bool
config ENERGY_MODEL
- bool "Energy Model for CPUs"
+ bool "Energy Model for devices with DVFS (CPUs, GPUs, etc)"
depends on SMP
depends on CPU_FREQ
- default n
help
Several subsystems (thermal and/or the task scheduler for example)
- can leverage information about the energy consumed by CPUs to make
- smarter decisions. This config option enables the framework from
- which subsystems can access the energy models.
+ can leverage information about the energy consumed by devices to
+ make smarter decisions. This config option enables the framework
+ from which subsystems can access the energy models.
The exact usage of the energy model is subsystem-dependent.