Thermal zone support for arm64

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Thermal zone support for arm64

Mark Kettenis
Many of the cheap arm64 (and armv7) boards will overheat if you run
the CPU cores at full throttle for a while.  Adding a heatsink may
help a little bit, but not enough.  Some boards have a microcontroller
that monitors the temperature and throttles the CPUs if necessary.
Other boards don't and will eventually hit a critical temperature
where it will either do an emergency powerdown or will start to become
unreliable.

In order to prevent this, the OS is supposed to monitor the
temperature and cool the device (either actively or passively) when
the temperature gets too high.  There are device tree bindings for
so-called thermal zones that link together temperature sensors and
cooling devices and define trip points that define the temperatures at
which we have to start cooling.  Most boards use passive cooling
through reducing the CPU clock speed and voltage.

The diff below implements support for these thermal zones.  Most of
the code is implemented in the generic FDT support code in dev/ofw.
Sensors and cooling devices make themselves known to this layer by
registering themselves just like we do for clocks and regulators.
This means the code is available on armv7 and octeon as well.  On
arm64, the CPUs are registered as cooling devices and implement
passive cooling by simply clipping the available DVFS states instead
of modifying perflevel.  I also added registration code to rktemp(4).
With these changes my RockPro64 can do a make build without reaching
the critical temperature while building clang.  The CPU temperature
now hovers around 70 degC, which is the temperature associated with
the lowest trip point that throttles only the "big" cores.

ok?


Index: arch/arm64/arm64/cpu.c
===================================================================
RCS file: /cvs/src/sys/arch/arm64/arm64/cpu.c,v
retrieving revision 1.32
diff -u -p -r1.32 cpu.c
--- arch/arm64/arm64/cpu.c 23 Jun 2019 17:14:49 -0000 1.32
+++ arch/arm64/arm64/cpu.c 29 Jun 2019 09:38:31 -0000
@@ -32,6 +32,7 @@
 #include <dev/ofw/openfirm.h>
 #include <dev/ofw/ofw_clock.h>
 #include <dev/ofw/ofw_regulator.h>
+#include <dev/ofw/ofw_thermal.h>
 #include <dev/ofw/fdt.h>
 
 #include <machine/cpufunc.h>
@@ -600,10 +601,14 @@ void cpu_opp_mountroot(struct device *);
 void cpu_opp_dotask(void *);
 void cpu_opp_setperf(int);
 
+uint32_t cpu_opp_get_cooling_level(void *, uint32_t *);
+void cpu_opp_set_cooling_level(void *, uint32_t *, uint32_t);
+
 void
 cpu_opp_init(struct cpu_info *ci, uint32_t phandle)
 {
  struct opp_table *ot;
+ struct cooling_device *cd;
  int count, node, child;
  uint32_t opp_hz, opp_microvolt;
  uint32_t values[3];
@@ -670,8 +675,16 @@ cpu_opp_init(struct cpu_info *ci, uint32
  LIST_INSERT_HEAD(&opp_tables, ot, ot_list);
 
  ci->ci_opp_table = ot;
+ ci->ci_opp_max = ot->ot_nopp - 1;
  ci->ci_cpu_supply = OF_getpropint(ci->ci_node, "cpu-supply", 0);
 
+ cd = malloc(sizeof(struct cooling_device), M_DEVBUF, M_ZERO | M_WAITOK);
+ cd->cd_node = ci->ci_node;
+ cd->cd_cookie = ci;
+ cd->cd_get_level = cpu_opp_get_cooling_level;
+ cd->cd_set_level = cpu_opp_set_cooling_level;
+ cooling_device_register(cd);
+
  /*
  * Do addional checks at mountroot when all the clocks and
  * regulators are available.
@@ -775,7 +788,7 @@ cpu_opp_dotask(void *arg)
  if (ot->ot_master && ot->ot_master != ci)
  continue;
 
- opp_idx = ci->ci_opp_idx;
+ opp_idx = MIN(ci->ci_opp_idx, ci->ci_opp_max);
  opp_hz = ot->ot_opp[opp_idx].opp_hz;
  opp_microvolt = ot->ot_opp[opp_idx].opp_microvolt;
 
@@ -844,4 +857,30 @@ cpu_opp_setperf(int level)
  * regulators might need process context.
  */
  task_add(systq, &cpu_opp_task);
+}
+
+uint32_t
+cpu_opp_get_cooling_level(void *cookie, uint32_t *cells)
+{
+ struct cpu_info *ci = cookie;
+ struct opp_table *ot = ci->ci_opp_table;
+
+ return ot->ot_nopp - ci->ci_opp_max - 1;
+}
+
+void
+cpu_opp_set_cooling_level(void *cookie, uint32_t *cells, uint32_t level)
+{
+ struct cpu_info *ci = cookie;
+ struct opp_table *ot = ci->ci_opp_table;
+ int opp_max;
+
+ if (level > (ot->ot_nopp - 1))
+ level = ot->ot_nopp - 1;
+
+ opp_max = (ot->ot_nopp - level - 1);
+ if (ci->ci_opp_max != opp_max) {
+ ci->ci_opp_max = opp_max;
+ task_add(systq, &cpu_opp_task);
+ }
 }
Index: arch/arm64/dev/mainbus.c
===================================================================
RCS file: /cvs/src/sys/arch/arm64/dev/mainbus.c,v
retrieving revision 1.13
diff -u -p -r1.13 mainbus.c
--- arch/arm64/dev/mainbus.c 23 May 2019 13:41:53 -0000 1.13
+++ arch/arm64/dev/mainbus.c 29 Jun 2019 09:38:32 -0000
@@ -25,6 +25,7 @@
 #include <machine/fdt.h>
 #include <dev/ofw/openfirm.h>
 #include <dev/ofw/fdt.h>
+#include <dev/ofw/ofw_thermal.h>
 
 #include <arm64/arm64/arm64var.h>
 #include <arm64/dev/mainbus.h>
@@ -147,6 +148,8 @@ mainbus_attach(struct device *parent, st
 
  /* Attach secondary CPUs. */
  mainbus_attach_cpus(self, mainbus_match_secondary);
+
+ thermal_init();
 }
 
 int
Index: arch/arm64/include/cpu.h
===================================================================
RCS file: /cvs/src/sys/arch/arm64/include/cpu.h,v
retrieving revision 1.13
diff -u -p -r1.13 cpu.h
--- arch/arm64/include/cpu.h 4 Jun 2019 14:03:21 -0000 1.13
+++ arch/arm64/include/cpu.h 29 Jun 2019 09:38:32 -0000
@@ -110,6 +110,7 @@ struct cpu_info {
 
  struct opp_table *ci_opp_table;
  volatile int ci_opp_idx;
+ volatile int ci_opp_max;
  uint32_t ci_cpu_supply;
 
 #ifdef MULTIPROCESSOR
Index: dev/fdt/rktemp.c
===================================================================
RCS file: /cvs/src/sys/dev/fdt/rktemp.c,v
retrieving revision 1.4
diff -u -p -r1.4 rktemp.c
--- dev/fdt/rktemp.c 1 Jan 2019 15:56:19 -0000 1.4
+++ dev/fdt/rktemp.c 29 Jun 2019 09:38:37 -0000
@@ -28,6 +28,7 @@
 #include <dev/ofw/ofw_clock.h>
 #include <dev/ofw/ofw_misc.h>
 #include <dev/ofw/ofw_pinctrl.h>
+#include <dev/ofw/ofw_thermal.h>
 #include <dev/ofw/fdt.h>
 
 /* Registers */
@@ -205,6 +206,8 @@ struct rktemp_softc {
  struct ksensor sc_sensors[3];
  int sc_nsensors;
  struct ksensordev sc_sensordev;
+
+ struct thermal_sensor sc_ts;
 };
 
 int rktemp_match(struct device *, void *, void *);
@@ -222,6 +225,7 @@ int32_t rktemp_calc_code(struct rktemp_s
 int32_t rktemp_calc_temp(struct rktemp_softc *, int32_t);
 int rktemp_valid(struct rktemp_softc *, int32_t);
 void rktemp_refresh_sensors(void *);
+int32_t rktemp_get_temperature(void *, uint32_t *);
 
 int
 rktemp_match(struct device *parent, void *match, void *aux)
@@ -332,6 +336,11 @@ rktemp_attach(struct device *parent, str
  }
  sensordev_install(&sc->sc_sensordev);
  sensor_task_register(sc, rktemp_refresh_sensors, 5);
+
+ sc->sc_ts.ts_node = node;
+ sc->sc_ts.ts_cookie = sc;
+ sc->sc_ts.ts_get_temperature = rktemp_get_temperature;
+ thermal_sensor_register(&sc->sc_ts);
 }
 
 int32_t
@@ -434,4 +443,21 @@ rktemp_refresh_sensors(void *arg)
  else
  sc->sc_sensors[i].flags |= SENSOR_FINVALID;
  }
+}
+
+int32_t
+rktemp_get_temperature(void *cookie, uint32_t *cells)
+{
+ struct rktemp_softc *sc = cookie;
+ uint32_t idx = cells[0];
+ int32_t code;
+
+ if (idx >= sc->sc_nsensors)
+ return THERMAL_SENSOR_MAX;
+
+ code = HREAD4(sc, TSADC_DATA0 + idx * 4);
+ if (rktemp_valid(sc, code))
+ return rktemp_calc_temp(sc, code);
+ else
+ return THERMAL_SENSOR_MAX;
 }
Index: dev/ofw/files.ofw
===================================================================
RCS file: /cvs/src/sys/dev/ofw/files.ofw,v
retrieving revision 1.6
diff -u -p -r1.6 files.ofw
--- dev/ofw/files.ofw 4 May 2018 16:12:12 -0000 1.6
+++ dev/ofw/files.ofw 29 Jun 2019 09:38:38 -0000
@@ -7,3 +7,4 @@ file dev/ofw/ofw_misc.c fdt
 file dev/ofw/ofw_pinctrl.c fdt
 file dev/ofw/ofw_power.c fdt
 file dev/ofw/ofw_regulator.c fdt
+file dev/ofw/ofw_thermal.c fdt
Index: dev/ofw/ofw_thermal.c
===================================================================
RCS file: dev/ofw/ofw_thermal.c
diff -N dev/ofw/ofw_thermal.c
--- /dev/null 1 Jan 1970 00:00:00 -0000
+++ dev/ofw/ofw_thermal.c 29 Jun 2019 09:38:38 -0000
@@ -0,0 +1,444 @@
+/* $OpenBSD$ */
+/*
+ * Copyright (c) 2019 Mark Kettenis
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#include <sys/types.h>
+#include <sys/systm.h>
+#include <sys/malloc.h>
+#include <sys/stdint.h>
+#include <sys/task.h>
+#include <sys/timeout.h>
+
+#include <machine/bus.h>
+
+#include <dev/ofw/openfirm.h>
+#include <dev/ofw/ofw_thermal.h>
+
+LIST_HEAD(, thermal_sensor) thermal_sensors =
+        LIST_HEAD_INITIALIZER(thermal_sensors);
+
+LIST_HEAD(, cooling_device) cooling_devices =
+        LIST_HEAD_INITIALIZER(cooling_devices);
+
+struct taskq *tztq;
+
+struct trippoint {
+ int32_t tp_temperature;
+ uint32_t tp_hysteresis;
+ int tp_type;
+ uint32_t tp_phandle;
+};
+
+#define THERMAL_NONE 0
+#define THERMAL_ACTIVE 1
+#define THERMAL_PASSIVE 2
+#define THERMAL_HOT 3
+#define THERMAL_CRITICAL 4
+
+struct cmap {
+ uint32_t *cm_cdev;
+ uint32_t *cm_cdevend;
+ uint32_t cm_trip;
+};
+
+struct cdev {
+ uint32_t cd_phandle;
+ int32_t cd_level;
+ int cd_active;
+ LIST_ENTRY(cdev) cd_list;
+};
+
+struct thermal_zone {
+ int tz_node;
+ char tz_name[64];
+ struct task tz_poll_task;
+ struct timeout tz_poll_to;
+ uint32_t *tz_sensors;
+ uint32_t tz_polling_delay;
+ uint32_t tz_polling_delay_passive;
+
+ struct trippoint *tz_trips;
+ int tz_ntrips;
+ struct trippoint *tz_tp;
+
+ struct cmap *tz_cmaps;
+ int tz_ncmaps;
+ struct cmap *tz_cm;
+
+ LIST_HEAD(, cdev) tz_cdevs;
+
+ int32_t tz_temperature;
+};
+
+void
+thermal_sensor_register(struct thermal_sensor *ts)
+{
+ ts->ts_cells = OF_getpropint(ts->ts_node, "#thermal-sensor-cells", 0);
+ ts->ts_phandle = OF_getpropint(ts->ts_node, "phandle", 0);
+ if (ts->ts_phandle == 0)
+ return;
+
+ LIST_INSERT_HEAD(&thermal_sensors, ts, ts_list);
+}
+
+void
+cooling_device_register(struct cooling_device *cd)
+{
+ cd->cd_cells = OF_getpropint(cd->cd_node, "#cooling-cells", 0);
+ cd->cd_phandle = OF_getpropint(cd->cd_node, "phandle", 0);
+ if (cd->cd_phandle == 0)
+ return;
+
+ LIST_INSERT_HEAD(&cooling_devices, cd, cd_list);
+}
+
+int32_t
+thermal_get_temperature_cells(uint32_t *cells)
+{
+ struct thermal_sensor *ts;
+ uint32_t phandle = cells[0];
+
+ LIST_FOREACH(ts, &thermal_sensors, ts_list) {
+ if (ts->ts_phandle == phandle)
+ break;
+ }
+
+ if (ts && ts->ts_get_temperature)
+ return ts->ts_get_temperature(ts->ts_cookie, &cells[1]);
+
+ return THERMAL_SENSOR_MAX;
+}
+
+void
+thermal_zone_poll_timeout(void *arg)
+{
+ struct thermal_zone *tz = arg;
+
+ task_add(tztq, &tz->tz_poll_task);
+}
+
+uint32_t *
+cdev_next_cdev(uint32_t *cells)
+{
+ uint32_t phandle = cells[0];
+ int node, ncells;
+
+ node = OF_getnodebyphandle(phandle);
+ if (node == 0)
+ return NULL;
+
+ ncells = OF_getpropint(node, "#cooling-cells", 2);
+ return cells + ncells + 1;
+}
+
+uint32_t
+cdev_get_level(uint32_t *cells)
+{
+ struct cooling_device *cd;
+ uint32_t phandle = cells[0];
+
+ LIST_FOREACH(cd, &cooling_devices, cd_list) {
+ if (cd->cd_phandle == phandle)
+ break;
+ }
+
+ if (cd && cd->cd_get_level)
+ return cd->cd_get_level(cd->cd_cookie, &cells[1]);
+
+ return 0;
+}
+
+void
+cdev_set_level(uint32_t *cells, uint32_t level)
+{
+ struct cooling_device *cd;
+ uint32_t phandle = cells[0];
+
+ LIST_FOREACH(cd, &cooling_devices, cd_list) {
+ if (cd->cd_phandle == phandle)
+ break;
+ }
+
+ if (cd && cd->cd_set_level)
+ cd->cd_set_level(cd->cd_cookie, &cells[1], level);
+}
+
+
+void
+cmap_deactivate(struct thermal_zone *tz, struct cmap *cm)
+{
+ struct cdev *cd;
+ uint32_t *cdev;
+
+ if (cm == NULL)
+ return;
+
+ cdev = cm->cm_cdev;
+ while (cdev && cdev < cm->cm_cdevend) {
+ LIST_FOREACH(cd, &tz->tz_cdevs, cd_list) {
+ if (cd->cd_phandle == cdev[0])
+ break;
+ }
+ KASSERT(cd != NULL);
+ cd->cd_active = 0;
+ cdev = cdev_next_cdev(cdev);
+ }
+}
+
+void
+cmap_activate(struct thermal_zone *tz, struct cmap *cm, int32_t delta)
+{
+ struct cdev *cd;
+ uint32_t *cdev;
+ int32_t min, max;
+
+ if (cm == NULL)
+ return;
+
+ cdev = cm->cm_cdev;
+ while (cdev && cdev < cm->cm_cdevend) {
+ LIST_FOREACH(cd, &tz->tz_cdevs, cd_list) {
+ if (cd->cd_phandle == cdev[0])
+ break;
+ }
+ KASSERT(cd != NULL);
+
+ min = (cdev[1] == THERMAL_NO_LIMIT) ? 0 : cdev[1];
+ max = (cdev[2] == THERMAL_NO_LIMIT) ? INT32_MAX : cdev[2];
+
+ cd->cd_active = 1;
+ cd->cd_level = cdev_get_level(cdev) + delta;
+ cd->cd_level = MAX(cd->cd_level, min);
+ cd->cd_level = MIN(cd->cd_level, max);
+ cdev_set_level(cdev, cd->cd_level);
+ cdev = cdev_next_cdev(cdev);
+ }
+}
+
+void
+cmap_finish(struct thermal_zone *tz)
+{
+ struct cdev *cd;
+
+ LIST_FOREACH(cd, &tz->tz_cdevs, cd_list) {
+ if (cd->cd_active == 0 && cd->cd_level != 0) {
+ cdev_set_level(&cd->cd_phandle, 0);
+ cd->cd_level = 0;
+ }
+ }
+}
+
+void
+thermal_zone_poll(void *arg)
+{
+ struct thermal_zone *tz = arg;
+ struct trippoint *tp, *newtp;
+ struct cmap *cm, *newcm;
+ uint32_t polling_delay;
+ int32_t temp, delta;
+ int i;
+
+ temp = thermal_get_temperature_cells(tz->tz_sensors);
+ if (temp == THERMAL_SENSOR_MAX)
+ return;
+
+ newtp = NULL;
+ tp = tz->tz_trips;
+ for (i = 0; i < tz->tz_ntrips; i++) {
+ if (temp < tp->tp_temperature && tp != tz->tz_tp)
+ break;
+ if (temp < tp->tp_temperature - tp->tp_hysteresis)
+ break;
+ newtp = tp++;
+ }
+
+ /* Short circuit if we didn't hit a trip point. */
+ if (newtp == NULL && tz->tz_tp == NULL)
+ goto out;
+
+ /*
+ * If the current tenperature is above the trip temperature:
+ *  - increase the cooling level if the temperature is rising
+ *  - do nothing if the temperature is falling
+ * If the current temperature is below the trip tenmperature:
+ *  - do nothing if the temperature is rising
+ *  - decreate the cooling level if the temperature is falling
+ */
+ delta = 0;
+ if (newtp) {
+ if (temp >= newtp->tp_temperature) {
+ if (temp > tz->tz_temperature)
+ delta = 1;
+ } else {
+ if (temp < tz->tz_temperature)
+ delta = -1;
+ }
+ }
+
+ newcm = NULL;
+ cm = tz->tz_cmaps;
+ for (i = 0; i < tz->tz_ncmaps; i++) {
+ if (newtp && cm->cm_trip == newtp->tp_phandle) {
+ newcm = cm;
+ break;
+ }
+ cm++;
+ }
+
+ cmap_deactivate(tz, tz->tz_cm);
+ cmap_activate(tz, newcm, delta);
+ cmap_finish(tz);
+
+ tz->tz_tp = newtp;
+ tz->tz_cm = newcm;
+
+out:
+ tz->tz_temperature = temp;
+ if (tz->tz_tp && tz->tz_tp->tp_type == THERMAL_PASSIVE)
+ polling_delay = tz->tz_polling_delay_passive;
+ else
+ polling_delay = tz->tz_polling_delay;
+ timeout_add_msec(&tz->tz_poll_to, polling_delay);
+}
+
+void
+thermal_zone_init(int node)
+{
+ struct thermal_zone *tz;
+ struct trippoint *tp;
+ struct cmap *cm;
+ struct cdev *cd;
+ int len, i;
+
+ len = OF_getproplen(node, "thermal-sensors");
+ if (len <= 0)
+ return;
+
+ tz = malloc(sizeof(struct thermal_zone), M_DEVBUF, M_ZERO | M_WAITOK);
+ tz->tz_node = node;
+
+ OF_getprop(node, "name", &tz->tz_name, sizeof(tz->tz_name));
+ tz->tz_name[sizeof(tz->tz_name) - 1] = 0;
+ tz->tz_sensors = malloc(len, M_DEVBUF, M_WAITOK);
+ OF_getpropintarray(node, "thermal-sensors", tz->tz_sensors, len);
+ tz->tz_polling_delay = OF_getpropint(node, "polling-delay", 0);
+ tz->tz_polling_delay_passive =
+    OF_getpropint(node, "polling-delay-passive", tz->tz_polling_delay);
+
+ task_set(&tz->tz_poll_task, thermal_zone_poll, tz);
+ timeout_set(&tz->tz_poll_to, thermal_zone_poll_timeout, tz);
+
+ /*
+ * Trip points for this thermal zone.
+ */
+ node = OF_getnodebyname(tz->tz_node, "trips");
+ for (node = OF_child(node); node != 0; node = OF_peer(node))
+ tz->tz_ntrips++;
+
+ tz->tz_trips = mallocarray(tz->tz_ntrips, sizeof(struct trippoint),
+    M_DEVBUF, M_ZERO | M_WAITOK);
+ tp = tz->tz_trips;
+
+ node = OF_getnodebyname(tz->tz_node, "trips");
+ for (node = OF_child(node); node != 0; node = OF_peer(node)) {
+ char type[32] = "none";
+
+ tp->tp_temperature =
+    OF_getpropint(node, "temperature", THERMAL_SENSOR_MAX);
+ tp->tp_hysteresis = OF_getpropint(node, "hysteresis", 0);
+ OF_getprop(node, "type", type, sizeof(type));
+ if (strcmp(type, "active") == 0)
+ tp->tp_type = THERMAL_ACTIVE;
+ else if (strcmp(type, "passive") == 0)
+ tp->tp_type = THERMAL_PASSIVE;
+ else if (strcmp(type, "hot") == 0)
+ tp->tp_type = THERMAL_HOT;
+ else if (strcmp(type, "critical") == 0)
+ tp->tp_type = THERMAL_CRITICAL;
+ tp->tp_phandle = OF_getpropint(node, "phandle", 0);
+ tp++;
+ }
+
+ /*
+ * Cooling maps for this thermal zone.
+ */
+ node = OF_getnodebyname(tz->tz_node, "cooling-maps");
+ for (node = OF_child(node); node != 0; node = OF_peer(node))
+ tz->tz_ncmaps++;
+
+ tz->tz_cmaps = mallocarray(tz->tz_ncmaps, sizeof(struct cmap),
+    M_DEVBUF, M_ZERO | M_WAITOK);
+ cm = tz->tz_cmaps;
+
+ node = OF_getnodebyname(tz->tz_node, "cooling-maps");
+ for (node = OF_child(node); node != 0; node = OF_peer(node)) {
+ len = OF_getproplen(node, "cooling-device");
+ if (len <= 0)
+ continue;
+ cm->cm_cdev = malloc(len, M_DEVBUF, M_ZERO | M_WAITOK);
+ OF_getpropintarray(node, "cooling-device", cm->cm_cdev, len);
+ cm->cm_cdevend = cm->cm_cdev + len / sizeof(uint32_t);
+ cm->cm_trip = OF_getpropint(node, "trip", 0);
+ cm++;
+ }
+
+ /*
+ * Create a list of all the possible cooling devices from the
+ * cooling maps for this thermal zone, and initialize their
+ * state.
+ */
+ LIST_INIT(&tz->tz_cdevs);
+ cm = tz->tz_cmaps;
+ for (i = 0; i < tz->tz_ncmaps; i++) {
+ uint32_t *cdev;
+
+ cdev = cm->cm_cdev;
+ while (cdev && cdev < cm->cm_cdevend) {
+ LIST_FOREACH(cd, &tz->tz_cdevs, cd_list) {
+ if (cd->cd_phandle == cdev[0])
+ break;
+ }
+ if (cd == NULL) {
+ cd = malloc(sizeof(struct cdev), M_DEVBUF,
+    M_ZERO | M_WAITOK);
+ cd->cd_phandle = cdev[0];
+ cd->cd_level = 0;
+ cd->cd_active = 0;
+ LIST_INSERT_HEAD(&tz->tz_cdevs, cd, cd_list);
+ }
+ cdev = cdev_next_cdev(cdev);
+ }
+ cm++;
+ }
+
+ /* Start polling if we are requested to do so. */
+ if (tz->tz_polling_delay > 0)
+ timeout_add_msec(&tz->tz_poll_to, tz->tz_polling_delay);
+}
+
+void
+thermal_init(void)
+{
+ int node = OF_finddevice("/thermal-zones");
+
+ if (node == 0)
+ return;
+
+ tztq = taskq_create("tztq", 1, IPL_NONE, 0);
+
+ for (node = OF_child(node); node != 0; node = OF_peer(node))
+ thermal_zone_init(node);
+}
Index: dev/ofw/ofw_thermal.h
===================================================================
RCS file: dev/ofw/ofw_thermal.h
diff -N dev/ofw/ofw_thermal.h
--- /dev/null 1 Jan 1970 00:00:00 -0000
+++ dev/ofw/ofw_thermal.h 29 Jun 2019 09:38:38 -0000
@@ -0,0 +1,53 @@
+/* $OpenBSD$ */
+/*
+ * Copyright (c) 2019 Mark Kettenis
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#ifndef _DEV_OFW_THERMAL_H_
+#define _DEV_OFW_THERMAL_H_
+
+struct thermal_sensor {
+ int ts_node;
+ void *ts_cookie;
+
+ int32_t (*ts_get_temperature)(void *, uint32_t *);
+
+ LIST_ENTRY(thermal_sensor) ts_list;
+ uint32_t ts_phandle;
+ uint32_t ts_cells;
+};
+
+#define THERMAL_SENSOR_MAX 0xffffffffU
+
+struct cooling_device {
+ int cd_node;
+ void *cd_cookie;
+
+ uint32_t (*cd_get_level)(void *, uint32_t *);
+ void (*cd_set_level)(void *, uint32_t *, uint32_t);
+
+ LIST_ENTRY(cooling_device) cd_list;
+ uint32_t cd_phandle;
+ uint32_t cd_cells;
+};
+
+#define THERMAL_NO_LIMIT 0xffffffffU
+
+void thermal_sensor_register(struct thermal_sensor *);
+void cooling_device_register(struct cooling_device *);
+
+void thermal_init(void);
+
+#endif /* _DEV_OFW_THERMAL_H_ */

Reply | Threaded
Open this post in threaded view
|

Re: Thermal zone support for arm64

Steffen Nurpmeso-2
Mark Kettenis wrote in <[hidden email]>:
 |Many of the cheap arm64 (and armv7) boards will overheat if you run
 |the CPU cores at full throttle for a while.  Adding a heatsink may
 |help a little bit, but not enough.  Some boards have a microcontroller
 |that monitors the temperature and throttles the CPUs if necessary.
 |Other boards don't and will eventually hit a critical temperature
 |where it will either do an emergency powerdown or will start to become
 |unreliable.
 |
 |In order to prevent this, the OS is supposed to monitor the
 |temperature and cool the device (either actively or passively) when
 |the temperature gets too high.  There are device tree bindings for
 |so-called thermal zones that link together temperature sensors and
 |cooling devices and define trip points that define the temperatures at
 |which we have to start cooling.  Most boards use passive cooling
 |through reducing the CPU clock speed and voltage.

This is very interesting.  These dense (x86-64) packages heat up
immensely and very fast, even with hyperthreading turned off
(though overheating no longer occurs like so).  I hooked into
a Linux bug report a few months back, my new box (i jumped
a decade of hardware development in March/April) just did/does not
keep up, ending in massive fan power stepping in.  I asked why no
adaptive strategy is used instead, and mentioned a fan control
shell script i have, as simple as it is, keeping a notion of the
last and the current level, which makes for a difference, and the
heat up/cool down trend, which makes for a distance.  Like that
a simple adaption can take place.  Well, in the meantime LWN
reported work on the scheduler, so that hot CPUs are scheduled
less work than others, which is of course an approach way more
sophisticated.

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Reply | Threaded
Open this post in threaded view
|

Re: Thermal zone support for arm64

Joseph Mayer
In reply to this post by Mark Kettenis
On Saturday, 29 June 2019 18:08, Mark Kettenis <[hidden email]> wrote:
> Many of the cheap arm64 (and armv7) boards will overheat if you run
> the CPU cores at full throttle for a while. Adding a heatsink may
> help a little bit, but not enough. Some boards have a microcontroller
> that monitors the temperature and throttles the CPUs if necessary.
> Other boards don't and will eventually hit a critical temperature
> where it will either do an emergency powerdown or will start to become
> unreliable.

Hi Mark,

Great.

With this diff SoC performance and temperature are subjected to the
logic that highest prio is stay at <70C and second prio is subject to
first prio being satified, operate at full/fullest possible
performance, right?

> the temperature gets too high. There are device tree bindings for
> so-called thermal zones that link together temperature sensors and
> cooling devices and define trip points that define the temperatures at
> which we have to start cooling. Most boards use passive cooling

Are the trip points default-config info stored in the hardware?

> + * If the current tenperature is above the trip temperature:
> + * If the current temperature is below the trip tenmperature:
>+ *  - decreate the cooling level if the temperature is falling

Small typo should te*ure->temperature & decreate -> decrease.

Thanks!
Joseph