idle threads say 100%

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

idle threads say 100%

Ted Unangst-6
I know there have been some changes in this area, but this seems like a
remaining bug.

Run top. Press S to show system processes. Observe idle1 and idle3 (the
hyperthreaded idles) stuck at 100%. With 0:00 time.

Now if you switch hw.smt=1 the 100% slowly ticks down, but the time
immediately jumps up to many hours.

Just a peculiarity I noticed. Perhaps there's some way to consolidate special
cases?

Reply | Threaded
Open this post in threaded view
|

Re: idle threads say 100%

Theo de Raadt-2
There was work before release which had to be pulled since it changed
API which some ports could not handle.

A new approach (adding a different API) is in the works from Scott.

However, it is possible there are other bugs in the way the cpus are
'parked'.

Or it could simply be bugs in top, which Scott is also addressing.

> I know there have been some changes in this area, but this seems like a
> remaining bug.
>
> Run top. Press S to show system processes. Observe idle1 and idle3 (the
> hyperthreaded idles) stuck at 100%. With 0:00 time.
>
> Now if you switch hw.smt=1 the 100% slowly ticks down, but the time
> immediately jumps up to many hours.
>
> Just a peculiarity I noticed. Perhaps there's some way to consolidate special
> cases?
>

Reply | Threaded
Open this post in threaded view
|

Re: idle threads say 100%

Mark Kettenis
> From: "Theo de Raadt" <[hidden email]>
> Date: Thu, 01 Nov 2018 14:21:43 -0600
>
> There was work before release which had to be pulled since it changed
> API which some ports could not handle.
>
> A new approach (adding a different API) is in the works from Scott.
>
> However, it is possible there are other bugs in the way the cpus are
> 'parked'.
>
> Or it could simply be bugs in top, which Scott is also addressing.

I think Ted is pointing out a different issue here that doesn't really
have anything to do with SMT.  The issue is that in some cases we end
up with CPUs reporting being 100% busy running the idle thread instead
of reporting being 100% idle.  This happens quite a lot on machines
with lots of CPUs immediately after they are booted.  Usually this
funny state disappears after some time.

An idle CPU is of course running the idle thread, so in that sense
this isn't super-strange.  But it does indicate there is some kind of
accounting issue.  I have a feeling this happens before any processes
have been scheduled on these CPUs.  But I've never found the problem...

> > I know there have been some changes in this area, but this seems like a
> > remaining bug.
> >
> > Run top. Press S to show system processes. Observe idle1 and idle3 (the
> > hyperthreaded idles) stuck at 100%. With 0:00 time.
> >
> > Now if you switch hw.smt=1 the 100% slowly ticks down, but the time
> > immediately jumps up to many hours.
> >
> > Just a peculiarity I noticed. Perhaps there's some way to consolidate special
> > cases?
> >
>
>

Reply | Threaded
Open this post in threaded view
|

Re: idle threads say 100%

Alexander Bluhm
On Fri, Nov 02, 2018 at 10:19:15PM +0100, Mark Kettenis wrote:

> I think Ted is pointing out a different issue here that doesn't really
> have anything to do with SMT.  The issue is that in some cases we end
> up with CPUs reporting being 100% busy running the idle thread instead
> of reporting being 100% idle.  This happens quite a lot on machines
> with lots of CPUs immediately after they are booted.  Usually this
> funny state disappears after some time.
>
> An idle CPU is of course running the idle thread, so in that sense
> this isn't super-strange.  But it does indicate there is some kind of
> accounting issue.  I have a feeling this happens before any processes
> have been scheduled on these CPUs.  But I've never found the problem...

I can easily reproduce this on a 2 socket machine with 4 cores each.
Hyper threads are turned off in BIOS.

schedcpu() does not account the run time if p->p_slptime > 1.  So
fresh idle threads have a CPU percentage of 99%.

USER       PID %CPU %MEM   VSZ   RSS TT  STAT  STARTED       TIME COMMAND
root     11006 99.0  0.0     0     0 ??  RK/6  10:19PM    0:15.51 (idle6)
root     26529 99.0  0.0     0     0 ??  RK/7  10:19PM    2:55.45 (idle7)
root     55382 99.0  0.0     0     0 ??  RK/3  10:19PM    2:55.27 (idle3)
root     84574 99.0  0.0     0     0 ??  RK/4  10:19PM    2:54.90 (idle4)
root     53490 99.0  0.0     0     0 ??  RK/5  10:19PM    2:49.72 (idle5)
root     59318 73.2  0.0     0     0 ??  DK    10:19PM    2:53.36 (idle0)
root     24358  0.0  0.0     0     0 ??  RK/1  10:19PM    2:54.66 (idle1)
root     15902  0.0  0.0     0     0 ??  RK/2  10:19PM    2:55.04 (idle2)

This effect goes away after runnning multiple processes on all cores.

USER       PID %CPU %MEM   VSZ   RSS TT  STAT  STARTED       TIME COMMAND
root     55382 11.7  0.0     0     0 ??  RK/3  10:19PM    4:06.52 (idle3)
root     84574 11.6  0.0     0     0 ??  DK    10:19PM    4:12.00 (idle4)
root     53490 11.7  0.0     0     0 ??  RK/5  10:19PM    4:11.94 (idle5)
root     11006 11.7  0.0     0     0 ??  RK/6  10:19PM    3:48.90 (idle6)
root     26529  1.3  0.0     0     0 ??  RK/7  10:19PM    4:11.81 (idle7)
root     59318  0.4  0.0     0     0 ??  DK    10:19PM    4:09.78 (idle0)
root     24358  0.0  0.0     0     0 ??  RK/1  10:19PM    4:11.37 (idle1)
root     15902  0.0  0.0     0     0 ??  RK/2  10:19PM    4:11.87 (idle2)

If I initialize p_slptime with 127 the 99% effect does not happen
from the beginning.

USER       PID %CPU %MEM   VSZ   RSS TT  STAT  STARTED       TIME COMMAND
root     86374  0.0  0.0     0     0 ??  RK/1  10:18PM    3:10.15 (idle1)
root     78654  0.0  0.0     0     0 ??  DK    10:18PM    3:09.52 (idle2)
root     85088  0.0  0.0     0     0 ??  DK    10:18PM    3:10.82 (idle3)
root     34080  0.0  0.0     0     0 ??  RK/4  10:18PM    2:06.89 (idle4)
root     65391  0.0  0.0     0     0 ??  RK/5  10:18PM    0:20.39 (idle5)
root      6812  0.0  0.0     0     0 ??  RK/6  10:18PM    0:36.33 (idle6)
root      6433  0.0  0.0     0     0 ??  RK/7  10:18PM    0:53.09 (idle7)
root     87244  0.0  0.0     0     0 ??  RK/0  10:18PM    3:07.58 (idle0)

There are still things I don't understand.  After a while the CPU
time for idle5, idle6, idle7 does not increase anymore.  I am doing
iperf3 performance tests on this machine.  My patch makes the results
more unsteady and throughput lower.  It seems that iperf3 processes
get scheduled on CPUs with less memory affinity.

bluhm

Index: kern/kern_sched.c
===================================================================
RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_sched.c,v
retrieving revision 1.54
diff -u -p -r1.54 kern_sched.c
--- kern/kern_sched.c 17 Nov 2018 23:10:08 -0000 1.54
+++ kern/kern_sched.c 17 Dec 2018 19:35:33 -0000
@@ -145,6 +145,7 @@ sched_idle(void *v)
  */
  SCHED_LOCK(s);
  cpuset_add(&sched_idle_cpus, ci);
+ p->p_slptime = 127;
  p->p_stat = SSLEEP;
  p->p_cpu = ci;
  atomic_setbits_int(&p->p_flag, P_CPUPEG);

Reply | Threaded
Open this post in threaded view
|

Re: idle threads say 100%

Martin Pieuchot
This is fun.

On 17/12/18(Mon) 22:30, Alexander Bluhm wrote:

> On Fri, Nov 02, 2018 at 10:19:15PM +0100, Mark Kettenis wrote:
> > I think Ted is pointing out a different issue here that doesn't really
> > have anything to do with SMT.  The issue is that in some cases we end
> > up with CPUs reporting being 100% busy running the idle thread instead
> > of reporting being 100% idle.  This happens quite a lot on machines
> > with lots of CPUs immediately after they are booted.  Usually this
> > funny state disappears after some time.
> >
> > An idle CPU is of course running the idle thread, so in that sense
> > this isn't super-strange.  But it does indicate there is some kind of
> > accounting issue.  I have a feeling this happens before any processes
> > have been scheduled on these CPUs.  But I've never found the problem...
>
> I can easily reproduce this on a 2 socket machine with 4 cores each.
> Hyper threads are turned off in BIOS.
>
> schedcpu() does not account the run time if p->p_slptime > 1.  So
> fresh idle threads have a CPU percentage of 99%.

First note that `p_slptime' is supposed to be reset before/after calling
mi_switch().  However the idle thread isn't doing that 8)

Until a thread is scheduled on a secondary CPU, mi_switch() will always
select the Idle thread and set `p_stat' to SONPROC.  This will cause
schedcpu() to calculate `p_pctcpu` for Idle threads until real threads
get scheduled.  This will happen until schedcpu() increases `p_slptime'
twice for every Idle thread.  For that to happen schedcpu() needs to
look at `p_stat' when the Idle thread isn't running.  Since the timer
is set to run every second, it make take some time for this to happen :)

That's why bluhm@'s diff works.

> There are still things I don't understand.  After a while the CPU
> time for idle5, idle6, idle7 does not increase anymore.  I am doing
> iperf3 performance tests on this machine.  My patch makes the results
> more unsteady and throughput lower.  It seems that iperf3 processes
> get scheduled on CPUs with less memory affinity.

What do you mean with CPU time?

Since you're doing tests, could you try the diff below?  It stops
recalculating the priority of Idle threads.  The first effect is fewer
contention on the SCHED_LOCK() and it also implicitly document the
current fun :)

I left an "#if 0" below.  You could try enabling it.  Does it change
anything?  I hope not.

Index: kern/kern_clock.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_clock.c,v
retrieving revision 1.97
diff -u -p -r1.97 kern_clock.c
--- kern/kern_clock.c 17 Oct 2018 12:25:38 -0000 1.97
+++ kern/kern_clock.c 22 Jan 2019 13:07:25 -0000
@@ -400,8 +400,7 @@ statclock(struct clockframe *frame)
  * ~~16 Hz is best
  */
  if (schedhz == 0) {
- if ((++curcpu()->ci_schedstate.spc_schedticks & 3) ==
-    0)
+ if ((++spc->spc_schedticks & 3) == 0)
  schedclock(p);
  }
  }
Index: kern/sched_bsd.c
===================================================================
RCS file: /cvs/src/sys/kern/sched_bsd.c,v
retrieving revision 1.48
diff -u -p -r1.48 sched_bsd.c
--- kern/sched_bsd.c 6 Jan 2019 12:59:45 -0000 1.48
+++ kern/sched_bsd.c 22 Jan 2019 14:15:28 -0000
@@ -218,6 +218,13 @@ schedcpu(void *arg)
 
  LIST_FOREACH(p, &allproc, p_list) {
  /*
+ * Idle threads are never placed in runqueue, therefore
+ * computing their priority is pointless.
+ */
+ if (p->p_cpu != NULL &&
+    p->p_cpu->ci_schedstate.spc_idleproc == p)
+ continue;
+ /*
  * Increment sleep time (if sleeping). We ignore overflow.
  */
  if (p->p_stat == SSLEEP || p->p_stat == SSTOP)
@@ -528,7 +535,17 @@ resetpriority(struct proc *p)
 void
 schedclock(struct proc *p)
 {
+ struct cpu_info *ci = curcpu();
+ struct schedstate_percpu *spc = &ci->ci_schedstate;
  int s;
+
+ if (p == spc->spc_idleproc) {
+#if 0
+ if (spc->spc_nrun)
+ need_resched(ci);
+#endif
+ return;
+ }
 
  SCHED_LOCK(s);
  p->p_estcpu = ESTCPULIM(p->p_estcpu + 1);

Reply | Threaded
Open this post in threaded view
|

Re: idle threads say 100%

Alexander Bluhm
On Tue, Jan 22, 2019 at 12:19:11PM -0200, Martin Pieuchot wrote:
> > There are still things I don't understand.  After a while the CPU
> > time for idle5, idle6, idle7 does not increase anymore.  I am doing
> > iperf3 performance tests on this machine.  My patch makes the results
> > more unsteady and throughput lower.  It seems that iperf3 processes
> > get scheduled on CPUs with less memory affinity.
>
> What do you mean with CPU time?

The TIME reported in ps or top for a process.  For idle threads on
idle CPUs it is not reported corrrectly.  There needs to be a real
process scheduled on this core for updating the time.

> Since you're doing tests, could you try the diff below?  It stops
> recalculating the priority of Idle threads.  The first effect is fewer
> contention on the SCHED_LOCK() and it also implicitly document the
> current fun :)

This diff works as expected, CPU percentage of idle thread is now
0%.

> I left an "#if 0" below.  You could try enabling it.  Does it change
> anything?  I hope not.

Same behavior with #if 1.

bluhm

> Index: kern/kern_clock.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/kern_clock.c,v
> retrieving revision 1.97
> diff -u -p -r1.97 kern_clock.c
> --- kern/kern_clock.c 17 Oct 2018 12:25:38 -0000 1.97
> +++ kern/kern_clock.c 22 Jan 2019 13:07:25 -0000
> @@ -400,8 +400,7 @@ statclock(struct clockframe *frame)
>   * ~~16 Hz is best
>   */
>   if (schedhz == 0) {
> - if ((++curcpu()->ci_schedstate.spc_schedticks & 3) ==
> -    0)
> + if ((++spc->spc_schedticks & 3) == 0)
>   schedclock(p);
>   }
>   }
> Index: kern/sched_bsd.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/sched_bsd.c,v
> retrieving revision 1.48
> diff -u -p -r1.48 sched_bsd.c
> --- kern/sched_bsd.c 6 Jan 2019 12:59:45 -0000 1.48
> +++ kern/sched_bsd.c 22 Jan 2019 14:15:28 -0000
> @@ -218,6 +218,13 @@ schedcpu(void *arg)
>  
>   LIST_FOREACH(p, &allproc, p_list) {
>   /*
> + * Idle threads are never placed in runqueue, therefore
> + * computing their priority is pointless.
> + */
> + if (p->p_cpu != NULL &&
> +    p->p_cpu->ci_schedstate.spc_idleproc == p)
> + continue;
> + /*
>   * Increment sleep time (if sleeping). We ignore overflow.
>   */
>   if (p->p_stat == SSLEEP || p->p_stat == SSTOP)
> @@ -528,7 +535,17 @@ resetpriority(struct proc *p)
>  void
>  schedclock(struct proc *p)
>  {
> + struct cpu_info *ci = curcpu();
> + struct schedstate_percpu *spc = &ci->ci_schedstate;
>   int s;
> +
> + if (p == spc->spc_idleproc) {
> +#if 0
> + if (spc->spc_nrun)
> + need_resched(ci);
> +#endif
> + return;
> + }
>  
>   SCHED_LOCK(s);
>   p->p_estcpu = ESTCPULIM(p->p_estcpu + 1);

Reply | Threaded
Open this post in threaded view
|

Re: idle threads say 100%

Martin Pieuchot
On 25/01/19(Fri) 02:24, Alexander Bluhm wrote:

> On Tue, Jan 22, 2019 at 12:19:11PM -0200, Martin Pieuchot wrote:
> > > There are still things I don't understand.  After a while the CPU
> > > time for idle5, idle6, idle7 does not increase anymore.  I am doing
> > > iperf3 performance tests on this machine.  My patch makes the results
> > > more unsteady and throughput lower.  It seems that iperf3 processes
> > > get scheduled on CPUs with less memory affinity.
> >
> > What do you mean with CPU time?
>
> The TIME reported in ps or top for a process.  For idle threads on
> idle CPUs it is not reported corrrectly.  There needs to be a real
> process scheduled on this core for updating the time.
>
> > Since you're doing tests, could you try the diff below?  It stops
> > recalculating the priority of Idle threads.  The first effect is fewer
> > contention on the SCHED_LOCK() and it also implicitly document the
> > current fun :)
>
> This diff works as expected, CPU percentage of idle thread is now
> 0%.
>
> > I left an "#if 0" below.  You could try enabling it.  Does it change
> > anything?  I hope not.
>
> Same behavior with #if 1.

In that case I'd like to commit it without the "#ifdef chunk".  Stop
accounting/updating priorities of Idle threads.  Ok?

Index: kern/kern_clock.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_clock.c,v
retrieving revision 1.97
diff -u -p -r1.97 kern_clock.c
--- kern/kern_clock.c 17 Oct 2018 12:25:38 -0000 1.97
+++ kern/kern_clock.c 26 Jan 2019 16:26:08 -0000
@@ -400,8 +400,7 @@ statclock(struct clockframe *frame)
  * ~~16 Hz is best
  */
  if (schedhz == 0) {
- if ((++curcpu()->ci_schedstate.spc_schedticks & 3) ==
-    0)
+ if ((++spc->spc_schedticks & 3) == 0)
  schedclock(p);
  }
  }
Index: kern/sched_bsd.c
===================================================================
RCS file: /cvs/src/sys/kern/sched_bsd.c,v
retrieving revision 1.48
diff -u -p -r1.48 sched_bsd.c
--- kern/sched_bsd.c 6 Jan 2019 12:59:45 -0000 1.48
+++ kern/sched_bsd.c 26 Jan 2019 16:26:22 -0000
@@ -218,6 +218,13 @@ schedcpu(void *arg)
 
  LIST_FOREACH(p, &allproc, p_list) {
  /*
+ * Idle threads are never placed in runqueue, therefore
+ * computing their priority is pointless.
+ */
+ if (p->p_cpu != NULL &&
+    p->p_cpu->ci_schedstate.spc_idleproc == p)
+ continue;
+ /*
  * Increment sleep time (if sleeping). We ignore overflow.
  */
  if (p->p_stat == SSLEEP || p->p_stat == SSTOP)
@@ -528,7 +535,12 @@ resetpriority(struct proc *p)
 void
 schedclock(struct proc *p)
 {
+ struct cpu_info *ci = curcpu();
+ struct schedstate_percpu *spc = &ci->ci_schedstate;
  int s;
+
+ if (p == spc->spc_idleproc)
+ return;
 
  SCHED_LOCK(s);
  p->p_estcpu = ESTCPULIM(p->p_estcpu + 1);

Reply | Threaded
Open this post in threaded view
|

Re: idle threads say 100%

Mark Kettenis
> Date: Sat, 26 Jan 2019 14:28:19 -0200
> From: Martin Pieuchot <[hidden email]>
>
> On 25/01/19(Fri) 02:24, Alexander Bluhm wrote:
> > On Tue, Jan 22, 2019 at 12:19:11PM -0200, Martin Pieuchot wrote:
> > > > There are still things I don't understand.  After a while the CPU
> > > > time for idle5, idle6, idle7 does not increase anymore.  I am doing
> > > > iperf3 performance tests on this machine.  My patch makes the results
> > > > more unsteady and throughput lower.  It seems that iperf3 processes
> > > > get scheduled on CPUs with less memory affinity.
> > >
> > > What do you mean with CPU time?
> >
> > The TIME reported in ps or top for a process.  For idle threads on
> > idle CPUs it is not reported corrrectly.  There needs to be a real
> > process scheduled on this core for updating the time.
> >
> > > Since you're doing tests, could you try the diff below?  It stops
> > > recalculating the priority of Idle threads.  The first effect is fewer
> > > contention on the SCHED_LOCK() and it also implicitly document the
> > > current fun :)
> >
> > This diff works as expected, CPU percentage of idle thread is now
> > 0%.
> >
> > > I left an "#if 0" below.  You could try enabling it.  Does it change
> > > anything?  I hope not.
> >
> > Same behavior with #if 1.
>
> In that case I'd like to commit it without the "#ifdef chunk".  Stop
> accounting/updating priorities of Idle threads.  Ok?

Skight improvement of the comment text below.  Otherwise ok kettenis@

Good to see this bug finally fixed!


> Index: kern/kern_clock.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/kern_clock.c,v
> retrieving revision 1.97
> diff -u -p -r1.97 kern_clock.c
> --- kern/kern_clock.c 17 Oct 2018 12:25:38 -0000 1.97
> +++ kern/kern_clock.c 26 Jan 2019 16:26:08 -0000
> @@ -400,8 +400,7 @@ statclock(struct clockframe *frame)
>   * ~~16 Hz is best
>   */
>   if (schedhz == 0) {
> - if ((++curcpu()->ci_schedstate.spc_schedticks & 3) ==
> -    0)
> + if ((++spc->spc_schedticks & 3) == 0)
>   schedclock(p);
>   }
>   }
> Index: kern/sched_bsd.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/sched_bsd.c,v
> retrieving revision 1.48
> diff -u -p -r1.48 sched_bsd.c
> --- kern/sched_bsd.c 6 Jan 2019 12:59:45 -0000 1.48
> +++ kern/sched_bsd.c 26 Jan 2019 16:26:22 -0000
> @@ -218,6 +218,13 @@ schedcpu(void *arg)
>  
>   LIST_FOREACH(p, &allproc, p_list) {
>   /*
> + * Idle threads are never placed in runqueue, therefore
> + * computing their priority is pointless.

"never placed in runqueue" -> "never placed on the run queue"

> + */
> + if (p->p_cpu != NULL &&
> +    p->p_cpu->ci_schedstate.spc_idleproc == p)
> + continue;
> + /*
>   * Increment sleep time (if sleeping). We ignore overflow.
>   */
>   if (p->p_stat == SSLEEP || p->p_stat == SSTOP)
> @@ -528,7 +535,12 @@ resetpriority(struct proc *p)
>  void
>  schedclock(struct proc *p)
>  {
> + struct cpu_info *ci = curcpu();
> + struct schedstate_percpu *spc = &ci->ci_schedstate;
>   int s;
> +
> + if (p == spc->spc_idleproc)
> + return;
>  
>   SCHED_LOCK(s);
>   p->p_estcpu = ESTCPULIM(p->p_estcpu + 1);
>