I don't get where the load comes from

classic Classic list List threaded Threaded
37 messages Options
12
Reply | Threaded
Open this post in threaded view
|

I don't get where the load comes from

Joel Carnat
Hi,

I am running a personal Mail+Web system on a Core2Duo 2GHz using Speedstep.
It is mostly doing nothing but still has a high load average.

I've check various stat tools but didn't find the reason for the load.

Anyone has ideas?

TIA,
        Jo

PS: here are some of the results I checked.

# uname -a
OpenBSD bagheera.tumfatig.net 4.9 GENERIC.MP#819 amd64

# sysctl hw
hw.machine=amd64
hw.model=Intel(R) Core(TM)2 Duo CPU T7300 @ 2.00GHz
hw.ncpu=2
hw.byteorder=1234
hw.pagesize=4096
hw.disknames=cd0:,sd0:01d3664288919ae7
hw.diskcount=2
hw.sensors.cpu0.temp0=45.00 degC
hw.sensors.cpu1.temp0=45.00 degC
hw.sensors.acpitz0.temp0=45.50 degC (zone temperature)
hw.sensors.acpiac0.indicator0=On (power supply)
hw.sensors.acpibat0.volt0=11.10 VDC (voltage)
hw.sensors.acpibat0.volt1=12.71 VDC (current voltage)
hw.sensors.acpibat0.amphour0=4.61 Ah (last full capacity)
hw.sensors.acpibat0.amphour1=0.52 Ah (warning capacity)
hw.sensors.acpibat0.amphour2=0.16 Ah (low capacity)
hw.sensors.acpibat0.amphour3=5.20 Ah (remaining capacity), OK
hw.sensors.acpibat0.raw0=0 (battery full), OK
hw.sensors.acpibat0.raw1=1 (rate)
hw.cpuspeed=800
hw.setperf=0
hw.vendor=Dell Inc.
hw.product=XPS M1330
hw.serialno=CK0W33J
hw.uuid=44454c4c-4b00-1030-8057-c3c04f33334a
hw.physmem=3747008512
hw.usermem=3734933504
hw.ncpufound=2

# top -n -o cpu -T
load averages:  1.19,  1.14,  0.99    bagheera.tumfatig.net 23:39:09
78 processes:  77 idle, 1 on processor
CPU0 states:  1.8% user,  0.0% nice,  0.7% system,  0.1% interrupt, 97.4%
idle
CPU1 states:  2.4% user,  0.0% nice,  0.8% system,  0.0% interrupt, 96.8%
idle
Memory: Real: 238M/656M act/tot  Free: 2809M  Swap: 0K/8197M used/tot

  PID USERNAME PRI NICE  SIZE   RES STATE     WAIT      TIME    CPU COMMAND
 3230 root       2    0 2156K 3152K sleep/1   netio     0:00  0.20% sshd
 1867 sshd       2    0 2148K 2368K sleep/0   select    0:00  0.05% sshd
19650 www       14    0 5640K   30M sleep/0   semwait   0:59  0.00% httpd
 4225 www       14    0 5984K   42M sleep/1   semwait   0:58  0.00% httpd
 3624 www       14    0 5644K   30M sleep/1   semwait   0:53  0.00% httpd
24875 www       14    0 5740K   32M sleep/1   semwait   0:52  0.00% httpd
22848 www       14    0 5724K   30M sleep/1   semwait   0:50  0.00% httpd
13508 www       14    0 5832K   31M sleep/1   semwait   0:48  0.00% httpd
24210 www       14    0 5652K   30M sleep/1   semwait   0:48  0.00% httpd
  510 www       14    0 5660K   30M sleep/1   semwait   0:46  0.00% httpd
20258 www        2    0 5536K   32M sleep/0   select    0:46  0.00% httpd
 6543 www       14    0 5772K   32M sleep/0   semwait   0:43  0.00% httpd
 9783 _mysql     2    0   55M   30M sleep/1   poll      0:20  0.00% mysqld
19071 root       2    0  640K 1416K sleep/1   select    0:09  0.00% sshd
10389 root       2    0 3376K 2824K sleep/0   poll      0:07  0.00% monit
21695 _sogo      2    0 7288K   18M sleep/1   poll      0:05  0.00% sogod
 1888 named      2    0   20M   21M sleep/1   select    0:05  0.00% named
18781 _sogo      2    0   15M   29M sleep/1   poll      0:04  0.00% sogod

# iostat -c 10 -w 1
      tty            cd0             sd0             cpu
 tin tout  KB/t t/s MB/s   KB/t t/s MB/s  us ni sy in id
   0    7  0.00   0 0.00  20.64   7 0.14   2  0  1  0 97
   0  174  0.00   0 0.00   0.00   0 0.00   0  0  0  0100
   0   57  0.00   0 0.00   0.00   0 0.00   1  0  2  0 97
   0   57  0.00   0 0.00  32.00  17 0.53   1  0  1  0 98
   0   58  0.00   0 0.00   0.00   0 0.00   7  0  7  0 86
   0   57  0.00   0 0.00   0.00   0 0.00   1  0  1  0 98
   0   57  0.00   0 0.00   0.00   0 0.00   1  0  1  0 98
   0   57  0.00   0 0.00   0.00   0 0.00   2  0  0  0 98
   0   57  0.00   0 0.00   4.00   1 0.00   0  0  1  0 99
   0   58  0.00   0 0.00   0.00   0 0.00   1  0  0  1 98

# vmstat -c 10 -w 1
 procs    memory       page                    disks    traps          cpu
 r b w    avm     fre  flt  re  pi  po  fr  sr cd0 sd0  int   sys   cs us sy
id
 1 1 0 243420 2866736  655   0   0   0   0   0   0   1   15  1828   77  2  1
97
 0 1 0 243636 2866336  234   0   0   0   0   0   0   0   10   540   47  0  1
99
 0 1 0 243668 2866304   95   0   0   0   0   0   0   0   17   329   44  1  0
99
 0 1 0 242848 2867552  644   0   0   0   0   0   0   0    8  1445  115  1  1
98
 0 1 0 243612 2866352 1076   0   0   0   0   0   0   0    9  2436   44  0  2
98
 0 1 0 243668 2866288  117   0   0   0   0   0   0   0    7   369   46  1  1
98
 0 1 0 243836 2866112  337   0   0   0   0   0   0   0    7   818   86  0  1
99
 0 1 0 243428 2866728 1216   0   0   0   0   0   0   0   11  2920   69  1  2
97
 0 1 0 243640 2866332  212   0   0   0   0   0   0   0    6   313   38  1  0
99
 0 1 0 243684 2866284   96   0   0   0   0   0   0   0    8   334   48  1  0
99

Reply | Threaded
Open this post in threaded view
|

Re: I don't get where the load comes from

Paul de Weerd
On Mon, May 30, 2011 at 11:44:29PM +0200, Joel Carnat wrote:
| Hi,
|
| I am running a personal Mail+Web system on a Core2Duo 2GHz using Speedstep.
| It is mostly doing nothing but still has a high load average.

Wait, what ?  ~1 is 'a high load average' now ?  What are that
database and webserver doing on your machine 'doing nothing' ?  What
other processes do you have running ?  Note that you don't have to use
lots of CPU to get a (really) high load...

Do you see a lot of interrupts perhaps ?  Try `systat -s1 vm` or
`vmstat -i`.

Paul 'WEiRD' de Weerd

| I've check various stat tools but didn't find the reason for the load.
|
| Anyone has ideas?
|
| TIA,
| Jo
|
| PS: here are some of the results I checked.
|
| # uname -a
| OpenBSD bagheera.tumfatig.net 4.9 GENERIC.MP#819 amd64
|
| # sysctl hw
| hw.machine=amd64
| hw.model=Intel(R) Core(TM)2 Duo CPU T7300 @ 2.00GHz
| hw.ncpu=2
| hw.byteorder=1234
| hw.pagesize=4096
| hw.disknames=cd0:,sd0:01d3664288919ae7
| hw.diskcount=2
| hw.sensors.cpu0.temp0=45.00 degC
| hw.sensors.cpu1.temp0=45.00 degC
| hw.sensors.acpitz0.temp0=45.50 degC (zone temperature)
| hw.sensors.acpiac0.indicator0=On (power supply)
| hw.sensors.acpibat0.volt0=11.10 VDC (voltage)
| hw.sensors.acpibat0.volt1=12.71 VDC (current voltage)
| hw.sensors.acpibat0.amphour0=4.61 Ah (last full capacity)
| hw.sensors.acpibat0.amphour1=0.52 Ah (warning capacity)
| hw.sensors.acpibat0.amphour2=0.16 Ah (low capacity)
| hw.sensors.acpibat0.amphour3=5.20 Ah (remaining capacity), OK
| hw.sensors.acpibat0.raw0=0 (battery full), OK
| hw.sensors.acpibat0.raw1=1 (rate)
| hw.cpuspeed=800
| hw.setperf=0
| hw.vendor=Dell Inc.
| hw.product=XPS M1330
| hw.serialno=CK0W33J
| hw.uuid=44454c4c-4b00-1030-8057-c3c04f33334a
| hw.physmem=3747008512
| hw.usermem=3734933504
| hw.ncpufound=2
|
| # top -n -o cpu -T
| load averages:  1.19,  1.14,  0.99    bagheera.tumfatig.net 23:39:09
| 78 processes:  77 idle, 1 on processor
| CPU0 states:  1.8% user,  0.0% nice,  0.7% system,  0.1% interrupt, 97.4%
| idle
| CPU1 states:  2.4% user,  0.0% nice,  0.8% system,  0.0% interrupt, 96.8%
| idle
| Memory: Real: 238M/656M act/tot  Free: 2809M  Swap: 0K/8197M used/tot
|
|   PID USERNAME PRI NICE  SIZE   RES STATE     WAIT      TIME    CPU COMMAND
|  3230 root       2    0 2156K 3152K sleep/1   netio     0:00  0.20% sshd
|  1867 sshd       2    0 2148K 2368K sleep/0   select    0:00  0.05% sshd
| 19650 www       14    0 5640K   30M sleep/0   semwait   0:59  0.00% httpd
|  4225 www       14    0 5984K   42M sleep/1   semwait   0:58  0.00% httpd
|  3624 www       14    0 5644K   30M sleep/1   semwait   0:53  0.00% httpd
| 24875 www       14    0 5740K   32M sleep/1   semwait   0:52  0.00% httpd
| 22848 www       14    0 5724K   30M sleep/1   semwait   0:50  0.00% httpd
| 13508 www       14    0 5832K   31M sleep/1   semwait   0:48  0.00% httpd
| 24210 www       14    0 5652K   30M sleep/1   semwait   0:48  0.00% httpd
|   510 www       14    0 5660K   30M sleep/1   semwait   0:46  0.00% httpd
| 20258 www        2    0 5536K   32M sleep/0   select    0:46  0.00% httpd
|  6543 www       14    0 5772K   32M sleep/0   semwait   0:43  0.00% httpd
|  9783 _mysql     2    0   55M   30M sleep/1   poll      0:20  0.00% mysqld
| 19071 root       2    0  640K 1416K sleep/1   select    0:09  0.00% sshd
| 10389 root       2    0 3376K 2824K sleep/0   poll      0:07  0.00% monit
| 21695 _sogo      2    0 7288K   18M sleep/1   poll      0:05  0.00% sogod
|  1888 named      2    0   20M   21M sleep/1   select    0:05  0.00% named
| 18781 _sogo      2    0   15M   29M sleep/1   poll      0:04  0.00% sogod
|
| # iostat -c 10 -w 1
|       tty            cd0             sd0             cpu
|  tin tout  KB/t t/s MB/s   KB/t t/s MB/s  us ni sy in id
|    0    7  0.00   0 0.00  20.64   7 0.14   2  0  1  0 97
|    0  174  0.00   0 0.00   0.00   0 0.00   0  0  0  0100
|    0   57  0.00   0 0.00   0.00   0 0.00   1  0  2  0 97
|    0   57  0.00   0 0.00  32.00  17 0.53   1  0  1  0 98
|    0   58  0.00   0 0.00   0.00   0 0.00   7  0  7  0 86
|    0   57  0.00   0 0.00   0.00   0 0.00   1  0  1  0 98
|    0   57  0.00   0 0.00   0.00   0 0.00   1  0  1  0 98
|    0   57  0.00   0 0.00   0.00   0 0.00   2  0  0  0 98
|    0   57  0.00   0 0.00   4.00   1 0.00   0  0  1  0 99
|    0   58  0.00   0 0.00   0.00   0 0.00   1  0  0  1 98
|
| # vmstat -c 10 -w 1
|  procs    memory       page                    disks    traps          cpu
|  r b w    avm     fre  flt  re  pi  po  fr  sr cd0 sd0  int   sys   cs us sy
| id
|  1 1 0 243420 2866736  655   0   0   0   0   0   0   1   15  1828   77  2  1
| 97
|  0 1 0 243636 2866336  234   0   0   0   0   0   0   0   10   540   47  0  1
| 99
|  0 1 0 243668 2866304   95   0   0   0   0   0   0   0   17   329   44  1  0
| 99
|  0 1 0 242848 2867552  644   0   0   0   0   0   0   0    8  1445  115  1  1
| 98
|  0 1 0 243612 2866352 1076   0   0   0   0   0   0   0    9  2436   44  0  2
| 98
|  0 1 0 243668 2866288  117   0   0   0   0   0   0   0    7   369   46  1  1
| 98
|  0 1 0 243836 2866112  337   0   0   0   0   0   0   0    7   818   86  0  1
| 99
|  0 1 0 243428 2866728 1216   0   0   0   0   0   0   0   11  2920   69  1  2
| 97
|  0 1 0 243640 2866332  212   0   0   0   0   0   0   0    6   313   38  1  0
| 99
|  0 1 0 243684 2866284   96   0   0   0   0   0   0   0    8   334   48  1  0
| 99
|

--
>++++++++[<++++++++++>-]<+++++++.>+++[<------>-]<.>+++[<+
+++++++++++>-]<.>++[<------------>-]<+.--------------.[-]
                 http://www.weirdnet.nl/                 

Reply | Threaded
Open this post in threaded view
|

Re: I don't get where the load comes from

Gonzalo L. Rodriguez
In reply to this post by Joel Carnat
Take a look of this

http://undeadly.org/cgi?action=article&sid=20090715034920

El 05/30/11 18:44, Joel Carnat escribis:

> Hi,
>
> I am running a personal Mail+Web system on a Core2Duo 2GHz using Speedstep.
> It is mostly doing nothing but still has a high load average.
>
> I've check various stat tools but didn't find the reason for the load.
>
> Anyone has ideas?
>
> TIA,
> Jo
>
> PS: here are some of the results I checked.
>
> # uname -a
> OpenBSD bagheera.tumfatig.net 4.9 GENERIC.MP#819 amd64
>
> # sysctl hw
> hw.machine=amd64
> hw.model=Intel(R) Core(TM)2 Duo CPU T7300 @ 2.00GHz
> hw.ncpu=2
> hw.byteorder=1234
> hw.pagesize=4096
> hw.disknames=cd0:,sd0:01d3664288919ae7
> hw.diskcount=2
> hw.sensors.cpu0.temp0=45.00 degC
> hw.sensors.cpu1.temp0=45.00 degC
> hw.sensors.acpitz0.temp0=45.50 degC (zone temperature)
> hw.sensors.acpiac0.indicator0=On (power supply)
> hw.sensors.acpibat0.volt0=11.10 VDC (voltage)
> hw.sensors.acpibat0.volt1=12.71 VDC (current voltage)
> hw.sensors.acpibat0.amphour0=4.61 Ah (last full capacity)
> hw.sensors.acpibat0.amphour1=0.52 Ah (warning capacity)
> hw.sensors.acpibat0.amphour2=0.16 Ah (low capacity)
> hw.sensors.acpibat0.amphour3=5.20 Ah (remaining capacity), OK
> hw.sensors.acpibat0.raw0=0 (battery full), OK
> hw.sensors.acpibat0.raw1=1 (rate)
> hw.cpuspeed=800
> hw.setperf=0
> hw.vendor=Dell Inc.
> hw.product=XPS M1330
> hw.serialno=CK0W33J
> hw.uuid=44454c4c-4b00-1030-8057-c3c04f33334a
> hw.physmem=3747008512
> hw.usermem=3734933504
> hw.ncpufound=2
>
> # top -n -o cpu -T
> load averages:  1.19,  1.14,  0.99    bagheera.tumfatig.net 23:39:09
> 78 processes:  77 idle, 1 on processor
> CPU0 states:  1.8% user,  0.0% nice,  0.7% system,  0.1% interrupt, 97.4%
> idle
> CPU1 states:  2.4% user,  0.0% nice,  0.8% system,  0.0% interrupt, 96.8%
> idle
> Memory: Real: 238M/656M act/tot  Free: 2809M  Swap: 0K/8197M used/tot
>
>    PID USERNAME PRI NICE  SIZE   RES STATE     WAIT      TIME    CPU COMMAND
>   3230 root       2    0 2156K 3152K sleep/1   netio     0:00  0.20% sshd
>   1867 sshd       2    0 2148K 2368K sleep/0   select    0:00  0.05% sshd
> 19650 www       14    0 5640K   30M sleep/0   semwait   0:59  0.00% httpd
>   4225 www       14    0 5984K   42M sleep/1   semwait   0:58  0.00% httpd
>   3624 www       14    0 5644K   30M sleep/1   semwait   0:53  0.00% httpd
> 24875 www       14    0 5740K   32M sleep/1   semwait   0:52  0.00% httpd
> 22848 www       14    0 5724K   30M sleep/1   semwait   0:50  0.00% httpd
> 13508 www       14    0 5832K   31M sleep/1   semwait   0:48  0.00% httpd
> 24210 www       14    0 5652K   30M sleep/1   semwait   0:48  0.00% httpd
>    510 www       14    0 5660K   30M sleep/1   semwait   0:46  0.00% httpd
> 20258 www        2    0 5536K   32M sleep/0   select    0:46  0.00% httpd
>   6543 www       14    0 5772K   32M sleep/0   semwait   0:43  0.00% httpd
>   9783 _mysql     2    0   55M   30M sleep/1   poll      0:20  0.00% mysqld
> 19071 root       2    0  640K 1416K sleep/1   select    0:09  0.00% sshd
> 10389 root       2    0 3376K 2824K sleep/0   poll      0:07  0.00% monit
> 21695 _sogo      2    0 7288K   18M sleep/1   poll      0:05  0.00% sogod
>   1888 named      2    0   20M   21M sleep/1   select    0:05  0.00% named
> 18781 _sogo      2    0   15M   29M sleep/1   poll      0:04  0.00% sogod
>
> # iostat -c 10 -w 1
>        tty            cd0             sd0             cpu
>   tin tout  KB/t t/s MB/s   KB/t t/s MB/s  us ni sy in id
>     0    7  0.00   0 0.00  20.64   7 0.14   2  0  1  0 97
>     0  174  0.00   0 0.00   0.00   0 0.00   0  0  0  0100
>     0   57  0.00   0 0.00   0.00   0 0.00   1  0  2  0 97
>     0   57  0.00   0 0.00  32.00  17 0.53   1  0  1  0 98
>     0   58  0.00   0 0.00   0.00   0 0.00   7  0  7  0 86
>     0   57  0.00   0 0.00   0.00   0 0.00   1  0  1  0 98
>     0   57  0.00   0 0.00   0.00   0 0.00   1  0  1  0 98
>     0   57  0.00   0 0.00   0.00   0 0.00   2  0  0  0 98
>     0   57  0.00   0 0.00   4.00   1 0.00   0  0  1  0 99
>     0   58  0.00   0 0.00   0.00   0 0.00   1  0  0  1 98
>
> # vmstat -c 10 -w 1
>   procs    memory       page                    disks    traps          cpu
>   r b w    avm     fre  flt  re  pi  po  fr  sr cd0 sd0  int   sys   cs us sy
> id
>   1 1 0 243420 2866736  655   0   0   0   0   0   0   1   15  1828   77  2  1
> 97
>   0 1 0 243636 2866336  234   0   0   0   0   0   0   0   10   540   47  0  1
> 99
>   0 1 0 243668 2866304   95   0   0   0   0   0   0   0   17   329   44  1  0
> 99
>   0 1 0 242848 2867552  644   0   0   0   0   0   0   0    8  1445  115  1  1
> 98
>   0 1 0 243612 2866352 1076   0   0   0   0   0   0   0    9  2436   44  0  2
> 98
>   0 1 0 243668 2866288  117   0   0   0   0   0   0   0    7   369   46  1  1
> 98
>   0 1 0 243836 2866112  337   0   0   0   0   0   0   0    7   818   86  0  1
> 99
>   0 1 0 243428 2866728 1216   0   0   0   0   0   0   0   11  2920   69  1  2
> 97
>   0 1 0 243640 2866332  212   0   0   0   0   0   0   0    6   313   38  1  0
> 99
>   0 1 0 243684 2866284   96   0   0   0   0   0   0   0    8   334   48  1  0
> 99
>

--
Sending from my Computer.

Reply | Threaded
Open this post in threaded view
|

Re: I don't get where the load comes from

Joel Carnat
In reply to this post by Paul de Weerd
Le 31 mai 2011 ` 00:15, Paul de Weerd a icrit :
> On Mon, May 30, 2011 at 11:44:29PM +0200, Joel Carnat wrote:
> | Hi,
> |
> | I am running a personal Mail+Web system on a Core2Duo 2GHz using
Speedstep.
> | It is mostly doing nothing but still has a high load average.
>
> Wait, what ?  ~1 is 'a high load average' now ?  What are that
> database and webserver doing on your machine 'doing nothing' ?  What
> other processes do you have running ?  Note that you don't have to use
> lots of CPU to get a (really) high load...
>

well, compared to my previous box, running NetBSD/xen, the same services
and showing about 0.3-0.6 of load ; I thought a load of 1.21 was quite much.

> Do you see a lot of interrupts perhaps ?  Try `systat -s1 vm` or
> `vmstat -i`.

# vmstat -i
interrupt                       total     rate
irq0/clock                    9709553      199
irq0/ipi                      1291416       26
irq144/acpi0                        1        0
irq145/inteldrm0                    9        0
irq96/uhci0                       117        0
irq98/ehci0                         2        0
irq97/azalia0                       1        0
irq101/wpi0                         1        0
irq101/bge0                    366615        7
irq96/ehci1                        20        0
irq101/ahci0                   332349        6
irq147/pckbc0                       6        0
irq148/pckbc0                      38        0
Total                        11700128      240


>
> Paul 'WEiRD' de Weerd
>
> | I've check various stat tools but didn't find the reason for the load.
> |
> | Anyone has ideas?
> |
> | TIA,
> | Jo
> |
> | PS: here are some of the results I checked.
> |
> | # uname -a
> | OpenBSD bagheera.tumfatig.net 4.9 GENERIC.MP#819 amd64
> |
> | # sysctl hw
> | hw.machine=amd64
> | hw.model=Intel(R) Core(TM)2 Duo CPU T7300 @ 2.00GHz
> | hw.ncpu=2
> | hw.byteorder=1234
> | hw.pagesize=4096
> | hw.disknames=cd0:,sd0:01d3664288919ae7
> | hw.diskcount=2
> | hw.sensors.cpu0.temp0=45.00 degC
> | hw.sensors.cpu1.temp0=45.00 degC
> | hw.sensors.acpitz0.temp0=45.50 degC (zone temperature)
> | hw.sensors.acpiac0.indicator0=On (power supply)
> | hw.sensors.acpibat0.volt0=11.10 VDC (voltage)
> | hw.sensors.acpibat0.volt1=12.71 VDC (current voltage)
> | hw.sensors.acpibat0.amphour0=4.61 Ah (last full capacity)
> | hw.sensors.acpibat0.amphour1=0.52 Ah (warning capacity)
> | hw.sensors.acpibat0.amphour2=0.16 Ah (low capacity)
> | hw.sensors.acpibat0.amphour3=5.20 Ah (remaining capacity), OK
> | hw.sensors.acpibat0.raw0=0 (battery full), OK
> | hw.sensors.acpibat0.raw1=1 (rate)
> | hw.cpuspeed=800
> | hw.setperf=0
> | hw.vendor=Dell Inc.
> | hw.product=XPS M1330
> | hw.serialno=CK0W33J
> | hw.uuid=44454c4c-4b00-1030-8057-c3c04f33334a
> | hw.physmem=3747008512
> | hw.usermem=3734933504
> | hw.ncpufound=2
> |
> | # top -n -o cpu -T
> | load averages:  1.19,  1.14,  0.99    bagheera.tumfatig.net 23:39:09
> | 78 processes:  77 idle, 1 on processor
> | CPU0 states:  1.8% user,  0.0% nice,  0.7% system,  0.1% interrupt, 97.4%
> | idle
> | CPU1 states:  2.4% user,  0.0% nice,  0.8% system,  0.0% interrupt, 96.8%
> | idle
> | Memory: Real: 238M/656M act/tot  Free: 2809M  Swap: 0K/8197M used/tot
> |
> |   PID USERNAME PRI NICE  SIZE   RES STATE     WAIT      TIME    CPU
COMMAND

> |  3230 root       2    0 2156K 3152K sleep/1   netio     0:00  0.20% sshd
> |  1867 sshd       2    0 2148K 2368K sleep/0   select    0:00  0.05% sshd
> | 19650 www       14    0 5640K   30M sleep/0   semwait   0:59  0.00% httpd
> |  4225 www       14    0 5984K   42M sleep/1   semwait   0:58  0.00% httpd
> |  3624 www       14    0 5644K   30M sleep/1   semwait   0:53  0.00% httpd
> | 24875 www       14    0 5740K   32M sleep/1   semwait   0:52  0.00% httpd
> | 22848 www       14    0 5724K   30M sleep/1   semwait   0:50  0.00% httpd
> | 13508 www       14    0 5832K   31M sleep/1   semwait   0:48  0.00% httpd
> | 24210 www       14    0 5652K   30M sleep/1   semwait   0:48  0.00% httpd
> |   510 www       14    0 5660K   30M sleep/1   semwait   0:46  0.00% httpd
> | 20258 www        2    0 5536K   32M sleep/0   select    0:46  0.00% httpd
> |  6543 www       14    0 5772K   32M sleep/0   semwait   0:43  0.00% httpd
> |  9783 _mysql     2    0   55M   30M sleep/1   poll      0:20  0.00%
mysqld

> | 19071 root       2    0  640K 1416K sleep/1   select    0:09  0.00% sshd
> | 10389 root       2    0 3376K 2824K sleep/0   poll      0:07  0.00% monit
> | 21695 _sogo      2    0 7288K   18M sleep/1   poll      0:05  0.00% sogod
> |  1888 named      2    0   20M   21M sleep/1   select    0:05  0.00% named
> | 18781 _sogo      2    0   15M   29M sleep/1   poll      0:04  0.00% sogod
> |
> | # iostat -c 10 -w 1
> |       tty            cd0             sd0             cpu
> |  tin tout  KB/t t/s MB/s   KB/t t/s MB/s  us ni sy in id
> |    0    7  0.00   0 0.00  20.64   7 0.14   2  0  1  0 97
> |    0  174  0.00   0 0.00   0.00   0 0.00   0  0  0  0100
> |    0   57  0.00   0 0.00   0.00   0 0.00   1  0  2  0 97
> |    0   57  0.00   0 0.00  32.00  17 0.53   1  0  1  0 98
> |    0   58  0.00   0 0.00   0.00   0 0.00   7  0  7  0 86
> |    0   57  0.00   0 0.00   0.00   0 0.00   1  0  1  0 98
> |    0   57  0.00   0 0.00   0.00   0 0.00   1  0  1  0 98
> |    0   57  0.00   0 0.00   0.00   0 0.00   2  0  0  0 98
> |    0   57  0.00   0 0.00   4.00   1 0.00   0  0  1  0 99
> |    0   58  0.00   0 0.00   0.00   0 0.00   1  0  0  1 98
> |
> | # vmstat -c 10 -w 1
> |  procs    memory       page                    disks    traps
cpu
> |  r b w    avm     fre  flt  re  pi  po  fr  sr cd0 sd0  int   sys   cs us
sy
> | id
> |  1 1 0 243420 2866736  655   0   0   0   0   0   0   1   15  1828   77  2
1
> | 97
> |  0 1 0 243636 2866336  234   0   0   0   0   0   0   0   10   540   47  0
1
> | 99
> |  0 1 0 243668 2866304   95   0   0   0   0   0   0   0   17   329   44  1
0
> | 99
> |  0 1 0 242848 2867552  644   0   0   0   0   0   0   0    8  1445  115  1
1
> | 98
> |  0 1 0 243612 2866352 1076   0   0   0   0   0   0   0    9  2436   44  0
2
> | 98
> |  0 1 0 243668 2866288  117   0   0   0   0   0   0   0    7   369   46  1
1
> | 98
> |  0 1 0 243836 2866112  337   0   0   0   0   0   0   0    7   818   86  0
1
> | 99
> |  0 1 0 243428 2866728 1216   0   0   0   0   0   0   0   11  2920   69  1
2
> | 97
> |  0 1 0 243640 2866332  212   0   0   0   0   0   0   0    6   313   38  1
0
> | 99
> |  0 1 0 243684 2866284   96   0   0   0   0   0   0   0    8   334   48  1
0
> | 99
> |
>
> --
>> ++++++++[<++++++++++>-]<+++++++.>+++[<------>-]<.>+++[<+
> +++++++++++>-]<.>++[<------------>-]<+.--------------.[-]
>                 http://www.weirdnet.nl/

Reply | Threaded
Open this post in threaded view
|

Re: I don't get where the load comes from

Tony Abernethy
Joel Carnat wrote
>well, compared to my previous box, running NetBSD/xen, the same services
>and showing about 0.3-0.6 of load ; I thought a load of 1.21 was quite much.

Different systems will agree on the spelling of the word load.
That is about as much agreement as you can expect.
Does the 0.3-0.6 really mean 30-60 percent loaded?
1.21 tasks seems kinda low for a multi-tasking system.

Reply | Threaded
Open this post in threaded view
|

Re: I don't get where the load comes from

Joel Carnat
In reply to this post by Gonzalo L. Rodriguez
Le 31 mai 2011 ` 02:19, Gonzalo L. R. a icrit :
> Take a look of this
>
> http://undeadly.org/cgi?action=article&sid=20090715034920

I found this article before posting.

But one thing that didn't convinced me is that, if I shutdown apmd and
configure hw.setperf=100, the load drops down to 0.30-0.20.

I don't get how "A high load is just that: high. It means you have a lot
of processes that sometimes run." can show load variation depending on
CPU speed only.

>
> El 05/30/11 18:44, Joel Carnat escribis:
>> Hi,
>>
>> I am running a personal Mail+Web system on a Core2Duo 2GHz using
Speedstep.

>> It is mostly doing nothing but still has a high load average.
>>
>> I've check various stat tools but didn't find the reason for the load.
>>
>> Anyone has ideas?
>>
>> TIA,
>> Jo
>>
>> PS: here are some of the results I checked.
>>
>> # uname -a
>> OpenBSD bagheera.tumfatig.net 4.9 GENERIC.MP#819 amd64
>>
>> # sysctl hw
>> hw.machine=amd64
>> hw.model=Intel(R) Core(TM)2 Duo CPU T7300 @ 2.00GHz
>> hw.ncpu=2
>> hw.byteorder=1234
>> hw.pagesize=4096
>> hw.disknames=cd0:,sd0:01d3664288919ae7
>> hw.diskcount=2
>> hw.sensors.cpu0.temp0=45.00 degC
>> hw.sensors.cpu1.temp0=45.00 degC
>> hw.sensors.acpitz0.temp0=45.50 degC (zone temperature)
>> hw.sensors.acpiac0.indicator0=On (power supply)
>> hw.sensors.acpibat0.volt0=11.10 VDC (voltage)
>> hw.sensors.acpibat0.volt1=12.71 VDC (current voltage)
>> hw.sensors.acpibat0.amphour0=4.61 Ah (last full capacity)
>> hw.sensors.acpibat0.amphour1=0.52 Ah (warning capacity)
>> hw.sensors.acpibat0.amphour2=0.16 Ah (low capacity)
>> hw.sensors.acpibat0.amphour3=5.20 Ah (remaining capacity), OK
>> hw.sensors.acpibat0.raw0=0 (battery full), OK
>> hw.sensors.acpibat0.raw1=1 (rate)
>> hw.cpuspeed=800
>> hw.setperf=0
>> hw.vendor=Dell Inc.
>> hw.product=XPS M1330
>> hw.serialno=CK0W33J
>> hw.uuid=44454c4c-4b00-1030-8057-c3c04f33334a
>> hw.physmem=3747008512
>> hw.usermem=3734933504
>> hw.ncpufound=2
>>
>> # top -n -o cpu -T
>> load averages:  1.19,  1.14,  0.99    bagheera.tumfatig.net 23:39:09
>> 78 processes:  77 idle, 1 on processor
>> CPU0 states:  1.8% user,  0.0% nice,  0.7% system,  0.1% interrupt, 97.4%
>> idle
>> CPU1 states:  2.4% user,  0.0% nice,  0.8% system,  0.0% interrupt, 96.8%
>> idle
>> Memory: Real: 238M/656M act/tot  Free: 2809M  Swap: 0K/8197M used/tot
>>
>>   PID USERNAME PRI NICE  SIZE   RES STATE     WAIT      TIME    CPU
COMMAND

>>  3230 root       2    0 2156K 3152K sleep/1   netio     0:00  0.20% sshd
>>  1867 sshd       2    0 2148K 2368K sleep/0   select    0:00  0.05% sshd
>> 19650 www       14    0 5640K   30M sleep/0   semwait   0:59  0.00% httpd
>>  4225 www       14    0 5984K   42M sleep/1   semwait   0:58  0.00% httpd
>>  3624 www       14    0 5644K   30M sleep/1   semwait   0:53  0.00% httpd
>> 24875 www       14    0 5740K   32M sleep/1   semwait   0:52  0.00% httpd
>> 22848 www       14    0 5724K   30M sleep/1   semwait   0:50  0.00% httpd
>> 13508 www       14    0 5832K   31M sleep/1   semwait   0:48  0.00% httpd
>> 24210 www       14    0 5652K   30M sleep/1   semwait   0:48  0.00% httpd
>>   510 www       14    0 5660K   30M sleep/1   semwait   0:46  0.00% httpd
>> 20258 www        2    0 5536K   32M sleep/0   select    0:46  0.00% httpd
>>  6543 www       14    0 5772K   32M sleep/0   semwait   0:43  0.00% httpd
>>  9783 _mysql     2    0   55M   30M sleep/1   poll      0:20  0.00% mysqld
>> 19071 root       2    0  640K 1416K sleep/1   select    0:09  0.00% sshd
>> 10389 root       2    0 3376K 2824K sleep/0   poll      0:07  0.00% monit
>> 21695 _sogo      2    0 7288K   18M sleep/1   poll      0:05  0.00% sogod
>>  1888 named      2    0   20M   21M sleep/1   select    0:05  0.00% named
>> 18781 _sogo      2    0   15M   29M sleep/1   poll      0:04  0.00% sogod
>>
>> # iostat -c 10 -w 1
>>       tty            cd0             sd0             cpu
>>  tin tout  KB/t t/s MB/s   KB/t t/s MB/s  us ni sy in id
>>    0    7  0.00   0 0.00  20.64   7 0.14   2  0  1  0 97
>>    0  174  0.00   0 0.00   0.00   0 0.00   0  0  0  0100
>>    0   57  0.00   0 0.00   0.00   0 0.00   1  0  2  0 97
>>    0   57  0.00   0 0.00  32.00  17 0.53   1  0  1  0 98
>>    0   58  0.00   0 0.00   0.00   0 0.00   7  0  7  0 86
>>    0   57  0.00   0 0.00   0.00   0 0.00   1  0  1  0 98
>>    0   57  0.00   0 0.00   0.00   0 0.00   1  0  1  0 98
>>    0   57  0.00   0 0.00   0.00   0 0.00   2  0  0  0 98
>>    0   57  0.00   0 0.00   4.00   1 0.00   0  0  1  0 99
>>    0   58  0.00   0 0.00   0.00   0 0.00   1  0  0  1 98
>>
>> # vmstat -c 10 -w 1
>>  procs    memory       page                    disks    traps          cpu
>>  r b w    avm     fre  flt  re  pi  po  fr  sr cd0 sd0  int   sys   cs us
sy
>> id
>>  1 1 0 243420 2866736  655   0   0   0   0   0   0   1   15  1828   77  2
1
>> 97
>>  0 1 0 243636 2866336  234   0   0   0   0   0   0   0   10   540   47  0
1
>> 99
>>  0 1 0 243668 2866304   95   0   0   0   0   0   0   0   17   329   44  1
0
>> 99
>>  0 1 0 242848 2867552  644   0   0   0   0   0   0   0    8  1445  115  1
1
>> 98
>>  0 1 0 243612 2866352 1076   0   0   0   0   0   0   0    9  2436   44  0
2
>> 98
>>  0 1 0 243668 2866288  117   0   0   0   0   0   0   0    7   369   46  1
1
>> 98
>>  0 1 0 243836 2866112  337   0   0   0   0   0   0   0    7   818   86  0
1
>> 99
>>  0 1 0 243428 2866728 1216   0   0   0   0   0   0   0   11  2920   69  1
2
>> 97
>>  0 1 0 243640 2866332  212   0   0   0   0   0   0   0    6   313   38  1
0
>> 99
>>  0 1 0 243684 2866284   96   0   0   0   0   0   0   0    8   334   48  1
0
>> 99
>>
>
> --
> Sending from my Computer.

Reply | Threaded
Open this post in threaded view
|

Re: I don't get where the load comes from

Tony Abernethy
Joel Carnat wrote:

>But one thing that didn't convinced me is that, if I shutdown apmd and
>configure hw.setperf=100, the load drops down to 0.30-0.20.

>I don't get how "A high load is just that: high. It means you have a lot
>of processes that sometimes run." can show load variation depending on
>CPU speed only.

Actually that should convince you that the numbers do not mean much.
You are measuring the difference between just barely being counted
and just barely not being counted.

Reply | Threaded
Open this post in threaded view
|

Re: I don't get where the load comes from

Joel Carnat
In reply to this post by Tony Abernethy
Le 31 mai 2011 ` 08:10, Tony Abernethy a icrit :
> Joel Carnat wrote
>> well, compared to my previous box, running NetBSD/xen, the same services
>> and showing about 0.3-0.6 of load ; I thought a load of 1.21 was quite
much.
>
> Different systems will agree on the spelling of the word load.
> That is about as much agreement as you can expect.
> Does the 0.3-0.6 really mean 30-60 percent loaded?

As far as I understood the counters on my previous nbsd box, 0.3 meant that
the
cpu was used at 30% of it's total capacity. Then, looking at the sys/user
counters,
I'd see what kind of things the system was doing.

> 1.21 tasks seems kinda low for a multi-tasking system.

ok :)

Reply | Threaded
Open this post in threaded view
|

Re: I don't get where the load comes from

Francois Pussault-2
Hi all,

load is not realy a cpu usage %.
In facts it is sum of many % (cpu real load, memory, buffers, etc...)
that explain why load can up over 5.0 for each cpu without any crash or freeze
of the host.

we should consider load as a "host" ressources %... this is not real of course
but this is more real, than considering it as only cpu use.

For example, in facts, all my machines run permanently about 1.1 or 1.2 and
sometimes for a short time
(few minutes) goes up to 2.5 to 3.0 of load.
so I don't worry, before 5.0, we should not worry about that.

regards

> ----------------------------------------
> From: Joel Carnat <[hidden email]>
> Sent: Tue May 31 09:10:59 CEST 2011
> To: Tony Abernethy <[hidden email]>
> Subject: Re: I don't get where the load comes from
>
>
> Le 31 mai 2011 ` 08:10, Tony Abernethy a icrit :
> > Joel Carnat wrote
> >> well, compared to my previous box, running NetBSD/xen, the same services
> >> and showing about 0.3-0.6 of load ; I thought a load of 1.21 was quite
> much.
> >
> > Different systems will agree on the spelling of the word load.
> > That is about as much agreement as you can expect.
> > Does the 0.3-0.6 really mean 30-60 percent loaded?
>
> As far as I understood the counters on my previous nbsd box, 0.3 meant that
> the
> cpu was used at 30% of it's total capacity. Then, looking at the sys/user
> counters,
> I'd see what kind of things the system was doing.
>
> > 1.21 tasks seems kinda low for a multi-tasking system.
>
> ok :)
>


Cordialement
Francois Pussault
3701 - 8 rue Marcel Pagnol
31100 ToulouseB
FranceB
+33 6 17 230 820 B  +33 5 34 365 269
[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: I don't get where the load comes from

Abel Abraham Camarillo Ojeda-2
On Tue, May 31, 2011 at 2:24 AM, Francois Pussault
<[hidden email]> wrote:

>
> load is not realy a cpu usage %.
> In facts it is sum of many % (cpu real load, memory, buffers, etc...)
> that explain why load can up over 5.0 for each cpu without any crash or freeze
> of the host.
>
> we should consider load as a "host" ressources %... this is not real of course
> but this is more real, than considering it as only cpu use.
>
>

"The load average numbers give the number of jobs in the run queue averaged
over 1, 5, and 15 minutes...."

from top(1).

Reply | Threaded
Open this post in threaded view
|

Re: I don't get where the load comes from

Sean Kamath
On May 31, 2011, at 12:33 AM, Abel Abraham Camarillo Ojeda wrote:

> On Tue, May 31, 2011 at 2:24 AM, Francois Pussault
> <[hidden email]> wrote:
>>
>> load is not realy a cpu usage %.
>> In facts it is sum of many % (cpu real load, memory, buffers, etc...)
>> that explain why load can up over 5.0 for each cpu without any crash or
freeze
>> of the host.
>>
>> we should consider load as a "host" ressources %... this is not real of
course
>> but this is more real, than considering it as only cpu use.
>>
>>
>
> "The load average numbers give the number of jobs in the run queue averaged
> over 1, 5, and 15 minutes...."
>
> from top(1).
>

As was mentioned earlier, no two systems agree on what "load average" is.

Making statements about it for a particular system should be based on the code
for that system.

Some systems count processes "runnable" if only the NFS back-end-storage were
available to page in the file.  Other systems say it's in a wait state.  The
former can easily lead to load averages in the 100s (or more) with a a CPU
idling at 99% (because everything's waiting on NFS).

Some systems don't even agree on what it means to "average".

Load Averages generally suck as a metric for system "business".  Look at
interrupts and CPU time -- they're what matter.  If you want to break out CPU
beyond "system", "user" and "idle", you can do that, too.

Sean

Reply | Threaded
Open this post in threaded view
|

Re: I don't get where the load comes from

Francois Pussault-2
So it is why I mentioned it is not real but a user-land approach of it can be
understood.

> ----------------------------------------
> From: Sean Kamath <[hidden email]>
> Sent: Tue May 31 11:07:46 CEST 2011
> To: Misc OpenBSD <[hidden email]>
> Subject: Re: I don't get where the load comes from
>
>
> On May 31, 2011, at 12:33 AM, Abel Abraham Camarillo Ojeda wrote:
>
> > On Tue, May 31, 2011 at 2:24 AM, Francois Pussault
> > <[hidden email]> wrote:
> >>
> >> load is not realy a cpu usage %.
> >> In facts it is sum of many % (cpu real load, memory, buffers, etc...)
> >> that explain why load can up over 5.0 for each cpu without any crash or
> freeze
> >> of the host.
> >>
> >> we should consider load as a "host" ressources %... this is not real of
> course
> >> but this is more real, than considering it as only cpu use.
> >>
> >>
> >
> > "The load average numbers give the number of jobs in the run queue
averaged
> > over 1, 5, and 15 minutes...."
> >
> > from top(1).
> >
>
> As was mentioned earlier, no two systems agree on what "load average" is.
>
> Making statements about it for a particular system should be based on the
code
> for that system.
>
> Some systems count processes "runnable" if only the NFS back-end-storage
were
> available to page in the file.  Other systems say it's in a wait state.
The
> former can easily lead to load averages in the 100s (or more) with a a CPU
> idling at 99% (because everything's waiting on NFS).
>
> Some systems don't even agree on what it means to "average".
>
> Load Averages generally suck as a metric for system "business".  Look at
> interrupts and CPU time -- they're what matter.  If you want to break out
CPU
> beyond "system", "user" and "idle", you can do that, too.
>
> Sean
>


Cordialement
Francois Pussault
3701 - 8 rue Marcel Pagnol
31100 ToulouseB
FranceB
+33 6 17 230 820 B  +33 5 34 365 269
[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: I don't get where the load comes from

Artur Grabowski
In reply to this post by Francois Pussault-2
On Tue, May 31, 2011 at 9:24 AM, Francois Pussault
<[hidden email]> wrote:
> Hi all,
>
> load is not realy a cpu usage %.
> In facts it is sum of many % (cpu real load, memory, buffers, etc...)

No, it isn't.

> we should consider load as a "host" ressources %...

No, we shouldn't.

The load average is a decaying average of the number of processes in
the runnable state or currently running on a cpu or in the process of
being forked or that have spent less than a second in a sleep state
with sleep priority lower than PZERO, which includes waiting for
memory resources, disk I/O, filesystem locks and a bunch of other
things. You could say it's a very vague estimate of how much work the
cpu might need to be doing soon, maybe. Or it could be completely
wrong because of sampling bias. It's not very important so it's not
really critical for the system to do a good job guessing this number,
so the system doesn't really try too hard.

This number may tell you something useful, or it might be totally
misleading. Or both.

//art

//art

Reply | Threaded
Open this post in threaded view
|

Re: I don't get where the load comes from

Benny Lofgren
On 2011-05-31 14.45, Artur Grabowski wrote:

> The load average is a decaying average of the number of processes in
> the runnable state or currently running on a cpu or in the process of
> being forked or that have spent less than a second in a sleep state
> with sleep priority lower than PZERO, which includes waiting for
> memory resources, disk I/O, filesystem locks and a bunch of other
> things. You could say it's a very vague estimate of how much work the
> cpu might need to be doing soon, maybe. Or it could be completely
> wrong because of sampling bias. It's not very important so it's not
> really critical for the system to do a good job guessing this number,
> so the system doesn't really try too hard.
>
> This number may tell you something useful, or it might be totally
> misleading. Or both.

One thing that often bites me in the butt is that cron relies on the
load average to decide if it should let batch(1) jobs run or not.

The default is if cron sees a loadavg > 1.5 it keeps the batch job
enqueued until it drops below that value. As I often see much, much
higher loads on my systems, invariably I find myself wondering why my
batch jobs never finish, just to discover that they have yet to run.
*duh*

So whenever I remember to, on every new system I set up I configure a
different load threshold value for cron. But I tend to forget, so...
:-)

I have no really good suggestion for how else cron should handle this,
otherwise I would have submitted a patch ages ago...


Regards,
/Benny

--
internetlabbet.se     / work:   +46 8 551 124 80      / "Words must
Benny Lvfgren        /  mobile: +46 70 718 11 90     /   be weighed,
                    /   fax:    +46 8 551 124 89    /    not counted."
                   /    email:  benny -at- internetlabbet.se

Reply | Threaded
Open this post in threaded view
|

Re: I don't get where the load comes from

Joel Wirāmu Pauling
In reply to this post by Joel Carnat
Load is generally a measure of a single processor core utilization over an
kernel dependent time range.

Generally as others have pointed out being a very broad (not as in meadow,
as in continent). Different OS's report load very differently from each
other today.

Traditionally you would see a load average of 1-2 on a multicore system (I
am talking HP-UX X client servers etc of the early 90's vintage). a Load
average of 1 means a single core of the system is being utilized close to
100% of the time.

On dual core systems a load average of 1 should be absolutely no cause for
concern.

Linux has moved away from reporting load average as a percentage of a single
core time in recent days for precisely this reason, people see a load of 1
and think there systems are esploding.

In the traditional mold todays processors should in theory get loads of 4-7
and still be responsive...



On 31 May 2011 19:10, Joel Carnat <[hidden email]> wrote:

> Le 31 mai 2011 ` 08:10, Tony Abernethy a icrit :
> > Joel Carnat wrote
> >> well, compared to my previous box, running NetBSD/xen, the same services
> >> and showing about 0.3-0.6 of load ; I thought a load of 1.21 was quite
> much.
> >
> > Different systems will agree on the spelling of the word load.
> > That is about as much agreement as you can expect.
> > Does the 0.3-0.6 really mean 30-60 percent loaded?
>
> As far as I understood the counters on my previous nbsd box, 0.3 meant that
> the
> cpu was used at 30% of it's total capacity. Then, looking at the sys/user
> counters,
> I'd see what kind of things the system was doing.
>
> > 1.21 tasks seems kinda low for a multi-tasking system.
>
> ok :)

Reply | Threaded
Open this post in threaded view
|

Re: I don't get where the load comes from

Benny Lofgren
On 2011-06-01 15.12, Joel Wiramu Pauling wrote:
> Load is generally a measure of a single processor core utilization over an
> kernel dependent time range.

No it isn't. You have totally misunderstood what the load average is.

> Generally as others have pointed out being a very broad (not as in meadow,
> as in continent). Different OS's report load very differently from each
> other today.

That one's sort of correct, although I've yet to see an OS where the load
doesn't in some way refer to an *average* *count* *of* *processes*.

> Traditionally you would see a load average of 1-2 on a multicore system (I
> am talking HP-UX X client servers etc of the early 90's vintage). a Load
> average of 1 means a single core of the system is being utilized close to
> 100% of the time.

No, no, no. Absolutely *NOT*. It doesn't reflect CPU usage at all.

And it never have. The load average must be the single most misunderstood
kernel metric there have ever been in the history of unix systems.

Very simplified it reflects the *number* *of* *processes* in a runnable
state,
averaged over some time. Not necessarily processes actually on core,
mind you,
but the number of processes *wanting* to run.

Now, a process can be in a runnable state for a variety of reasons, and
there
is for example nothing that says it even needs to use up its alloted time
slice when actually running, but it still counts as runnable. It can be
runnable when waiting for a system resource; then it consumes *no* CPU
cycles
at all, but it still counts towards the load average.

> On dual core systems a load average of 1 should be absolutely no cause for
> concern.

I routinely see load averages of 30-40-50, upwards of 100 on some of my
systems. They run absolutely smooth and beautiful, with no noticable lag
or delays. The processors may be near idling, they may be doing some work,
it varies, but it is nothing I can tell from the load average alone.

> Linux has moved away from reporting load average as a percentage of a single
> core time in recent days for precisely this reason, people see a load of 1
> and think there systems are esploding.
>
> In the traditional mold todays processors should in theory get loads of 4-7
> and still be responsive...

I'm sorry to say, but your entire text is based on a misunderstanding of
what
the load average really is, so the above sentences are totally irrelevant.


Regards,
/Benny


> On 31 May 2011 19:10, Joel Carnat <[hidden email]> wrote:
>
>> Le 31 mai 2011 ` 08:10, Tony Abernethy a icrit :
>>> Joel Carnat wrote
>>>> well, compared to my previous box, running NetBSD/xen, the same services
>>>> and showing about 0.3-0.6 of load ; I thought a load of 1.21 was quite
>> much.
>>>
>>> Different systems will agree on the spelling of the word load.
>>> That is about as much agreement as you can expect.
>>> Does the 0.3-0.6 really mean 30-60 percent loaded?
>>
>> As far as I understood the counters on my previous nbsd box, 0.3 meant that
>> the
>> cpu was used at 30% of it's total capacity. Then, looking at the sys/user
>> counters,
>> I'd see what kind of things the system was doing.
>>
>>> 1.21 tasks seems kinda low for a multi-tasking system.
>>
>> ok :)
>

--
internetlabbet.se     / work:   +46 8 551 124 80      / "Words must
Benny LC6fgren        /  mobile: +46 70 718 11 90     /   be weighed,
                    /   fax:    +46 8 551 124 89    /    not counted."
                   /    email:  benny -at- internetlabbet.se

Reply | Threaded
Open this post in threaded view
|

Re: I don't get where the load comes from

Joel Wirāmu Pauling
On 2 June 2011 01:41, Benny Lofgren <[hidden email]> wrote:

> On 2011-06-01 15.12, Joel Wiramu Pauling wrote:
> > Load is generally a measure of a single processor core utilization over
> an
> > kernel dependent time range.
>
> No it isn't. You have totally misunderstood what the load average is.
>
> > Generally as others have pointed out being a very broad (not as in
> meadow,
> > as in continent). Different OS's report load very differently from each
> > other today.
>
> That one's sort of correct, although I've yet to see an OS where the load
> doesn't in some way refer to an *average* *count* *of* *processes*.
>
> > Traditionally you would see a load average of 1-2 on a multicore system
> (I
> > am talking HP-UX X client servers etc of the early 90's vintage). a Load
> > average of 1 means a single core of the system is being utilized close to
> > 100% of the time.
>
> No, no, no. Absolutely *NOT*. It doesn't reflect CPU usage at all.
>
> And it never have. The load average must be the single most misunderstood
> kernel metric there have ever been in the history of unix systems.
>
> Very simplified it reflects the *number* *of* *processes* in a runnable
> state,
> averaged over some time. Not necessarily processes actually on core,
> mind you,
> but the number of processes *wanting* to run.
>
> Now, a process can be in a runnable state for a variety of reasons, and
> there
> is for example nothing that says it even needs to use up its alloted time
> slice when actually running, but it still counts as runnable. It can be
> runnable when waiting for a system resource; then it consumes *no* CPU
> cycles
> at all, but it still counts towards the load average.
>
> > On dual core systems a load average of 1 should be absolutely no cause
> for
> > concern.
>
> I routinely see load averages of 30-40-50, upwards of 100 on some of my
> systems. They run absolutely smooth and beautiful, with no noticable lag
> or delays. The processors may be near idling, they may be doing some work,
> it varies, but it is nothing I can tell from the load average alone.
>
> > Linux has moved away from reporting load average as a percentage of a
> single
> > core time in recent days for precisely this reason, people see a load of
> 1
> > and think there systems are esploding.
> >
> > In the traditional mold todays processors should in theory get loads of
> 4-7
> > and still be responsive...
>
> I'm sorry to say, but your entire text is based on a misunderstanding of
> what
> the load average really is, so the above sentences are totally irrelevant.
>
>

I agree with what you are saying, and I worded this quite badly, the frame I
was trying to setup was "back in the day" when multi-user meant something
(VAX/PDP) - the load average WAS tied to core utilization - as you would
queue a job, and it would go into the queue and there would be lots of stuff
in the queue and the load average would bumo, because there wasn't much core
to go around.

That hasn't been the case for a very very long time and once we entered the
age of multi-tasking load become unintuitive.

Point being it's an indication of something today that isn't at all
intuitive.

Sorry for muddying the waters even more, my fuck up.


> > On 31 May 2011 19:10, Joel Carnat <[hidden email]> wrote:
> >
> >> Le 31 mai 2011 ` 08:10, Tony Abernethy a icrit :
> >>> Joel Carnat wrote
> >>>> well, compared to my previous box, running NetBSD/xen, the same
> services
> >>>> and showing about 0.3-0.6 of load ; I thought a load of 1.21 was quite
> >> much.
> >>>
> >>> Different systems will agree on the spelling of the word load.
> >>> That is about as much agreement as you can expect.
> >>> Does the 0.3-0.6 really mean 30-60 percent loaded?
> >>
> >> As far as I understood the counters on my previous nbsd box, 0.3 meant
> that
> >> the
> >> cpu was used at 30% of it's total capacity. Then, looking at the
> sys/user
> >> counters,
> >> I'd see what kind of things the system was doing.
> >>
> >>> 1.21 tasks seems kinda low for a multi-tasking system.
> >>
> >> ok :)
> >
>
> --
> internetlabbet.se     / work:   +46 8 551 124 80      / "Words must
> Benny LC6fgren        /  mobile: +46 70 718 11 90     /   be weighed,
>                     /   fax:    +46 8 551 124 89    /    not counted."
>                   /    email:  benny -at- internetlabbet.se

Reply | Threaded
Open this post in threaded view
|

Re: I don't get where the load comes from

LeviaComm Networks NOC
In reply to this post by Benny Lofgren
On 01-Jun-11 05:46, Benny Lofgren wrote:

> On 2011-05-31 14.45, Artur Grabowski wrote:
>> The load average is a decaying average of the number of processes in
>> the runnable state or currently running on a cpu or in the process of
>> being forked or that have spent less than a second in a sleep state
>> with sleep priority lower than PZERO, which includes waiting for
>> memory resources, disk I/O, filesystem locks and a bunch of other
>> things. You could say it's a very vague estimate of how much work the
>> cpu might need to be doing soon, maybe. Or it could be completely
>> wrong because of sampling bias. It's not very important so it's not
>> really critical for the system to do a good job guessing this number,
>> so the system doesn't really try too hard.
>>
>> This number may tell you something useful, or it might be totally
>> misleading. Or both.
>
> One thing that often bites me in the butt is that cron relies on the
> load average to decide if it should let batch(1) jobs run or not.
>
> The default is if cron sees a loadavg>  1.5 it keeps the batch job
> enqueued until it drops below that value. As I often see much, much
> higher loads on my systems, invariably I find myself wondering why my
> batch jobs never finish, just to discover that they have yet to run.
> *duh*
>
> So whenever I remember to, on every new system I set up I configure a
> different load threshold value for cron. But I tend to forget, so...
> :-)
>
> I have no really good suggestion for how else cron should handle this,
> otherwise I would have submitted a patch ages ago...
>

I had tinkered with a solution for this:
Cron wakes up a minute before the batch run is scheduled to run.  Cron
will then copy a random 4kb sector from the hard disk to RAM, then run
either an MD5 or SHA hash against it.  The whole process would be timed
and if it completed within a a reasonable amount of time for the system
then it would kick off a batch job

This was the easiest way I thought of measuring the actual performance
of the system at any given time since it measures the entire system and
closely emulates actual work.

While this isn't really the right thing to do, I found it to be the most
effective on my systems.

Reply | Threaded
Open this post in threaded view
|

Re: I don't get where the load comes from

Christiano F. Haesbaert
On 1 June 2011 11:01, LeviaComm Networks <[hidden email]> wrote:

> On 01-Jun-11 05:46, Benny Lofgren wrote:
>>
>> On 2011-05-31 14.45, Artur Grabowski wrote:
>>>
>>> The load average is a decaying average of the number of processes in
>>> the runnable state or currently running on a cpu or in the process of
>>> being forked or that have spent less than a second in a sleep state
>>> with sleep priority lower than PZERO, which includes waiting for
>>> memory resources, disk I/O, filesystem locks and a bunch of other
>>> things. You could say it's a very vague estimate of how much work the
>>> cpu might need to be doing soon, maybe. Or it could be completely
>>> wrong because of sampling bias. It's not very important so it's not
>>> really critical for the system to do a good job guessing this number,
>>> so the system doesn't really try too hard.
>>>
>>> This number may tell you something useful, or it might be totally
>>> misleading. Or both.
>>
>> One thing that often bites me in the butt is that cron relies on the
>> load average to decide if it should let batch(1) jobs run or not.
>>
>> The default is if cron sees a loadavg>  1.5 it keeps the batch job
>> enqueued until it drops below that value. As I often see much, much
>> higher loads on my systems, invariably I find myself wondering why my
>> batch jobs never finish, just to discover that they have yet to run.
>> *duh*
>>
>> So whenever I remember to, on every new system I set up I configure a
>> different load threshold value for cron. But I tend to forget, so...
>> :-)
>>
>> I have no really good suggestion for how else cron should handle this,
>> otherwise I would have submitted a patch ages ago...
>>
>
> I had tinkered with a solution for this:
> Cron wakes up a minute before the batch run is scheduled to run.  Cron will
> then copy a random 4kb sector from the hard disk to RAM, then run either an
> MD5 or SHA hash against it.  The whole process would be timed and if it
> completed within a a reasonable amount of time for the system then it would
> kick off a batch job
>
> This was the easiest way I thought of measuring the actual performance of
> the system at any given time since it measures the entire system and
closely
> emulates actual work.
>
> While this isn't really the right thing to do, I found it to be the most
> effective on my systems.
>
>

You really think cron should be doing it's own calculation ? I don't
like that *at all*.

Can't we just have a higher default threshold for cron ?
Can't we default to 0 ?

I think this is something that should be looked up, if we admit load
average is a shitty measure, we shouldn't rely on it for running cron
jobs.

I hereby vote for default to 0. (Thank god this isn't a democracy :-) )

Reply | Threaded
Open this post in threaded view
|

Re: I don't get where the load comes from

Benny Lofgren
In reply to this post by Joel Wirāmu Pauling
On 2011-06-01 15.53, Joel Wiramu Pauling wrote:
> On 2 June 2011 01:41, Benny Lofgren <[hidden email]
> <mailto:[hidden email]>> wrote:
> I agree with what you are saying, and I worded this quite badly, the
> frame I was trying to setup was "back in the day" when multi-user meant
> something (VAX/PDP) - the load average WAS tied to core utilization - as
> you would queue a job, and it would go into the queue and there would be
> lots of stuff in the queue and the load average would bumo, because
> there wasn't much core to go around.

Not wanting to turn this into a pissing contest, I still have to say that
you are fundamentally wrong about this. I'm sorry, but what you are saying
simply is not correct.

I've worked in-depth on just about every unixlike architecture there is
since I started out in this business back in 1983, and on every single
one (that employed it at all) the load average concept has worked
similarly to how I described it in my previous mail. (Not always EXACTLY
alike, but the general principle have always been the same.)

The reason I'm so adamant about this is that the interpretation of the
load average metric truly is one of the longest-standing misconceptions
about the finer points of unix system administration there is, and if
this discussion thread can set just one individual straight about it
then it is worth the extra mail bandwidth. :-)

One only needs to look at all of the very confident, yet dead-wrong,
answers to the OP:s question in this thread to realize that it is
indeed a confusing subject. And the importance of getting it straightened
out cannot be overstated. I've long ago lost count of the number of
times I've been called in to "fix" a problem with high system loads
only to find that the only metric used to determine that is... yes,
the load average. I wonder how much money have been wasted over the
years trying to throw hardware on what might not even have been a
problem in the first place...


Regards,
/Benny



> That hasn't been the case for a very very long time and once we entered
> the age of multi-tasking load become unintuitive.
>
> Point being it's an indication of something today that isn't at all
> intuitive.
>
> Sorry for muddying the waters even more, my fuck up.
>
>
>     > On 31 May 2011 19:10, Joel Carnat <[hidden email]
>     <mailto:[hidden email]>> wrote:
>     >
>     >> Le 31 mai 2011 ` 08:10, Tony Abernethy a icrit :
>     >>> Joel Carnat wrote
>     >>>> well, compared to my previous box, running NetBSD/xen, the same
>     services
>     >>>> and showing about 0.3-0.6 of load ; I thought a load of 1.21 was
>     quite
>     >> much.
>     >>>
>     >>> Different systems will agree on the spelling of the word load.
>     >>> That is about as much agreement as you can expect.
>     >>> Does the 0.3-0.6 really mean 30-60 percent loaded?
>     >>
>     >> As far as I understood the counters on my previous nbsd box, 0.3
>     meant that
>     >> the
>     >> cpu was used at 30% of it's total capacity. Then, looking at the
>     sys/user
>     >> counters,
>     >> I'd see what kind of things the system was doing.
>     >>
>     >>> 1.21 tasks seems kinda low for a multi-tasking system.
>     >>
>     >> ok :)
>     >
>
>     --
>     internetlabbet.se <http://internetlabbet.se>     / work:   +46 8 551
>     124 80      / "Words must
>     Benny Lvfgren        /  mobile: +46 70 718 11 90     /   be weighed,
>                        /   fax:    +46 8 551 124 89    /    not counted."
>                       /    email:  benny -at- internetlabbet.se
>     <http://internetlabbet.se>
>
>

--
internetlabbet.se     / work:   +46 8 551 124 80      / "Words must
Benny Lvfgren        /  mobile: +46 70 718 11 90     /   be weighed,
                    /   fax:    +46 8 551 124 89    /    not counted."
                   /    email:  benny -at- internetlabbet.se

12