High system load with gkrellm on amd64

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

High system load with gkrellm on amd64

Laurence Tratt
On two of my amd64 machines (a desktop and a laptop), I have been plagued,
for several months, by seemingly random high system loads (100%). Sometimes
after a few minutes, sometimes after a couple of days, of use, my CPU useage
would effectively hit 100%, with top showing 2 CPU cores hitting 50% each on
"system".

For some time, I could not work out any way of solving this, short of
rebooting the machine in question. It now seems that the culprit is gkrellm:
killing it reduces the "system" load back to down to 0-3%.

Has anyone else seen this problem with gkrellm? I'm at a bit of a loss to
explain why it happens so randomly. It might, perhaps, be related to the
rthreads transition, but it's hard to be sure. The laptop dmesg is attached
at the end. If anyone wants further details, I'm happy to provide them.


Laurie
--
Personal                                             http://tratt.net/laurie/
The Converge programming language                      http://convergepl.org/
   https://github.com/ltratt              http://twitter.com/laurencetratt

OpenBSD 5.2-beta (GENERIC.MP) #345: Sun Jul  8 14:48:27 MDT 2012
    [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 8466853888 (8074MB)
avail mem = 8219103232 (7838MB)
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.6 @ 0xdae9c000 (66 entries)
bios0: vendor LENOVO version "8DET54WW (1.24 )" date 10/18/2011
bios0: LENOVO 4287CTO
acpi0 at bios0: rev 2
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP SLIC SSDT SSDT SSDT HPET APIC MCFG ECDT ASF! TCPA SSDT SSDT UEFI UEFI UEFI
acpi0: wakeup devices LID_(S3) SLPB(S3) IGBE(S4) EXP4(S4) EXP7(S4) EHC1(S3) EHC2(S3) HDEF(S4)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpihpet0 at acpi0: 14318179 Hz
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i5-2410M CPU @ 2.30GHz, 2292.91 MHz
cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,PCLMUL,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,x2APIC,POPCNT,AES,XSAVE,AVX,NXE,LONG,LAHF
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: apic clock running at 99MHz
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Core(TM) i5-2410M CPU @ 2.30GHz, 2292.56 MHz
cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,PCLMUL,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,x2APIC,POPCNT,AES,XSAVE,AVX,NXE,LONG,LAHF
cpu1: 256KB 64b/line 8-way L2 cache
ioapic0 at mainbus0: apid 2 pa 0xfec00000, version 20, 24 pins
acpimcfg0 at acpi0 addr 0xf8000000, bus 0-63
acpiec0 at acpi0
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus -1 (PEG_)
acpiprt2 at acpi0: bus 2 (EXP1)
acpiprt3 at acpi0: bus 3 (EXP2)
acpiprt4 at acpi0: bus 5 (EXP4)
acpiprt5 at acpi0: bus 13 (EXP5)
acpiprt6 at acpi0: bus -1 (EXP7)
acpicpu0 at acpi0: C3, C2, C1, PSS
acpicpu1 at acpi0: C3, C2, C1, PSS
acpipwrres0 at acpi0: PUBS
acpitz0 at acpi0: critical temperature is 99 degC
acpibtn0 at acpi0: LID_
acpibtn1 at acpi0: SLPB
acpibat0 at acpi0: BAT0 model "42T4861" serial 25468 type LION oem "SANYO"
acpibat1 at acpi0: BAT1 not present
acpiac0 at acpi0: AC unit offline
acpithinkpad0 at acpi0
acpidock0 at acpi0: GDCK not docked (0)
cpu0: Enhanced SpeedStep 2292 MHz: speeds: 2301, 2300, 2000, 1800, 1600, 1400, 1200, 1000, 800 MHz
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel Core 2G Host" rev 0x09
vga1 at pci0 dev 2 function 0 "Intel HD Graphics 3000" rev 0x09
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
intagp0 at vga1
agp0 at intagp0: aperture at 0xe0000000, size 0x10000000
inteldrm0 at vga1: apic 2 int 16
drm0 at inteldrm0
"Intel 6 Series MEI" rev 0x04 at pci0 dev 22 function 0 not configured
em0 at pci0 dev 25 function 0 "Intel 82579LM" rev 0x04: msi, address f0:de:f1:a7:e4:89
ehci0 at pci0 dev 26 function 0 "Intel 6 Series USB" rev 0x04: apic 2 int 16
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 "Intel EHCI root hub" rev 2.00/1.00 addr 1
azalia0 at pci0 dev 27 function 0 "Intel 6 Series HD Audio" rev 0x04: msi
azalia0: codecs: Conexant/0x506e, Intel/0x2805, using Conexant/0x506e
audio0 at azalia0
ppb0 at pci0 dev 28 function 0 "Intel 6 Series PCIE" rev 0xb4: msi
pci1 at ppb0 bus 2
ppb1 at pci0 dev 28 function 1 "Intel 6 Series PCIE" rev 0xb4: msi
pci2 at ppb1 bus 3
iwn0 at pci2 dev 0 function 0 "Intel Centrino Advanced-N 6205" rev 0x34: msi, MIMO 2T2R, MoW, address 08:11:96:cb:69:58
ppb2 at pci0 dev 28 function 3 "Intel 6 Series PCIE" rev 0xb4: msi
pci3 at ppb2 bus 5
ppb3 at pci0 dev 28 function 4 "Intel 6 Series PCIE" rev 0xb4: msi
pci4 at ppb3 bus 13
sdhc0 at pci4 dev 0 function 0 "Ricoh 5U823 SD/MMC" rev 0x04: apic 2 int 16
sdmmc0 at sdhc0
ehci1 at pci0 dev 29 function 0 "Intel 6 Series USB" rev 0x04: apic 2 int 23
usb1 at ehci1: USB revision 2.0
uhub1 at usb1 "Intel EHCI root hub" rev 2.00/1.00 addr 1
pcib0 at pci0 dev 31 function 0 "Intel QM67 LPC" rev 0x04
ahci0 at pci0 dev 31 function 2 "Intel 6 Series AHCI" rev 0x04: msi, AHCI 1.3
scsibus0 at ahci0: 32 targets
sd0 at scsibus0 targ 0 lun 0: <ATA, SAMSUNG SSD 830, CXM0> SCSI3 0/direct fixed naa.5002538043584d30
sd0: 122104MB, 512 bytes/sector, 250069680 sectors, thin
ichiic0 at pci0 dev 31 function 3 "Intel 6 Series SMBus" rev 0x04: apic 2 int 18
iic0 at ichiic0
spdmem0 at iic0 addr 0x50: 4GB DDR3 SDRAM PC3-12800 SO-DIMM
spdmem1 at iic0 addr 0x51: 4GB DDR3 SDRAM PC3-12800 SO-DIMM
isa0 at pcib0
isadma0 at isa0
pckbc0 at isa0 port 0x60/5
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pms0 at pckbc0 (aux slot)
pckbc0: using irq 12 for aux slot
wsmouse0 at pms0 mux 0
wsmouse1 at pms0 mux 0
pms0: Synaptics clickpad, firmware 8.0
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
aps0 at isa0 port 0x1600/31
mtrr: Pentium Pro MTRR support
uhub2 at uhub0 port 1 "Intel Rate Matching Hub" rev 2.00/0.00 addr 2
ugen0 at uhub2 port 4 "Broadcom Corp Broadcom Bluetooth Device" rev 2.00/7.48 addr 3
uvideo0 at uhub2 port 6 configuration 1 interface 0 "Chicony Electronics Co., Ltd. Integrated Camera" rev 2.00/8.54 addr 4
video0 at uvideo0
uhub3 at uhub1 port 1 "Intel Rate Matching Hub" rev 2.00/0.00 addr 2
vscsi0 at root
scsibus1 at vscsi0: 256 targets
softraid0 at root
scsibus2 at softraid0: 256 targets
root on sd0a (e8c623509d1f3143.a) swap on sd0b dump on sd0b

Reply | Threaded
Open this post in threaded view
|

Re: High system load with gkrellm on amd64

Stuart Henderson
On 2012/07/17 10:23, Laurence Tratt wrote:

> On two of my amd64 machines (a desktop and a laptop), I have been plagued,
> for several months, by seemingly random high system loads (100%). Sometimes
> after a few minutes, sometimes after a couple of days, of use, my CPU useage
> would effectively hit 100%, with top showing 2 CPU cores hitting 50% each on
> "system".
>
> For some time, I could not work out any way of solving this, short of
> rebooting the machine in question. It now seems that the culprit is gkrellm:
> killing it reduces the "system" load back to down to 0-3%.
>
> Has anyone else seen this problem with gkrellm? I'm at a bit of a loss to
> explain why it happens so randomly. It might, perhaps, be related to the
> rthreads transition, but it's hard to be sure. The laptop dmesg is attached
> at the end. If anyone wants further details, I'm happy to provide them.

It's worth attaching ktrace to the process when it's having this problem.

Ideally start gkrellm with LD_BIND_NOW defined to reduce some of the spam e.g.

LD_BIND_NOW= gkrellm

Then when it's looping, do:

ktrace -d -i -p $(pgrep gkrellm); sleep 1; ktrace -C
kdump > somefile.txt

And have a look and see what it's doing.

If this doesn't give anything interesting then probably break out GDB
and take a look there instead, but ktrace is likely to get something
useful more easily.

Reply | Threaded
Open this post in threaded view
|

Re: High system load with gkrellm on amd64

f.holop
In reply to this post by Laurence Tratt
hmm, on Tue, Jul 17, 2012 at 10:23:36AM +0100, Laurence Tratt said that
> For some time, I could not work out any way of solving this, short of
> rebooting the machine in question. It now seems that the culprit is gkrellm:
> killing it reduces the "system" load back to down to 0-3%.

i had something similar, and top(1) with threads clearly showed
that it was a gkrellm thread hogging the cpu.

in my experience this happened when the gtk and co. libraries
were updated while it was running.  killing and restarting
with all the new libs made it disappear

-f
--
new restaurant on the moon.  great food, no atmosphere.

Reply | Threaded
Open this post in threaded view
|

Re: High system load with gkrellm on amd64

Laurence Tratt
In reply to this post by Stuart Henderson
On Tue, Jul 17, 2012 at 11:42:05AM +0100, Stuart Henderson wrote:

>> On two of my amd64 machines (a desktop and a laptop), I have been plagued,
>> for several months, by seemingly random high system loads (100%).
>> Sometimes after a few minutes, sometimes after a couple of days, of use,
>> my CPU useage would effectively hit 100%, with top showing 2 CPU cores
>> hitting 50% each on "system".
>>
>> For some time, I could not work out any way of solving this, short of
>> rebooting the machine in question. It now seems that the culprit is
>> gkrellm: killing it reduces the "system" load back to down to 0-3%.
> It's worth attaching ktrace to the process when it's having this problem.

Thanks to Stuart for the guidelines, I was eventually able to catch a trace
on a bang upto date machine. When gkrellm goes nuts, the ktrace shows endless
numbers of the following:

 28660 gkrellm  CALL  sched_yield()
 28660 gkrellm  RET   sched_yield 0
 28660 gkrellm  CALL  sched_yield()
 28660 gkrellm  RET   sched_yield 0

A 1 second ktrace of gkrellm generates close to 3 million lines of output,
nearly all of it the above sequence. If anyone wants the full output, please
let me know.

Why does it do this? That I don't know!


Laurie
--
Personal                                             http://tratt.net/laurie/
The Converge programming language                      http://convergepl.org/
   https://github.com/ltratt              http://twitter.com/laurencetratt