pseudo-crash on OpenBSD 4.5

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

pseudo-crash on OpenBSD 4.5

bill234
Got a bit of an oddity with OpenBSD 4.5 - it's not quite a crash, but
close. It has happened 3 times now, usually after running flawlessly for
2-3 weeks.

Fully up to date with 4.5-stable, running GENERIC.MP on a Dell poweredge
R300 quad-core server with 4 gig ram (dmesg below). It's used as a
firewall/NAT/vpn gateway, and as an email server.

When the problem occurs, all services on the server stop responding
(pop,imap,smtp, etc).

The odd thing is that it does respond to ping, and the server still routes
traffic correctly, and the vpn is up.

The server console shows nothing out of the ordinary (white on black text
login prompt, no X11), but the console is frozen - doesn't respond to
keyboard.

Since it doesn't actually panic, I can't run the usual debug tools.

My only choice is to reboot.

This is my only quad-core server with 4 gig - I'm wondering if it's
related to GENERIC.MP or all the ram.

(I have many other openbsd 4.5 boxes, none have this issue, but they are
single-core and less than 3 gig ram)

Any suggestions? Dell has a bios upgrade, I'll give that a try.




OpenBSD 4.5-stable (GENERIC.MP) #2: Wed Nov  4 21:53:18 EST 2009
    username@servername:/usr/src/sys/arch/i386/compile/GENERIC.MP
cpu0: Intel(R) Xeon(R) CPU X3323 @ 2.50GHz ("GenuineIntel" 686-class) 2.51
GHz
cpu0:
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,EST,TM2,CX16,xTPR
real mem  = 3483598848 (3322MB)
avail mem = 3379777536 (3223MB)
mainbus0 at root
bios0 at mainbus0: AT/286+ BIOS, date 08/15/08, BIOS32 rev. 0 @ 0xfa520,
SMBIOS rev. 2.5 @ 0xcfb9c000 (55 entries)
bios0: vendor Dell Inc. version "1.3.0" date 08/15/2008
bios0: Dell Inc. PowerEdge R300
acpi0 at bios0: rev 2
acpi0: tables DSDT FACP APIC SPCR HPET MCFG WD__ SLIC ERST HEST BERT EINJ
TCPA
acpi0: wakeup devices PCI0(S5)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: apic clock running at 333MHz
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Intel(R) Xeon(R) CPU X3323 @ 2.50GHz ("GenuineIntel" 686-class) 2.50
GHz
cpu1:
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,EST,TM2,CX16,xTPR
cpu2 at mainbus0: apid 2 (application processor)
cpu2: Intel(R) Xeon(R) CPU X3323 @ 2.50GHz ("GenuineIntel" 686-class) 2.50
GHz
cpu2:
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,EST,TM2,CX16,xTPR
cpu3 at mainbus0: apid 3 (application processor)
cpu3: Intel(R) Xeon(R) CPU X3323 @ 2.50GHz ("GenuineIntel" 686-class) 2.50
GHz
cpu3:
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,EST,TM2,CX16,xTPR
ioapic0 at mainbus0: apid 4 pa 0xfec00000, version 20, 24 pins
ioapic0: misconfigured as apic 0, remapped to apid 4
acpihpet0 at acpi0: 14318179 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 5 (PEX4)
acpiprt2 at acpi0: bus 7 (PEX6)
acpiprt3 at acpi0: bus 1 (SBE4)
acpiprt4 at acpi0: bus 2 (SBE5)
acpiprt5 at acpi0: bus 10 (COMP)
acpicpu0 at acpi0: C3
acpicpu1 at acpi0: C3
acpicpu2 at acpi0: C3
acpicpu3 at acpi0: C3
bios0: ROM list: 0xc0000/0x9000 0xc9000/0x1000 0xca000/0x2000
0xcc000/0x5c00 0xec000/0x4000!
ipmi at mainbus0 not configured
cpu0: Enhanced SpeedStep disabled by BIOS
pci0 at mainbus0 bus 0: configuration mode 1 (bios)
pchb0 at pci0 dev 0 function 0 "Intel 5100 Host" rev 0x90
ppb0 at pci0 dev 2 function 0 "Intel 5100 PCIE" rev 0x90
pci1 at ppb0 bus 3
ppb1 at pci0 dev 3 function 0 "Intel 5100 PCIE" rev 0x90
pci2 at ppb1 bus 4
ppb2 at pci0 dev 4 function 0 "Intel 5100 PCIE" rev 0x90: apic 4 int 16
(irq 0)
pci3 at ppb2 bus 5
mpi0 at pci3 dev 0 function 0 "Symbios Logic SAS1068E" rev 0x08: apic 4
int 16 (irq 15)
scsibus0 at mpi0: 112 targets
sd0 at scsibus0 targ 0 lun 0: <Dell, VIRTUAL DISK, 1028> SCSI3 0/direct fixed
sd0: 476416MB, 512 bytes/sec, 975699968 sec total
ses0 at scsibus0 targ 8 lun 0: <DP, BACKPLANE, 1.05> SCSI3 13/enclosure
services fixed
ppb3 at pci0 dev 5 function 0 "Intel 5100 PCIE" rev 0x90
pci4 at ppb3 bus 6
ppb4 at pci0 dev 6 function 0 "Intel 5100 PCIE" rev 0x90: apic 4 int 16
(irq 0)
pci5 at ppb4 bus 7
bge0 at pci5 dev 0 function 0 "Broadcom BCM5722" rev 0x00, BCM5755 C0
(0xa200): apic 4 int 16 (irq 15), address 00:10:18:49:cd:f7
brgphy0 at bge0 phy 1: BCM5722 10/100/1000baseT PHY, rev. 0
ppb5 at pci0 dev 7 function 0 "Intel 5100 PCIE" rev 0x90
pci6 at ppb5 bus 8
pchb1 at pci0 dev 16 function 0 "Intel 5100 FSB" rev 0x90
pchb2 at pci0 dev 16 function 1 "Intel 5100 FSB" rev 0x90
pchb3 at pci0 dev 16 function 2 "Intel 5100 FSB" rev 0x90
pchb4 at pci0 dev 17 function 0 "Intel 5100 Reserved" rev 0x90
pchb5 at pci0 dev 19 function 0 "Intel 5100 Reserved" rev 0x90
pchb6 at pci0 dev 21 function 0 "Intel 5100 DDR" rev 0x90
pchb7 at pci0 dev 22 function 0 "Intel 5100 DDR" rev 0x90
ppb6 at pci0 dev 28 function 0 "Intel 82801I PCIE" rev 0x02
pci7 at ppb6 bus 9
ppb7 at pci0 dev 28 function 4 "Intel 82801I PCIE" rev 0x02
pci8 at ppb7 bus 1
bge1 at pci8 dev 0 function 0 "Broadcom BCM5722" rev 0x00, BCM5755 C0
(0xa200): apic 4 int 16 (irq 15), address 00:24:e8:75:91:47
brgphy1 at bge1 phy 1: BCM5722 10/100/1000baseT PHY, rev. 0
ppb8 at pci0 dev 28 function 5 "Intel 82801I PCIE" rev 0x02
pci9 at ppb8 bus 2
bge2 at pci9 dev 0 function 0 "Broadcom BCM5722" rev 0x00, BCM5755 C0
(0xa200): apic 4 int 17 (irq 14), address 00:24:e8:75:91:48
brgphy2 at bge2 phy 1: BCM5722 10/100/1000baseT PHY, rev. 0
uhci0 at pci0 dev 29 function 0 "Intel 82801I USB" rev 0x02: apic 4 int 21
(irq 11)
uhci1 at pci0 dev 29 function 1 "Intel 82801I USB" rev 0x02: apic 4 int 20
(irq 10)
uhci2 at pci0 dev 29 function 2 "Intel 82801I USB" rev 0x02: apic 4 int 21
(irq 11)
ehci0 at pci0 dev 29 function 7 "Intel 82801I USB" rev 0x02: apic 4 int 21
(irq 11)
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 "Intel EHCI root hub" rev 2.00/1.00 addr 1
ppb9 at pci0 dev 30 function 0 "Intel 82801BA Hub-to-PCI" rev 0x92
pci10 at ppb9 bus 10
vga1 at pci10 dev 7 function 0 "ATI ES1000" rev 0x02
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
radeondrm0 at vga1: apic 4 int 19 (irq 6)
drm0 at radeondrm0
ichpcib0 at pci0 dev 31 function 0 "Intel 82801IR LPC" rev 0x02: PM disabled
pciide0 at pci0 dev 31 function 2 "Intel 82801I SATA" rev 0x02: DMA,
channel 0 configured to native-PCI, channel 1 configured to native-PCI
pciide0: using apic 4 int 23 (irq 6) for native-PCI interrupt
pciide1 at pci0 dev 31 function 5 "Intel 82801I SATA" rev 0x02: DMA,
channel 0 wired to native-PCI, channel 1 wired to native-PCI
pciide1: using apic 4 int 22 (irq 5) for native-PCI interrupt
usb1 at uhci0: USB revision 1.0
uhub1 at usb1 "Intel UHCI root hub" rev 1.00/1.00 addr 1
usb2 at uhci1: USB revision 1.0
uhub2 at usb2 "Intel UHCI root hub" rev 1.00/1.00 addr 1
usb3 at uhci2: USB revision 1.0
uhub3 at usb3 "Intel UHCI root hub" rev 1.00/1.00 addr 1
isa0 at ichpcib0
isadma0 at isa0
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
com1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
pckbc0 at isa0 port 0x60/5
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pcppi0 at isa0 port 0x61
midi0 at pcppi0: <PC speaker>
spkr0 at pcppi0
npx0 at isa0 port 0xf0/16: reported by CPUID; using exception 16
mtrr: Pentium Pro MTRR support
uhub4 at uhub0 port 5 "Cypress Semiconductor USB2 Hub" rev 2.00/90.15 addr 2
uhidev0 at uhub2 port 2 configuration 1 interface 0 "Tangtop USBPS2" rev
1.10/0.01 addr 2
uhidev0: iclass 3/1
ukbd0 at uhidev0: 8 modifier keys, 6 key codes
wskbd1 at ukbd0 mux 1
wskbd1: connecting to wsdisplay0
uhidev1 at uhub2 port 2 configuration 1 interface 1 "Tangtop USBPS2" rev
1.10/0.01 addr 2
uhidev1: iclass 3/1, 3 report ids
ums0 at uhidev1 reportid 1: 5 buttons, Z dir
wsmouse0 at ums0 mux 0
uhid0 at uhidev1 reportid 2: input=2, output=0, feature=0
uhid1 at uhidev1 reportid 3: input=1, output=0, feature=0
softraid0 at root
root on sd0a swap on sd0b dump on sd0b

Reply | Threaded
Open this post in threaded view
|

Re: pseudo-crash on OpenBSD 4.5

STeve Andre'
Quoting [hidden email]:

> Got a bit of an oddity with OpenBSD 4.5 - it's not quite a crash, but
> close. It has happened 3 times now, usually after running flawlessly for
> 2-3 weeks.
>
> Fully up to date with 4.5-stable, running GENERIC.MP on a Dell poweredge
> R300 quad-core server with 4 gig ram (dmesg below). It's used as a
> firewall/NAT/vpn gateway, and as an email server.
>
> When the problem occurs, all services on the server stop responding
> (pop,imap,smtp, etc).
>
> The odd thing is that it does respond to ping, and the server still routes
> traffic correctly, and the vpn is up.
>
> The server console shows nothing out of the ordinary (white on black text
> login prompt, no X11), but the console is frozen - doesn't respond to
> keyboard.
>
> Since it doesn't actually panic, I can't run the usual debug tools.
>
> My only choice is to reboot.
>
> This is my only quad-core server with 4 gig - I'm wondering if it's
> related to GENERIC.MP or all the ram.
>
> (I have many other openbsd 4.5 boxes, none have this issue, but they are
> single-core and less than 3 gig ram)
>
> Any suggestions? Dell has a bios upgrade, I'll give that a try.

Just to get two things out of the way: 1) try the sp kernel, and 2) drop down
to 2G of memory.

A friend ran across some dell or hp system that didn't like having 4g of
ram, so its an east test do do.

bios upgrades are almost always a great thing to try.

--STeve Andre'

Reply | Threaded
Open this post in threaded view
|

Re: pseudo-crash on OpenBSD 4.5

Nick Guenther
In reply to this post by bill234
On Fri, Dec 11, 2009 at 12:17 PM,  <[hidden email]> wrote:

> Got a bit of an oddity with OpenBSD 4.5 - it's not quite a crash, but
> close. It has happened 3 times now, usually after running flawlessly for
> 2-3 weeks.
>
> Fully up to date with 4.5-stable, running GENERIC.MP on a Dell poweredge
> R300 quad-core server with 4 gig ram (dmesg below). It's used as a
> firewall/NAT/vpn gateway, and as an email server.
>
> When the problem occurs, all services on the server stop responding
> (pop,imap,smtp, etc).
>
> The odd thing is that it does respond to ping, and the server still routes
> traffic correctly, and the vpn is up.
>
> The server console shows nothing out of the ordinary (white on black text
> login prompt, no X11), but the console is frozen - doesn't respond to
> keyboard.
>
> Since it doesn't actually panic, I can't run the usual debug tools.

I've definitely had this happen to me but never had conclusive proof
of the cause (because as you say, all you can do is reboot). I have
more information from a DD-WRT install in fact: the web UI would stop
responding and traffic would slow to a crawl but not stop; we were 90%
sure the problem was memory pressure. When you get it back up try
logging vmstat(8) every few minutes?

-Nick