Constant Panic's when writing to disk w/ 3ware

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Constant Panic's when writing to disk w/ 3ware

Harford, Colin
Anytime there is any intensive disk operations I am able to cause the system
to crash.  Even using tar, or a cvs update is enough.


This is a i386 machine, that has ran OpenBSD since 3.0 without issue with very
minor changes.  The change that has causes the issues has been the switch to
3WARE 5800 ide raid controller.  Now, this is an older raid controller, but is
listed in the hardware compat list.

Started experiencing crashes, so, I first put it down to a defective raid
controller, and got a different one (same brand, and model).  Got further
along with the newer raid controller, but continued to get a large number of
crashes anytime I went to do disk intensive operations.

The system has 4 identical drives connected created two raid 1 arrays.  I have
switched disks, etc, and ran every combo of the 4 as different sets.  At the
end, I just have two identical drives together running a single raid 1 set.

These drives were previously being used in OpenBSD in this computer without
issue.  If i have the raid card verify the array, many hours later it will
validate without issue.

I found I was able to get a more stable system after putting in the raid
controller by disabling pcibios and alipm.  Before that, I would get original
panic's of alipm timeouts.


This is the current configuration, and crash information logged via serial.


OpenBSD/i386 BOOT 2.10
booting hd0a:/bsd: 4966344+867848 [52+255872+237161]=0x608d64
entry point at 0x100120

[ using 493460 bytes of bsd ELF symbol table ]
Copyright (c) 1982, 1986, 1989, 1991, 1993
        The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2006 OpenBSD. All rights reserved.  http://www.OpenBSD.org

OpenBSD 3.9 (GENERIC) #617: Thu Mar  2 02:26:48 MST 2006
    [hidden email]:/usr/src/sys/arch/i386/compile/GENERIC
cpu0: AMD Athlon(TM)Processor ("AuthenticAMD" 686-class, 256KB L2 cache) 1.40
GH
z
cpu0:
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXS
R
real mem  = 804806656 (785944K)
avail mem = 726978560 (709940K)
using 4278 buffers containing 40341504 bytes (39396K) of memory
User Kernel Config
UKC> disable alipm
164 alipm* disabled
UKC> disable pcibios
277 pcibios0 disabled
UKC> exit
Continuing...
mainbus0 (root)
bios0 at mainbus0: AT/286+(78) BIOS, date 04/30/03, BIOS32 rev. 0 @ 0xf0f70
apm0 at bios0: Power Management spec V1.2 (BIOS mgmt disabled)
apm0: APM power management enable: unrecognized device ID (9)
apm0: APM engage (device 1): power management disabled (1)
apm0: AC on, battery charge unknown
apm0: flags b0102 dobusy 0 doidle 1
pcibios at bios0 function 0x1a not configured
bios0: ROM list: 0xc0000/0xc800 0xd0000/0x1800 0xd4000/0x1000
cpu0 at mainbus0
pci0 at mainbus0 bus 0: configuration mode 1 (no bios)
pchb0 at pci0 dev 0 function 0 "Acer Labs M1647 PCI" rev 0x04
ppb0 at pci0 dev 1 function 0 "Acer Labs M5247 AGP/PCI-PC" rev 0x00
pci1 at ppb0 bus 1
vga1 at pci1 dev 0 function 0 "NVIDIA GeForce2 MX" rev 0xa1
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
ohci0 at pci0 dev 2 function 0 "Acer Labs M5237 USB" rev 0x03: irq 9, version
1.
0, legacy support
usb0 at ohci0: USB revision 1.0
uhub0 at usb0
uhub0: Acer Labs OHCI root hub, rev 1.00/1.00, addr 1
uhub0: 4 ports with 4 removable, self powered
pciide0 at pci0 dev 4 function 0 "Acer Labs M5229 UDMA IDE" rev 0xc4: DMA,
chann
el 0 configured to compatibility, channel 1 configured to compatibility
atapiscsi0 at pciide0 channel 0 drive 0
scsibus0 at atapiscsi0: 2 targets
cd0 at scsibus0 targ 0 lun 0: <LITE-ON, LTR-16102B, OS08> SCSI0 5/cdrom
removabl
e
cd0(pciide0:0:0): using PIO mode 4, DMA mode 2
pciide0: channel 1 disabled (no drives)
ohci1 at pci0 dev 6 function 0 "Acer Labs M5237 USB" rev 0x03: irq 9, version
1.
0, legacy support
usb1 at ohci1: USB revision 1.0
uhub1 at usb1
uhub1: Acer Labs OHCI root hub, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
pcib0 at pci0 dev 7 function 0 "Acer Labs M1533 ISA" rev 0x00
em0 at pci0 dev 9 function 0 "Intel PRO/1000MT (82540EM)" rev 0x02: irq 5,
addre
ss 00:07:e9:0a:27:3a
eap0 at pci0 dev 10 function 0 "Ensoniq CT5880" rev 0x02: irq 3
eap0: eap1371_read_codec timeout 2
ac97: codec id 0x83847609 (SigmaTel STAC9721/23)
ac97: codec features 18 bit DAC, 18 bit ADC, SigmaTel 3D
audio0 at eap0
midi0 at eap0: <AudioPCI MIDI UART>
fxp0 at pci0 dev 11 function 0 "Intel 8255x" rev 0x0c, i82550: irq 10, address
0
0:02:b3:92:4a:1d
inphy0 at fxp0 phy 1: i82555 10/100 PHY, rev. 4
twe0 at pci0 dev 12 function 0 "3ware Escalade IDE RAID" rev 0x12: irq 11
twe0: Escalade V8.2
scsibus1 at twe0: 16 targets
sd0 at scsibus1 targ 1 lun 0: <3WARE, Host drive #01, > SCSI2 0/direct fixed
sd0: 114472MB, 114472 cyl, 64 head, 32 sec, 512 bytes/sec, 234439600 sec
total
"Texas Instruments TSB12LV23 FireWire" rev 0x00 at pci0 dev 13 function 0 not
co
nfigured
"Acer Labs M7101 Power" rev 0x00 at pci0 dev 17 function 0 not configured
isa0 at pcib0
isadma0 at isa0
pckbc0 at isa0 port 0x60/5
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pmsi0 at pckbc0 (aux slot)
pckbc0: using irq 12 for aux slot
wsmouse0 at pmsi0 mux 0
pcppi0 at isa0 port 0x61
midi1 at pcppi0: <PC speaker>
spkr0 at pcppi0
lpt0 at isa0 port 0x378/4 irq 7
npx0 at isa0 port 0xf0/16: using exception 16
pccom0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
pccom0: console
fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec
biomask eb45 netmask ef65 ttymask ffe7
pctr: user-level cycle counter enabled
mtrr: Pentium Pro MTRR support
apm0: APM set power state: power management disabled (1)
dkcsum: sd0 matches BIOS drive 0x80
root on sd0a
rootdev=0x400 rrootdev=0xd00 rawdev=0xd02
Automatic boot in progress: starting file system checks.
/dev/rsd0a: file system is clean; not checking
/dev/rsd0f: file system is clean; not checking
/dev/rsd0h: file system is clean; not checking
/dev/rsd0g: file system is clean; not checking
/dev/rsd0d: file system is clean; not checking
/dev/rsd0e: file system is clean; not checking
setting tty flags
pf enabled
net.inet.ip.forwarding: 0 -> 1
net.inet.ip.mforwarding: 0 -> 1
net.inet6.ip6.forwarding: 0 -> 1
net.inet6.ip6.accept_rtadv: 0 -> 1
machdep.allowaperture: 0 -> 2
starting network
DHCPREQUEST on fxp0 to 255.255.255.255 port 67
DHCPACK from xx.xx.xx.xx
bound to xx.xx.xx.yy -- renewal in 61962 seconds.
starting system logger
starting initial daemons: ntpd.
savecore: no core dump
checking quotas: done.
building ps databases: kvm dev.
clearing /tmp
starting pre-securelevel daemons:.
setting kernel security level: kern.securelevel: 0 -> 1
creating runtime link editor directory cache.
preserving editor files
starting network daemons: sendmail ftp-proxy inetd sshd.
starting local daemons:.
standard daemons: cron.
Fri May 26 22:55:28 MDT 2006

OpenBSD/i386 (inset-random-name) (tty00)

login:

Login:


execute the following:
cd /usr/src2
tar xvfz src.tar.gz

It gets part way through and I get the following:


sd0(twe0:1:0): User command with no ioctl


That console becomes locked and unresponsive...

Log in via ssh at this time and do a top:

It says along the lines of this:

load averages:  0.17,  0.19,  0.08
23:00:03
25 processes:  24 idle, 1 on processor
CPU states:  0.0% user,  0.0% nice,  0.0% system,  0.2% interrupt, 99.8% idle
Memory: Real: 8396K/77M act/tot  Free: 674M  Swap: 0K/1536M used/tot

  PID USERNAME PRI NICE  SIZE   RES STATE    WAIT     TIME    CPU COMMAND
25448 root      -5    0  556K  464K idle     biowai   0:00  0.00% tar
10858 root      -5    0  416K  620K idle     pipewr   0:00  0.00% gzip
25660 root       2    0  688K 1176K idle     select   0:00  0.00% sshd
13885 root       2    0 3344K 2080K sleep    select   0:00  0.00% sshd
 3926 root      18    0  468K  496K idle     pause    0:00  0.00% ksh
 8406 root       2    0  960K 1116K sleep    select   0:00  0.00% sendmail
 1102 _syslogd   2    0  416K  548K idle     poll     0:00  0.00% syslogd
18491 _ntp       2    0  284K  588K sleep    poll     0:00  0.00% ntpd
 5858 root      18    0  536K  468K sleep    pause    0:00  0.00% ksh
17598 root      28    0  468K 1008K onproc   -        0:00  0.00% top
    1 root      10    0  320K  324K idle     wait     0:00  0.00% init
 9338 _pflogd    4    0  624K  232K sleep    bpf      0:00  0.00% pflogd
17823 root       3    0  288K  524K idle     ttyin    0:00  0.00% getty
25317 root       2    0  488K  664K idle     select   0:00  0.00% cron
 8178 root       3    0  324K  528K idle     ttyin    0:00  0.00% getty
18862 root       2    0  312K  516K idle     select   0:00  0.00% inetd
15124 proxy      2    0  252K  580K sleep    kqread   0:00  0.00% ftp-proxy
17259 root       3    0  220K  536K idle     ttyin    0:00  0.00% getty
 8903 root       2    0  388K  492K idle     netio    0:00  0.00% syslogd
21139 root       2    0  560K  372K idle     netio    0:00  0.00% pflogd
23996 root       3    0  280K  528K idle     ttyin    0:00  0.00% getty
 6598 root       3    0  328K  524K idle     ttyin    0:00  0.00% getty
 8611 root       2    0  324K  612K idle     poll     0:00  0.00% ntpd
 5424 _dhcp      2    0  380K  204K idle     poll     0:00  0.00% dhclient
11796 root       2    0  324K  260K idle     poll     0:00  0.00% dhclient


until I get:

login: panic: pool_get(scxspl): free list modified: magic=0; page 0xd743b000;
it
em addr 0xd743b280
Stopped at      Debugger+0x4:   leave
RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS PANIC!
DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION!
ddb> trace
Debugger(0,0,0,d743b280,d0600fa0) at Debugger+0x4
panic(d051a9c0,d053ed01,0,d743b000,d743b280) at panic+0x63
pool_get(d0600fa0,0,0,0) at pool_get+0x315
scsi_get_xs(d13a5980,1001,a,d13a5980,0) at scsi_get_xs+0x6a
scsi_scsi_cmd(d13a5980,e8cace14,a,e1ea5000,800,4,ea60,d7498078,1001,0,0,0) at
s
csi_scsi_cmd+0x27
sdstart(d13bdc00,d7498078,d141f000,0,0) at sdstart+0x18f
sdstrategy(d7498078,0,0,0,d7498078) at sdstrategy+0xbd
spec_strategy(e8cacef4,0,0,0,0) at spec_strategy+0x33
spec_vnoperate(e8cacef4,2,e8cab000,d7433e10,e8cacf1c) at spec_vnoperate+0x16
ufs_strategy(e8cacef4,0,0,80,d05a7720) at ufs_strategy+0x52
VOP_STRATEGY(d7498078,cd9,e8cacf3c,d021f3f1,0) at VOP_STRATEGY+0x25
bwrite(d7498078,e8cacf74,e8cacf8c,d023e6db,d05a7760) at bwrite+0xac
VOP_BWRITE(d7498078,0,0,0,0) at VOP_BWRITE+0x25
buf_daemon(d7433e10) at buf_daemon+0xb9
Bad frame pointer: 0xd070bed8
ddb> ps
   PID   PPID   PGRP    UID  S       FLAGS  WAIT       COMMAND
  8057  13765  13765      0  3      0x4086  pipewr     gzip
 13765   5858  13765      0  2      0x4006             tar
  5858  13885   5858      0  3      0x4086  pause      ksh
 13885  25660  13885      0  3      0x4084  select     sshd
 25448   3926  25448      0  3      0x4006  biowait    tar
 17823      1  17823      0  3      0x4086  ttyin      getty
 17259      1  17259      0  3      0x4086  ttyin      getty
  8178      1   8178      0  3      0x4086  ttyin      getty
  6598      1   6598      0  3      0x4086  ttyin      getty
 23996      1  23996      0  3      0x4086  ttyin      getty
  3926      1   3926      0  3      0x4086  pause      ksh
 25317      1  25317      0  3        0x84  select     cron
  8406      1   8406      0  3     0x40184  select     sendmail
 25660      1  25660      0  3        0x84  select     sshd
 18862      1  18862      0  3       0x184  select     inetd
 15124      1  15124     71  3       0x184  kqread     ftp-proxy
 18491   8611   8611     83  3       0x184  poll       ntpd
  8611      1   8611      0  3        0x84  poll       ntpd
  9338  21139  21139     74  3       0x184  bpf        pflogd
 21139      1  21139      0  3        0x84  netio      pflogd
  1102   8903   8903     73  3       0x184  poll       syslogd
  8903      1   8903      0  3        0x84  netio      syslogd
  5424      1   5424     77  3       0x184  poll       dhclient
 11796      1  19672      0  3        0x86  poll       dhclient
    14      0      0      0  3    0x100204  crypto_wa  crypto
    13      0      0      0  3    0x100204  aiodoned   aiodoned
    12      0      0      0  3    0x100204  syncer     update
*   11      0      0      0  7    0x100204             cleaner
    10      0      0      0  3    0x100204  reaper     reaper
     9      0      0      0  3    0x100204  pgdaemon   pagedaemon
     8      0      0      0  3    0x100204  pftm       pfpurge
     7      0      0      0  2    0x100204             twe0
     6      0      0      0  3    0x100204  usbevt     usb1
     5      0      0      0  3    0x100204  usbtsk     usbtask
     4      0      0      0  3    0x100204  usbevt     usb0
     3      0      0      0  3    0x100204  apmev      apm0
     2      0      0      0  3    0x100204  kmalloc    kmthread
     1      0      1      0  3      0x4084  wait       init
     0     -1      0      0  3     0x80204  scheduler  swapper
 10858  25448  25448      0  5      0x6002             gzip
ddb>


ddb> show registers
ds                  0x10
es                  0x10
fs                  0x58
gs                  0x10
edi           0xd051a9c0        addrmask+0x21a0
esi           0xe8cacd28
ebp           0xe8caccfc
ebx                    0
edx                  0x4
ecx           0xd05a6644        kprintf_mutex
eax                  0x1
eip           0xd0343a48        Debugger+0x4
cs                   0x8
eflags             0x202
esp           0xe8caccfc
ss            0xe8ca0010
Debugger+0x4:   leave