'schizo0: safari error' on Sun blade 1000

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

'schizo0: safari error' on Sun blade 1000

Edd Barrett-3
I have a sun blade 1000 which I use (infrequently) for testing stuff big
endian. Recently it has started locking up with the message:

schizo0: safari error

Sometimes that is all that is printed. Sometimes it prints more stuff
(end of report). In the former case, you don't get ddb, in the latter
you do. Either way, I can't determine what event triggers the crash.
Sometimes it happens during boot, other times, after several minutes of
uptime.

I think the hardware is OK. 'test all' in the eeprom doesn't suggest anything
is wrong.

I've tried backing out:
http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/arch/sparc64/dev/schizo.c.diff?r1=1.63&r2=1.64

Sadly, this made no difference.

Here's a more verbose crash (converted from a picture of my screen to text
using OCR, so beware mistakes):

# reboot
schizo0: safari error
ERRLOG=10
UE_AFSR=800010050
UE_AFAR=400
CE_AFSR=4000000008000067
CE_AFAR=319d8780
panic: schizo0: fatal
Stopped at        Debugger+0x8:     nop
TID     PID     UID       PRFLAGS      PFLAGS CPU COMMAND
*13422 13422         0          0x3            0    0 reboot
   1       1       0          0x2            0    1 init
schizo_safari_error(4000119a280, 4000a276260, 4000a150470, 0, 0, 1) at schizo_s
afari_error+0x134
intr_list_handler(40001177500, 4000a276260, 4000a150470, 1, 14240, 1) at intr_1
ist_handler+Ox3c
intr_handler(e0017ec8, 400014da900, baa2, ffffffffffffe000, 0, 6) at intr_handl
er+Ox50
sparc_intr_retry(4000a25a480, ffffffffff7a6000, 33442000, 7fffffff, 8c, 0) at s
parc_intr_retry+Ox5c
pmap_remove(ffffffffff7a6000, fffffffffdfe4000, ffffffffff7e4000, 7fffffffe000,
17b1760, 18d3000) at pmap_remove+0x98
uvm_mapent_forkcopy(4000a150470, 4000a25d200, 4000a25d500, 4000a150520, 1, 4000
9fe6304) at uvm_mapent_forkcopy+Oxfc
uvmspace_fork(4000a25d200, 11929e0, 40009ea02f0,      0, 40009ea0370, 1832dc0)   at uv
mspace_fork+0x188
process_new(40009ea8010,    40009ea0850, 1, 0, 40009ea8078, 0) at process_new+Ox1a
0
forkl(40009ea8910, 1, 0, 0, 11797e0, 0) at fork1+0x8ec
sys_fork(40009ea8910, 40018f13db0, 40018f13df0, 40009ea8910, 40018113b18, 6) at
sys_fork+0x40
syscall(40018f13ed0, 402, 746890e688, 746890e68c, 0, 0) at syscall+0x27c
syscall_setup(40018f13ed0, 421, 746890a5c8, 746890a5cc, 0, 31e2a700) at syscall
_setup+0x134


Here's a dmesg from a DEBUG kernel:

console is keyboard/display
Copyright (c) 1982, 1986, 1989, 1991, 1993
        The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2016 OpenBSD. All rights reserved.  http://www.OpenBSD.org

OpenBSD 6.0-beta (GENERIC.MP) #0: Sat May 28 19:43:36 BST 2016
    [hidden email]:/usr/src/sys/arch/sparc64/compile/GENERIC.MP
real mem = 1073741824 (1024MB)
avail mem = 1038516224 (990MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root: SUNW,Sun-Blade-1000 (2 X UltraSPARC-III)
cpu0 at mainbus0: SUNW,UltraSPARC-III (rev 5.14) @ 900 MHz
cpu0: physical 32K instruction (32 b/l), 64K data (32 b/l), 8192K external (512 b/l)
cpu1 at mainbus0: SUNW,UltraSPARC-III (rev 5.14) @ 750 MHz
cpu1: physical 32K instruction (32 b/l), 64K data (32 b/l), 8192K external (512 b/l)
"memory-controller" at mainbus0 not configured
"memory-controller" at mainbus0 not configured
schizo0 at mainbus0: "Schizo", version 4, ign 200, bus B 0 to 0
schizo0: schizo_iommu_init: getprop failed, using iobase=0xffffffff, tsbsize=7
dvma map c0000000-ffffffff
schizo_bus_map: type 0 off 0 sz 1000000 flags 0 cspace 0pci0 at schizo0
ebus0 at pci0 dev 5 function 0 "Sun RIO EBus" rev 0x01
"flashprom" at ebus0 addr 0-1fffff not configured
pcfiic0 at ebus0 addr 2e-2f, 2d-2d ivec 0x23schizo_bus_map: type 2 off 7e00002e sz 2 flags 0 cspace 2schizo_bus_map: type 2 off 7e00002d sz 1 flags 0 cspace 2
iic0 at pcfiic0
bbc0 at ebus0 addr 0-fffffschizo_bus_map: type 2 off 7e000000 sz 100000 flags 0 cspace 2: AID 0x00
ppm0 at ebus0 addr e-28, 728000-728003, 30002e-30002f, 300600-300607schizo_bus_map: type 2 off 7e30002e sz 2 flags 0 cspace 2schizo_bus_map: type 2 off 7e300600 sz 8 flags 0 cspace 2
pcfiic1 at ebus0 addr 30-31 ivec 0x23schizo_bus_map: type 2 off 7e000030 sz 2 flags 0 cspace 2
iic1 at pcfiic1
admtemp0 at iic1 addr 0x18: max1617
admtemp1 at iic1 addr 0x4c: max1617
tda0 at iic1 addr 0x24
"scm001" at iic1 addr 0x20 not configured
"firei" at iic1 addr 0x30 not configured
beep0 at ebus0 addr 32-37schizo_bus_map: type 2 off 7e000032 sz 6 flags 0 cspace 2: clock 75MHz
audioce0 at ebus0 addr 200000-2000ff, 702000-70200f, 704000-70400f, 722000-722003 ivec 0x20 ivec 0x21schizo_bus_map: type 2 off 7e200000 sz 100 flags 2 cspace 2schizo_bus_map: type 2 off 7e702000 sz 10 flags 2 cspace 2schizo_bus_map: type 2 off 7e704000 sz 10 flags 2 cspace 2schizo_bus_map: type 2 off 7e722000 sz 4 flags 2 cspace 2: nvaddrs 0
audio0 at audioce0
rtc0 at ebus0 addr 300070-300071 ivec 0x24schizo_bus_map: type 2 off 7e300070 sz 2 flags 0 cspace 2: ds1287
"gpio" at ebus0 addr 300600-300607 not configured
pmc0 at ebus0 addr 300700-300701schizo_bus_map: type 2 off fff38700 sz 0 flags 16 cspace 2
lpt0 at ebus0 addr 300278-300287, 30002e-30002f, 700000-70000f ivec 0x1cschizo_bus_map: type 2 off 7e300278 sz 10 flags 0 cspace 2schizo_bus_map: type 2 off 7e30002e sz 2 flags 0 cspace 2: polled
sab0 at ebus0 addr 400000-40007f ivec 0x22schizo_bus_map: type 2 off 7e400000 sz 80 flags 0 cspace 2: rev 3.2
sabtty0 at sab0 port 0
sabtty1 at sab0 port 1
gem0 at pci0 dev 5 function 1 "Sun ERI Ether" rev 0x01schizo_bus_map: type 2 off 100000 sz 20000 flags 0 cspace 2schizo_bus_map: type 2 off 400000 sz 400000 flags 0 cspace 2: ivec 0x21d, address 00:03:ba:10:0e:6a
luphy0 at gem0 phy 1: LU6612 10/100 PHY, rev. 1
"Sun FireWire" rev 0x01 at pci0 dev 5 function 2 not configured
ohci0 at pci0 dev 5 function 3 "Sun USB" rev 0x01schizo_bus_map: type 2 off 1000000 sz 8000 flags 0 cspace 2: ivec 0x21f, version 1.0, legacy support
siop0 at pci0 dev 6 function 0 "Symbios Logic 53c875" rev 0x37schizo_bus_map: type 2 off 124000 sz 100 flags 0 cspace 2schizo_bus_map: type 1 off 300 sz 100 flags 0 cspace 1: ivec 0x218schizo_bus_map: type 2 off 126000 sz 1000 flags 0 cspace 2, using 4K of on-board RAM
scsibus1 at siop0: 16 targets, initiator 7
cd0 at scsibus1 targ 0 lun 0: <NEC, CD-ROM DRIVE:465, 1.25> SCSI2 5/cdrom removable
siop1 at pci0 dev 6 function 1 "Symbios Logic 53c875" rev 0x37schizo_bus_map: type 2 off 128000 sz 100 flags 0 cspace 2schizo_bus_map: type 1 off 400 sz 100 flags 0 cspace 1: ivec 0x219schizo_bus_map: type 2 off 12a000 sz 1000 flags 0 cspace 2, using 4K of on-board RAM
scsibus2 at siop1: 16 targets, initiator 7
usb0 at ohci0: USB revision 1.0
uhub0 at usb0 "Sun OHCI root hub" rev 1.00/1.00 addr 1
schizo1 at mainbus0: "Schizo", version 4, ign 200, bus A 0 to 0
schizo1: schizo_iommu_init: getprop failed, using iobase=0xffffffff, tsbsize=7
dvma map c0000000-ffffffff
schizo_bus_map: type 0 off 0 sz 1000000 flags 0 cspace 0pci1 at schizo1
qla0 at pci1 dev 4 function 0 "QLogic ISP2200" rev 0x05schizo_bus_map: type 2 off 100000 sz 1000 flags 0 cspace 2: ivec 0x204
qla0: firmware rev 2.2.6, attrs 0x7
scsibus3 at qla0: 256 targets, WWPN 21000003ba100e6a, WWNN 20000003ba100e6a
sym0 at scsibus3 targ 1 lun 0: <SEAGATE, ST3146707FC, 0003> SCSI3 0/direct fixed naa.20000014c303de67
sd0 at scsibus0 targ 0 lun 0: <SEAGATE, ST3146707FC, 0003> SCSI3 0/direct fixed naa.20000014c303de67
sd0: 140014MB, 512 bytes/sector, 286749488 sectors
sym1 at scsibus3 targ 2 lun 0: <SEAGATE, ST3146807FC, 0006> SCSI3 0/direct fixed naa.20000018623c7bd4
sd1 at scsibus0 targ 1 lun 0: <SEAGATE, ST3146807FC, 0006> SCSI3 0/direct fixed naa.20000018623c7bd4
sd1: 140014MB, 512 bytes/sector, 286749488 sectors
upa0 at mainbus0
creator0 at upa0: Elite3D, model SUNW,540-3623, dac 0, 1280x1024
wsdisplay0 at creator0 mux 1: console (std, sun emulation)
"ppm" at mainbus0 not configured
uhidev0 at uhub0 port 4 configuration 1 interface 0 "Fujitsu Component Type 6 Keyboard" rev 1.00/1.01 addr 2
uhidev0: iclass 3/1
ukbd0 at uhidev0: 8 variable keys, 6 key codes, country code 32
wskbd0 at ukbd0: console keyboard, using wsdisplay0
vscsi0 at root
scsibus4 at vscsi0: 256 targets
softraid0 at root
scsibus5 at softraid0: 256 targets
bootpath: /pci@8,600000/SUNW,qlc@4,0/fp@0,0/disk@21000014c303de67,0
root on sd0a (8d7ec6f394d57510.a) swap on sd0b dump on sd0b
creator0: firmware rev 1.3.11


--
Best Regards
Edd Barrett

http://www.theunixzoo.co.uk

Reply | Threaded
Open this post in threaded view
|

Re: 'schizo0: safari error' on Sun blade 1000

Mark Kettenis
> Date: Tue, 31 May 2016 09:18:13 +0100
> From: Edd Barrett <[hidden email]>
>
> I have a sun blade 1000 which I use (infrequently) for testing stuff big
> endian. Recently it has started locking up with the message:
>
> schizo0: safari error
>
> Sometimes that is all that is printed. Sometimes it prints more stuff
> (end of report). In the former case, you don't get ddb, in the latter
> you do. Either way, I can't determine what event triggers the crash.
> Sometimes it happens during boot, other times, after several minutes of
> uptime.
>
> I think the hardware is OK. 'test all' in the eeprom doesn't suggest anything
> is wrong.
>
> I've tried backing out:
> http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/arch/sparc64/dev/schizo.c.diff?r1=1.63&r2=1.64
>
> Sadly, this made no difference.
>
> Here's a more verbose crash (converted from a picture of my screen to text
> using OCR, so beware mistakes):
>
> # reboot
> schizo0: safari error
> ERRLOG=10
> UE_AFSR=800010050
> UE_AFAR=400
> CE_AFSR=4000000008000067
> CE_AFAR=319d8780

Can you please double-check the numbers here?  Any mistakes made by
the OCR software here could send me in the wrong direction.

Reply | Threaded
Open this post in threaded view
|

Re: 'schizo0: safari error' on Sun blade 1000

Edd Barrett-3
On Tue, May 31, 2016 at 01:35:22PM +0200, Mark Kettenis wrote:

> > # reboot
> > schizo0: safari error
> > ERRLOG=10
> > UE_AFSR=800010050
> > UE_AFAR=400
> > CE_AFSR=4000000008000067
> > CE_AFAR=319d8780
>
> Can you please double-check the numbers here?  Any mistakes made by
> the OCR software here could send me in the wrong direction.

I confirm that these numbers are correct.

And I have uploaded the pics here:
http://theunixzoo.co.uk/random/crash1.jpg
http://theunixzoo.co.uk/random/crash2.jpg

--
Best Regards
Edd Barrett

http://www.theunixzoo.co.uk

Reply | Threaded
Open this post in threaded view
|

Re: 'schizo0: safari error' on Sun blade 1000

Edd Barrett-3
In reply to this post by Mark Kettenis
On Tue, May 31, 2016 at 01:35:22PM +0200, Mark Kettenis wrote:

> > Date: Tue, 31 May 2016 09:18:13 +0100
> > From: Edd Barrett <[hidden email]>
> >
> > I have a sun blade 1000 which I use (infrequently) for testing stuff big
> > endian. Recently it has started locking up with the message:
> >
> > schizo0: safari error
> >
> > Sometimes that is all that is printed. Sometimes it prints more stuff
> > (end of report). In the former case, you don't get ddb, in the latter
> > you do. Either way, I can't determine what event triggers the crash.
> > Sometimes it happens during boot, other times, after several minutes of
> > uptime.
> >
> > I think the hardware is OK. 'test all' in the eeprom doesn't suggest anything
> > is wrong.

Actually, I am no longer sure of this.

Since I was last playing, a CPU died (the system would not even power on
with this CPU plugged in). With the busted CPU removed, the system boots
but the safari error remains.

I then installed Solaris 10. And got this (ocr again):

---8<---
TIME EVENT-ID MSG-ID SEVERITY
jun 09 00:07:22 c639728d-944d-e5d4-8431-a1a3e5f1b179 PCIEX-8OOO-5Y Major
Host : blade
Platform : SUNW,Sun-Blade-1000 Chassis_id
Product_sn
Fault class : fault.io.pci.device-invreq
Affects : dev:////pci@8,600000/SUNW,qlc@4
faulted but still in service
FRU : "MB" (hc://:product-id=SUNW,Sun-Blade-1OOO:server-id=blade/motherb
oard=0)
Faulty
Description : The transmitting device sent an invalid request.
Response : One or more device instances may be disabled
Impact : Loss of services provided by the device instances associated with
this fault
Action : Use ’fmadm faulty’ to provide a more detailed view of this event.
Please refer to the associated reference document at
http://sun.com/msg/PCIEX-8OOO-5Y for the latest service
procedures and policies regarding this diagnosis.
--->8---

This may well be solaris' version of "safari error".

Sad times for my blade. Either my disks (i've been using a couple of
different ones) or the PCI bus seems broken.

I may try sourcing another disk, but failing that, I think the system
might be due for the sun graveyard :(

--
Best Regards
Edd Barrett

http://www.theunixzoo.co.uk

Reply | Threaded
Open this post in threaded view
|

Re: 'schizo0: safari error' on Sun blade 1000

Edd Barrett-3
On Thu, Jun 09, 2016 at 12:39:47AM +0100, Edd Barrett wrote:
> Sad times for my blade. Either my disks (i've been using a couple of
> different ones) or the PCI bus seems broken.

I've sourced a new motherboard for this system. I will let you know how I
get on.

--
Best Regards
Edd Barrett

http://www.theunixzoo.co.uk

Reply | Threaded
Open this post in threaded view
|

Re: 'schizo0: safari error' on Sun blade 1000

Edd Barrett-3
On Sat, Jun 11, 2016 at 03:10:15PM +0100, Edd Barrett wrote:
> On Thu, Jun 09, 2016 at 12:39:47AM +0100, Edd Barrett wrote:
> > Sad times for my blade. Either my disks (i've been using a couple of
> > different ones) or the PCI bus seems broken.
>
> I've sourced a new motherboard for this system. I will let you know how I
> get on.

With the new motherboard in, and running the installer over serial line:

---8<---
...
Get/Verify xserv60.tgz  100% |**************************| 17291 KB    00:13    
Installing bsd          100% |**************************|  9270 KB    00:02    
Installing bsd.rd       100% |**************************|  2653 KB    00:00    
Installing bsd.mp       100% |**************************|  9303 KB    00:02    
Installing base60.tgz    16% |*schizo0: safari error    |  5760 KB    00:17 ETA
ERRLOG=3010
UE_AFSR=1c2410ac13b
UE_AFAR=3895c049aa0
CE_AFSR=400000004800010f
CE_AFAR=31fda600
panic: schizo0: fatal
syncing disks... ***53 42                       |  8960 KB    00:14 15 ETA9 done
Frame pointer is at 0xe00170c9
Call traceback:
12a91f8(1000, 5, 1, 0, 0, 0, e0017189) fp = e0017189
10fb59c(100, 5, 0, 0, e0017b90, 0, e0017249) fp = e0017249
1105dcc(100, e0017c60, 1b08000, e0017c60, e0017c68, ffffffffffffffff, e0017309) fp = e0017309
1289774(13bbf18, 400011b21a4, 1b74000, 4000a27c800, 100, 3b9ac800, e00173d9) fp = e00173d9
12a75b8(400011b2180, 4000a26bb10, fc00, 22a560, 0, 0, e0017499) fp = e0017499
12a756c(40001192700, 4, 4000a252a10, 40018f4fce0, 1, 4000, e0017559) fp = e0017559
1011fc8(e0017ec8, 40001193400, 5f5f9, 0, 0, 0, e0017619) fp = e0017619
10119b4(40018f4fed0, fffffffffffffffe, 22a520, 44820082, 0, 558, fffffffffffb7f01) fp = fffffffffffb7f01

dump to dev 5,1 not possible
rebooting
--->8---

Doh! Still not sure if this is a hardware fault or a kernel bug. FWIW I
tried installing on a couple of different disks, same crash. I also
tried pulling the graphics card, here's the crash from that attempt:

---8<---
Get/Verify xserv60.tgz  100% |**************************| 17291 KB    00:12    
Installing bsd          100% |**************************|  9270 KB    00:01    
Installing bsd.rd       100% |**************************|  2653 KB    00:00    
Installing bsd.mp       100% |**************************|  9303 KB    00:01    
Installing base60.tgz    18% |****                      | 10368 KB    00:13 ETAschizo0: safari error
ERRLOG=3010
UE_AFSR=10040028038
UE_AFAR=28940008280
CE_AFSR=400000000800002c
CE_AFAR=31812700
panic: schizo0: fatal
syncing disks... 62 51 18 14 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 giving up
Frame pointer is at 0xe00170c9
Call traceback:
12a91f8(1000, 5, 1, 4000a298000, 40017b537f0, 40017b538f8, e0017189) fp = e0017189
10fb59c(100, 5, 0, 0, e0017b90, 0, e0017249) fp = e0017249
1105dcc(100, e0017c60, 1b08000, e0017c60, e0017c68, ffffffffffffffff, e0017309) fp = e0017309
1289774(13bbf18, 400011b21a4, 1b74000, ffffffffffffffff, 100, 830, e00173d9) fp = e00173d9
12a75b8(400011b2180, 31810000, 3, 0, 4000a16c900, 3c00, e0017499) fp = e0017499
12a756c(40001192700, 8000, 0, 4000f7d6000, 4, 1b76240, e0017559) fp = e0017559
1011fc8(e0017ec8, 40001193400, 5951e, 8000, 118d67c, 13, e0017619) fp = e0017619
118fca0(4000c209260, 0, 4d98, 0, 4000c206000, 40017b53810, 40017b52ec1) fp = 40017b52ec1

dump to dev 5,1 not possible
rebooting
--->8---

--
Best Regards
Edd Barrett

http://www.theunixzoo.co.uk

Reply | Threaded
Open this post in threaded view
|

Re: 'schizo0: safari error' on Sun blade 1000

Edd Barrett-3
Hi,

On Tue, Jun 21, 2016 at 01:04:24AM +0100, Edd Barrett wrote:

> With the new motherboard in, and running the installer over serial line:
>
> Get/Verify xserv60.tgz  100% |**************************| 17291 KB    00:13    
> Installing bsd          100% |**************************|  9270 KB    00:02    
> Installing bsd.rd       100% |**************************|  2653 KB    00:00    
> Installing bsd.mp       100% |**************************|  9303 KB    00:02    
> Installing base60.tgz    16% |*schizo0: safari error    |  5760 KB    00:17 ETA
> ERRLOG=3010
> UE_AFSR=1c2410ac13b
> UE_AFAR=3895c049aa0
> CE_AFSR=400000004800010f
> CE_AFAR=31fda600
> panic: schizo0: fatal

Sorry to dig up an old thread, but I thought I should mention that this
error is still present on the latest sparc64 snapshot.

--
Best Regards
Edd Barrett

http://www.theunixzoo.co.uk