SCSI illegal commands and bus resets on Ultra 1E starting with 4.8

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

SCSI illegal commands and bus resets on Ultra 1E starting with 4.8

Kurt Mosiejczuk-4
Starting with 4.8 the following type of errors pop up periodically when
accessing the disk:

esp0: illegal command: 0x0 (state 2, phase 3, prevphase 3)
esp0: SCSI bus reset

esp0: illegal command: 0x0 (state 2, phase 7, prevphase 3)
esp0: SCSI bus reset

Initially I thought my disks were flaky, but I found older releases
worked okay, and found the trouble started after 4.7 release.  I get the
error regardless of whether I have a terminator hanging off the back of
the Ultra 1 or not (and even tried different types of terminators).

The 4.9 dmesg is almost identical to the 4.8 dmesg, differing only on
version number, CPU frequency and a difference of bytes in available
memory.  I can make that dmesg and a snapshot dmesg available if it helps.

I got this to happen on more than one Ultra 1E and I'm pretty sure it
happened with my E3K too (although it had... other issues).  I see that
there was 1 change to esp_sbus.c that looks like a culprit, but since
the commit message indicates it was part of a cleanup I wanted to bring
it here before trying to blindly revert that file.

Thanks,
   --Kurt

4.7 dmesg

Copyright (c) 1982, 1986, 1989, 1991, 1993
         The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2010 OpenBSD. All rights reserved.
http://www.OpenBSD.org

OpenBSD 4.7 (RAMDISK) #223: Thu Mar 18 00:20:18 MDT 2010
     [hidden email]:/usr/src/sys/arch/sparc64/compile/RAMDISK
real mem = 268435456 (256MB)
avail mem = 251215872 (239MB)
mainbus0 at root: Sun Ultra 1 UPA/SBus (UltraSPARC 167MHz)
cpu0 at mainbus0: SUNW,UltraSPARC (rev 4.0) @ 167.007 MHz
cpu0: physical 16K instruction (32 b/l), 16K data (32 b/l), 512K
external (64 b/l)
timer0 at mainbus0 addr 0xfffc7c00 ivec 0x7f0, 0x7f1
sbus0 at mainbus0 addr 0xfffcc000: clock = 25 MHz
sbus0: dvma map ff800000-ffffffff, STC0 enabled
"SUNW,CS4231" at sbus0 slot 13 offset 0xc000000 vector 24 ipl 8 not
configured
auxio0 at sbus0 slot 15 offset 0x1900000
"flashprom" at sbus0 slot 15 offset 0x0 not configured
"SUNW,fdtwo" at sbus0 class block slot 15 offset 0x1400000 vector 29 ipl
11 not configured
clock1 at sbus0 slot 15 offset 0x1200000: mk48t59
zs0 at sbus0 slot 15 offset 0x1100000 vector 28 ipl 12 softpri 6
zstty0 at zs0 channel 0: console output
zstty1 at zs0 channel 1
zs1 at sbus0 slot 15 offset 0x1000000 vector 28 ipl 12 softpri 6
zskbd0 at zs1 channel 0: no keyboard
zstty2 at zs1 channel 1: mouse
"sc" at sbus0 slot 15 offset 0x1300000 not configured
"SUNW,pll" at sbus0 slot 15 offset 0x1304000 not configured
esp0 at sbus0 slot 14 offset 0x8800000 vector 20 ipl 3: dma rev fas
esp0: FAS366/HME, 40MHz
scsibus0 at esp0: 16 targets, initiator 7
sd0 at scsibus0 targ 0 lun 0: <FUJITSU, MAT3073NC, 0104> SCSI3 0/direct
fixed
sd0: 70136MB, 512 bytes/sec, 143638992 sec total
sd1 at scsibus0 targ 1 lun 0: <IBM, DPSS-318350M, S93E> SCSI3 0/direct fixed
sd1: 17366MB, 512 bytes/sec, 35566478 sec total
cd0 at scsibus0 targ 6 lun 0: <TOSHIBA, XM-5401TASUN4XCD, 1036> SCSI2
5/cdrom removable
hme0 at sbus0 slot 14 offset 0x8c00000 vector 21 ipl 6, address
08:00:20:9d:cf:04
nsphy0 at hme0 phy 1: DP83840 10/100 PHY, rev. 0
"SUNW,bpp" at sbus0 slot 14 offset 0xc800000 vector 22 ipl 2 not configured
hme1 at sbus0 slot 1 offset 0x8c00000 vector 4 ipl 6, address
08:00:20:9d:cf:04
nsphy1 at hme1 phy 1: DP83840 10/100 PHY, rev. 1
esp1 at sbus0 slot 1 offset 0x8800000 vector 3 ipl 3: dma rev fas
esp1: FAS366/HME, 40MHz
scsibus1 at esp1: 16 targets, initiator 7
rd0: fixed, 6144 blocks
softraid0 at root
bootpath: /sbus@1f,0/SUNW,fas@e,8800000/sd@6,0:f
root on rd0a swap on rd0b dump on rd0b

4.8 dmesg

Copyright (c) 1982, 1986, 1989, 1991, 1993
         The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2010 OpenBSD. All rights reserved.
http://www.OpenBSD.org

OpenBSD 4.8 (RAMDISK) #366: Mon Aug 16 09:52:28 MDT 2010
     [hidden email]:/usr/src/sys/arch/sparc64/compile/RAMDISK
real mem = 268435456 (256MB)
avail mem = 255164416 (243MB)
mainbus0 at root: Sun Ultra 1 UPA/SBus (UltraSPARC 167MHz)
cpu0 at mainbus0: SUNW,UltraSPARC (rev 4.0) @ 167.006 MHz
cpu0: physical 16K instruction (32 b/l), 16K data (32 b/l), 512K
external (64 b/l)
timer0 at mainbus0 addr 0xfffc7c00 ivec 0x7f0, 0x7f1
sbus0 at mainbus0 addr 0xfffcc000: clock = 25 MHz
sbus0: dvma map ff800000-ffffffff, STC0 enabled
"SUNW,CS4231" at sbus0 slot 13 offset 0xc000000 vector 24 ipl 8 not
configured
auxio0 at sbus0 slot 15 offset 0x1900000
"flashprom" at sbus0 slot 15 offset 0x0 not configured
"SUNW,fdtwo" at sbus0 class block slot 15 offset 0x1400000 vector 29 ipl
11 not configured
clock1 at sbus0 slot 15 offset 0x1200000: mk48t59
zs0 at sbus0 slot 15 offset 0x1100000 vector 28 ipl 12 softpri 6
zstty0 at zs0 channel 0: console output
zstty1 at zs0 channel 1
zs1 at sbus0 slot 15 offset 0x1000000 vector 28 ipl 12 softpri 6
zskbd0 at zs1 channel 0: no keyboard
zstty2 at zs1 channel 1: mouse
"sc" at sbus0 slot 15 offset 0x1300000 not configured
"SUNW,pll" at sbus0 slot 15 offset 0x1304000 not configured
esp0 at sbus0 slot 14 offset 0x8800000 vector 20 ipl 3: dma rev fas
esp0: FAS366/HME, 40MHz
scsibus0 at esp0: 16 targets, initiator 7
sd0 at scsibus0 targ 0 lun 0: <FUJITSU, MAT3073NC, 0104> SCSI3 0/direct
fixed
esp0: illegal command: 0x0 (state 2, phase 3, prevphase 3)
esp0: SCSI bus reset
sd0: 70136MB, 512 bytes/sec, 143638992 sec total
sd1 at scsibus0 targ 1 lun 0: <IBM, DPSS-318350M, S93E> SCSI3 0/direct fixed
esp0: illegal command: 0x0 (state 2, phase 7, prevphase 3)
esp0: SCSI bus reset
sd1: 17366MB, 512 bytes/sec, 35566478 sec total
cd0 at scsibus0 targ 6 lun 0: <TOSHIBA, XM-5401TASUN4XCD, 1036> SCSI2
5/cdrom removable
hme0 at sbus0 slot 14 offset 0x8c00000 vector 21 ipl 6, address
08:00:20:9d:cf:04
nsphy0 at hme0 phy 1: DP83840 10/100 PHY, rev. 0
"SUNW,bpp" at sbus0 slot 14 offset 0xc800000 vector 22 ipl 2 not configured
hme1 at sbus0 slot 1 offset 0x8c00000 vector 4 ipl 6, address
08:00:20:9d:cf:04
nsphy1 at hme1 phy 1: DP83840 10/100 PHY, rev. 1
esp1 at sbus0 slot 1 offset 0x8800000 vector 3 ipl 3: dma rev fas
esp1: FAS366/HME, 40MHz
scsibus1 at esp1: 16 targets, initiator 7
rd0: fixed, 6144 blocks
softraid0 at root
esp0: illegal command: 0x0 (state 2, phase 3, prevphase 3)
esp0: SCSI bus reset
esp0: illegal command: 0x0 (state 2, phase 7, prevphase 3)
esp0: SCSI bus reset
bootpath: /sbus@1f,0/SUNW,fas@e,8800000/sd@6,0:f
root on rd0a swap on rd0b dump on rd0b

Reply | Threaded
Open this post in threaded view
|

Re: SCSI illegal commands and bus resets on Ultra 1E starting with 4.8

Nick Holland
On 07/18/11 14:22, Kurt Mosiejczuk wrote:

> Starting with 4.8 the following type of errors pop up periodically when
> accessing the disk:
>
> esp0: illegal command: 0x0 (state 2, phase 3, prevphase 3)
> esp0: SCSI bus reset
>
> esp0: illegal command: 0x0 (state 2, phase 7, prevphase 3)
> esp0: SCSI bus reset
>
> Initially I thought my disks were flaky, but I found older releases
> worked okay, and found the trouble started after 4.7 release.  I get the
> error regardless of whether I have a terminator hanging off the back of
> the Ultra 1 or not (and even tried different types of terminators).
>
> The 4.9 dmesg is almost identical to the 4.8 dmesg, differing only on
> version number, CPU frequency and a difference of bytes in available
> memory.  I can make that dmesg and a snapshot dmesg available if it helps.
>
> I got this to happen on more than one Ultra 1E and I'm pretty sure it
> happened with my E3K too (although it had... other issues).  I see that
> there was 1 change to esp_sbus.c that looks like a culprit, but since
> the commit message indicates it was part of a cleanup I wanted to bring
> it here before trying to blindly revert that file.

bah.
I've got an Ultra1 in production, and it shows no such problem.
I just fired up an Ultra1E, though, and it does confirm your
observation.  One of the places where the U1 and U1e differ is in the
SCSI system.

My U1e is not in a great shape at the moment, so please feel free to
revert the change you are suspicious of, and see if that solves your
problem.  And put a PR in with the results, either way.  I'll get my U1e
back up to shape eventually, but you've got a head start on me. :)

Nick.

Reply | Threaded
Open this post in threaded view
|

Re: SCSI illegal commands and bus resets on Ultra 1E starting with 4.8

Kurt Mosiejczuk-4
Nick Holland wrote:
 > bah.
 > I've got an Ultra1 in production, and it shows no such problem.
 > I just fired up an Ultra1E, though, and it does confirm your
 > observation.  One of the places where the U1 and U1e differ is in the
 > SCSI system.

 > My U1e is not in a great shape at the moment, so please feel free to
 > revert the change you are suspicious of, and see if that solves your
 > problem.  And put a PR in with the results, either way.  I'll get my U1e
 > back up to shape eventually, but you've got a head start on me. :)

Okay, I'll work on that.  And I'm so glad I went back and put the "E" in
the subject before I sent the email :)

--Kurt