SPARC64: input/output error on softraid 5 with more than 8 disks

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

SPARC64: input/output error on softraid 5 with more than 8 disks

Alex McWhirter
I'm forwarding this to a few other lists just to see if i can get some
more input on it. I would like to think this is an arch specific bug as
i imagine there are people using raid 5 with more than 8 disks on amd64
/ i386. However i haven't tried this on amd64, so i am not certain.

1. Steps to reproduce (can be done in bsd.rd after issuing MAKEDEV for
all disks)

Using sd2-sd11 for this example.

disklabel -E sd2(sd3,sd4,etc..) - Create partition "a" with all disk
space as RAID type all disks are the same model.
bioctl -c 5 -l sd2a,sd3a,etc... softraid0 - comes out as sd12
disklabel -E sd12 - Create partition "a" with all disk space as BSD type
newfs sd12a
mount /dev/sd12a /mnt
cd /mnt
dd if=/dev/zero of=test bs=1m count=128

2. Output Received

dd will fail immediately with "Input/Output Error" with 0 records
written. dmesg shows no errors. The test file is visible on the
filesystem however.

3. Dmesg output, will post tomorrow. As stated below, this happens on
two different sparc64 hosts with completely different hardware, disks,
HBA's, and disk shelves. There are no dmesg errors added when dd fails.

4. Third party software, can be done in a base install or in bsd.rd. No
third party software needed.

5. No kernel panic, system keep chugging along, filesystem stays
mounted, fsck comes back clean, upon reboot raid is still in good
standing and filesystem checks still pass.

-------- Original Message --------
Subject: Input/Output error on softraid 5 with more that 8 disks
Date: 2016-10-05 19:19
 From: [hidden email]
To: [hidden email]

Tested on a Sun E6K and Sun V210. It seems that if i create a raid 5
array bigger than 8 disks (U320 SCSI) using softraid, i get a rather
vague "Input / Output" error why attempting to write any significantly
large data (dd if=/dev/zero of=/mnt/blah bs=1m count=128). newfs creates
the filesystem fine and fsck comes back clean but trying to put anything
on the array doesn't seem to do anything.

I was just wondering if this is something that has been reported in the
past or if I'm the first to uncover it? I have obtained the same results
from various hardware which seems to point to the softraid driver.

Hardware Tested

HBA's

qlw: isp1000, isp1040, isp10160
esp: fas336

Disk Shelfs

StorEdge D1000
StorEdge 3320

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: SPARC64: input/output error on softraid 5 with more than 8 disks

Alex McWhirter
On 2016-10-06 11:34, Kenneth Westerback wrote:

> 1) Why do you say >8 but only give an example using 10 disks?
>
> 2) fdisk and disklabels for all the disks you test would be useful, as
> would the verbatum output from newfs.
>
> 3) The size of the disks would also be useful (although the information
> above would contain this).
>
> 4) To eliminate the size of the resulting volume being a problem and
> possibly eliminating ffs2 vs ffs issues trying to create a volume with
> smaller chunks (say 100MB) on each disk would be another useful data
> point.
>
> .... Ken
>


1. That was a bit of an assumption on my part, just tested 9 disks and
it works fine. My raid arrays have even disk spacing, so using odd
numbered arrays is not ideal so i never tested it before hand. I can
verify that 10 disk arrays, 11 disk arrays, and 12 disk arrays are
broken in the same manner however.

2. To make testing a bit easier i decided to start doing debugging on my
Sun T5120 at home. I only have 2 disks, but have setup 10 ~100MB
partitions on sd0.

# disklabel sd0
# /dev/rsd0c:
type: SCSI
disk: SCSI disk
label: ST914602SSUN146G
duid: 82bafda60ea79f65
flags: vendor
bytes/sector: 512
sectors/track: 848
tracks/cylinder: 24
sectors/cylinder: 20352
cylinders: 14089
total sectors: 286739329
boundstart: 0
boundend: 286739329
drivedata: 0

16 partitions:
#                size           offset  fstype [fsize bsize  cpg]
   a:           223872                0    RAID
   b:           223872           223872    RAID
   c:        286739329                0  unused
   d:           223872           447744    RAID
   e:           223872           671616    RAID
   f:           223872           895488    RAID
   g:           223872          1119360    RAID
   h:           223872          1343232    RAID
   i:           223872          1567104    RAID
   j:           223872          1790976    RAID
   k:           223872          2014848    RAID
#dd

After that i setup raid 5 between all of these partitions...

# bioctl -c 5 -l sd0a,sd0b,sd0d,sd0e,sd0f,sd0g,sd0h,sd0i,sd0j,sd0k
softraid0
sd2 at scsibus2 targ 1 lun 0: <OPENBSD, SR RAID 5, 006> SCSI2 0/direct
fixed
sd2: 981MB, 512 bytes/sector, 2009088 sectors
softraid0: RAID 5 volume attached as sd2
#

Then add a disklabel to sd2 and format it...

# disklabel sd2
# /dev/rsd2c:
type: SCSI
disk: SCSI disk
label: SR RAID 5
duid: 9388bbda6be53605
flags: vendor
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 255
sectors/cylinder: 16065
cylinders: 125
total sectors: 2009088
boundstart: 0
boundend: 2009088
drivedata: 0

16 partitions:
#                size           offset  fstype [fsize bsize  cpg]
   a:          2008125                0  4.2BSD   2048 16384    1
   c:          2009088                0  unused
#
# newfs sd2a
/dev/rsd2a: 980.5MB in 2008124 sectors of 512 bytes
5 cylinder groups of 202.47MB, 12958 blocks, 25984 inodes each
super-block backups (for fsck -b #) at:
  32, 414688, 829344, 1244000, 1658656,
#

And now we mount it and try to write data to it...

# mount /dev/sd2a /mnt
# cd /mnt
# dd if=/dev/zero of=test bs=1m count=128
dd: test: Input/output error
1+0 records in
0+0 records out
0 bytes transferred in 0.112 secs (0 bytes/sec)
#


I plan on building a debug kernel tonight see if i can get anything out
more out of it.

Loading...