HD 'Analysis'

classic Classic list List threaded Threaded
28 messages Options
12
Reply | Threaded
Open this post in threaded view
|

HD 'Analysis'

L. V. Lammert
Been trying to build a replacement HD for a system, .. and it seems
impossible to verify whether a disk is bad or not (having wasted some hours
rsync'ing data only to have the HD lock up the system when doing the final
rsync).

What is the best way to do a surface analysis on a disk? badsect seems like
a holdover from MB-sized disks, and it doesn't do any analysis.

        TIA,

        Lee

Reply | Threaded
Open this post in threaded view
|

Re: HD 'Analysis'

STeve Andre'
On Monday 04 May 2009 17:56:43 L. V. Lammert wrote:

> Been trying to build a replacement HD for a system, .. and it seems
> impossible to verify whether a disk is bad or not (having wasted some hours
> rsync'ing data only to have the HD lock up the system when doing the final
> rsync).
>
> What is the best way to do a surface analysis on a disk? badsect seems like
> a holdover from MB-sized disks, and it doesn't do any analysis.
>
> TIA,
>
> Lee

The best way is to get a new disk.  I'm serious.  Disks are cheap enough, and
the value of whats on them is high enough that if you think its going, get a
new one.  Even if this is a hobby system, I'd do that.

There is disk testing software from the OEMs you can use.

But if you think its acting weird don't trust it.

--STeve Andre'

Reply | Threaded
Open this post in threaded view
|

Re: HD 'Analysis'

L. V. Lammert
At 06:06 PM 5/4/2009 -0400, STeve Andre' wrote:

>The best way is to get a new disk.  I'm serious.  Disks are cheap enough, and
>the value of whats on them is high enough that if you think its going, get a
>new one.  Even if this is a hobby system, I'd do that.

And I'm serious too - how many hard drives to you throw away before you
realize that might not be the problem?

>There is disk testing software from the OEMs you can use.
>
>But if you think its acting weird don't trust it.

That's why I'm looking for a way to gather some hard data.

         Lee

Reply | Threaded
Open this post in threaded view
|

Re: HD 'Analysis'

STeve Andre'
On Monday 04 May 2009 18:29:26 L. V. Lammert wrote:

> At 06:06 PM 5/4/2009 -0400, STeve Andre' wrote:
> >The best way is to get a new disk.  I'm serious.  Disks are cheap enough,
> > and the value of whats on them is high enough that if you think its
> > going, get a new one.  Even if this is a hobby system, I'd do that.
>
> And I'm serious too - how many hard drives to you throw away before you
> realize that might not be the problem?
>
> >There is disk testing software from the OEMs you can use.
> >
> >But if you think its acting weird don't trust it.
>
> That's why I'm looking for a way to gather some hard data.
>
>          Lee

I have a pile of disks that I suspect.  Looking at the drawer, I see 8
of them.  As I have time I test them, usually with dd:

   dd if=/dev/sd1c of=/dev/null bs=64k

and try that a bunch.  Usually I shake loose a few errors after a
while.  And of course, listen to them.  Hearing many clicking
noises for recalibrates in a row is another sign of impending
death.

For laptop disks I've used IBM/Hitahi's drive fitness test, and
that usually works well, but earlier this year it gave a drive a
clean bill of health, and the disk died about 2 weeks later.

--STeve Andre'

Reply | Threaded
Open this post in threaded view
|

Re: HD 'Analysis'

Jose Quinteiro-5
In reply to this post by L. V. Lammert
I use this http://smartmontools.sourceforge.net/

Saludos,
Jose.

L. V. Lammert wrote:

> At 06:06 PM 5/4/2009 -0400, STeve Andre' wrote:
>
>> The best way is to get a new disk.  I'm serious.  Disks are cheap
>> enough, and
>> the value of whats on them is high enough that if you think its going,
>> get a
>> new one.  Even if this is a hobby system, I'd do that.
>
> And I'm serious too - how many hard drives to you throw away before you
> realize that might not be the problem?
>
>> There is disk testing software from the OEMs you can use.
>>
>> But if you think its acting weird don't trust it.
>
> That's why I'm looking for a way to gather some hard data.
>
>         Lee

Reply | Threaded
Open this post in threaded view
|

Re: HD 'Analysis'

Tony Aberenthy
In reply to this post by STeve Andre'
STeve Andre' wrote:

> On Monday 04 May 2009 17:56:43 L. V. Lammert wrote:
> > Been trying to build a replacement HD for a system, .. and it seems
> > impossible to verify whether a disk is bad or not (having
> wasted some hours
> > rsync'ing data only to have the HD lock up the system when
> doing the final
> > rsync).
> >
> > What is the best way to do a surface analysis on a disk?
> badsect seems like
> > a holdover from MB-sized disks, and it doesn't do any analysis.
> >
> > TIA,
> >
> > Lee
>
> The best way is to get a new disk.  I'm serious.  Disks are
> cheap enough, and
> the value of whats on them is high enough that if you think
> its going, get a
> new one.  Even if this is a hobby system, I'd do that.
>
> There is disk testing software from the OEMs you can use.
>
> But if you think its acting weird don't trust it.
>
> --STeve Andre'
>
There is, in the e2fsprogs package, something called badblocks.
I have used it (on Linux) to "rescue" bad disks.
(Windows laptops  -- kinda redundant?)

If you care about your data, follow Steve's advice.

The reality seems to be that this does exercise a disk's ability
to relocate bad sectors so that a bad disk suddenly goes good.
This is using a destructive surface test  (badblocks -sw ...)
Realistically, seems like the most reliable test is that disk is slower
than it should be.

Me, if I want to rely on a disk drive, I will run badblocks on it.
The long-winded destructive test
And I will time it, at least sporadically.
(New disks are not immune from having problems ;-)
The exercise maybe loses out to watching grass grow.

Reply | Threaded
Open this post in threaded view
|

Re: HD 'Analysis'

Steven Shockley
In reply to this post by L. V. Lammert
On 5/4/2009 5:56 PM, L. V. Lammert wrote:
> What is the best way to do a surface analysis on a disk? badsect seems
> like a holdover from MB-sized disks, and it doesn't do any analysis.

MHDD might do what you want:

http://hddguru.com/content/en/software/2005.10.02-MHDD/

I haven't used it, but Victoria (http://hdd-911.com/) might be useful if
you can read Russian.

Gibson's Spinrite is okay to check a drive but he tries to imply that
what he does is way more complicated than it really is.  That, and the
author is a weenie media whore.

I rarely see a bad drive lock up the system on modern machines without
timeout messages on the console, etc.  Your controller or cable may be
suspect if the drive passes all the tests you throw at it.

Reply | Threaded
Open this post in threaded view
|

Re: HD 'Analysis'

Hannah Schroeter
In reply to this post by STeve Andre'
Hi!

On Mon, May 04, 2009 at 06:34:07PM -0400, STeve Andre' wrote:
>[...]

>I have a pile of disks that I suspect.  Looking at the drawer, I see 8
>of them.  As I have time I test them, usually with dd:

>   dd if=/dev/sd1c of=/dev/null bs=64k
               ^r

Do yourself a favor and use the raw device.

>[...]

Kind regards,

Hannah.

Reply | Threaded
Open this post in threaded view
|

Re: HD 'Analysis'

L. V. Lammert
In reply to this post by Steven Shockley
At 10:32 PM 5/4/2009 -0400, Steve Shockley wrote:

>On 5/4/2009 5:56 PM, L. V. Lammert wrote:
>>What is the best way to do a surface analysis on a disk? badsect seems
>>like a holdover from MB-sized disks, and it doesn't do any analysis.
>
>MHDD might do what you want:
>
>http://hddguru.com/content/en/software/2005.10.02-MHDD/
>
>I haven't used it, but Victoria (http://hdd-911.com/) might be useful if
>you can read Russian.
>
>Gibson's Spinrite is okay to check a drive but he tries to imply that what
>he does is way more complicated than it really is.  That, and the author
>is a weenie media whore.
>
>I rarely see a bad drive lock up the system on modern machines without
>timeout messages on the console, etc.  Your controller or cable may be
>suspect if the drive passes all the tests you throw at it.

Some good options, .. seems like all are DOS, however <g>!! I guess that's
no big deal if you're rebooting for the analysis, but it does not seem 'right'!

         Lee

Reply | Threaded
Open this post in threaded view
|

Re: HD 'Analysis'

L. V. Lammert
In reply to this post by Tony Aberenthy
At 05:45 PM 5/4/2009 -0500, Tony Abernethy wrote:
>
>There is, in the e2fsprogs package, something called badblocks.
>I have used it (on Linux) to "rescue" bad disks.
>(Windows laptops  -- kinda redundant?)

Interesting, .. it DNB on 4.0, however, .. and I'm unsure as to any issues
between utilities designed for ext2 and ffs???

>If you care about your data, follow Steve's advice.

Right. How many disks should I throw away before trying to gather some
USEFUL data?

>Me, if I want to rely on a disk drive, I will run badblocks on it.

Sounds like the best idea - do you run it from a Linux CD, or ??

         Thanks!

         Lee

Reply | Threaded
Open this post in threaded view
|

Re: HD 'Analysis'

L. V. Lammert
In reply to this post by Jose Quinteiro-5
At 03:36 PM 5/4/2009 -0700, Jose Quinteiro wrote:
>I use this http://smartmontools.sourceforge.net/
>
>Saludos,
>Jose.

Thanks! I have used smart tools in the past, .. but how do you use them for
testing?

         Lee

Reply | Threaded
Open this post in threaded view
|

Re: HD 'Analysis'

Jose Quinteiro-5
First thing I do with a new hard drive is run a long self-test using
smartctl.  If it passes it gets added to the system.  I have smartd set
to do a daily short self-test and a weekly long self-test on every
drive.  Replace any drives that start to show errors.

Saludos,
Jose.

L. V. Lammert wrote:

> At 03:36 PM 5/4/2009 -0700, Jose Quinteiro wrote:
>> I use this http://smartmontools.sourceforge.net/
>>
>> Saludos,
>> Jose.
>
> Thanks! I have used smart tools in the past, .. but how do you use them
> for testing?
>
>         Lee

Reply | Threaded
Open this post in threaded view
|

Re: HD 'Analysis'

STeve Andre'
In reply to this post by L. V. Lammert
On Tuesday 05 May 2009 12:11:49 L. V. Lammert wrote:

> At 05:45 PM 5/4/2009 -0500, Tony Abernethy wrote:
> >There is, in the e2fsprogs package, something called badblocks.
> >I have used it (on Linux) to "rescue" bad disks.
> >(Windows laptops  -- kinda redundant?)
>
> Interesting, .. it DNB on 4.0, however, .. and I'm unsure as to any issues
> between utilities designed for ext2 and ffs???
>
> >If you care about your data, follow Steve's advice.
>
> Right. How many disks should I throw away before trying to gather some
> USEFUL data?

Perhaps I didn't word my thoughts well enough, and appeared snarky
to you?  That wasn't my intent.

Disks today are 1) VASTLY cheaper per meg of storage, 2) Faster, 3)
less power comsumptive and noisy.

But there is also 4) which is they aren't built as well.  The MTBF figures
are a mathmatical fantasy, and dangerously worthless.  I have many
older systems running "small" disks from 2G to about 20G that are
still fine since 1996.  In fact, looking at my log of disk disasters, I've
had three disks blow up when being used by my users, when they
were using those machines.  In contrast, the 60G+ disk era has given
me at least 12 problems in the last four to five years, and I'm not
counting friends systems that I've helped out on.  Probably more
like 18 disasters+ if I count those.

Because of this I've adopted a really careful attitude about disks
in general.  I'm not starting to treat them like airplane parts--replace
them before they fail.  This is especially true for laptop disks (I've
had four disks start to go on various OpenBSD thinkpads I've had).

When you have free time you can beat on a disk, and take weeks
pounding on it.  Look at iogen in the ports tree as another testing
method.  It is also the case that multiple make builds of userland
is a good test.  I'm hesitant to depend on the smart tools, because
I've had laptop disks that failed hours after a check said things
were fine, and I still have a 100G disk generates smart errors
but which is absolutely good.

Remember too that getting a disk replacement under warranty
almost always results in a "recertified" disk, and I'm nervous about
using them.  Given the cost I get new ones.

Hannah's comment that I should have used the raw device was
quite correct; that was a tyop so it should have said

   dd if=/dev/rsd1c of=/dev/null bs=64k

>
> >Me, if I want to rely on a disk drive, I will run badblocks on it.
>
> Sounds like the best idea - do you run it from a Linux CD, or ??
>
>          Thanks!
>
>          Lee

--STeve Andre'

Reply | Threaded
Open this post in threaded view
|

Re: HD 'Analysis'

Steven Shockley
In reply to this post by Jose Quinteiro-5
On 5/5/2009 12:50 PM, Josi Quinteiro wrote:
> First thing I do with a new hard drive is run a long self-test using
> smartctl. If it passes it gets added to the system. I have smartd set to
> do a daily short self-test and a weekly long self-test on every drive.
> Replace any drives that start to show errors.

The self-tests take the drive offline while they run, right?  Do you
unmount them first, or is the system okay just waiting until the drive
responds?

Reply | Threaded
Open this post in threaded view
|

Re: HD 'Analysis'

Steven Shockley
In reply to this post by L. V. Lammert
On 5/5/2009 11:49 AM, L. V. Lammert wrote:
> Some good options, .. seems like all are DOS, however <g>!! I guess
> that's no big deal if you're rebooting for the analysis, but it does not
> seem 'right'!

No, they have a Windows version of Victoria! <g>  Personally, I use
these kinds of utilities to see if a drive is worth saving, when I can
do destructive tests.  For example I "recovered" a 250gb disk from an
XServe RAID that i use as a second drive in my work desktop.  SMART
reports 300 reallocation events, but no matter what I do that doesn't
increase.  I use it for temporary storage for easy-to-replace data.

Reply | Threaded
Open this post in threaded view
|

Re: HD 'Analysis'

Martin Schröder
In reply to this post by Steven Shockley
2009/5/6, Steve Shockley <[hidden email]>:
>  The self-tests take the drive offline while they run, right?  Do you

No. man smartctl

Best
   Martin

Reply | Threaded
Open this post in threaded view
|

Re: HD 'Analysis'

ropers
In reply to this post by Tony Aberenthy
>> On Monday 04 May 2009 17:56:43 L. V. Lammert wrote:
>> > What is the best way to do a surface analysis on a disk?
>>

2009/5/5 Tony Abernethy <[hidden email]>:

> There is, in the e2fsprogs package, something called badblocks.
> I have used it (on Linux) to "rescue" bad disks.
> (Windows laptops  -- kinda redundant?)
>
> If you care about your data, follow Steve's advice.
>
> The reality seems to be that this does exercise a disk's ability
> to relocate bad sectors so that a bad disk suddenly goes good.
> This is using a destructive surface test  (badblocks -sw ...)
> Realistically, seems like the most reliable test is that disk is slower
> than it should be.
>
> Me, if I want to rely on a disk drive, I will run badblocks on it.
> The long-winded destructive test
> And I will time it, at least sporadically.
> (New disks are not immune from having problems ;-)
> The exercise maybe loses out to watching grass grow.

I also would recommend badblocks(8), but I would recommend
  badblocks -svn
instead of badblocks -sw.

badblocks -svn also (s)hows its progress as it goes along, but does a
(v)erbose (n)on-destructive read/write test (as opposed to either the
default read-only test or the destructive read/write test). You can
check an entire device with badblocks, or a partition, or a file. The
great thing about using badblocks to check a partition is that it's
filesystem-agnostic. It will dutifully check every bit of its target
partition regardless of what's actually on it. And if you give
badblocks -svn an entire storage device to test, it will not even care
about the actual partition scheme used. Because this read/write test
can trigger the disk's own built-in bad sector relocation, this means
you can even have a disk that you can't read the partition table from,
and running badblocks -svn over it may at least temporarily fix
things. And I've used badblocks -svn e.g. to check old Macintosh
floppies. Who cares that OpenBSD doesn't know much about the
filesystem on those? badblocks does the job anyway.

(Because of this agnosticism, it's actually questionable whether
badblocks(8) ought to be part of a filesystem-specific package, but
hey, that's what it comes in. Yea, one *could* also argue whether to
include it elsewhere by default because it's so useful, but I'm not
the one making those decisions and I guess the folks who do will do
what makes the most sense to them, so I don't feel like starting to be
a back seat driver... ;-)

Oh, and of course it would probably be prudent to do a backup before
read/write tests, even though badblocks is well-established and (with
-n) supposed to be non-destructive. Supposed to... ;-) I've never been
disappointed but YMMV.

regards,
--ropers

Reply | Threaded
Open this post in threaded view
|

Re: HD 'Analysis'

Steven Shockley
In reply to this post by Martin Schröder
On 5/6/2009 11:24 AM, Martin Schrvder wrote:
> 2009/5/6, Steve Shockley<[hidden email]>:
>>   The self-tests take the drive offline while they run, right?  Do you
>
> No. man smartctl

Huh.  That kind of contradicts the name "offline self test", but I guess
they call that "captive".

Reply | Threaded
Open this post in threaded view
|

Re: HD 'Analysis'

Marco Peereboom
In reply to this post by ropers
You people crack me up.  I have been trying to ignore this post for a
while but can't anymore.  Garbage like badblock are from the era that
you still could low level format a drive.  Remember those fun days?
When you were all excited about your 10MB hard disk?

Use dd to read it; if it is somewhat broken the drive will reallocate
it.  If it is badly broken the IO will fail and it is time to toss the
disk.  Those are about all the flavors you have available.  Running
vendor diags is basically a fancier dd.

On Thu, May 07, 2009 at 01:10:56AM +0200, ropers wrote:

> >> On Monday 04 May 2009 17:56:43 L. V. Lammert wrote:
> >> > What is the best way to do a surface analysis on a disk?
> >>
>
> 2009/5/5 Tony Abernethy <[hidden email]>:
> > There is, in the e2fsprogs package, something called badblocks.
> > I have used it (on Linux) to "rescue" bad disks.
> > (Windows laptops  -- kinda redundant?)
> >
> > If you care about your data, follow Steve's advice.
> >
> > The reality seems to be that this does exercise a disk's ability
> > to relocate bad sectors so that a bad disk suddenly goes good.
> > This is using a destructive surface test  (badblocks -sw ...)
> > Realistically, seems like the most reliable test is that disk is slower
> > than it should be.
> >
> > Me, if I want to rely on a disk drive, I will run badblocks on it.
> > The long-winded destructive test
> > And I will time it, at least sporadically.
> > (New disks are not immune from having problems ;-)
> > The exercise maybe loses out to watching grass grow.
>
> I also would recommend badblocks(8), but I would recommend
>   badblocks -svn
> instead of badblocks -sw.
>
> badblocks -svn also (s)hows its progress as it goes along, but does a
> (v)erbose (n)on-destructive read/write test (as opposed to either the
> default read-only test or the destructive read/write test). You can
> check an entire device with badblocks, or a partition, or a file. The
> great thing about using badblocks to check a partition is that it's
> filesystem-agnostic. It will dutifully check every bit of its target
> partition regardless of what's actually on it. And if you give
> badblocks -svn an entire storage device to test, it will not even care
> about the actual partition scheme used. Because this read/write test
> can trigger the disk's own built-in bad sector relocation, this means
> you can even have a disk that you can't read the partition table from,
> and running badblocks -svn over it may at least temporarily fix
> things. And I've used badblocks -svn e.g. to check old Macintosh
> floppies. Who cares that OpenBSD doesn't know much about the
> filesystem on those? badblocks does the job anyway.
>
> (Because of this agnosticism, it's actually questionable whether
> badblocks(8) ought to be part of a filesystem-specific package, but
> hey, that's what it comes in. Yea, one *could* also argue whether to
> include it elsewhere by default because it's so useful, but I'm not
> the one making those decisions and I guess the folks who do will do
> what makes the most sense to them, so I don't feel like starting to be
> a back seat driver... ;-)
>
> Oh, and of course it would probably be prudent to do a backup before
> read/write tests, even though badblocks is well-established and (with
> -n) supposed to be non-destructive. Supposed to... ;-) I've never been
> disappointed but YMMV.
>
> regards,
> --ropers

Reply | Threaded
Open this post in threaded view
|

Re: HD 'Analysis'

ropers
>> >> On Monday 04 May 2009 17:56:43 L. V. Lammert wrote:
>> >> > What is the best way to do a surface analysis on a disk?

>> 2009/5/5 Tony Abernethy <[hidden email]>:
>> > There is, in the e2fsprogs package, something called badblocks.

> On Thu, May 07, 2009 at 01:10:56AM +0200, ropers wrote:
>> I also would recommend badblocks(8), but I would recommend
>>   badblocks -svn
>> instead of badblocks -sw.
>>
>> badblocks -svn also (s)hows its progress as it goes along, but does a
>> (v)erbose (n)on-destructive read/write test (as opposed to either the
>> default read-only test or the destructive read/write test). You can
>> check an entire device with badblocks, or a partition, or a file. The
>> great thing about using badblocks to check a partition is that it's
>> filesystem-agnostic. It will dutifully check every bit of its target
>> partition regardless of what's actually on it. And if you give
>> badblocks -svn an entire storage device to test, it will not even care
>> about the actual partition scheme used. Because this read/write test
>> can trigger the disk's own built-in bad sector relocation, this means
>> you can even have a disk that you can't read the partition table from,
>> and running badblocks -svn over it may at least temporarily fix
>> things. And I've used badblocks -svn e.g. to check old Macintosh
>> floppies. Who cares that OpenBSD doesn't know much about the
>> filesystem on those? badblocks does the job anyway.

>> Oh, and of course it would probably be prudent to do a backup before
>> read/write tests, even though badblocks is well-established and (with
>> -n) supposed to be non-destructive. Supposed to... ;-) I've never been
>> disappointed but YMMV.

2009/5/7 Marco Peereboom <[hidden email]>:
> You people crack me up.  I have been trying to ignore this post for a
> while but can't anymore.  Garbage like badblock are from the era that
> you still could low level format a drive.  Remember those fun days?
> When you were all excited about your 10MB hard disk?
>
> Use dd to read it; if it is somewhat broken the drive will reallocate
> it.  If it is badly broken the IO will fail and it is time to toss the
> disk.  Those are about all the flavors you have available.  Running
> vendor diags is basically a fancier dd.

Why do you consider badblocks garbage?

I remember now that we talked about this before over a year ago, when
I first asked about using badblocks on OpenBSD. Back then I eventually
surmised that using dd to do the same thing as badblocks -svn would be
possible but a lot more cumbersome, cf.:
http://kerneltrap.org/mailarchive/openbsd-misc/2008/4/19/1499524

Am I/was I mistaken, and if so, where?

Thanks and regards,
--ropers

12