SRM Memory Test

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

SRM Memory Test

J.C. Roberts-2
I haven't quite made it to the "Attempt Installing OpenBSD" portion yet
because I noticed something odd on an Alpha Personal Workstation 433
that I got off of eBay. The ARC/AlphaBIOS would occasionally report
256MB rather than the usual 384MB. This weirdness was intermittent.

I kicked the system into SRM Console mode and I've been trying to run
memtest to no avail. Running even the most simple tests seems to
basically lock up the system since the command fails to ever exit even
if you let it run for a couple hours to try completing two passes.

    >>> memtest -rb -p 2

If you background the memtest process and run show_status, it seems to
pass at least once?

>>> memtest -rb -p 2 &
>>> show_status
ID        Program   Device   Pass   Hard/Soft   Written   Read
-------- --------- -------- ------ ----------- --------- ------
00000001      idle system       0      0    0         0      0
0000004F   memtest memory       1      0    0         0      0


Using >>>kill_diags afterwards only locks up the system.

I've tried reading up on the SRM Console in the user guide (link below)
http://ftp.digital.com/pub/Digital/info/semiconductor/literature/srmcons.pdf
and I've searched around for more detailed instructions on the web. I
found a cryptic post to the DebianAlpha list about using
>>>dynamic -r
to figure out values to use with memtest switches but I doubt the guys'
goal was to test all available memory.

Though I've reduced the system memory to 128MB (two DIMS) so I can test
the pairs, I'm still not sure if there's some problem with the memory or
if *I* am the problem.

When you guys use memtest, how do you do it?

Thanks,
JCR

Reply | Threaded
Open this post in threaded view
|

Re: SRM Memory Test

J.C. Roberts-2
On Tue, 13 Dec 2005 16:00:02 -0800, J.C. Roberts <[hidden email]>
wrote:

>I noticed something odd on an Alpha Personal Workstation 433 that I got
>off of eBay. The ARC/AlphaBIOS would occasionally report 256MB rather
>than the usual 384MB. This weirdness was intermittent. I have reseated
>everything in the system to make sure there are no connection/connector
>issues but I think it would be prudent to actually test the memory
>itself.
>
>I kicked the system into SRM Console mode and I've been trying to run
>memtest to no avail. I believe *I* am the real problem since I don't
>know what the heck I'm doing in SRM in spite of the fact that I've read
>the SRM Console user guide.
>http://ftp.digital.com/pub/Digital/info/semiconductor/literature/srmcons.pdf
>
>The SRM version is v7.2-1  Mar 6, 2000
>
>Running even the most simple tests seems to basically lock up the system
>since the command fails to ever exit even if you let it run for a couple
>hours to try completing two passes.
>
>  >>> memtest -rb -p 2
>
>If you background the memtest process and run show_status, it seems to
>pass at least once?
>
>  >>> memtest -rb -p 2 &
>  >>> show_status
>  ID        Program   Device   Pass   Hard/Soft   Written   Read
>  -------- --------- -------- ------ ----------- --------- ------
>  00000001      idle system       0      0    0         0      0
>  0000004F   memtest memory       1      0    0         0      0
>
>
>Using >>>kill_diags afterwards only locks up the system.
>
>I've searched around for more detailed instructions on the web. I found
>a cryptic post to the DebianAlpha list
>http://lists.debian.org/debian-alpha/2004/11/msg00064.html
>
>It mentions using
>
>>>>dynamic -r
>
>to figure out values to use with memtest switches but I still don't
>understand what was meant. The whole "zone" thing is a mystery. Worse
>yet, the SRM Console user guide doesn't even mention "dynamic" as a
>command and the man/help pages in the SRM itself are useless.
>
>I've reduced the system memory to 128MB (two DIMS) so I can test the
>pairs and by accident I figured out which pair is bad (i.e. running
>"dynamic -h" by mistake resulted in errors with one pair).
>
>When you guys use memtest properly, how do you do it?
>
>Thanks,
>JCR

My apologies for replying to myself, but I've had a few people ask me
off list to make the answer public if I ever manage to figure it out.
I've been working on this for a week, reading docs, searching the web
and asking around on OpenVMS, FreeBSD, OpenBSD, NetBSD and linux lists
and groups.

With the help of Graham Burley on comp.os.vms an answer for the problem
with the SRM MEMTEST and MEMORY commands failing to run has been found.
The WRITTEN and READ portions of the SHOW_STATUS output (above) were
telling us that the tests were not actually running.

This system probably came out of a "secure" site (i.e. government), so
it was sold to me without a hard drive. Though I had installed a new
disk, there was no OS or bootable partition on it (an old 4.5GB data
drive with an NTFS partition -this becomes relevant later), and
obviously, there was nothing for the SRM to boot to in the system.

When booting to SRM I got the expected error messages

  CPU 0 booting

  (boot dka0.0.0.1009.0 -flags A)
  block 0 of dka0.0.0.1009.0 is not a valid boot block
  bootstrap failure

  Retrying, type ^C to abort...

Basically, it's an endless loop of trying to boot to the disk, so
I had always just been following instructions and using ^C to get into
the SRM console to run the memory tests. This ^C is the main cause of
the memory testing problems I mentioned above because by aborting, the
system/SRM is _not_ initialized.

If you're having problems with either MEMORY or MEMTEST do a ps (or
CTRL-T) and look at the status of the MEMTEST lines. If you see them
stuck with "WAITING ON" you know your system/SRM was not completely
initialized.

If you run INIT at this point, you just end up with the same bootstrap
failures and ^C issue as before, so you need to change how the system
boots before running INIT.

  >>>set auto_action halt
  >>>init

This gets you to a nice, clean SRM console that's been fully
initialized. At this point MEMORY and MEMTEST commands should work
properly. You can tell they are working by the WRITTEN and READ portions
of the SHOW_STATUS output. If the -p switch has a value of zero, memory
tests will run until you tell them to stop with the KILL_DIAGS command.

By the way, if you want to see what the "normal" switches are for
running MEMTEST you can look at the MEMORY script.

  >>>cat memory


So the system passed it's memory tests and all was well until I rebooted
the system. This put me into AlphaBIOS/ARC for some strange reason. I
didn't think it was a big deal so I did the usual to switch back to SRM:

  F2         (Setup)
  CMOS Setup
  F6         (Advanced)
  Console Selection: "UNIX Console SRM" (or "OPENVMS Console SRM")
  F10        (save)
  F10        (save)
  ESC        (exit)
 
  power cycle
 
For some strange reason I ended up in AlphaBIOS/ARC again? This was
weird so I did the steps again, cold booted again, and sure enough, it
_still_ came up in AlphaBIOS/ARC mode?

The reason why the darn thing refused to go into SRM mode is because of
that old NTFS partition on the disk. Once I deleted that partition
through the AlphaBIOS, I could finally reset the "Console Selection" to
SRM and have it work.

Hopefully this information will help the next person trying to figure
out why their memtest isn't working as expected.

Kind Regards,
JCR







--
|   Patches to developers are like lights to moths;
|   "Ooohhh PATCHES! Look at the pretty patches..."
|   You can expect them to just circle for a while and even if
|   they never commit, you'll definitely have their attention.