Hardware fault on M3000

Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Hardware fault on M3000

michael-2
Hello,

I have a Fujitsu SPARC Enterprise M3000 that appears to have been
rendered inoperable by the OpenBSD 6.2 installer. After partitioning
and during the installation of sets, the machine abruptly shut down and
faulted its motherboard. From my limited research, it appears the only
way to clear this fault is to have Oracle dispatch a field technician,
which is out of my reach as a hobbyist. Even a full factory reset of
the SCF did not successfully clear the fault, and the machine now
refuses to power on.

Somone named Naruaki Etomi posted a message to this list on March 28,
2015, complaining of the same issue with OpenBSD 5.6. The description
of the issue in that post is virtually identical to what happened to
my machine, down to the same fault code (SCF-8003-HA), and the same
timing (during the expansion of the sets). If I hadn't found that post,
I would have dismissed this as an unlucky hardware issue.

I'm curious whether anyone is successfully using OpenBSD on an M3000.
If not, perhaps some kind of warning could be added to the SPARC64
port's website to discourage further testing?

Below is some output from the XSCF showing the fault that occurred.
It's very similar to the Etomi post from 2015.

XSCF> fmdump -v -u 97a2854b-5603-4efa-b609-127469fc445d
TIME UUID MSG-ID
Jan 10 17:28:26.7967 97a2854b-5603-4efa-b609-127469fc445d SCF-8003-HA
100% fault.chassis.SPARC-Enterprise.asic.mbc.fe

Problem in: hc:///chassis=0/cmu=0/mbc=0
Affects: hc:///chassis=0/cmu=0/xsb=0
FRU: hc://:product-id=SPARC Enterprise M3000:chassis-id=PX61011015:
server-id=brad:serial=PP101000WS:part=CA07082-D051 D1 \541-4281-04:
revision=0301/component=/MBU_A
Location: /MBU_A

XSCF> showstatus
* MBU_A Status:Faulted;
* CPU Status:Deconfigured;
* MEM#0A Status:Deconfigured;
* MEM#0B Status:Deconfigured;
* MEM#1A Status:Deconfigured;
* MEM#1B Status:Deconfigured;
* MEM#2A Status:Deconfigured;
* MEM#2B Status:Deconfigured;
* MEM#3A Status:Deconfigured;
* MEM#3B Status:Deconfigured;

XSCF> poweron -y -d 0
DomainIDs to power on:00
Continue? [y|n] :y
00 :Not powering on :Poweron canceled due to missing component.

I'd be glad to collect any other information that might be useful
to the community.

Thanks
Michael Proctor
[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Hardware fault on M3000

tinkr
> Hello,
>
> I have a Fujitsu SPARC Enterprise M3000 that appears to have been
> rendered inoperable by the OpenBSD 6.2 installer. After partitioning
> and during the installation of sets, the machine abruptly shut down and
> faulted its motherboard. From my limited research, it appears the only
> way to clear this fault is to have Oracle dispatch a field technician,
> which is out of my reach as a hobbyist. Even a full factory reset of
> the SCF did not successfully clear the fault, and the machine now
> refuses to power on.

Offlist 2016-10-23 Theo said "We have had lots of them [M3000].  And it does not work
because of something we didn't figure out."

As you see on https://www.openbsd.org/sparc64.html M3000 is not listed as supported.

Sorry to hear you had this issue.
Reply | Threaded
Open this post in threaded view
|

Re: Hardware fault on M3000

Jeff Veiss
In reply to this post by michael-2
Hi,

I'd be surprised if OpenBSD caused that. I'm pretty sure it doesn't touch the firmware particularly the system controller (XSCF). It's more likely that your system board has simply failed.  If this was an M4000 or M5000, I would suggest reseating the board. In this case, you can try doing a full power reset (pull the power cords for at least a minute) or possibly doing a full reset of the XSCF.

Add a last resort, if you don't pay Oracle for support, you might be able to find a replacement cheaply on eBay. I know sparc hardware has been going cheap there recently.
--
-Jeff

On January 13, 2018 10:03:11 AM MST, [hidden email] wrote:

>Hello,
>
>I have a Fujitsu SPARC Enterprise M3000 that appears to have been
>rendered inoperable by the OpenBSD 6.2 installer. After partitioning
>and during the installation of sets, the machine abruptly shut down and
>faulted its motherboard. From my limited research, it appears the only
>way to clear this fault is to have Oracle dispatch a field technician,
>which is out of my reach as a hobbyist. Even a full factory reset of
>the SCF did not successfully clear the fault, and the machine now
>refuses to power on.
>
>Somone named Naruaki Etomi posted a message to this list on March 28,
>2015, complaining of the same issue with OpenBSD 5.6. The description
>of the issue in that post is virtually identical to what happened to
>my machine, down to the same fault code (SCF-8003-HA), and the same
>timing (during the expansion of the sets). If I hadn't found that post,
>I would have dismissed this as an unlucky hardware issue.
>
>I'm curious whether anyone is successfully using OpenBSD on an M3000.
>If not, perhaps some kind of warning could be added to the SPARC64
>port's website to discourage further testing?
>
>Below is some output from the XSCF showing the fault that occurred.
>It's very similar to the Etomi post from 2015.
>
>XSCF> fmdump -v -u 97a2854b-5603-4efa-b609-127469fc445d
>TIME UUID MSG-ID
>Jan 10 17:28:26.7967 97a2854b-5603-4efa-b609-127469fc445d SCF-8003-HA
>100% fault.chassis.SPARC-Enterprise.asic.mbc.fe
>
>Problem in: hc:///chassis=0/cmu=0/mbc=0
>Affects: hc:///chassis=0/cmu=0/xsb=0
>FRU: hc://:product-id=SPARC Enterprise M3000:chassis-id=PX61011015:
>server-id=brad:serial=PP101000WS:part=CA07082-D051 D1 \541-4281-04:
>revision=0301/component=/MBU_A
>Location: /MBU_A
>
>XSCF> showstatus
>* MBU_A Status:Faulted;
>* CPU Status:Deconfigured;
>* MEM#0A Status:Deconfigured;
>* MEM#0B Status:Deconfigured;
>* MEM#1A Status:Deconfigured;
>* MEM#1B Status:Deconfigured;
>* MEM#2A Status:Deconfigured;
>* MEM#2B Status:Deconfigured;
>* MEM#3A Status:Deconfigured;
>* MEM#3B Status:Deconfigured;
>
>XSCF> poweron -y -d 0
>DomainIDs to power on:00
>Continue? [y|n] :y
>00 :Not powering on :Poweron canceled due to missing component.
>
>I'd be glad to collect any other information that might be useful
>to the community.
>
>Thanks
>Michael Proctor
>[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Hardware fault on M3000

michael-2
In reply to this post by tinkr
> Offlist 2016-10-23 Theo said "We have had lots of them [M3000].  And it
> does not work
> because of something we didn't figure out."
>
> As you see on https://www.openbsd.org/sparc64.html M3000 is not listed as
> supported.
>
> Sorry to hear you had this issue.

Thanks, it's good to know this isn't a new problem. If this is a known
failure mode, maybe the M3000 could be moved from "untested" to
"unsupported" on sparc64.html, to prevent anyone else from making
the same mistake.



Reply | Threaded
Open this post in threaded view
|

Re: Hardware fault on M3000

Tijs Michels
I bought the same machine, hoping I could put OpenBSD on it.  Since I have
OpenBSD running on my PrimePower 250, I did not expect serious problems.
Luckily, I read just in time, in two separate threads, that any attempt to
install OpenBSD on it will actually brick the M3000.  So now it sits in a
corner, unused, and the PrimePower has to soldier on.  Such a shame.

On Sat, Jan 13, 2018 at 7:47 PM, <[hidden email]> wrote:

> > Offlist 2016-10-23 Theo said "We have had lots of them [M3000].  And it
> > does not work
> > because of something we didn't figure out."
> >
> > As you see on https://www.openbsd.org/sparc64.html M3000 is not listed
> as
> > supported.
> >
> > Sorry to hear you had this issue.
>
> Thanks, it's good to know this isn't a new problem. If this is a known
> failure mode, maybe the M3000 could be moved from "untested" to
> "unsupported" on sparc64.html, to prevent anyone else from making
> the same mistake.
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Hardware fault on M3000

amcwhirter
Is this specific to the M3000? Because I run 6.1 on a M4000 without issue
which is nearly identical architecture wise.
IIRC if you do a full xscf reset and get to the initial user setup dialog
you can add field engineer status to any account.
Reply | Threaded
Open this post in threaded view
|

Re: Hardware fault on M3000

michael-2
> Is this specific to the M3000? Because I run 6.1 on a M4000 without issue
> which is nearly identical architecture wise.
> IIRC if you do a full xscf reset and get to the initial user setup dialog
> you can add field engineer status to any account.

I was able to assign the fieldeng and mode privileges to my user
after performing a full reset of the XSCF, but trying to use
enableservice requires a service password from Oracle support.

On the M3000, addboard and deleteboard are not supported, and
replacefru only allows me to replace fans and PSUs -- I guess
because there is only one system board and it's not considered
field-replaceable.




Reply | Threaded
Open this post in threaded view
|

Re: Hardware fault on M3000

Tijs Michels
In reply to this post by Jeff Veiss
It is not just this one system board.  Last year I found multiple reports
of people who had bricked their M3000 when they tried to install OpenBSD.
Here are three:

https://www.mail-archive.com/sparc@.../msg00523.html
https://anindito.my.id/openbsd-bricks-sun-enterprise-m3000/
http://www.abclinuxu.cz/poradna/unix/show/383354

That second link seems dead now.  Still, it's a known issue.

On Sat, Jan 13, 2018 at 7:31 PM, Jeff Veiss <[hidden email]> wrote:

> Hi,
>
> I'd be surprised if OpenBSD caused that. I'm pretty sure it doesn't touch
> the firmware particularly the system controller (XSCF). It's more likely
> that your system board has simply failed.  If this was an M4000 or M5000, I
> would suggest reseating the board. In this case, you can try doing a full
> power reset (pull the power cords for at least a minute) or possibly doing
> a full reset of the XSCF.
>
> Add a last resort, if you don't pay Oracle for support, you might be able
> to find a replacement cheaply on eBay. I know sparc hardware has been going
> cheap there recently.
> --
> -Jeff
>
> On January 13, 2018 10:03:11 AM MST, [hidden email] wrote:
> >Hello,
> >
> >I have a Fujitsu SPARC Enterprise M3000 that appears to have been
> >rendered inoperable by the OpenBSD 6.2 installer. After partitioning
> >and during the installation of sets, the machine abruptly shut down and
> >faulted its motherboard. From my limited research, it appears the only
> >way to clear this fault is to have Oracle dispatch a field technician,
> >which is out of my reach as a hobbyist. Even a full factory reset of
> >the SCF did not successfully clear the fault, and the machine now
> >refuses to power on.
> >
> >Somone named Naruaki Etomi posted a message to this list on March 28,
> >2015, complaining of the same issue with OpenBSD 5.6. The description
> >of the issue in that post is virtually identical to what happened to
> >my machine, down to the same fault code (SCF-8003-HA), and the same
> >timing (during the expansion of the sets). If I hadn't found that post,
> >I would have dismissed this as an unlucky hardware issue.
> >
> >I'm curious whether anyone is successfully using OpenBSD on an M3000.
> >If not, perhaps some kind of warning could be added to the SPARC64
> >port's website to discourage further testing?
> >
> >Below is some output from the XSCF showing the fault that occurred.
> >It's very similar to the Etomi post from 2015.
> >
> >XSCF> fmdump -v -u 97a2854b-5603-4efa-b609-127469fc445d
> >TIME UUID MSG-ID
> >Jan 10 17:28:26.7967 97a2854b-5603-4efa-b609-127469fc445d SCF-8003-HA
> >100% fault.chassis.SPARC-Enterprise.asic.mbc.fe
> >
> >Problem in: hc:///chassis=0/cmu=0/mbc=0
> >Affects: hc:///chassis=0/cmu=0/xsb=0
> >FRU: hc://:product-id=SPARC Enterprise M3000:chassis-id=PX61011015:
> >server-id=brad:serial=PP101000WS:part=CA07082-D051 D1 \541-4281-04:
> >revision=0301/component=/MBU_A
> >Location: /MBU_A
> >
> >XSCF> showstatus
> >* MBU_A Status:Faulted;
> >* CPU Status:Deconfigured;
> >* MEM#0A Status:Deconfigured;
> >* MEM#0B Status:Deconfigured;
> >* MEM#1A Status:Deconfigured;
> >* MEM#1B Status:Deconfigured;
> >* MEM#2A Status:Deconfigured;
> >* MEM#2B Status:Deconfigured;
> >* MEM#3A Status:Deconfigured;
> >* MEM#3B Status:Deconfigured;
> >
> >XSCF> poweron -y -d 0
> >DomainIDs to power on:00
> >Continue? [y|n] :y
> >00 :Not powering on :Poweron canceled due to missing component.
> >
> >I'd be glad to collect any other information that might be useful
> >to the community.
> >
> >Thanks
> >Michael Proctor
> >[hidden email]
>
Reply | Threaded
Open this post in threaded view
|

Re: Hardware fault on M3000

meghan@r53sound.com
In reply to this post by michael-2

 
I bricked an M3000 installing 6.1, last year.
 
 
 
Error on the console when the orange fault light went on:
 
 
"May  3 01:49:10 localhost fmd: SOURCE: sde, REV: 1.16, CSN: PX61212030
  EVENT-ID: d6245ec0-3f56-4ab1-8de1-adc440ce3a81 Refer to [ http://www ]( http://www ).
sun.com/msg/SCF-8002-12 for detailed information."
 
 
Tried to find a way to clear the fault for days, before I got
so angry, I pulled my drives and dumped it at a recycler.

 
Meghan
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Hardware fault on M3000

meghan@r53sound.com
In reply to this post by tinkr


>Offlist 2016-10-23 Theo said "We have had lots of them [M3000]. And it does not work
>because of something we didn't figure out."

That would have been great to know before I blew US$400 in May of 2017.
 
"As you see on https://www.openbsd.org/sparc64.html M3000 is not listed as supported."
It's listed under systems that are untested, but:
!!!! ". . . most of these machines will almost certainly just work." !!!!
 
This machine is still costly on eBay. Please update the the page before
anyone else finds out the hard way what was already known to devs.
 
Meghan