segmentation fault during package build

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

segmentation fault during package build

Riccardo Mottola
Hi,

I am running OpenBSD 5.6 on Sparc [1]

Since I did not find several packages available, I got ports (5.6 tar.gz
version), unpacked it and started building.


While I attempt to install libxml I get, while installing bzip2 dependency:

install -c -o root -g bin -m 555 bzgrep bzmore bzdiff
/usr/ports/pobj/bzip2-1.0.6/fake-sparc/usr/local/bin
install -c -o root -g bin -m 444 bzip2.1 bzgrep.1 bzmore.1 bzdiff.1
/usr/ports/pobj/bzip2-1.0.6/fake-sparc/usr/local/man/man1
Segmentation fault (core dumped)
*** Error 139 in /usr/ports/pobj/bzip2-1.0.6/bzip2-1.0.6 (Makefile:105
'install': @cd /usr/ports/pobj/bzip2-1.0.6/fake-sparc/usr/local/man/m...)
*** Error 1 in /usr/ports/archivers/bzip2
(/usr/ports/infrastructure/mk/bsd.port.mk:2807
'/usr/ports/pobj/bzip2-1.0.6/fake-sparc/.fake_done')


If I just type "make install" again, it happens again, thus I would
exclude a memory issue which makes thins more random, but it repeats in
the same place. Perhaps a bad generated binary or a function call
causing problems?

I wanted to look for the core file, but can't find it. Where could it be?

Cheers,
Riccardo

[1] OpenBSD 5.6 (GENERIC) #94: Wed Aug 13 13:54:32 GMT 2014
[hidden email]:/usr/src/sys/arch/sparc/compile/GENERIC

Reply | Threaded
Open this post in threaded view
|

Re: segmentation fault during package build

Tobias Ulmer
On Wed, Dec 03, 2014 at 09:38:17AM +0100, Riccardo Mottola wrote:

> Hi,
>
> I am running OpenBSD 5.6 on Sparc [1]
>
> Since I did not find several packages available, I got ports (5.6 tar.gz
> version), unpacked it and started building.
>
>
> While I attempt to install libxml I get, while installing bzip2 dependency:
>
> install -c -o root -g bin -m 555 bzgrep bzmore bzdiff
> /usr/ports/pobj/bzip2-1.0.6/fake-sparc/usr/local/bin
> install -c -o root -g bin -m 444 bzip2.1 bzgrep.1 bzmore.1 bzdiff.1
> /usr/ports/pobj/bzip2-1.0.6/fake-sparc/usr/local/man/man1
> Segmentation fault (core dumped)
> *** Error 139 in /usr/ports/pobj/bzip2-1.0.6/bzip2-1.0.6 (Makefile:105
> 'install': @cd /usr/ports/pobj/bzip2-1.0.6/fake-sparc/usr/local/man/m...)
> *** Error 1 in /usr/ports/archivers/bzip2
> (/usr/ports/infrastructure/mk/bsd.port.mk:2807
> '/usr/ports/pobj/bzip2-1.0.6/fake-sparc/.fake_done')
>
>
> If I just type "make install" again, it happens again, thus I would exclude
> a memory issue which makes thins more random, but it repeats in the same
> place. Perhaps a bad generated binary or a function call causing problems?
>
> I wanted to look for the core file, but can't find it. Where could it be?
>
> Cheers,
> Riccardo
>
> [1] OpenBSD 5.6 (GENERIC) #94: Wed Aug 13 13:54:32 GMT 2014
> [hidden email]:/usr/src/sys/arch/sparc/compile/GENERIC
>

full dmesg please

Reply | Threaded
Open this post in threaded view
|

Re: segmentation fault during package build

Christian Weisgerber
In reply to this post by Riccardo Mottola
On 2014-12-03, Riccardo Mottola <[hidden email]> wrote:

> install -c -o root -g bin -m 555 bzgrep bzmore bzdiff
> /usr/ports/pobj/bzip2-1.0.6/fake-sparc/usr/local/bin
> install -c -o root -g bin -m 444 bzip2.1 bzgrep.1 bzmore.1 bzdiff.1
> /usr/ports/pobj/bzip2-1.0.6/fake-sparc/usr/local/man/man1
> Segmentation fault (core dumped)
> *** Error 139 in /usr/ports/pobj/bzip2-1.0.6/bzip2-1.0.6 (Makefile:105
> 'install': @cd /usr/ports/pobj/bzip2-1.0.6/fake-sparc/usr/local/man/m...)
>
> I wanted to look for the core file, but can't find it. Where could it be?

Somewhere under the work directory.

$ find /usr/ports/pobj/bzip2-1.0.6 -name \*.core

--
Christian "naddy" Weisgerber                          [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: segmentation fault during package build

Riccardo Mottola
In reply to this post by Tobias Ulmer
Tobias Ulmer wrote:
> full dmesg please
Here it is:
OpenBSD 5.6 (GENERIC) #94: Wed Aug 13 13:54:32 GMT 2014
[hidden email]:/usr/src/sys/arch/sparc/compile/GENERIC
real mem = 166998016 (159MB)
avail mem = 159440896 (152MB)
mainbus0 at root: SUNW,SPARCstation-20
cpu0 at mainbus0: TMS390Z50 v0 or TMS390Z55 @ 50 MHz, on-chip FPU
cpu0: physical 20K instruction (64 b/l), 16K data (32 b/l) cache enabled
obio0 at mainbus0
clock0 at obio0 addr 0xf1200000: mk48t08 (eeprom)
timer0 at obio0 addr 0xf1300000: delay constant 23, frequency 1000000 Hz
zs0 at obio0 addr 0xf1100000 pri 12, softpri 6
zstty0 at zs0 channel 0: console
zstty1 at zs0 channel 1
zs1 at obio0 addr 0xf1000000 pri 12, softpri 6
zskbd0 at zs1 channel 0: no keyboard
zsms0 at zs1 channel 1
wsmouse0 at zsms0 mux 0
fdc0 at obio0 addr 0xf1700000 pri 11, softpri 4: chip 82077
fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec
auxreg0 at obio0 addr 0xf1800000
power0 at obio0 addr 0xf1a01000
cgfourteen0 at obio0 addr 0x9c000000 pri 8: 4MB, rev 3.0, 1152x900
wsdisplay0 at cgfourteen0 mux 1
wsdisplay0: screen 0 added (std, sun emulation)
iommu0 at mainbus0 ioaddr 0xe0000000: version 0x3/0x1, page-size 4096,
range 64MB
sbus0 at iommu0: 25 MHz
dma0 at sbus0 slot 15 offset 0x400000: rev 2
esp0 at dma0 offset 0x800000 pri 4: ESP200, 40MHz
scsibus0 at esp0: 8 targets, initiator 7
sd0 at scsibus0 targ 1 lun 0: <WDIGTL, ENTERPRISE, 1.91> SCSI2 0/direct
fixed serial.WDIGTL_ENTERPRISE_WS7010415066_
sd0: 4157MB, 512 bytes/sector, 8515173 sectors
sd1 at scsibus0 targ 3 lun 0: <WDIGTL, ENTERPRISE, 1.91> SCSI2 0/direct
fixed serial.WDIGTL_ENTERPRISE_WS7010412117_
sd1: 4157MB, 512 bytes/sector, 8515173 sectors
cd0 at scsibus0 targ 6 lun 0: <TOSHIBA, XM-4101TASUNSLCD, 3424> SCSI2
5/cdrom removable
ledma0 at sbus0 slot 15 offset 0x400010: rev 2
le0 at ledma0 offset 0xc00000 pri 6: address 08:00:20:22:39:f0
le0: 16 receive buffers, 4 transmit buffers
bpp0 at sbus0 slot 15 offset 0x4800000: DMA2
"SUNW,DBRIe" at sbus0 slot 14 offset 0x10000 not configured
vscsi0 at root
scsibus1 at vscsi0: 256 targets
softraid0 at root
scsibus2 at softraid0: 256 targets
bootpath:
/iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@1,0
root on sd0a (51fed56a334302f4.a) swap on sd0b dump on sd0b

Reply | Threaded
Open this post in threaded view
|

Re: segmentation fault during package build

Riccardo Mottola
In reply to this post by Christian Weisgerber
Hi,

Christian Weisgerber wrote:
> Somewhere under the work directory.
>
> $ find /usr/ports/pobj/bzip2-1.0.6 -name \*.core
$ find . -name \*.core
./fake-sparc/usr/local/man/man1/ln.core

ln segfaulting? sounds bad!

I tried to get a trace, but:

(gdb) bt
#0  0x0001b024 in ?? ()
#1  0x0001afec in ?? ()
Previous frame identical to this frame (corrupt stack?)



Riccardo

Reply | Threaded
Open this post in threaded view
|

Re: segmentation fault during package build

Philip Guenther-2
On Thu, Dec 4, 2014 at 4:22 PM, Riccardo Mottola
<[hidden email]> wrote:

> Christian Weisgerber wrote:
>>
>> Somewhere under the work directory.
>>
>> $ find /usr/ports/pobj/bzip2-1.0.6 -name \*.core
>
> $ find . -name \*.core
> ./fake-sparc/usr/local/man/man1/ln.core
>
> ln segfaulting? sounds bad!
>
> I tried to get a trace, but:
>
> (gdb) bt
> #0  0x0001b024 in ?? ()
> #1  0x0001afec in ?? ()
> Previous frame identical to this frame (corrupt stack?)

Build an ln binary with debugging:
  cd /usr/src/bin/ln
  make clean
  make obj
  make depend
  make PIPE='-ggdb'
  sudo make install NOMAN=1

then reproduce the problem in the bzip2 port to get a fresh core file
with that binary, then finally run gdb against the _uninstalled_
binary (/usr/src/bin/ln/obj/ln) but with the new core file and see
what the backtrace shows.


Philip Guenther

Reply | Threaded
Open this post in threaded view
|

Re: segmentation fault during package build

Tobias Ulmer
In reply to this post by Riccardo Mottola
On Fri, Dec 05, 2014 at 01:18:18AM +0100, Riccardo Mottola wrote:

> Tobias Ulmer wrote:
> >full dmesg please
> Here it is:
> OpenBSD 5.6 (GENERIC) #94: Wed Aug 13 13:54:32 GMT 2014
> [hidden email]:/usr/src/sys/arch/sparc/compile/GENERIC
> real mem = 166998016 (159MB)
> avail mem = 159440896 (152MB)
> mainbus0 at root: SUNW,SPARCstation-20
> cpu0 at mainbus0: TMS390Z50 v0 or TMS390Z55 @ 50 MHz, on-chip FPU
> cpu0: physical 20K instruction (64 b/l), 16K data (32 b/l) cache enabled

This CPU module (Voyager iirc) has issues. It's stable most of the time,
but small programs (like chmod, cat, touch, ...) crash in random places.

Reply | Threaded
Open this post in threaded view
|

Re: segmentation fault during package build

Riccardo Mottola
In reply to this post by Philip Guenther-2
Hi,

Philip Guenther wrote:
> then reproduce the problem in the bzip2 port to get a fresh core file
> with that binary, then finally run gdb against the_uninstalled_
> binary (/usr/src/bin/ln/obj/ln) but with the new core file and see
> what the backtrace shows.
before doing that, I did this: make clean=depends (inside libxml, which
has bzip as a dependency). Make install again.

Now it built bzip2 but ln crashes while installing mapages of tcl. I
reissue make install now, it crashes again in tcl.

That is, starting from a clean port build may shift the problem, but
once it is there, I can hit again and again in the same place.
I will try if it survives a reboot, that is if it is file-system
dependent "only". Once I know it is, I can then check different "ln"
binaries by leaving it in the same state and not making clean.

Riccardo

Reply | Threaded
Open this post in threaded view
|

Re: segmentation fault during package build

Riccardo Mottola
In reply to this post by Tobias Ulmer
Hi Tobias,

what you write is frightening :)

Tobias Ulmer wrote:

> On Fri, Dec 05, 2014 at 01:18:18AM +0100, Riccardo Mottola wrote:
>> Tobias Ulmer wrote:
>>> full dmesg please
>> Here it is:
>> OpenBSD 5.6 (GENERIC) #94: Wed Aug 13 13:54:32 GMT 2014
>> [hidden email]:/usr/src/sys/arch/sparc/compile/GENERIC
>> real mem = 166998016 (159MB)
>> avail mem = 159440896 (152MB)
>> mainbus0 at root: SUNW,SPARCstation-20
>> cpu0 at mainbus0: TMS390Z50 v0 or TMS390Z55 @ 50 MHz, on-chip FPU
>> cpu0: physical 20K instruction (64 b/l), 16K data (32 b/l) cache enabled
> This CPU module (Voyager iirc) has issues. It's stable most of the time,
> but small programs (like chmod, cat, touch, ...) crash in random places.
Voyager? I don't know, checking:
http://mbus.sunhelp.org/modules/index.htm

I would identify it as SM50, the latter revision, with the large
heatsink, not the round one.
I suppose you mean by unstable under OpenBSD? It was very stable under
solaris 2.5.
Would it crash the program always in the same place?

It is the first time I run OpenBSD on this machine, converting it from
Solaris. It have a dual-HyperSparc module from Ross, but during OpenBSD
install it apparently failed, becoming unreliable. I hope it is just a
coincidence of the age and that OpenBSD doesn't "fry" modules :) After a
while it is up I get back into OBP and at reboot I get a memory failure.
I suppose the cache controller or the cache went bad.

I still have a SM40 I think, for emergency. I will try that and see,
after being certain that I can reproduce these error from reboot to
reboot. Once I get the crash I can type "make install" and reproduce it
in the same place, but a make clean will have it work and have crash ln
in another package.


Riccardo

Reply | Threaded
Open this post in threaded view
|

Re: segmentation fault during package build

Tobias Ulmer
On Fri, Dec 05, 2014 at 10:48:36AM +0100, Riccardo Mottola wrote:

> Hi Tobias,
>
> what you write is frightening :)
>
> Tobias Ulmer wrote:
> >On Fri, Dec 05, 2014 at 01:18:18AM +0100, Riccardo Mottola wrote:
> >>Tobias Ulmer wrote:
> >>>full dmesg please
> >>Here it is:
> >>OpenBSD 5.6 (GENERIC) #94: Wed Aug 13 13:54:32 GMT 2014
> >>[hidden email]:/usr/src/sys/arch/sparc/compile/GENERIC
> >>real mem = 166998016 (159MB)
> >>avail mem = 159440896 (152MB)
> >>mainbus0 at root: SUNW,SPARCstation-20
> >>cpu0 at mainbus0: TMS390Z50 v0 or TMS390Z55 @ 50 MHz, on-chip FPU
> >>cpu0: physical 20K instruction (64 b/l), 16K data (32 b/l) cache enabled
> >This CPU module (Voyager iirc) has issues. It's stable most of the time,
> >but small programs (like chmod, cat, touch, ...) crash in random places.
> Voyager? I don't know, checking:
> http://mbus.sunhelp.org/modules/index.htm
>
> I would identify it as SM50, the latter revision, with the large heatsink,
> not the round one.

I just had a look, mine are 501-2708

> I suppose you mean by unstable under OpenBSD? It was very stable under
> solaris 2.5.

Yes, under OpenBSD. I don't know about other OS.

> Would it crash the program always in the same place?

No.

>
> It is the first time I run OpenBSD on this machine, converting it from
> Solaris. It have a dual-HyperSparc module from Ross, but during OpenBSD
> install it apparently failed, becoming unreliable. I hope it is just a
> coincidence of the age and that OpenBSD doesn't "fry" modules :) After a
> while it is up I get back into OBP and at reboot I get a memory failure. I
> suppose the cache controller or the cache went bad.

Cooling issue? My SS10/SS20 work fine with a HM150S-512

>
> I still have a SM40 I think, for emergency. I will try that and see, after
> being certain that I can reproduce these error from reboot to reboot. Once I
> get the crash I can type "make install" and reproduce it in the same place,
> but a make clean will have it work and have crash ln in another package.
>
>
> Riccardo

Reply | Threaded
Open this post in threaded view
|

Re: segmentation fault during package build

Riccardo Mottola
Hi,

On 12/06/14 03:07, Tobias Ulmer wrote:
> Voyager? I don't know, checking:
> http://mbus.sunhelp.org/modules/index.htm
>
> I would identify it as SM50, the latter revision, with the large heatsink,
> not the round one.
> I just had a look, mine are 501-2708
As mine. 501-2708  03 REV 50
And they are unstable for you. I'll try some further test with this and
then swap in the old SM40.


>> It is the first time I run OpenBSD on this machine, converting it from
>> Solaris. It have a dual-HyperSparc module from Ross, but during OpenBSD
>> install it apparently failed, becoming unreliable. I hope it is just a
>> coincidence of the age and that OpenBSD doesn't "fry" modules :) After a
>> while it is up I get back into OBP and at reboot I get a memory failure. I
>> suppose the cache controller or the cache went bad.
> Cooling issue? My SS10/SS20 work fine with a HM150S-512
Under Solaris I had very long compiling sessions and it remained always
stable as a rock. Also opening the box and touching the heat sink
revealed acceptable temperatures.


Riccardo

Reply | Threaded
Open this post in threaded view
|

Re: segmentation fault during package build

Miod Vallat
I can confirm the spurious segmentation faults or `double free' issues
with an SM40 module, and I am currently investigating the issue.

Miod

Reply | Threaded
Open this post in threaded view
|

Re: segmentation fault during package build

Riccardo Mottola
Hi,

Miod Vallat wrote:
> I can confirm the spurious segmentation faults or `double free' issues
> with an SM40 module, and I am currently investigating the issue.
Fine. That means that using the SM40 instead of the SM50 won't probably
help. I will try though, just to be sure.

They are both cache-less modules.

What I found out is that apparently crashes change from reboot to reboot
ad not in the same place.

1) compile, get a crash (appears ln most often for me)
2) reissue make, it will crash in the same place
3) reboot, reissue make, it will crash somewhere else!

As soon as I have enough time (might be next week though) I will try the
above again and again to see if it is consistent.
Also, a method for reproducing it would be nice, just randomly building
ports is not the best reproducible test case.

Thanks,
Riccardo

Reply | Threaded
Open this post in threaded view
|

Re: segmentation fault during package build

Riccardo Mottola
In reply to this post by Miod Vallat
Hello Miod,

Miod Vallat wrote:
> I can confirm the spurious segmentation faults or `double free' issues
> with an SM40 module, and I am currently investigating the issue.
I swapped in the ol' SM40 instead of SM50 and after a make clean, build
still failed, still in tcl.

So yes, both modules are affected

Something stresses ln there, just as a test I tried "touch a" and "ln -s
a b" and it of course works.

Riccardo