installer amd64 'Get/Verify bsd' -> 'Illegal instruction' - shuttle ds47d

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

installer amd64 'Get/Verify bsd' -> 'Illegal instruction' - shuttle ds47d

Marcus MERIGHI
Regression: from 5.8 onwards install fails on three of these shuttle
ds47d machines (5.7 works, 5.8 and -current doesn't).

They are supposed to be identical, two bought at once, one a little
earlier.

How it fails: 'Get/Verify SHA256.sig' succeeds, but 'Get/Verify bsd'
stopps instantly and gives only 'Illegal instruction'.

I've done installs of 5.7, 5.8 and -current multiple times on all three
of these machines. I've swapped RAM with my notebook. I took everything
out that can be removed (mSATA SSD, wlan). I do not think this is a
hardware or user failure (anymore).

What I've found out:
I can run -current, when booting from externally installed usb stick:
sshd(8) stops when the first client disconnects.
cron(8) stops when it sends an email.
top(1) returns normally on 'q', prints 'Illegal instruction' on strg+c.
ping(8) returns 'Illegal instruction' after printing the first line.

Testing the commands that are run by install.sub on the running -current
system shows, that if I ftp to file and run the file through sha256
there is no problem. The pipe is causing the 'Illegal instruction'.

I tried my gdb voodoo but it is weak...
gdb(1) 'bt' on the cores has one thing in common: <stdin> - no such
file.

Below my posts to misc@ (sorry!), followed by -current (24/1) dmesg.

Attached is the usual info on bugs@, for 5.7 and -current (24/1).

One of the machines is ready for debugging. Please advise!

Bye + Thanks for reading, Marcus

[hidden email] (Marcus MERIGHI), 2016.01.22 (Fri) 16:14 (CET):
> please disregard!
>
> sorry, right after hitting send I realized that this was not a -current
> install as usual but a 5.8 one.
>
> Therefore I'm quite sure the problem is hardware/PEBKAC.
 

> [hidden email] (Marcus MERIGHI), 2016.01.22 (Fri) 16:11 (CET):
> > I just downloaded amd64 bsd.rd, put it on an usb stick, booted a new
> > machine from usb.
> >
> > When the file sets were selected, 'Get/Verify SHA256.sig' succeeded, but
> > 'Get/Verify bsd' stopped instantly and gave only 'Illegal instruction'.
> >
> > Before this the only thing that didn't work was fetching the mirror
> > list. I entered one manually and the installer proceeded.
> >
> > The machine once had OpenBSD loaded and worked, it's a Shuttle DS47,
> > dmesg is in the archives.
> >
> > The USB drive I used is the one I always use for these tasks.
OpenBSD 5.9-beta (RAMDISK_CD) #1695: Sun Jan 24 21:40:49 MST 2016
    [hidden email]:/usr/src/sys/arch/amd64/compile/RAMDISK_CD
real mem = 4161052672 (3968MB)
avail mem = 4033228800 (3846MB)
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xeb530 (73 entries)
bios0: vendor American Megatrends Inc. version "1.03" date 08/09/2013
bios0: Shuttle Inc. DS47D
acpi0 at bios0: rev 2
acpi0: tables DSDT FACP APIC FPDT MCFG SLIC HPET SSDT SSDT SSDT
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Celeron(R) CPU 847 @ 1.10GHz, 1097.70 MHz
cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,XSAVE,NXE,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.1.2, IBE
cpu at mainbus0: not configured
ioapic0 at mainbus0: apid 2 pa 0xfec00000, version 20, 24 pins
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus -1 (P0P1)
acpiprt2 at acpi0: bus 1 (RP01)
acpiprt3 at acpi0: bus 2 (RP02)
acpiprt4 at acpi0: bus 3 (RP03)
acpiprt5 at acpi0: bus 4 (RP04)
acpiprt6 at acpi0: bus -1 (RP05)
acpiprt7 at acpi0: bus -1 (RP06)
acpiprt8 at acpi0: bus -1 (RP07)
acpiprt9 at acpi0: bus -1 (RP08)
acpiprt10 at acpi0: bus -1 (PEG0)
acpiprt11 at acpi0: bus -1 (PEG1)
acpiprt12 at acpi0: bus -1 (PEG2)
acpiprt13 at acpi0: bus -1 (PEG3)
acpiec0 at acpi0: not present
acpicpu at acpi0 not configured
acpipwrres at acpi0 not configured
acpipwrres at acpi0 not configured
acpipwrres at acpi0 not configured
acpipwrres at acpi0 not configured
acpipwrres at acpi0 not configured
acpitz at acpi0 not configured
acpitz at acpi0 not configured
acpibat at acpi0 not configured
acpibat at acpi0 not configured
acpibat at acpi0 not configured
acpibtn at acpi0 not configured
acpibtn at acpi0 not configured
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel Core 2G Host" rev 0x09
vga1 at pci0 dev 2 function 0 "Intel HD Graphics 2000" rev 0x09
wsdisplay1 at vga1 mux 1: console (80x25, vt100 emulation)
"Intel 7 Series MEI" rev 0x04 at pci0 dev 22 function 0 not configured
ehci0 at pci0 dev 26 function 0 "Intel 7 Series USB" rev 0x04: apic 2 int 16
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 "Intel EHCI root hub" rev 2.00/1.00 addr 1
"Intel 7 Series HD Audio" rev 0x04 at pci0 dev 27 function 0 not configured
ppb0 at pci0 dev 28 function 0 "Intel 7 Series PCIE" rev 0xc4: msi
pci1 at ppb0 bus 1
iwn0 at pci1 dev 0 function 0 "Intel Wireless WiFi Link 4965" rev 0x61: msi, MIMO 2T3R, MoW2, address 00:21:5c:04:ca:5f
ppb1 at pci0 dev 28 function 1 "Intel 7 Series PCIE" rev 0xc4: msi
pci2 at ppb1 bus 2
re0 at pci2 dev 0 function 0 "Realtek 8168" rev 0x0c: RTL8168G/8111G (0x4c00), msi, address 80:ee:73:77:d5:a9
rgephy0 at re0 phy 7: RTL8251 PHY, rev. 0
ppb2 at pci0 dev 28 function 2 "Intel 7 Series PCIE" rev 0xc4: msi
pci3 at ppb2 bus 3
xhci0 at pci3 dev 0 function 0 "ASMedia ASM1042A xHCI" rev 0x00: msi
usb1 at xhci0: USB revision 3.0
uhub1 at usb1 "ASMedia xHCI root hub" rev 3.00/1.00 addr 1
ppb3 at pci0 dev 28 function 3 "Intel 7 Series PCIE" rev 0xc4: msi
pci4 at ppb3 bus 4
re1 at pci4 dev 0 function 0 "Realtek 8168" rev 0x0c: RTL8168G/8111G (0x4c00), msi, address 80:ee:73:77:d5:a8
rgephy1 at re1 phy 7: RTL8251 PHY, rev. 0
ehci1 at pci0 dev 29 function 0 "Intel 7 Series USB" rev 0x04: apic 2 int 23
usb2 at ehci1: USB revision 2.0
uhub2 at usb2 "Intel EHCI root hub" rev 2.00/1.00 addr 1
"Intel NM70 LPC" rev 0x04 at pci0 dev 31 function 0 not configured
ahci0 at pci0 dev 31 function 2 "Intel 7 Series AHCI" rev 0x04: msi, AHCI 1.3
ahci0: port 2: 3.0Gb/s
scsibus0 at ahci0: 32 targets
sd0 at scsibus0 targ 2 lun 0: <ATA, TS8GMSM610, 2013> SCSI3 0/direct fixed t10.ATA_TS8GMSM610_A93797112253B1000021
sd0: 7641MB, 512 bytes/sector, 15649200 sectors
"Intel 7 Series SMBus" rev 0x04 at pci0 dev 31 function 3 not configured
isa0 at mainbus0
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
com1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
uhub3 at uhub0 port 1 "vendor 0x8087 product 0x0024" rev 2.00/0.00 addr 2
uhidev0 at uhub3 port 1 configuration 1 interface 0 "vendor 0x05af Rx504B  Ver:3.03" rev 2.00/3.10 addr 3
uhidev0: iclass 3/1
ukbd0 at uhidev0
wskbd0 at ukbd0: console keyboard, using wsdisplay1
uhidev1 at uhub3 port 1 configuration 1 interface 1 "vendor 0x05af Rx504B  Ver:3.03" rev 2.00/3.10 addr 3
uhidev1: iclass 3/1, 5 report ids
uhid at uhidev1 reportid 1 not configured
uhid at uhidev1 reportid 2 not configured
uhid at uhidev1 reportid 3 not configured
uhid at uhidev1 reportid 4 not configured
uhid at uhidev1 reportid 5 not configured
umass0 at uhub1 port 4 configuration 1 interface 0 "Generic Mass Storage" rev 2.00/1.0e addr 2
umass0: using SCSI over Bulk-Only
scsibus1 at umass0: 2 targets, initiator 0
sd1 at scsibus1 targ 1 lun 0: <Generic, Flash Disk, 8.07> SCSI2 0/direct removable
sd1: 7680MB, 512 bytes/sector, 15728640 sectors
uhub4 at uhub2 port 1 "vendor 0x8087 product 0x0024" rev 2.00/0.00 addr 2
umass1 at uhub4 port 2 configuration 1 interface 0 "Generic USB Storage" rev 2.00/94.54 addr 3
umass1: using SCSI over Bulk-Only
scsibus2 at umass1: 2 targets, initiator 0
sd2 at scsibus2 targ 1 lun 0: <Generic, STORAGE DEVICE, 9454> SCSI0 0/direct removable
softraid0 at root
scsibus3 at softraid0: 256 targets
root on rd0a swap on rd0b dump on rd0b
syncing disks...
OpenBSD 5.9-beta (GENERIC.MP) #1863: Sun Jan 24 21:35:42 MST 2016
    [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 4161052672 (3968MB)
avail mem = 4030754816 (3844MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xeb530 (73 entries)
bios0: vendor American Megatrends Inc. version "1.03" date 08/09/2013
bios0: Shuttle Inc. DS47D
acpi0 at bios0: rev 2
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP APIC FPDT MCFG SLIC HPET SSDT SSDT SSDT
acpi0: wakeup devices P0P1(S4) USB1(S3) USB2(S3) USB3(S3) USB4(S3) USB5(S3) USB6(S3) USB7(S3) PXSX(S4) RP01(S4) PXSX(S4) RP02(S4) PXSX(S4) RP03(S4) PXSX(S4) RP04(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Celeron(R) CPU 847 @ 1.10GHz, 1097.70 MHz
cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,XSAVE,NXE,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.1.2, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Celeron(R) CPU 847 @ 1.10GHz, 1097.51 MHz
cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,XSAVE,NXE,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 0, core 1, package 0
ioapic0 at mainbus0: apid 2 pa 0xfec00000, version 20, 24 pins
acpimcfg0 at acpi0 addr 0xf8000000, bus 0-63
acpihpet0 at acpi0: 14318179 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus -1 (P0P1)
acpiprt2 at acpi0: bus 1 (RP01)
acpiprt3 at acpi0: bus 2 (RP02)
acpiprt4 at acpi0: bus 3 (RP03)
acpiprt5 at acpi0: bus 4 (RP04)
acpiprt6 at acpi0: bus -1 (RP05)
acpiprt7 at acpi0: bus -1 (RP06)
acpiprt8 at acpi0: bus -1 (RP07)
acpiprt9 at acpi0: bus -1 (RP08)
acpiprt10 at acpi0: bus -1 (PEG0)
acpiprt11 at acpi0: bus -1 (PEG1)
acpiprt12 at acpi0: bus -1 (PEG2)
acpiprt13 at acpi0: bus -1 (PEG3)
acpiec0 at acpi0: not present
acpicpu0 at acpi0: C2(350@104 mwait.1@0x20), C1(1000@1 mwait.1), PSS
acpicpu1 at acpi0: C2(350@104 mwait.1@0x20), C1(1000@1 mwait.1), PSS
acpipwrres0 at acpi0: FN00, resource for FAN0
acpipwrres1 at acpi0: FN01, resource for FAN1
acpipwrres2 at acpi0: FN02, resource for FAN2
acpipwrres3 at acpi0: FN03, resource for FAN3
acpipwrres4 at acpi0: FN04, resource for FAN4
acpitz0 at acpi0: critical temperature is 101 degC
acpitz1 at acpi0: critical temperature is 101 degC
acpibat0 at acpi0: BAT0 not present
acpibat1 at acpi0: BAT1 not present
acpibat2 at acpi0: BAT2 not present
acpibtn0 at acpi0: PWRB
acpibtn1 at acpi0: LID0
acpivideo0 at acpi0: GFX0
cpu0: Enhanced SpeedStep 1097 MHz: speeds: 1100, 1000, 900, 800 MHz
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel Core 2G Host" rev 0x09
inteldrm0 at pci0 dev 2 function 0 "Intel HD Graphics 2000" rev 0x09
drm0 at inteldrm0
inteldrm0: msi
inteldrm0: 1280x1024
wsdisplay0 at inteldrm0 mux 1: console (std, vt100 emulation)
wsdisplay0: screen 1-5 added (std, vt100 emulation)
"Intel 7 Series MEI" rev 0x04 at pci0 dev 22 function 0 not configured
ehci0 at pci0 dev 26 function 0 "Intel 7 Series USB" rev 0x04: apic 2 int 16
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 "Intel EHCI root hub" rev 2.00/1.00 addr 1
azalia0 at pci0 dev 27 function 0 "Intel 7 Series HD Audio" rev 0x04: msi
azalia0: codecs: Realtek ALC662, Intel/0x2806, using Realtek ALC662
audio0 at azalia0
ppb0 at pci0 dev 28 function 0 "Intel 7 Series PCIE" rev 0xc4: msi
pci1 at ppb0 bus 1
iwn0 at pci1 dev 0 function 0 "Intel Wireless WiFi Link 4965" rev 0x61: msi, MIMO 2T3R, MoW2, address 00:21:5c:04:ca:5f
ppb1 at pci0 dev 28 function 1 "Intel 7 Series PCIE" rev 0xc4: msi
pci2 at ppb1 bus 2
re0 at pci2 dev 0 function 0 "Realtek 8168" rev 0x0c: RTL8168G/8111G (0x4c00), msi, address 80:ee:73:77:d5:a9
rgephy0 at re0 phy 7: RTL8251 PHY, rev. 0
ppb2 at pci0 dev 28 function 2 "Intel 7 Series PCIE" rev 0xc4: msi
pci3 at ppb2 bus 3
xhci0 at pci3 dev 0 function 0 "ASMedia ASM1042A xHCI" rev 0x00: msi
usb1 at xhci0: USB revision 3.0
uhub1 at usb1 "ASMedia xHCI root hub" rev 3.00/1.00 addr 1
ppb3 at pci0 dev 28 function 3 "Intel 7 Series PCIE" rev 0xc4: msi
pci4 at ppb3 bus 4
re1 at pci4 dev 0 function 0 "Realtek 8168" rev 0x0c: RTL8168G/8111G (0x4c00), msi, address 80:ee:73:77:d5:a8
rgephy1 at re1 phy 7: RTL8251 PHY, rev. 0
ehci1 at pci0 dev 29 function 0 "Intel 7 Series USB" rev 0x04: apic 2 int 23
usb2 at ehci1: USB revision 2.0
uhub2 at usb2 "Intel EHCI root hub" rev 2.00/1.00 addr 1
pcib0 at pci0 dev 31 function 0 "Intel NM70 LPC" rev 0x04
ahci0 at pci0 dev 31 function 2 "Intel 7 Series AHCI" rev 0x04: msi, AHCI 1.3
ahci0: port 2: 3.0Gb/s
scsibus1 at ahci0: 32 targets
sd0 at scsibus1 targ 2 lun 0: <ATA, TS8GMSM610, 2013> SCSI3 0/direct fixed t10.ATA_TS8GMSM610_A93797112253B1000021
sd0: 7641MB, 512 bytes/sector, 15649200 sectors
ichiic0 at pci0 dev 31 function 3 "Intel 7 Series SMBus" rev 0x04: apic 2 int 18
iic0 at ichiic0
spdmem0 at iic0 addr 0x50: 4GB DDR3 SDRAM PC3-10600 SO-DIMM
isa0 at pcib0
isadma0 at isa0
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
com1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
it0 at isa0 port 0x2e/2: IT8728F rev 1, EC port 0xa30
uhub3 at uhub0 port 1 "Intel Rate Matching Hub" rev 2.00/0.00 addr 2
uhidev0 at uhub3 port 1 configuration 1 interface 0 "Jing Mold Rx504B  Ver:3.03" rev 2.00/3.10 addr 3
uhidev0: iclass 3/1
ukbd0 at uhidev0: 8 variable keys, 6 key codes
wskbd0 at ukbd0: console keyboard, using wsdisplay0
uhidev1 at uhub3 port 1 configuration 1 interface 1 "Jing Mold Rx504B  Ver:3.03" rev 2.00/3.10 addr 3
uhidev1: iclass 3/1, 5 report ids
ums0 at uhidev1 reportid 1: 3 buttons, Z dir
wsmouse0 at ums0 mux 0
uhid0 at uhidev1 reportid 2: input=2, output=0, feature=0
uhid1 at uhidev1 reportid 3: input=2, output=0, feature=0
uhid2 at uhidev1 reportid 4: input=2, output=0, feature=0
uhid3 at uhidev1 reportid 5: input=7, output=7, feature=0
uhub4 at uhub2 port 1 "Intel Rate Matching Hub" rev 2.00/0.00 addr 2
umass0 at uhub4 port 2 configuration 1 interface 0 "Generic USB Storage" rev 2.00/94.54 addr 3
umass0: using SCSI over Bulk-Only
scsibus2 at umass0: 2 targets, initiator 0
sd1 at scsibus2 targ 1 lun 0: <Generic, STORAGE DEVICE, 9454> SCSI0 0/direct removable
vscsi0 at root
scsibus3 at vscsi0: 256 targets
softraid0 at root
scsibus4 at softraid0: 256 targets
root on sd0a (e333dfd9e2d5f9a9.a) swap on sd0b dump on sd0b
umass1 at uhub4 port 4 configuration 1 interface 0 "Generic Mass Storage" rev 2.00/1.0e addr 4
umass1: using SCSI over Bulk-Only
scsibus5 at umass1: 2 targets, initiator 0
sd2 at scsibus5 targ 1 lun 0: <Generic, Flash Disk, 8.07> SCSI2 0/direct removable
sd2: 7680MB, 512 bytes/sector, 15728640 sectors

shuttle-ds47d-57.tar.gz (32K) Download Attachment
shuttle-ds47d-cur.tar.gz (32K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: installer amd64 'Get/Verify bsd' -> 'Illegal instruction' - shuttle ds47d

Mike Larkin
On Wed, Jan 27, 2016 at 06:08:49PM +0100, Marcus MERIGHI wrote:
> Regression: from 5.8 onwards install fails on three of these shuttle
> ds47d machines (5.7 works, 5.8 and -current doesn't).
>
> They are supposed to be identical, two bought at once, one a little
> earlier.
>
> How it fails: 'Get/Verify SHA256.sig' succeeds, but 'Get/Verify bsd'
> stopps instantly and gives only 'Illegal instruction'.

There seem to be a few instances of this happening recently, given
various threads on misc/tech/bugs@. IIRC someone traced the instruction
to a syscall (maybe it was sysret, can't recall) by dumping one of the
binaries that was having problems, but nobody could explain why that was
occurring since if that were really an invalid instruction, it would have
failed on the first use, not some thousand or so calls later.

Both of those instructions can indeed generate #UD, but apparently only
if you either aren't in 64 bit mode (not the case) or if EFER gets trashed
(likely not the case either). So, I'm stumped.

I don't have any other advice at the moment, but am replying here to
keep the thread up to date with what I recall being the most up to date
info.

-ml

>
> I've done installs of 5.7, 5.8 and -current multiple times on all three
> of these machines. I've swapped RAM with my notebook. I took everything
> out that can be removed (mSATA SSD, wlan). I do not think this is a
> hardware or user failure (anymore).
>
> What I've found out:
> I can run -current, when booting from externally installed usb stick:
> sshd(8) stops when the first client disconnects.
> cron(8) stops when it sends an email.
> top(1) returns normally on 'q', prints 'Illegal instruction' on strg+c.
> ping(8) returns 'Illegal instruction' after printing the first line.
>
> Testing the commands that are run by install.sub on the running -current
> system shows, that if I ftp to file and run the file through sha256
> there is no problem. The pipe is causing the 'Illegal instruction'.
>
> I tried my gdb voodoo but it is weak...
> gdb(1) 'bt' on the cores has one thing in common: <stdin> - no such
> file.
>
> Below my posts to misc@ (sorry!), followed by -current (24/1) dmesg.
>
> Attached is the usual info on bugs@, for 5.7 and -current (24/1).
>
> One of the machines is ready for debugging. Please advise!
>
> Bye + Thanks for reading, Marcus
>
> [hidden email] (Marcus MERIGHI), 2016.01.22 (Fri) 16:14 (CET):
> > please disregard!
> >
> > sorry, right after hitting send I realized that this was not a -current
> > install as usual but a 5.8 one.
> >
> > Therefore I'm quite sure the problem is hardware/PEBKAC.
>  
> > [hidden email] (Marcus MERIGHI), 2016.01.22 (Fri) 16:11 (CET):
> > > I just downloaded amd64 bsd.rd, put it on an usb stick, booted a new
> > > machine from usb.
> > >
> > > When the file sets were selected, 'Get/Verify SHA256.sig' succeeded, but
> > > 'Get/Verify bsd' stopped instantly and gave only 'Illegal instruction'.
> > >
> > > Before this the only thing that didn't work was fetching the mirror
> > > list. I entered one manually and the installer proceeded.
> > >
> > > The machine once had OpenBSD loaded and worked, it's a Shuttle DS47,
> > > dmesg is in the archives.
> > >
> > > The USB drive I used is the one I always use for these tasks.
>
> OpenBSD 5.9-beta (RAMDISK_CD) #1695: Sun Jan 24 21:40:49 MST 2016
>     [hidden email]:/usr/src/sys/arch/amd64/compile/RAMDISK_CD
> real mem = 4161052672 (3968MB)
> avail mem = 4033228800 (3846MB)
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xeb530 (73 entries)
> bios0: vendor American Megatrends Inc. version "1.03" date 08/09/2013
> bios0: Shuttle Inc. DS47D
> acpi0 at bios0: rev 2
> acpi0: tables DSDT FACP APIC FPDT MCFG SLIC HPET SSDT SSDT SSDT
> acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Celeron(R) CPU 847 @ 1.10GHz, 1097.70 MHz
> cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,XSAVE,NXE,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
> cpu0: 256KB 64b/line 8-way L2 cache
> cpu0: apic clock running at 99MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.1.1.2, IBE
> cpu at mainbus0: not configured
> ioapic0 at mainbus0: apid 2 pa 0xfec00000, version 20, 24 pins
> acpiprt0 at acpi0: bus 0 (PCI0)
> acpiprt1 at acpi0: bus -1 (P0P1)
> acpiprt2 at acpi0: bus 1 (RP01)
> acpiprt3 at acpi0: bus 2 (RP02)
> acpiprt4 at acpi0: bus 3 (RP03)
> acpiprt5 at acpi0: bus 4 (RP04)
> acpiprt6 at acpi0: bus -1 (RP05)
> acpiprt7 at acpi0: bus -1 (RP06)
> acpiprt8 at acpi0: bus -1 (RP07)
> acpiprt9 at acpi0: bus -1 (RP08)
> acpiprt10 at acpi0: bus -1 (PEG0)
> acpiprt11 at acpi0: bus -1 (PEG1)
> acpiprt12 at acpi0: bus -1 (PEG2)
> acpiprt13 at acpi0: bus -1 (PEG3)
> acpiec0 at acpi0: not present
> acpicpu at acpi0 not configured
> acpipwrres at acpi0 not configured
> acpipwrres at acpi0 not configured
> acpipwrres at acpi0 not configured
> acpipwrres at acpi0 not configured
> acpipwrres at acpi0 not configured
> acpitz at acpi0 not configured
> acpitz at acpi0 not configured
> acpibat at acpi0 not configured
> acpibat at acpi0 not configured
> acpibat at acpi0 not configured
> acpibtn at acpi0 not configured
> acpibtn at acpi0 not configured
> pci0 at mainbus0 bus 0
> pchb0 at pci0 dev 0 function 0 "Intel Core 2G Host" rev 0x09
> vga1 at pci0 dev 2 function 0 "Intel HD Graphics 2000" rev 0x09
> wsdisplay1 at vga1 mux 1: console (80x25, vt100 emulation)
> "Intel 7 Series MEI" rev 0x04 at pci0 dev 22 function 0 not configured
> ehci0 at pci0 dev 26 function 0 "Intel 7 Series USB" rev 0x04: apic 2 int 16
> usb0 at ehci0: USB revision 2.0
> uhub0 at usb0 "Intel EHCI root hub" rev 2.00/1.00 addr 1
> "Intel 7 Series HD Audio" rev 0x04 at pci0 dev 27 function 0 not configured
> ppb0 at pci0 dev 28 function 0 "Intel 7 Series PCIE" rev 0xc4: msi
> pci1 at ppb0 bus 1
> iwn0 at pci1 dev 0 function 0 "Intel Wireless WiFi Link 4965" rev 0x61: msi, MIMO 2T3R, MoW2, address 00:21:5c:04:ca:5f
> ppb1 at pci0 dev 28 function 1 "Intel 7 Series PCIE" rev 0xc4: msi
> pci2 at ppb1 bus 2
> re0 at pci2 dev 0 function 0 "Realtek 8168" rev 0x0c: RTL8168G/8111G (0x4c00), msi, address 80:ee:73:77:d5:a9
> rgephy0 at re0 phy 7: RTL8251 PHY, rev. 0
> ppb2 at pci0 dev 28 function 2 "Intel 7 Series PCIE" rev 0xc4: msi
> pci3 at ppb2 bus 3
> xhci0 at pci3 dev 0 function 0 "ASMedia ASM1042A xHCI" rev 0x00: msi
> usb1 at xhci0: USB revision 3.0
> uhub1 at usb1 "ASMedia xHCI root hub" rev 3.00/1.00 addr 1
> ppb3 at pci0 dev 28 function 3 "Intel 7 Series PCIE" rev 0xc4: msi
> pci4 at ppb3 bus 4
> re1 at pci4 dev 0 function 0 "Realtek 8168" rev 0x0c: RTL8168G/8111G (0x4c00), msi, address 80:ee:73:77:d5:a8
> rgephy1 at re1 phy 7: RTL8251 PHY, rev. 0
> ehci1 at pci0 dev 29 function 0 "Intel 7 Series USB" rev 0x04: apic 2 int 23
> usb2 at ehci1: USB revision 2.0
> uhub2 at usb2 "Intel EHCI root hub" rev 2.00/1.00 addr 1
> "Intel NM70 LPC" rev 0x04 at pci0 dev 31 function 0 not configured
> ahci0 at pci0 dev 31 function 2 "Intel 7 Series AHCI" rev 0x04: msi, AHCI 1.3
> ahci0: port 2: 3.0Gb/s
> scsibus0 at ahci0: 32 targets
> sd0 at scsibus0 targ 2 lun 0: <ATA, TS8GMSM610, 2013> SCSI3 0/direct fixed t10.ATA_TS8GMSM610_A93797112253B1000021
> sd0: 7641MB, 512 bytes/sector, 15649200 sectors
> "Intel 7 Series SMBus" rev 0x04 at pci0 dev 31 function 3 not configured
> isa0 at mainbus0
> com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
> com1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
> pckbc0 at isa0 port 0x60/5 irq 1 irq 12
> uhub3 at uhub0 port 1 "vendor 0x8087 product 0x0024" rev 2.00/0.00 addr 2
> uhidev0 at uhub3 port 1 configuration 1 interface 0 "vendor 0x05af Rx504B  Ver:3.03" rev 2.00/3.10 addr 3
> uhidev0: iclass 3/1
> ukbd0 at uhidev0
> wskbd0 at ukbd0: console keyboard, using wsdisplay1
> uhidev1 at uhub3 port 1 configuration 1 interface 1 "vendor 0x05af Rx504B  Ver:3.03" rev 2.00/3.10 addr 3
> uhidev1: iclass 3/1, 5 report ids
> uhid at uhidev1 reportid 1 not configured
> uhid at uhidev1 reportid 2 not configured
> uhid at uhidev1 reportid 3 not configured
> uhid at uhidev1 reportid 4 not configured
> uhid at uhidev1 reportid 5 not configured
> umass0 at uhub1 port 4 configuration 1 interface 0 "Generic Mass Storage" rev 2.00/1.0e addr 2
> umass0: using SCSI over Bulk-Only
> scsibus1 at umass0: 2 targets, initiator 0
> sd1 at scsibus1 targ 1 lun 0: <Generic, Flash Disk, 8.07> SCSI2 0/direct removable
> sd1: 7680MB, 512 bytes/sector, 15728640 sectors
> uhub4 at uhub2 port 1 "vendor 0x8087 product 0x0024" rev 2.00/0.00 addr 2
> umass1 at uhub4 port 2 configuration 1 interface 0 "Generic USB Storage" rev 2.00/94.54 addr 3
> umass1: using SCSI over Bulk-Only
> scsibus2 at umass1: 2 targets, initiator 0
> sd2 at scsibus2 targ 1 lun 0: <Generic, STORAGE DEVICE, 9454> SCSI0 0/direct removable
> softraid0 at root
> scsibus3 at softraid0: 256 targets
> root on rd0a swap on rd0b dump on rd0b
> syncing disks...
> OpenBSD 5.9-beta (GENERIC.MP) #1863: Sun Jan 24 21:35:42 MST 2016
>     [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 4161052672 (3968MB)
> avail mem = 4030754816 (3844MB)
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xeb530 (73 entries)
> bios0: vendor American Megatrends Inc. version "1.03" date 08/09/2013
> bios0: Shuttle Inc. DS47D
> acpi0 at bios0: rev 2
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP APIC FPDT MCFG SLIC HPET SSDT SSDT SSDT
> acpi0: wakeup devices P0P1(S4) USB1(S3) USB2(S3) USB3(S3) USB4(S3) USB5(S3) USB6(S3) USB7(S3) PXSX(S4) RP01(S4) PXSX(S4) RP02(S4) PXSX(S4) RP03(S4) PXSX(S4) RP04(S4) [...]
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Celeron(R) CPU 847 @ 1.10GHz, 1097.70 MHz
> cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,XSAVE,NXE,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
> cpu0: 256KB 64b/line 8-way L2 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 99MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.1.1.2, IBE
> cpu1 at mainbus0: apid 2 (application processor)
> cpu1: Intel(R) Celeron(R) CPU 847 @ 1.10GHz, 1097.51 MHz
> cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,XSAVE,NXE,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
> cpu1: 256KB 64b/line 8-way L2 cache
> cpu1: smt 0, core 1, package 0
> ioapic0 at mainbus0: apid 2 pa 0xfec00000, version 20, 24 pins
> acpimcfg0 at acpi0 addr 0xf8000000, bus 0-63
> acpihpet0 at acpi0: 14318179 Hz
> acpiprt0 at acpi0: bus 0 (PCI0)
> acpiprt1 at acpi0: bus -1 (P0P1)
> acpiprt2 at acpi0: bus 1 (RP01)
> acpiprt3 at acpi0: bus 2 (RP02)
> acpiprt4 at acpi0: bus 3 (RP03)
> acpiprt5 at acpi0: bus 4 (RP04)
> acpiprt6 at acpi0: bus -1 (RP05)
> acpiprt7 at acpi0: bus -1 (RP06)
> acpiprt8 at acpi0: bus -1 (RP07)
> acpiprt9 at acpi0: bus -1 (RP08)
> acpiprt10 at acpi0: bus -1 (PEG0)
> acpiprt11 at acpi0: bus -1 (PEG1)
> acpiprt12 at acpi0: bus -1 (PEG2)
> acpiprt13 at acpi0: bus -1 (PEG3)
> acpiec0 at acpi0: not present
> acpicpu0 at acpi0: C2(350@104 mwait.1@0x20), C1(1000@1 mwait.1), PSS
> acpicpu1 at acpi0: C2(350@104 mwait.1@0x20), C1(1000@1 mwait.1), PSS
> acpipwrres0 at acpi0: FN00, resource for FAN0
> acpipwrres1 at acpi0: FN01, resource for FAN1
> acpipwrres2 at acpi0: FN02, resource for FAN2
> acpipwrres3 at acpi0: FN03, resource for FAN3
> acpipwrres4 at acpi0: FN04, resource for FAN4
> acpitz0 at acpi0: critical temperature is 101 degC
> acpitz1 at acpi0: critical temperature is 101 degC
> acpibat0 at acpi0: BAT0 not present
> acpibat1 at acpi0: BAT1 not present
> acpibat2 at acpi0: BAT2 not present
> acpibtn0 at acpi0: PWRB
> acpibtn1 at acpi0: LID0
> acpivideo0 at acpi0: GFX0
> cpu0: Enhanced SpeedStep 1097 MHz: speeds: 1100, 1000, 900, 800 MHz
> pci0 at mainbus0 bus 0
> pchb0 at pci0 dev 0 function 0 "Intel Core 2G Host" rev 0x09
> inteldrm0 at pci0 dev 2 function 0 "Intel HD Graphics 2000" rev 0x09
> drm0 at inteldrm0
> inteldrm0: msi
> inteldrm0: 1280x1024
> wsdisplay0 at inteldrm0 mux 1: console (std, vt100 emulation)
> wsdisplay0: screen 1-5 added (std, vt100 emulation)
> "Intel 7 Series MEI" rev 0x04 at pci0 dev 22 function 0 not configured
> ehci0 at pci0 dev 26 function 0 "Intel 7 Series USB" rev 0x04: apic 2 int 16
> usb0 at ehci0: USB revision 2.0
> uhub0 at usb0 "Intel EHCI root hub" rev 2.00/1.00 addr 1
> azalia0 at pci0 dev 27 function 0 "Intel 7 Series HD Audio" rev 0x04: msi
> azalia0: codecs: Realtek ALC662, Intel/0x2806, using Realtek ALC662
> audio0 at azalia0
> ppb0 at pci0 dev 28 function 0 "Intel 7 Series PCIE" rev 0xc4: msi
> pci1 at ppb0 bus 1
> iwn0 at pci1 dev 0 function 0 "Intel Wireless WiFi Link 4965" rev 0x61: msi, MIMO 2T3R, MoW2, address 00:21:5c:04:ca:5f
> ppb1 at pci0 dev 28 function 1 "Intel 7 Series PCIE" rev 0xc4: msi
> pci2 at ppb1 bus 2
> re0 at pci2 dev 0 function 0 "Realtek 8168" rev 0x0c: RTL8168G/8111G (0x4c00), msi, address 80:ee:73:77:d5:a9
> rgephy0 at re0 phy 7: RTL8251 PHY, rev. 0
> ppb2 at pci0 dev 28 function 2 "Intel 7 Series PCIE" rev 0xc4: msi
> pci3 at ppb2 bus 3
> xhci0 at pci3 dev 0 function 0 "ASMedia ASM1042A xHCI" rev 0x00: msi
> usb1 at xhci0: USB revision 3.0
> uhub1 at usb1 "ASMedia xHCI root hub" rev 3.00/1.00 addr 1
> ppb3 at pci0 dev 28 function 3 "Intel 7 Series PCIE" rev 0xc4: msi
> pci4 at ppb3 bus 4
> re1 at pci4 dev 0 function 0 "Realtek 8168" rev 0x0c: RTL8168G/8111G (0x4c00), msi, address 80:ee:73:77:d5:a8
> rgephy1 at re1 phy 7: RTL8251 PHY, rev. 0
> ehci1 at pci0 dev 29 function 0 "Intel 7 Series USB" rev 0x04: apic 2 int 23
> usb2 at ehci1: USB revision 2.0
> uhub2 at usb2 "Intel EHCI root hub" rev 2.00/1.00 addr 1
> pcib0 at pci0 dev 31 function 0 "Intel NM70 LPC" rev 0x04
> ahci0 at pci0 dev 31 function 2 "Intel 7 Series AHCI" rev 0x04: msi, AHCI 1.3
> ahci0: port 2: 3.0Gb/s
> scsibus1 at ahci0: 32 targets
> sd0 at scsibus1 targ 2 lun 0: <ATA, TS8GMSM610, 2013> SCSI3 0/direct fixed t10.ATA_TS8GMSM610_A93797112253B1000021
> sd0: 7641MB, 512 bytes/sector, 15649200 sectors
> ichiic0 at pci0 dev 31 function 3 "Intel 7 Series SMBus" rev 0x04: apic 2 int 18
> iic0 at ichiic0
> spdmem0 at iic0 addr 0x50: 4GB DDR3 SDRAM PC3-10600 SO-DIMM
> isa0 at pcib0
> isadma0 at isa0
> com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
> com1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
> pckbc0 at isa0 port 0x60/5 irq 1 irq 12
> pcppi0 at isa0 port 0x61
> spkr0 at pcppi0
> it0 at isa0 port 0x2e/2: IT8728F rev 1, EC port 0xa30
> uhub3 at uhub0 port 1 "Intel Rate Matching Hub" rev 2.00/0.00 addr 2
> uhidev0 at uhub3 port 1 configuration 1 interface 0 "Jing Mold Rx504B  Ver:3.03" rev 2.00/3.10 addr 3
> uhidev0: iclass 3/1
> ukbd0 at uhidev0: 8 variable keys, 6 key codes
> wskbd0 at ukbd0: console keyboard, using wsdisplay0
> uhidev1 at uhub3 port 1 configuration 1 interface 1 "Jing Mold Rx504B  Ver:3.03" rev 2.00/3.10 addr 3
> uhidev1: iclass 3/1, 5 report ids
> ums0 at uhidev1 reportid 1: 3 buttons, Z dir
> wsmouse0 at ums0 mux 0
> uhid0 at uhidev1 reportid 2: input=2, output=0, feature=0
> uhid1 at uhidev1 reportid 3: input=2, output=0, feature=0
> uhid2 at uhidev1 reportid 4: input=2, output=0, feature=0
> uhid3 at uhidev1 reportid 5: input=7, output=7, feature=0
> uhub4 at uhub2 port 1 "Intel Rate Matching Hub" rev 2.00/0.00 addr 2
> umass0 at uhub4 port 2 configuration 1 interface 0 "Generic USB Storage" rev 2.00/94.54 addr 3
> umass0: using SCSI over Bulk-Only
> scsibus2 at umass0: 2 targets, initiator 0
> sd1 at scsibus2 targ 1 lun 0: <Generic, STORAGE DEVICE, 9454> SCSI0 0/direct removable
> vscsi0 at root
> scsibus3 at vscsi0: 256 targets
> softraid0 at root
> scsibus4 at softraid0: 256 targets
> root on sd0a (e333dfd9e2d5f9a9.a) swap on sd0b dump on sd0b
> umass1 at uhub4 port 4 configuration 1 interface 0 "Generic Mass Storage" rev 2.00/1.0e addr 4
> umass1: using SCSI over Bulk-Only
> scsibus5 at umass1: 2 targets, initiator 0
> sd2 at scsibus5 targ 1 lun 0: <Generic, Flash Disk, 8.07> SCSI2 0/direct removable
> sd2: 7680MB, 512 bytes/sector, 15728640 sectors



Reply | Threaded
Open this post in threaded view
|

Re: installer amd64 'Get/Verify bsd' -> 'Illegal instruction' - shuttle ds47d

Stefan Kempf-2
Mike Larkin wrote:

> On Wed, Jan 27, 2016 at 06:08:49PM +0100, Marcus MERIGHI wrote:
> > Regression: from 5.8 onwards install fails on three of these shuttle
> > ds47d machines (5.7 works, 5.8 and -current doesn't).
> >
> > They are supposed to be identical, two bought at once, one a little
> > earlier.
> >
> > How it fails: 'Get/Verify SHA256.sig' succeeds, but 'Get/Verify bsd'
> > stopps instantly and gives only 'Illegal instruction'.
>
> There seem to be a few instances of this happening recently, given
> various threads on misc/tech/bugs@. IIRC someone traced the instruction
> to a syscall (maybe it was sysret, can't recall) by dumping one of the
> binaries that was having problems, but nobody could explain why that was
> occurring since if that were really an invalid instruction, it would have
> failed on the first use, not some thousand or so calls later.

I think it was syscall: https://marc.info/?l=openbsd-bugs&m=145135041623692&w=2
 
> Both of those instructions can indeed generate #UD, but apparently only
> if you either aren't in 64 bit mode (not the case) or if EFER gets trashed
> (likely not the case either). So, I'm stumped.

On amd64, the kernel generates a SIGILL on: an undefined/illegal instruction
(well, duh ;-) and on an FP coprocessor fetch fault.

Then there's the routine sendsig() in amd64/machdep.c that copies
out FPU state, signal context, etc. to the user process as I can
understand so far.  When the copyout() fails, the kernel calls
sigexit(SIGILL), which terminates the process.

So what I suspect to happen is that:
- userland does a syscall
- something goes wrong in the kernel, causing it to call
  sigexit(SIGILL), terminating the process
- and the offending instruction you see in the core dump
  is the 'syscall' instruction.
 

> I don't have any other advice at the moment, but am replying here to
> keep the thread up to date with what I recall being the most up to date
> info.
>
> -ml
>
> >
> > I've done installs of 5.7, 5.8 and -current multiple times on all three
> > of these machines. I've swapped RAM with my notebook. I took everything
> > out that can be removed (mSATA SSD, wlan). I do not think this is a
> > hardware or user failure (anymore).
> >
> > What I've found out:
> > I can run -current, when booting from externally installed usb stick:
> > sshd(8) stops when the first client disconnects.
> > cron(8) stops when it sends an email.
> > top(1) returns normally on 'q', prints 'Illegal instruction' on strg+c.
> > ping(8) returns 'Illegal instruction' after printing the first line.
> >
> > Testing the commands that are run by install.sub on the running -current
> > system shows, that if I ftp to file and run the file through sha256
> > there is no problem. The pipe is causing the 'Illegal instruction'.
> >
> > I tried my gdb voodoo but it is weak...
> > gdb(1) 'bt' on the cores has one thing in common: <stdin> - no such
> > file.
> >
> > Below my posts to misc@ (sorry!), followed by -current (24/1) dmesg.
> >
> > Attached is the usual info on bugs@, for 5.7 and -current (24/1).
> >
> > One of the machines is ready for debugging. Please advise!
> >
> > Bye + Thanks for reading, Marcus
> >
> > [hidden email] (Marcus MERIGHI), 2016.01.22 (Fri) 16:14 (CET):
> > > please disregard!
> > >
> > > sorry, right after hitting send I realized that this was not a -current
> > > install as usual but a 5.8 one.
> > >
> > > Therefore I'm quite sure the problem is hardware/PEBKAC.
> >  
> > > [hidden email] (Marcus MERIGHI), 2016.01.22 (Fri) 16:11 (CET):
> > > > I just downloaded amd64 bsd.rd, put it on an usb stick, booted a new
> > > > machine from usb.
> > > >
> > > > When the file sets were selected, 'Get/Verify SHA256.sig' succeeded, but
> > > > 'Get/Verify bsd' stopped instantly and gave only 'Illegal instruction'.
> > > >
> > > > Before this the only thing that didn't work was fetching the mirror
> > > > list. I entered one manually and the installer proceeded.
> > > >
> > > > The machine once had OpenBSD loaded and worked, it's a Shuttle DS47,
> > > > dmesg is in the archives.
> > > >
> > > > The USB drive I used is the one I always use for these tasks.
> >
> > OpenBSD 5.9-beta (RAMDISK_CD) #1695: Sun Jan 24 21:40:49 MST 2016
> >     [hidden email]:/usr/src/sys/arch/amd64/compile/RAMDISK_CD
> > real mem = 4161052672 (3968MB)
> > avail mem = 4033228800 (3846MB)
> > mainbus0 at root
> > bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xeb530 (73 entries)
> > bios0: vendor American Megatrends Inc. version "1.03" date 08/09/2013
> > bios0: Shuttle Inc. DS47D
> > acpi0 at bios0: rev 2
> > acpi0: tables DSDT FACP APIC FPDT MCFG SLIC HPET SSDT SSDT SSDT
> > acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
> > cpu0 at mainbus0: apid 0 (boot processor)
> > cpu0: Intel(R) Celeron(R) CPU 847 @ 1.10GHz, 1097.70 MHz
> > cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,XSAVE,NXE,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
> > cpu0: 256KB 64b/line 8-way L2 cache
> > cpu0: apic clock running at 99MHz
> > cpu0: mwait min=64, max=64, C-substates=0.2.1.1.2, IBE
> > cpu at mainbus0: not configured
> > ioapic0 at mainbus0: apid 2 pa 0xfec00000, version 20, 24 pins
> > acpiprt0 at acpi0: bus 0 (PCI0)
> > acpiprt1 at acpi0: bus -1 (P0P1)
> > acpiprt2 at acpi0: bus 1 (RP01)
> > acpiprt3 at acpi0: bus 2 (RP02)
> > acpiprt4 at acpi0: bus 3 (RP03)
> > acpiprt5 at acpi0: bus 4 (RP04)
> > acpiprt6 at acpi0: bus -1 (RP05)
> > acpiprt7 at acpi0: bus -1 (RP06)
> > acpiprt8 at acpi0: bus -1 (RP07)
> > acpiprt9 at acpi0: bus -1 (RP08)
> > acpiprt10 at acpi0: bus -1 (PEG0)
> > acpiprt11 at acpi0: bus -1 (PEG1)
> > acpiprt12 at acpi0: bus -1 (PEG2)
> > acpiprt13 at acpi0: bus -1 (PEG3)
> > acpiec0 at acpi0: not present
> > acpicpu at acpi0 not configured
> > acpipwrres at acpi0 not configured
> > acpipwrres at acpi0 not configured
> > acpipwrres at acpi0 not configured
> > acpipwrres at acpi0 not configured
> > acpipwrres at acpi0 not configured
> > acpitz at acpi0 not configured
> > acpitz at acpi0 not configured
> > acpibat at acpi0 not configured
> > acpibat at acpi0 not configured
> > acpibat at acpi0 not configured
> > acpibtn at acpi0 not configured
> > acpibtn at acpi0 not configured
> > pci0 at mainbus0 bus 0
> > pchb0 at pci0 dev 0 function 0 "Intel Core 2G Host" rev 0x09
> > vga1 at pci0 dev 2 function 0 "Intel HD Graphics 2000" rev 0x09
> > wsdisplay1 at vga1 mux 1: console (80x25, vt100 emulation)
> > "Intel 7 Series MEI" rev 0x04 at pci0 dev 22 function 0 not configured
> > ehci0 at pci0 dev 26 function 0 "Intel 7 Series USB" rev 0x04: apic 2 int 16
> > usb0 at ehci0: USB revision 2.0
> > uhub0 at usb0 "Intel EHCI root hub" rev 2.00/1.00 addr 1
> > "Intel 7 Series HD Audio" rev 0x04 at pci0 dev 27 function 0 not configured
> > ppb0 at pci0 dev 28 function 0 "Intel 7 Series PCIE" rev 0xc4: msi
> > pci1 at ppb0 bus 1
> > iwn0 at pci1 dev 0 function 0 "Intel Wireless WiFi Link 4965" rev 0x61: msi, MIMO 2T3R, MoW2, address 00:21:5c:04:ca:5f
> > ppb1 at pci0 dev 28 function 1 "Intel 7 Series PCIE" rev 0xc4: msi
> > pci2 at ppb1 bus 2
> > re0 at pci2 dev 0 function 0 "Realtek 8168" rev 0x0c: RTL8168G/8111G (0x4c00), msi, address 80:ee:73:77:d5:a9
> > rgephy0 at re0 phy 7: RTL8251 PHY, rev. 0
> > ppb2 at pci0 dev 28 function 2 "Intel 7 Series PCIE" rev 0xc4: msi
> > pci3 at ppb2 bus 3
> > xhci0 at pci3 dev 0 function 0 "ASMedia ASM1042A xHCI" rev 0x00: msi
> > usb1 at xhci0: USB revision 3.0
> > uhub1 at usb1 "ASMedia xHCI root hub" rev 3.00/1.00 addr 1
> > ppb3 at pci0 dev 28 function 3 "Intel 7 Series PCIE" rev 0xc4: msi
> > pci4 at ppb3 bus 4
> > re1 at pci4 dev 0 function 0 "Realtek 8168" rev 0x0c: RTL8168G/8111G (0x4c00), msi, address 80:ee:73:77:d5:a8
> > rgephy1 at re1 phy 7: RTL8251 PHY, rev. 0
> > ehci1 at pci0 dev 29 function 0 "Intel 7 Series USB" rev 0x04: apic 2 int 23
> > usb2 at ehci1: USB revision 2.0
> > uhub2 at usb2 "Intel EHCI root hub" rev 2.00/1.00 addr 1
> > "Intel NM70 LPC" rev 0x04 at pci0 dev 31 function 0 not configured
> > ahci0 at pci0 dev 31 function 2 "Intel 7 Series AHCI" rev 0x04: msi, AHCI 1.3
> > ahci0: port 2: 3.0Gb/s
> > scsibus0 at ahci0: 32 targets
> > sd0 at scsibus0 targ 2 lun 0: <ATA, TS8GMSM610, 2013> SCSI3 0/direct fixed t10.ATA_TS8GMSM610_A93797112253B1000021
> > sd0: 7641MB, 512 bytes/sector, 15649200 sectors
> > "Intel 7 Series SMBus" rev 0x04 at pci0 dev 31 function 3 not configured
> > isa0 at mainbus0
> > com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
> > com1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
> > pckbc0 at isa0 port 0x60/5 irq 1 irq 12
> > uhub3 at uhub0 port 1 "vendor 0x8087 product 0x0024" rev 2.00/0.00 addr 2
> > uhidev0 at uhub3 port 1 configuration 1 interface 0 "vendor 0x05af Rx504B  Ver:3.03" rev 2.00/3.10 addr 3
> > uhidev0: iclass 3/1
> > ukbd0 at uhidev0
> > wskbd0 at ukbd0: console keyboard, using wsdisplay1
> > uhidev1 at uhub3 port 1 configuration 1 interface 1 "vendor 0x05af Rx504B  Ver:3.03" rev 2.00/3.10 addr 3
> > uhidev1: iclass 3/1, 5 report ids
> > uhid at uhidev1 reportid 1 not configured
> > uhid at uhidev1 reportid 2 not configured
> > uhid at uhidev1 reportid 3 not configured
> > uhid at uhidev1 reportid 4 not configured
> > uhid at uhidev1 reportid 5 not configured
> > umass0 at uhub1 port 4 configuration 1 interface 0 "Generic Mass Storage" rev 2.00/1.0e addr 2
> > umass0: using SCSI over Bulk-Only
> > scsibus1 at umass0: 2 targets, initiator 0
> > sd1 at scsibus1 targ 1 lun 0: <Generic, Flash Disk, 8.07> SCSI2 0/direct removable
> > sd1: 7680MB, 512 bytes/sector, 15728640 sectors
> > uhub4 at uhub2 port 1 "vendor 0x8087 product 0x0024" rev 2.00/0.00 addr 2
> > umass1 at uhub4 port 2 configuration 1 interface 0 "Generic USB Storage" rev 2.00/94.54 addr 3
> > umass1: using SCSI over Bulk-Only
> > scsibus2 at umass1: 2 targets, initiator 0
> > sd2 at scsibus2 targ 1 lun 0: <Generic, STORAGE DEVICE, 9454> SCSI0 0/direct removable
> > softraid0 at root
> > scsibus3 at softraid0: 256 targets
> > root on rd0a swap on rd0b dump on rd0b
> > syncing disks...
> > OpenBSD 5.9-beta (GENERIC.MP) #1863: Sun Jan 24 21:35:42 MST 2016
> >     [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > real mem = 4161052672 (3968MB)
> > avail mem = 4030754816 (3844MB)
> > mpath0 at root
> > scsibus0 at mpath0: 256 targets
> > mainbus0 at root
> > bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xeb530 (73 entries)
> > bios0: vendor American Megatrends Inc. version "1.03" date 08/09/2013
> > bios0: Shuttle Inc. DS47D
> > acpi0 at bios0: rev 2
> > acpi0: sleep states S0 S3 S4 S5
> > acpi0: tables DSDT FACP APIC FPDT MCFG SLIC HPET SSDT SSDT SSDT
> > acpi0: wakeup devices P0P1(S4) USB1(S3) USB2(S3) USB3(S3) USB4(S3) USB5(S3) USB6(S3) USB7(S3) PXSX(S4) RP01(S4) PXSX(S4) RP02(S4) PXSX(S4) RP03(S4) PXSX(S4) RP04(S4) [...]
> > acpitimer0 at acpi0: 3579545 Hz, 24 bits
> > acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
> > cpu0 at mainbus0: apid 0 (boot processor)
> > cpu0: Intel(R) Celeron(R) CPU 847 @ 1.10GHz, 1097.70 MHz
> > cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,XSAVE,NXE,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
> > cpu0: 256KB 64b/line 8-way L2 cache
> > cpu0: smt 0, core 0, package 0
> > mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> > cpu0: apic clock running at 99MHz
> > cpu0: mwait min=64, max=64, C-substates=0.2.1.1.2, IBE
> > cpu1 at mainbus0: apid 2 (application processor)
> > cpu1: Intel(R) Celeron(R) CPU 847 @ 1.10GHz, 1097.51 MHz
> > cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,XSAVE,NXE,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
> > cpu1: 256KB 64b/line 8-way L2 cache
> > cpu1: smt 0, core 1, package 0
> > ioapic0 at mainbus0: apid 2 pa 0xfec00000, version 20, 24 pins
> > acpimcfg0 at acpi0 addr 0xf8000000, bus 0-63
> > acpihpet0 at acpi0: 14318179 Hz
> > acpiprt0 at acpi0: bus 0 (PCI0)
> > acpiprt1 at acpi0: bus -1 (P0P1)
> > acpiprt2 at acpi0: bus 1 (RP01)
> > acpiprt3 at acpi0: bus 2 (RP02)
> > acpiprt4 at acpi0: bus 3 (RP03)
> > acpiprt5 at acpi0: bus 4 (RP04)
> > acpiprt6 at acpi0: bus -1 (RP05)
> > acpiprt7 at acpi0: bus -1 (RP06)
> > acpiprt8 at acpi0: bus -1 (RP07)
> > acpiprt9 at acpi0: bus -1 (RP08)
> > acpiprt10 at acpi0: bus -1 (PEG0)
> > acpiprt11 at acpi0: bus -1 (PEG1)
> > acpiprt12 at acpi0: bus -1 (PEG2)
> > acpiprt13 at acpi0: bus -1 (PEG3)
> > acpiec0 at acpi0: not present
> > acpicpu0 at acpi0: C2(350@104 mwait.1@0x20), C1(1000@1 mwait.1), PSS
> > acpicpu1 at acpi0: C2(350@104 mwait.1@0x20), C1(1000@1 mwait.1), PSS
> > acpipwrres0 at acpi0: FN00, resource for FAN0
> > acpipwrres1 at acpi0: FN01, resource for FAN1
> > acpipwrres2 at acpi0: FN02, resource for FAN2
> > acpipwrres3 at acpi0: FN03, resource for FAN3
> > acpipwrres4 at acpi0: FN04, resource for FAN4
> > acpitz0 at acpi0: critical temperature is 101 degC
> > acpitz1 at acpi0: critical temperature is 101 degC
> > acpibat0 at acpi0: BAT0 not present
> > acpibat1 at acpi0: BAT1 not present
> > acpibat2 at acpi0: BAT2 not present
> > acpibtn0 at acpi0: PWRB
> > acpibtn1 at acpi0: LID0
> > acpivideo0 at acpi0: GFX0
> > cpu0: Enhanced SpeedStep 1097 MHz: speeds: 1100, 1000, 900, 800 MHz
> > pci0 at mainbus0 bus 0
> > pchb0 at pci0 dev 0 function 0 "Intel Core 2G Host" rev 0x09
> > inteldrm0 at pci0 dev 2 function 0 "Intel HD Graphics 2000" rev 0x09
> > drm0 at inteldrm0
> > inteldrm0: msi
> > inteldrm0: 1280x1024
> > wsdisplay0 at inteldrm0 mux 1: console (std, vt100 emulation)
> > wsdisplay0: screen 1-5 added (std, vt100 emulation)
> > "Intel 7 Series MEI" rev 0x04 at pci0 dev 22 function 0 not configured
> > ehci0 at pci0 dev 26 function 0 "Intel 7 Series USB" rev 0x04: apic 2 int 16
> > usb0 at ehci0: USB revision 2.0
> > uhub0 at usb0 "Intel EHCI root hub" rev 2.00/1.00 addr 1
> > azalia0 at pci0 dev 27 function 0 "Intel 7 Series HD Audio" rev 0x04: msi
> > azalia0: codecs: Realtek ALC662, Intel/0x2806, using Realtek ALC662
> > audio0 at azalia0
> > ppb0 at pci0 dev 28 function 0 "Intel 7 Series PCIE" rev 0xc4: msi
> > pci1 at ppb0 bus 1
> > iwn0 at pci1 dev 0 function 0 "Intel Wireless WiFi Link 4965" rev 0x61: msi, MIMO 2T3R, MoW2, address 00:21:5c:04:ca:5f
> > ppb1 at pci0 dev 28 function 1 "Intel 7 Series PCIE" rev 0xc4: msi
> > pci2 at ppb1 bus 2
> > re0 at pci2 dev 0 function 0 "Realtek 8168" rev 0x0c: RTL8168G/8111G (0x4c00), msi, address 80:ee:73:77:d5:a9
> > rgephy0 at re0 phy 7: RTL8251 PHY, rev. 0
> > ppb2 at pci0 dev 28 function 2 "Intel 7 Series PCIE" rev 0xc4: msi
> > pci3 at ppb2 bus 3
> > xhci0 at pci3 dev 0 function 0 "ASMedia ASM1042A xHCI" rev 0x00: msi
> > usb1 at xhci0: USB revision 3.0
> > uhub1 at usb1 "ASMedia xHCI root hub" rev 3.00/1.00 addr 1
> > ppb3 at pci0 dev 28 function 3 "Intel 7 Series PCIE" rev 0xc4: msi
> > pci4 at ppb3 bus 4
> > re1 at pci4 dev 0 function 0 "Realtek 8168" rev 0x0c: RTL8168G/8111G (0x4c00), msi, address 80:ee:73:77:d5:a8
> > rgephy1 at re1 phy 7: RTL8251 PHY, rev. 0
> > ehci1 at pci0 dev 29 function 0 "Intel 7 Series USB" rev 0x04: apic 2 int 23
> > usb2 at ehci1: USB revision 2.0
> > uhub2 at usb2 "Intel EHCI root hub" rev 2.00/1.00 addr 1
> > pcib0 at pci0 dev 31 function 0 "Intel NM70 LPC" rev 0x04
> > ahci0 at pci0 dev 31 function 2 "Intel 7 Series AHCI" rev 0x04: msi, AHCI 1.3
> > ahci0: port 2: 3.0Gb/s
> > scsibus1 at ahci0: 32 targets
> > sd0 at scsibus1 targ 2 lun 0: <ATA, TS8GMSM610, 2013> SCSI3 0/direct fixed t10.ATA_TS8GMSM610_A93797112253B1000021
> > sd0: 7641MB, 512 bytes/sector, 15649200 sectors
> > ichiic0 at pci0 dev 31 function 3 "Intel 7 Series SMBus" rev 0x04: apic 2 int 18
> > iic0 at ichiic0
> > spdmem0 at iic0 addr 0x50: 4GB DDR3 SDRAM PC3-10600 SO-DIMM
> > isa0 at pcib0
> > isadma0 at isa0
> > com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
> > com1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
> > pckbc0 at isa0 port 0x60/5 irq 1 irq 12
> > pcppi0 at isa0 port 0x61
> > spkr0 at pcppi0
> > it0 at isa0 port 0x2e/2: IT8728F rev 1, EC port 0xa30
> > uhub3 at uhub0 port 1 "Intel Rate Matching Hub" rev 2.00/0.00 addr 2
> > uhidev0 at uhub3 port 1 configuration 1 interface 0 "Jing Mold Rx504B  Ver:3.03" rev 2.00/3.10 addr 3
> > uhidev0: iclass 3/1
> > ukbd0 at uhidev0: 8 variable keys, 6 key codes
> > wskbd0 at ukbd0: console keyboard, using wsdisplay0
> > uhidev1 at uhub3 port 1 configuration 1 interface 1 "Jing Mold Rx504B  Ver:3.03" rev 2.00/3.10 addr 3
> > uhidev1: iclass 3/1, 5 report ids
> > ums0 at uhidev1 reportid 1: 3 buttons, Z dir
> > wsmouse0 at ums0 mux 0
> > uhid0 at uhidev1 reportid 2: input=2, output=0, feature=0
> > uhid1 at uhidev1 reportid 3: input=2, output=0, feature=0
> > uhid2 at uhidev1 reportid 4: input=2, output=0, feature=0
> > uhid3 at uhidev1 reportid 5: input=7, output=7, feature=0
> > uhub4 at uhub2 port 1 "Intel Rate Matching Hub" rev 2.00/0.00 addr 2
> > umass0 at uhub4 port 2 configuration 1 interface 0 "Generic USB Storage" rev 2.00/94.54 addr 3
> > umass0: using SCSI over Bulk-Only
> > scsibus2 at umass0: 2 targets, initiator 0
> > sd1 at scsibus2 targ 1 lun 0: <Generic, STORAGE DEVICE, 9454> SCSI0 0/direct removable
> > vscsi0 at root
> > scsibus3 at vscsi0: 256 targets
> > softraid0 at root
> > scsibus4 at softraid0: 256 targets
> > root on sd0a (e333dfd9e2d5f9a9.a) swap on sd0b dump on sd0b
> > umass1 at uhub4 port 4 configuration 1 interface 0 "Generic Mass Storage" rev 2.00/1.0e addr 4
> > umass1: using SCSI over Bulk-Only
> > scsibus5 at umass1: 2 targets, initiator 0
> > sd2 at scsibus5 targ 1 lun 0: <Generic, Flash Disk, 8.07> SCSI2 0/direct removable
> > sd2: 7680MB, 512 bytes/sector, 15728640 sectors
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: installer amd64 'Get/Verify bsd' -> 'Illegal instruction' - shuttle ds47d

Stuart Henderson-6
On 2016/01/27 20:10, Stefan Kempf wrote:
> So what I suspect to happen is that:
> - userland does a syscall
> - something goes wrong in the kernel, causing it to call
>   sigexit(SIGILL), terminating the process
> - and the offending instruction you see in the core dump
>   is the 'syscall' instruction.

If this is the case, perhaps ktrace will give clues.

Reply | Threaded
Open this post in threaded view
|

Re: installer amd64 'Get/Verify bsd' -> 'Illegal instruction' - shuttle ds47d

Stefan Kempf-2
Stuart Henderson wrote:
> On 2016/01/27 20:10, Stefan Kempf wrote:
> > So what I suspect to happen is that:
> > - userland does a syscall
> > - something goes wrong in the kernel, causing it to call
> >   sigexit(SIGILL), terminating the process
> > - and the offending instruction you see in the core dump
> >   is the 'syscall' instruction.
>
> If this is the case, perhaps ktrace will give clues.

Let's give it a try.
 
Marcus, can you run this as root, please?
ktrace /sbin/ping some.domain

Or whatever way you invoked ping that made it crash.

And send us the output of kdump -f ktrace.out?

Reply | Threaded
Open this post in threaded view
|

Re: installer amd64 'Get/Verify bsd' -> 'Illegal instruction' - shuttle ds47d

Marcus MERIGHI
[hidden email] (Stefan Kempf), 2016.01.28 (Thu) 06:48 (CET):

> Stuart Henderson wrote:
> > On 2016/01/27 20:10, Stefan Kempf wrote:
> > > So what I suspect to happen is that:
> > > - userland does a syscall
> > > - something goes wrong in the kernel, causing it to call
> > >   sigexit(SIGILL), terminating the process
> > > - and the offending instruction you see in the core dump
> > >   is the 'syscall' instruction.
> >
> > If this is the case, perhaps ktrace will give clues.
>
> Let's give it a try.
>  
> Marcus, can you run this as root, please?
> ktrace /sbin/ping some.domain
>
> Or whatever way you invoked ping that made it crash.
>
> And send us the output of kdump -f ktrace.out?

Thanks for the advise:

# ktrace /sbin/ping 192.168.188.189
PING 192.168.188.189 (192.168.188.189): 56 data bytes
64 bytes from 192.168.188.189: icmp_seq=0 ttl=255 time=3.286 ms
Illegal instruction

# kdump -f ./ktrace.out
 31378          EMUL  "native"
 31378 ktrace   RET   ktrace 0
 31378 ktrace   CALL
execve(0x7f7ffffc1fff,0x7f7ffffc1f20,0x7f7ffffc1f38)
 31378 ktrace   NAMI  "/sbin/ping"
 31378 ktrace   ARGS  
        [0] = "/sbin/ping"
        [1] = "192.168.188.189"
 31378 ping     RET   execve 0
 31378 ping     CALL  mprotect(0x15a413e2f000,0x2000,0x1<PROT_READ>)
 31378 ping     RET   mprotect 0
 31378 ping     CALL  kbind(0,0x7f7ffffd8a5a,0)
 31378 ping     RET   kbind 0
 31378 ping     CALL  sysctl(6.7<hw.pagesize>,0x15a413f38c60,
                        0x7f7ffffd88c0,0,0)
 31378 ping     RET   sysctl 0
 31378 ping     CALL  mmap(0,0x1000,0x3<PROT_READ|PROT_WRITE>,0x1002
                        <MAP_PRIVATE|MAP_ANON>,-1,0)
 31378 ping     RET   mmap 23805231312896/0x15a6965b3000
 31378 ping     CALL  mprotect(0x15a6965b3000,0x1000,0x1<PROT_READ>)
 31378 ping     RET   mprotect 0
 31378 ping     CALL  socket(AF_INET,0x3<SOCK_RAW>,0x1)
 31378 ping     RET   socket 3
 31378 ping     CALL  getuid()
 31378 ping     RET   getuid 0<"root">
 31378 ping     CALL  setresuid(0<"root">,0<"root">,0<"root">)
 31378 ping     RET   setresuid 0
 31378 ping     CALL  getentropy(0x7f7ffffd81d0,40)
 31378 ping     RET   getentropy 0
 31378 ping     CALL  mmap(0,0x450,0x3<PROT_READ|PROT_WRITE>,0x1002
                        <MAP_PRIVATE|MAP_ANON>,-1,0)
 31378 ping     RET   mmap 23804690755584/0x15a67622f000
 31378 ping     CALL  minherit(0x15a67622f000,0x450,MAP_INHERIT_ZERO)
 31378 ping     RET   minherit 0
 31378 ping     CALL  readlink(0x15a413c2ac78,0x7f7ffffd81a0,63)
 31378 ping     NAMI  "/etc/malloc.conf"
 31378 ping     RET   readlink -1 errno 2 No such file or directory
 31378 ping     CALL  issetugid()
 31378 ping     RET   issetugid 1
 31378 ping     CALL  mmap(0,0x4000,0x3<PROT_READ|PROT_WRITE>,0x1002
                        <MAP_PRIVATE|MAP_ANON>,-1,0)
 31378 ping     RET   mmap 23806166503424/0x15a6ce191000
 31378 ping     CALL  mprotect(0x15a6ce191000,0x1000,0<PROT_NONE>)
 31378 ping     RET   mprotect 0
 31378 ping     CALL  mprotect(0x15a6ce194000,0x1000,0<PROT_NONE>)
 31378 ping     RET   mprotect 0
 31378 ping     CALL  mmap(0,0x2000,0x3<PROT_READ|PROT_WRITE>,0x1002
                        <MAP_PRIVATE|MAP_ANON>,-1,0)
 31378 ping     RET   mmap 23805153824768/0x15a691bcd000
 31378 ping     CALL  mprotect(0x15a413f35000,0x1000,0x1<PROT_READ>)
 31378 ping     RET   mprotect 0
 31378 ping     CALL  mmap(0,0x1000,0x3<PROT_READ|PROT_WRITE>,0x1002
                        <MAP_PRIVATE|MAP_ANON>,-1,0)
 31378 ping     RET   mmap 23805802340352/0x15a6b8646000
 31378 ping     CALL  mmap(0,0x1000,0x3<PROT_READ|PROT_WRITE>,0x1002
                        <MAP_PRIVATE|MAP_ANON>,-1,0)
 31378 ping     RET   mmap 23805649330176/0x15a6af45a000
 31378 ping     CALL  getpid()
 31378 ping     RET   getpid 31378/0x7a92
 31378 ping     CALL  getsockopt(3,SOL_SOCKET,SO_SNDBUF,0x7f7ffffd87c0,
                        0x7f7ffffd87b8)
 31378 ping     RET   getsockopt 0
 31378 ping     CALL  setsockopt(3,SOL_SOCKET,SO_RCVBUF,
                        0x15a413f30238,4)
 31378 ping     RET   setsockopt 0
 31378 ping     CALL  mprotect(0x15a6965b3000,0x1000,0x3<PROT_READ|
                        PROT_WRITE>)
 31378 ping     RET   mprotect 0
 31378 ping     CALL  mprotect(0x15a6965b3000,0x1000,0x1<PROT_READ>)
 31378 ping     RET   mprotect 0
 31378 ping     CALL  fstat(1,0x7f7ffffd7640)
 31378 ping     STRU  struct stat { dev=1040, ino=338283,
                        mode=crw--w---- , nlink=1, uid=1000<"asfer">,
                        gid=4<"tty">, rdev=1280, atime=1453968739
                        <"Jan 28 09:12:19 2016">.870031187,
                        mtime=1453968739
                        <"Jan 28 09:12:19 2016">.870031187,
                        ctime=1453968739
                        <"Jan 28 09:12:19 2016">.870031187, size=0,
                        blocks=0, blksize=65536, flags=0x0,
                        gen=0x88d85db9 }
 31378 ping     RET   fstat 0
 31378 ping     CALL  mmap(0,0x10000,0x3<PROT_READ|PROT_WRITE>,0x1002
                        <MAP_PRIVATE|MAP_ANON>,-1,0)
 31378 ping     RET   mmap 23803532505088/0x15a631197000
 31378 ping     CALL  fcntl(1,F_ISATTY)
 31378 ping     RET   fcntl 1
 31378 ping     CALL  write(1,0x15a631197000,0x36)
 31378 ping     GIO   fd 1 wrote 54 bytes
       "PING 192.168.188.189 (192.168.188.189): 56 data bytes
       "
 31378 ping     RET   write 54/0x36
 31378 ping     CALL  pledge(0x15a413c2a1c1,0)
 31378 ping     STRU  pledge request="stdio inet dns"
 31378 ping     RET   pledge 0
 31378 ping     CALL  sigaction(SIGINT,0x7f7ffffd8230,0x7f7ffffd8220)
 31378 ping     STRU  struct sigaction { handler=0x15a413b03050,
                        mask=0<>, flags=0x2<SA_RESTART> }
 31378 ping     STRU  struct sigaction { handler=SIG_DFL, mask=0<>,
                        flags=0<> }
 31378 ping     RET   sigaction 0
 31378 ping     CALL  sigaction(SIGINFO,0x7f7ffffd8230,0x7f7ffffd8220)
 31378 ping     STRU  struct sigaction { handler=0x15a413b03050,
                        mask=0<>, flags=0x2<SA_RESTART> }
 31378 ping     STRU  struct sigaction { handler=SIG_DFL, mask=0<>,
                        flags=0x12<SA_RESTART|SA_NODEFER> }
 31378 ping     RET   sigaction 0
 31378 ping     CALL  sigaction(SIGALRM,0x7f7ffffd8230,0x7f7ffffd8220)
 31378 ping     STRU  struct sigaction { handler=0x15a413b03050,
                        mask=0<>, flags=0x2<SA_RESTART> }
 31378 ping     STRU  struct sigaction { handler=SIG_DFL, mask=0<>,
                        flags=0<> }
 31378 ping     RET   sigaction 0
 31378 ping     CALL  setitimer(ITIMER_REAL,0x7f7ffffd8740,0)
 31378 ping     RET   setitimer 0
 31378 ping     CALL  clock_gettime(CLOCK_MONOTONIC,0x7f7ffffd81c0)
 31378 ping     STRU  struct timespec { 871<"Jan  1 01:14:31
                        1970">.350784414 }
 31378 ping     RET   clock_gettime 0
 31378 ping     CALL  sendto(3,0x15a413f38eb4,0x40,0,0x15a413f4af10,
                        0x10)
 31378 ping     STRU  struct sockaddr { AF_INET, 192.168.188.189:0 }
 31378 ping     GIO   fd 3 wrote 64 bytes
       "\b\0\M^Pc\M^Rz\0\0C\^O{\^W=\M-'N\M-eC\M^OX\^Wf\M^I\M-v\M^]\f4\
        M-*\^X\M-P~\M^XR\^X\^Y\^Z\^[\^\\^]\^^\^_!"#$%&'()*+,-.\
        /01234567"
 31378 ping     RET   sendto 64/0x40
 31378 ping     CALL  poll(0x7f7ffffd8790,1,INFTIM)
 31378 ping     RET   poll 1
 31378 ping     CALL  recvmsg(3,0x7f7ffffd86e0,0)
 31378 ping     GIO   fd 3 read 84 bytes
       "E\0\0T\a\^Z\0\0\M^?\^A\M-:\^Z\M-@\M-(\M-<\M-=\M-@\M-(\M-<e\0\0\
        M^Xc\M^Rz\0\0C\^O{\^W=\M-'N\M-eC\M^OX\^Wf\M^I\M-v\M^]\\f4\M-*\
        ^X\M-P~\M^XR\^X\^Y\^Z\^[\^\\^]\^^\^_!"#$%&'()*+,-./01234567"
 31378 ping     STRU  struct sockaddr { AF_INET, 192.168.188.189:0 }
 31378 ping     STRU  struct msghdr { name=0x7f7ffffd8780, namelen=16,
                        iov=0x7f7ffffd8760, iovlen=1,
                        control=0x7f7ffffd82d0, controllen=0, flags=0 }
 31378 ping     STRU  struct iovec { base=0x15a6b8646f54, len=108 }
 31378 ping     RET   recvmsg 84/0x54
 31378 ping     CALL  clock_gettime(CLOCK_MONOTONIC,0x7f7ffffd8200)
 31378 ping     STRU  struct timespec { 871<"Jan  1 01:14:31
                        1970">.354070662 }
 31378 ping     RET   clock_gettime 0
 31378 ping     CALL  write(1,0x15a631197000,0x40)
 31378 ping     GIO   fd 1 wrote 64 bytes
       "64 bytes from 192.168.188.189: icmp_seq=0 ttl=255 time=3.286 ms
       "
 31378 ping     RET   write 64/0x40
 31378 ping     CALL  poll(0x7f7ffffd8790,1,INFTIM)
 31378 ping     PSIG  SIGALRM caught handler=0x15a413b03050 mask=0<>



# gdb -q /usr/sbin/sshd /sshd.core  
(no debugging symbols found)
Core was generated by `sshd'.
Program terminated with signal 4, Illegal instruction.
(no debugging symbols found)
Loaded symbols for /usr/sbin/sshd
Reading symbols from /usr/lib/libutil.so.12.1...done.
Loaded symbols for /usr/lib/libutil.so.12.1
Reading symbols from /usr/lib/libcrypto.so.37.0...done.
Loaded symbols for /usr/lib/libcrypto.so.37.0
Reading symbols from /usr/lib/libz.so.5.0...done.
Loaded symbols for /usr/lib/libz.so.5.0
Reading symbols from /usr/lib/libc.so.84.2...done.
Loaded symbols for /usr/lib/libc.so.84.2
Reading symbols from /usr/libexec/ld.so...done.
Loaded symbols for /usr/libexec/ld.so
#0  0x00000d9b0d57d52a in select () at <stdin>:2
2       <stdin>: No such file or directory.
        in <stdin>
(gdb) bt
#0  0x00000d9b0d57d52a in select () at <stdin>:2
#1  0x00000d990c00de91 in sshd_hostkey_sign () from /usr/sbin/sshd
#2  0x00000d990c00b4a1 in ?? () from /usr/sbin/sshd
#3  0x0000000000000000 in ?? ()
Current language:  auto; currently asm
(gdb)


Thanks for looking, Marcus

Reply | Threaded
Open this post in threaded view
|

Re: installer amd64 'Get/Verify bsd' -> 'Illegal instruction' - shuttle ds47d

Stefan Kempf-2
Marcus MERIGHI wrote:

> [hidden email] (Stefan Kempf), 2016.01.28 (Thu) 06:48 (CET):
> > Stuart Henderson wrote:
> > > On 2016/01/27 20:10, Stefan Kempf wrote:
> > > > So what I suspect to happen is that:
> > > > - userland does a syscall
> > > > - something goes wrong in the kernel, causing it to call
> > > >   sigexit(SIGILL), terminating the process
> > > > - and the offending instruction you see in the core dump
> > > >   is the 'syscall' instruction.
> > >
> > > If this is the case, perhaps ktrace will give clues.
> >
> > Let's give it a try.
> >  
> > Marcus, can you run this as root, please?
> > ktrace /sbin/ping some.domain
> >
> > Or whatever way you invoked ping that made it crash.
> >
> > And send us the output of kdump -f ktrace.out?
>
> # ktrace /sbin/ping 192.168.188.189
> PING 192.168.188.189 (192.168.188.189): 56 data bytes
> 64 bytes from 192.168.188.189: icmp_seq=0 ttl=255 time=3.286 ms
> Illegal instruction

It's close to my guess. This is how I interpret the end of the output:
 
> # kdump -f ./ktrace.out
> [...]
>  31378 ping     CALL  poll(0x7f7ffffd8790,1,INFTIM)
>  31378 ping     PSIG  SIGALRM caught handler=0x15a413b03050 mask=0<>
 
The process blocks in a system call, then a signal wakes it up. Before
returning to userspace, sendsig() tries to setup a signal context.
Since the ktrace output stops here, sendsig() must have called
sigexit(SIGILL). This happens when the kernel is not able to copy the
signal context onto the stack of the user process.

Some reasons I can think of: the process is at the very bottom of
the stack, the stack pointer of the user process is trashed, or:
the stack pointer is within the stack area of the process, but
it points to a page that was not yet mapped-in, and uvm_fault()
fails to fault it in for some reason.

Let's see what the stack pointer looks like when you get the illegal
instruction. Can you try this please:

$ top

In a different shell (as root):
# procmap <pid of top>

We need to see the lines that say [ stack ]

Now, back in top, hit ctrl+c to make it crash. Then run:

$ gdb -q /usr/bin/top top.core
(gdb) info reg

And send us the output of the 'info reg' command.

>
> # gdb -q /usr/sbin/sshd /sshd.core  
> (no debugging symbols found)
> Core was generated by `sshd'.
> Program terminated with signal 4, Illegal instruction.
> (no debugging symbols found)
> Loaded symbols for /usr/sbin/sshd
> Reading symbols from /usr/lib/libutil.so.12.1...done.
> Loaded symbols for /usr/lib/libutil.so.12.1
> Reading symbols from /usr/lib/libcrypto.so.37.0...done.
> Loaded symbols for /usr/lib/libcrypto.so.37.0
> Reading symbols from /usr/lib/libz.so.5.0...done.
> Loaded symbols for /usr/lib/libz.so.5.0
> Reading symbols from /usr/lib/libc.so.84.2...done.
> Loaded symbols for /usr/lib/libc.so.84.2
> Reading symbols from /usr/libexec/ld.so...done.
> Loaded symbols for /usr/libexec/ld.so
> #0  0x00000d9b0d57d52a in select () at <stdin>:2
> 2       <stdin>: No such file or directory.
>         in <stdin>
> (gdb) bt
> #0  0x00000d9b0d57d52a in select () at <stdin>:2
> #1  0x00000d990c00de91 in sshd_hostkey_sign () from /usr/sbin/sshd
> #2  0x00000d990c00b4a1 in ?? () from /usr/sbin/sshd
> #3  0x0000000000000000 in ?? ()
> Current language:  auto; currently asm
> (gdb)
>
>
> Thanks for looking, Marcus

Reply | Threaded
Open this post in threaded view
|

Re: installer amd64 'Get/Verify bsd' -> 'Illegal instruction' - shuttle ds47d

Marcus MERIGHI
[hidden email] (Stefan Kempf), 2016.01.28 (Thu) 20:57 (CET):

> Marcus MERIGHI wrote:
> > [hidden email] (Stefan Kempf), 2016.01.28 (Thu) 06:48 (CET):
> > > Stuart Henderson wrote:
> > > > On 2016/01/27 20:10, Stefan Kempf wrote:
> > > > > So what I suspect to happen is that:
> > > > > - userland does a syscall
> > > > > - something goes wrong in the kernel, causing it to call
> > > > >   sigexit(SIGILL), terminating the process
> > > > > - and the offending instruction you see in the core dump
> > > > >   is the 'syscall' instruction.
> > > >
> > > > If this is the case, perhaps ktrace will give clues.
> > >
> > > Let's give it a try.
> > >  
> > > Marcus, can you run this as root, please?
> > > ktrace /sbin/ping some.domain
> > >
> > > Or whatever way you invoked ping that made it crash.
> > >
> > > And send us the output of kdump -f ktrace.out?
> >
> > # ktrace /sbin/ping 192.168.188.189
> > PING 192.168.188.189 (192.168.188.189): 56 data bytes
> > 64 bytes from 192.168.188.189: icmp_seq=0 ttl=255 time=3.286 ms
> > Illegal instruction
>
> It's close to my guess. This is how I interpret the end of the output:
>  
> > # kdump -f ./ktrace.out
> > [...]
> >  31378 ping     CALL  poll(0x7f7ffffd8790,1,INFTIM)
> >  31378 ping     PSIG  SIGALRM caught handler=0x15a413b03050 mask=0<>
>  
> The process blocks in a system call, then a signal wakes it up. Before
> returning to userspace, sendsig() tries to setup a signal context.
> Since the ktrace output stops here, sendsig() must have called
> sigexit(SIGILL). This happens when the kernel is not able to copy the
> signal context onto the stack of the user process.
>
> Some reasons I can think of: the process is at the very bottom of
> the stack, the stack pointer of the user process is trashed, or:
> the stack pointer is within the stack area of the process, but
> it points to a page that was not yet mapped-in, and uvm_fault()
> fails to fault it in for some reason.
>
> Let's see what the stack pointer looks like when you get the illegal
> instruction. Can you try this please:
>
> $ top
>
> In a different shell (as root):
> # procmap <pid of top>
>
> We need to see the lines that say [ stack ]
00007F7FFDFE1000  28672K                     [ stack ]
00007F7FFFBE1000   4028K read/write          [ stack ]
00007F7FFFFD0000     64K read/write          [ stack ]
00007F7FFFFE0000      4K                     [ stack ]

> Now, back in top, hit ctrl+c to make it crash. Then run:
>
> $ gdb -q /usr/bin/top top.core
> (gdb) info reg
>
> And send us the output of the 'info reg' command.

rax            0x4      4
rbx            0x6773930c4a0    7109130372256
rcx            0x679c7cb2dda    7120112791002
rdx            0x1388   5000
rsi            0x1      1
rdi            0x7f7ffffdf858   140187732408408
rbp            0x1e     0x1e
rsp            0x7f7ffffdf848   0x7f7ffffdf848
r8             0x101010101010101        72340172838076673
r9             0x8080808080808080       -9187201950435737472
r10            0x679c7d11c5a    7120113179738
r11            0x246    582
r12            0x6773930bc60    7109130370144
r13            0x6773930c480    7109130372224
r14            0x7f7ffffdf8c0   140187732408512
r15            0x67738f08ae0    7109126163168
rip            0x679c7cb2dda    0x679c7cb2dda <poll+10>
eflags         0x247    583
cs             0x2b     43
ss             0x23     35
ds             0x23     35
es             0x23     35
fs             0x23     35
gs             0x23     35

Full output is in gdb.out and procmap.out, respectively.

Thanks for your instructions and for working on this!

Bye, Marcus

> !DSPAM:56aa72d7244577102954733!

gdb.out (1K) Download Attachment
procmap.out (12K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: installer amd64 'Get/Verify bsd' -> 'Illegal instruction' - shuttle ds47d

Stefan Kempf-2
Marcus MERIGHI wrote:

> [hidden email] (Stefan Kempf), 2016.01.28 (Thu) 20:57 (CET):
> > Marcus MERIGHI wrote:
> >  
> > Let's see what the stack pointer looks like when you get the illegal
> > instruction. Can you try this please:
> >
> > We need to see the lines that say [ stack ]
>
> 00007F7FFDFE1000  28672K                     [ stack ]
> 00007F7FFFBE1000   4028K read/write          [ stack ]
> 00007F7FFFFD0000     64K read/write          [ stack ]
> 00007F7FFFFE0000      4K                     [ stack ]
>
> > Now, back in top, hit ctrl+c to make it crash. Then run:
> >
> > $ gdb -q /usr/bin/top top.core
> > (gdb) info reg
> >
> > And send us the output of the 'info reg' command.
>
> rsp            0x7f7ffffdf848   0x7f7ffffdf848
 
0x7f7ffffdf848 is within 00007F7FFFFD0000 + 64K, which is mapped
read/write, so the process seems to enter the kernel with a proper
stack pointer.

We need to see how it looks like from within the kernel (and whether
the illegal instruction is really raised from within sendsig()). Can you
try the diff below?

Before booting the new kernel, add to your sysctl.conf:
ddb.panic=1
ddb.console=1

ddb.panic=1 should be enough though

You should get a kernel panic now instead of an illegal instruction
signal if you try running ping or top. We need the output of the panic
message and the output of the following commands:

ddb> trace
ddb> show proc

This will also print something like vmspace=<address>.
Use this address for the next command:

ddb> show map /f <address>

Thanks for helping remote-debugging :-)

Index: arch/amd64/amd64/machdep.c
===================================================================
RCS file: /cvs/src/sys/arch/amd64/amd64/machdep.c,v
retrieving revision 1.217
diff -u -p -r1.217 machdep.c
--- arch/amd64/amd64/machdep.c 21 Oct 2015 07:59:17 -0000 1.217
+++ arch/amd64/amd64/machdep.c 30 Jan 2016 09:39:57 -0000
@@ -527,6 +527,7 @@ sendsig(sig_t catcher, int sig, int mask
  siginfo_t ksi;
  register_t sp, scp, sip;
  u_long sss;
+ int userstack;
 
 #ifdef DEBUG
  if ((sigdebug & SDB_FOLLOW) && (!sigpid || p->p_pid == sigpid))
@@ -540,10 +541,13 @@ sendsig(sig_t catcher, int sig, int mask
 
  /* Allocate space for the signal handler context. */
  if ((p->p_sigstk.ss_flags & SS_DISABLE) == 0 &&
-    !sigonstack(tf->tf_rsp) && (psp->ps_sigonstack & sigmask(sig)))
+    !sigonstack(tf->tf_rsp) && (psp->ps_sigonstack & sigmask(sig))) {
  sp = (register_t)p->p_sigstk.ss_sp + p->p_sigstk.ss_size;
- else
+ userstack = 0;
+ } else {
  sp = tf->tf_rsp - 128;
+ userstack = 1;
+ }
 
  sp &= ~15ULL; /* just in case */
  sss = (sizeof(ksc) + 15) & ~15;
@@ -553,8 +557,18 @@ sendsig(sig_t catcher, int sig, int mask
  sp -= fpu_save_len;
  ksc.sc_fpstate = (struct fxsave64 *)sp;
  if (copyout(&p->p_addr->u_pcb.pcb_savefpu.fp_fxsave,
-    (void *)sp, fpu_save_len))
+    (void *)sp, fpu_save_len)) {
+ panic("sendsig 1: fxsave %p, sp %p, fxave_size %zu, "
+    "savefpu_size %zu, fpu_save_len %zu, tf_rsp %p, "
+    "userstack %d",
+    &p->p_addr->u_pcb.pcb_savefpu.fp_fxsave,
+    (void *)sp,
+    sizeof(p->p_addr->u_pcb.pcb_savefpu.fp_fxsave),
+    sizeof(p->p_addr->u_pcb.pcb_savefpu),
+    fpu_save_len, (void *)tf->tf_rsp,
+    userstack);
  sigexit(p, SIGILL);
+ }
 
  /* Signal handlers get a completely clean FP state */
  p->p_md.md_flags &= ~MDP_USEDFPU;
@@ -566,13 +580,22 @@ sendsig(sig_t catcher, int sig, int mask
  sss += (sizeof(ksi) + 15) & ~15;
 
  initsiginfo(&ksi, sig, code, type, val);
- if (copyout(&ksi, (void *)sip, sizeof(ksi)))
+ if (copyout(&ksi, (void *)sip, sizeof(ksi))) {
+ panic("sendsig 2: sip %p, tf_rsp %p, ksi_size %zu, "
+    "userstack %d",
+    (void *)sip, (void *)tf->tf_rsp, sizeof(ksi),
+    userstack);
  sigexit(p, SIGILL);
+ }
  }
  scp = sp - sss;
 
- if (copyout(&ksc, (void *)scp, sizeof(ksc)))
+ if (copyout(&ksc, (void *)scp, sizeof(ksc))) {
+ panic("sendsig 3: scp %p, tf_rsp %p, ksc_size %zu, "
+    "userstack %d",
+    (void *)scp, (void *)tf->tf_rsp, sizeof(ksc), userstack);
  sigexit(p, SIGILL);
+ }
 
  /*
  * Build context to run handler in.

Reply | Threaded
Open this post in threaded view
|

Re: installer amd64 'Get/Verify bsd' -> 'Illegal instruction' - shuttle ds47d

Marcus MERIGHI
[hidden email] (Stefan Kempf), 2016.01.30 (Sat) 10:49 (CET):

> Marcus MERIGHI wrote:
> > [hidden email] (Stefan Kempf), 2016.01.28 (Thu) 20:57 (CET):
> > > Marcus MERIGHI wrote:
> > >  
> > > Let's see what the stack pointer looks like when you get the illegal
> > > instruction. Can you try this please:
> > >
> > > We need to see the lines that say [ stack ]
> >
> > 00007F7FFDFE1000  28672K                     [ stack ]
> > 00007F7FFFBE1000   4028K read/write          [ stack ]
> > 00007F7FFFFD0000     64K read/write          [ stack ]
> > 00007F7FFFFE0000      4K                     [ stack ]
> >
> > > Now, back in top, hit ctrl+c to make it crash. Then run:
> > >
> > > $ gdb -q /usr/bin/top top.core
> > > (gdb) info reg
> > >
> > > And send us the output of the 'info reg' command.
> >
> > rsp            0x7f7ffffdf848   0x7f7ffffdf848
>  
> 0x7f7ffffdf848 is within 00007F7FFFFD0000 + 64K, which is mapped
> read/write, so the process seems to enter the kernel with a proper
> stack pointer.
>
> We need to see how it looks like from within the kernel (and whether
> the illegal instruction is really raised from within sendsig()). Can you
> try the diff below?
>
> Before booting the new kernel, add to your sysctl.conf:
> ddb.panic=1
> ddb.console=1
Done.

> You should get a kernel panic now instead of an illegal instruction
> signal if you try running ping or top. We need the output of the panic
> message and the output of the following commands:

ping(1), top(1) messed up the screen.

> ddb> trace
> ddb> show proc
>
> This will also print something like vmspace=<address>.
> Use this address for the next command:
>
> ddb> show map /f <address>
>
> Thanks for helping remote-debugging :-)

Thanks for looking at it and taking me further down the rabbit hole than
I could have gone myself... And thanks for all the explanations!

Console log below, attached as well for better readability.

Thanks, Marcus

# ping 192.168.188.189                                                  
PING 192.168.188.189 (192.168.188.189): 56 data bytes
64 bytes from 192.168.188.189: icmp_seq=0 ttl=255 time=166.533 ms
panic: sendsig 1: fxsave 0xffff800032c8a000, sp 0x7f7fff0d20b1,
fxave_size 512, savefpu_size 832, fpu_save_len 15773951, tf_rsp
0x7f7ffffdd238, userstack 1
Stopped at      Debugger+0x9:   leave
   TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
* 4460   4460      0    0x100033          0    1  ping
Debugger() at Debugger+0x9
panic() at panic+0xfe
sendsig() at sendsig+0x33e
postsig() at postsig+0x24e
userret() at userret+0x4c
syscall() at syscall+0x209
--- syscall (number 4) ---
end of kernel
end trace frame: 0x12f5ee349ee8, count: 9
0x12f5edf06afa:
http://www.openbsd.org/ddb.html describes the minimum info required in
bug reports.  Insufficient info makes it difficult to find and fix bugs.

ddb{1}> trace
Debugger() at Debugger+0x9
panic() at panic+0xfe
sendsig() at sendsig+0x33e
postsig() at postsig+0x24e
userret() at userret+0x4c
syscall() at syscall+0x209
--- syscall (number 4) ---
end of kernel
end trace frame: 0x12f5ee349ee8, count: -6
0x12f5edf06afa:

ddb{1}> show proc
PROC (ping) pid=4460 stat=onproc
    flags process=100033<CONTROLT,EXEC,SUGID,SUGIDEXEC,PLEDGE> proc=0
    pri=24, usrpri=50, nice=20
    forw=0xffffffffffffffff, list=0xffff800032d191d8,0xffffffff81943240
    process=0xffff800032c98a90 user=0xffff800032c8a000,
    vmspace=0xffffff011da41100
    estcpu=0, cpticks=1, pctcpu=0.1
    user=0, sys=1, intr=0

ddb{1}> show map /f 0xffffff011da41100
MAP 0xffffff011da41100: [0x1000->0x7f7fffffc000]
        brk() allocate range: 0x12f5ee100000-0x12f7ee100000
        stack allocate range: 0x7f7ffdfdf000-0x7f7ffffdf000
        sz=33988608, ref=1, version=77, flags=0x41
        pmap=0xffffff0111450638(resident=73)
        vm_refcnt=1 vm_shm=0x0 vm_rssize=0 vm_swrss=0
        vm_tsize=42 vm_dsize=587
        vm_taddr=0x12f5edf00000 vm_daddr=0x12f5ee100000
        vm_maxsaddr=0x7f7ffdfdf000 vm_minsaddr=0x7f7ffffdf000
 - 0xffffff01118ad2c0: 0x1000->0x1000: obj=0x0/0x0, amap=0x0/0
        submap=F, cow=F, nc=F, prot(max)=0/0, inh=0, wc=0, adv=0
        hole=F, free=T, guard=0x0, free=0x1000-0x12f5edf00000
        fspace_augment=20847468212224
        freemapped=T, uaddr=0xffffff011edd12a0
                (0x1000-0x7f7fffffc000 uaddr_rnd)
 - 0xffffff01124bec18: 0x12f5edf00000->0x12f5edf2a000:
   obj=0xffffff0110c13d48/0x0, amap=0x0/0
        submap=F, cow=T, nc=T, prot(max)=5/7, inh=1, wc=0, adv=0
        hole=F, free=T, guard=0x0, free=0x12f5edf2a000-0x12f5ee029000
        fspace_augment=20847468212224
        freemapped=T, uaddr=0xffffff011edd12a0
                (0x1000-0x7f7fffffc000 uaddr_rnd)
 - 0xffffff01124becc0: 0x12f5ee029000->0x12f5ee030000:
   obj=0xffffff0110c13d48/0x29000, amap=0x0/0
        submap=F, cow=T, nc=T, prot(max)=1/7, inh=1, wc=0, adv=0
        hole=F, free=T, guard=0x0, free=0x12f5ee030000-0x12f5ee100000
        fspace_augment=851968
        freemapped=T, uaddr=0xffffff011edd12a0
                (0x1000-0x7f7fffffc000 uaddr_rnd)
 - 0xffffff01124aa6b0: 0x12f5ee100000->0x12f5ee100000: obj=0x0/0x0,
   amap=0x0/0
        submap=F, cow=F, nc=F, prot(max)=0/0, inh=0, wc=0, adv=0
        hole=F, free=T, guard=0x0, free=0x12f5ee100000-0x12f5ee12f000
        fspace_augment=192512
        freemapped=T, uaddr=0xffffff011edd02a0
                (0x1000-0x7f7fffffc000 uaddr_stckbrk)
 - 0xffffff01124beb70: 0x12f5ee12f000->0x12f5ee130000: obj=0x0/0x0,
   amap=0xffffff01124b2798/0
        submap=F, cow=T, nc=F, prot(max)=3/7, inh=1, wc=0, adv=0
        hole=F, free=T, guard=0x0, free=0x12f5ee130000-0x12f5ee22f000
        fspace_augment=20847468212224
        freemapped=T, uaddr=0xffffff011edd02a0
                (0x1000-0x7f7fffffc000 uaddr_stckbrk)
 - 0xffffff01124beeb8: 0x12f5ee22f000->0x12f5ee230000: obj=0x0/0x0,
   amap=0xffffff01124b24c8/0
        submap=F, cow=T, nc=F, prot(max)=1/7, inh=1, wc=0, adv=0
        hole=F, free=T, guard=0x0, free=0x12f5ee230000-0x12f5ee32f000
        fspace_augment=1044480
        freemapped=T, uaddr=0xffffff011edd02a0
                (0x1000-0x7f7fffffc000 uaddr_stckbrk)
 - 0xffffff01124aa8a8: 0x12f5ee32f000->0x12f5ee330000:
   obj=0xffffff0110c13d48/0x2f000, amap=0xffffff01124b2168/0
        submap=F, cow=T, nc=F, prot(max)=3/7, inh=1, wc=0, adv=0
        hole=F, free=T, guard=0x0, free=0x12f5ee330000-0x12f5ee330000
        fspace_augment=8587530240
        freemapped=T, uaddr=0xffffff011edd02a0
                (0x1000-0x7f7fffffc000 uaddr_stckbrk)
 - 0xffffff01124aad40: 0x12f5ee330000->0x12f5ee331000: obj=0x0/0x0,
   amap=0xffffff01124b2630/0
        submap=F, cow=T, nc=F, prot(max)=3/7, inh=1, wc=0, adv=0
        hole=F, free=T, guard=0x0, free=0x12f5ee331000-0x12f5ee331000
        fspace_augment=0
        freemapped=T, uaddr=0xffffff011edd02a0
                (0x1000-0x7f7fffffc000 uaddr_stckbrk)
 - 0xffffff01124aab48: 0x12f5ee331000->0x12f5ee334000: obj=0x0/0x0,
   amap=0xffffff01124b2318/0
        submap=F, cow=T, nc=F, prot(max)=3/7, inh=1, wc=0, adv=0
        hole=F, free=T, guard=0x0, free=0x12f5ee334000-0x12f5ee334000
        fspace_augment=8587530240
        freemapped=T, uaddr=0xffffff011edd02a0
                (0x1000-0x7f7fffffc000 uaddr_stckbrk)
 - 0xffffff01124aade8: 0x12f5ee334000->0x12f5ee335000: obj=0x0/0x0,
   amap=0xffffff01124b2318/3
        submap=F, cow=T, nc=F, prot(max)=1/7, inh=1, wc=0, adv=0
        hole=F, free=T, guard=0x0, free=0x12f5ee335000-0x12f5ee335000
        fspace_augment=8587530240
        freemapped=T, uaddr=0xffffff011edd02a0
                (0x1000-0x7f7fffffc000 uaddr_stckbrk)
 - 0xffffff0111a51d28: 0x12f5ee335000->0x12f5ee34b000: obj=0x0/0x0,
   amap=0xffffff01124b2318/4
        submap=F, cow=T, nc=F, prot(max)=3/7, inh=1, wc=0, adv=0
        hole=F, free=T, guard=0x0, free=0x12f5ee34b000-0x12f7ee100000
        fspace_augment=8587530240
        freemapped=T, uaddr=0xffffff011edd02a0
                (0x1000-0x7f7fffffc000 uaddr_stckbrk)
 - 0xffffff01124aae90: 0x12f7ee100000->0x12f7ee100000: obj=0x0/0x0,
   amap=0x0/0
        submap=F, cow=F, nc=F, prot(max)=0/0, inh=0, wc=0, adv=0
        hole=F, free=T, guard=0x0, free=0x12f7ee100000-0x12f7f6b7e000
        fspace_augment=119327938801664
        freemapped=T, uaddr=0xffffff011edd12a0
                (0x1000-0x7f7fffffc000 uaddr_rnd)
 - 0xffffff01124aa368: 0x12f7f6b7e000->0x12f7f6b80000: obj=0x0/0x0,
   amap=0xffffff01124b2ab0/0
        submap=F, cow=T, nc=F, prot(max)=3/7, inh=1, wc=0, adv=0
        hole=F, free=T, guard=0x0, free=0x12f7f6b80000-0x12f80c9f5000
        fspace_augment=367480832
        freemapped=T, uaddr=0xffffff011edd12a0
                (0x1000-0x7f7fffffc000 uaddr_rnd)
 - 0xffffff01124aa560: 0x12f80c9f5000->0x12f80c9f6000: obj=0x0/0x0,
   amap=0xffffff01124b2558/0
        submap=F, cow=T, nc=F, prot(max)=3/7, inh=3, wc=0, adv=0
        hole=F, free=T, guard=0x0, free=0x12f80c9f6000-0x12f823d82000
        fspace_augment=1555902464
        freemapped=T, uaddr=0xffffff011edd12a0
                (0x1000-0x7f7fffffc000 uaddr_rnd)
 - 0xffffff0111a51bd8: 0x12f823d82000->0x12f823d83000: obj=0x0/0x0,
   amap=0xffffff01124b21f8/0
        submap=F, cow=T, nc=F, prot(max)=3/7, inh=1, wc=0, adv=0
        hole=F, free=T, guard=0x0, free=0x12f823d83000-0x12f880956000
        fspace_augment=1555902464
        freemapped=T, uaddr=0xffffff011edd12a0
                (0x1000-0x7f7fffffc000 uaddr_rnd)
 - 0xffffff01124aa2c0: 0x12f880956000->0x12f880957000:
   obj=0xffffff011e2dc240/0x0, amap=0x0/0
        submap=F, cow=T, nc=T, prot(max)=5/7, inh=1, wc=0, adv=1
        hole=F, free=T, guard=0x0, free=0x12f880957000-0x12f8923ca000
        fspace_augment=1555902464
        freemapped=T, uaddr=0xffffff011edd12a0
                (0x1000-0x7f7fffffc000 uaddr_rnd)
 - 0xffffff0111c41558: 0x12f8923ca000->0x12f8923cb000: obj=0x0/0x0,
   amap=0xffffff01124b2cf0/0
        submap=F, cow=T, nc=F, prot(max)=3/7, inh=1, wc=0, adv=0
        hole=F, free=T, guard=0x0, free=0x12f8923cb000-0x12f897e95000
        fspace_augment=95199232
        freemapped=T, uaddr=0xffffff011edd12a0
                (0x1000-0x7f7fffffc000 uaddr_rnd)
 - 0xffffff01125ebcd0: 0x12f897e95000->0x12f897ea5000: obj=0x0/0x0,
   amap=0xffffff01124b2990/0
        submap=F, cow=T, nc=F, prot(max)=3/7, inh=1, wc=0, adv=0
        hole=F, free=T, guard=0x0, free=0x12f897ea5000-0x12f8ad773000
        fspace_augment=361553920
        freemapped=T, uaddr=0xffffff011edd12a0
                (0x1000-0x7f7fffffc000 uaddr_rnd)
 - 0xffffff01124aa218: 0x12f8ad773000->0x12f8ad774000: obj=0x0/0x0,
   amap=0x0/0
        submap=F, cow=T, nc=T, prot(max)=0/7, inh=1, wc=0, adv=0
        hole=F, free=T, guard=0x0, free=0x12f8ad774000-0x12f8ad774000
        fspace_augment=0
        freemapped=T, uaddr=0xffffff011edd12a0
                (0x1000-0x7f7fffffc000 uaddr_rnd)
 - 0xffffff01124aa0c8: 0x12f8ad774000->0x12f8ad776000: obj=0x0/0x0,
   amap=0xffffff01124b28b8/0
        submap=F, cow=T, nc=F, prot(max)=3/7, inh=1, wc=0, adv=0
        hole=F, free=T, guard=0x0, free=0x12f8ad776000-0x12f8ad776000
        fspace_augment=488570880
        freemapped=T, uaddr=0xffffff011edd12a0
                (0x1000-0x7f7fffffc000 uaddr_rnd)
 - 0xffffff01124aabf0: 0x12f8ad776000->0x12f8ad777000: obj=0x0/0x0,
   amap=0x0/0
        submap=F, cow=T, nc=T, prot(max)=0/7, inh=1, wc=0, adv=0
        hole=F, free=T, guard=0x0, free=0x12f8ad777000-0x12f8ca967000
        fspace_augment=488570880
        freemapped=T, uaddr=0xffffff011edd12a0
                (0x1000-0x7f7fffffc000 uaddr_rnd)
 - 0xffffff01124aaaa0: 0x12f8ca967000->0x12f8ca968000: obj=0x0/0x0,
   amap=0xffffff01124b2c60/0
        submap=F, cow=T, nc=F, prot(max)=1/7, inh=1, wc=0, adv=0
        hole=F, free=T, guard=0x0, free=0x12f8ca968000-0x7f7ffdfdf000
        fspace_augment=119327938801664
        freemapped=T, uaddr=0xffffff011edd12a0
                (0x1000-0x7f7fffffc000 uaddr_rnd)
 - 0xffffff01124aac98: 0x7f7ffdfdf000->0x7f7fff7df000: obj=0x0/0x0,
   amap=0x0/0
        submap=F, cow=T, nc=T, prot(max)=0/7, inh=1, wc=0, adv=0
        hole=F, free=T, guard=0x0, free=0x7f7fff7df000-0x7f7fff7df000
        fspace_augment=0
        freemapped=T, uaddr=0xffffff011edd02a0
                (0x1000-0x7f7fffffc000 uaddr_stckbrk)
 - 0xffffff01124aa170: 0x7f7fff7df000->0x7f7ffffd0000: obj=0x0/0x0,
   amap=0x0/0
        submap=F, cow=T, nc=T, prot(max)=3/7, inh=1, wc=0, adv=0
        hole=F, free=T, guard=0x0, free=0x7f7ffffd0000-0x7f7ffffd0000
        fspace_augment=118784
        freemapped=T, uaddr=0xffffff011edd02a0
                (0x1000-0x7f7fffffc000 uaddr_stckbrk)
 - 0xffffff01124aa020: 0x7f7ffffd0000->0x7f7ffffde000: obj=0x0/0x0,
   amap=0xffffff01124b2510/0
        submap=F, cow=T, nc=F, prot(max)=3/7, inh=1, wc=0, adv=0
        hole=F, free=T, guard=0x0, free

[At this point output from ddb via tip(1) '~| cat > ~/ddb.out' stops.
Below is continued via copy/paste, resuming at 0xffffff01124aa020.]

 - 0xffffff01124aa020: 0x7f7ffffd0000->0x7f7ffffde000: obj=0x0/0x0,
   amap=0xffffff01124b2510/0
        submap=F, cow=T, nc=F, prot(max)=3/7, inh=1, wc=0, adv=0
        hole=F, free=T, guard=0x0, free=0x7f7ffffde000-0x7f7ffffde000
        fspace_augment=0
        freemapped=T, uaddr=0xffffff011edd02a0
                (0x1000-0x7f7fffffc000 uaddr_stckbrk)
 - 0xffffff01124aa758: 0x7f7ffffde000->0x7f7ffffdf000: obj=0x0/0x0,
   amap=0x0/0
        submap=F, cow=T, nc=T, prot(max)=0/0, inh=1, wc=0, adv=0
        hole=F, free=T, guard=0x0, free=0x7f7ffffdf000-0x7f7fffffc000
        fspace_augment=118784
        freemapped=T, uaddr=0xffffff011edd12a0
                (0x1000-0x7f7fffffc000 uaddr_rnd)
- uvm_addr exe: NULL
- uvm_addr any[0]: 0xffffff011edd12a0 (uaddr_rnd 0x1000-0x7f7fffffc000)
- uvm_addr any[1]: NULL
- uvm_addr any[2]: NULL
- uvm_addr any[3]: NULL
- uvm_addr brk/stack: 0xffffff011edd02a0 (uaddr_stckbrk
    0x1000-0x7f7fffffc000)

# additionally cpu0, I was wondering whether this makes sense?

ddb{1}> machine ddbcpu 0
Stopped at      Debugger+0x9:   leave
Debugger() at Debugger+0x9
x86_ipi_handler() at x86_ipi_handler+0x76
Xresume_lapic_ipi() at Xresume_lapic_ipi+0x1c
--- interrupt ---
__mp_lock() at __mp_lock+0x48
softintr_dispatch() at softintr_dispatch+0x43
Xsoftclock() at Xsoftclock+0x1f
--- interrupt ---
end of kernel
end trace frame: 0x1388, count: 9
0x8:

ddb{0}> trace
Debugger() at Debugger+0x9
x86_ipi_handler() at x86_ipi_handler+0x76
Xresume_lapic_ipi() at Xresume_lapic_ipi+0x1c
--- interrupt ---
__mp_lock() at __mp_lock+0x48
softintr_dispatch() at softintr_dispatch+0x43
Xsoftclock() at Xsoftclock+0x1f
--- interrupt ---
end of kernel
end trace frame: 0x1388, count: -6
0x8:
ddb{0}> show proc
PROC (idle0) pid=3293 stat=onproc
    flags process=14000<NOZOMBIE,SYSTEM> proc=40000200<SYSTEM,CPUPEG>
    pri=0, usrpri=86, nice=20
    forw=0x804f638d9d1bbb95, list=0xffff8000ffffe238,0xffff8000ffffe6b8
    process=0xffff8000ffffc540 user=0xffff800032c2a000,
    vmspace=0xffffffff8193f260
    estcpu=36, cpticks=44, pctcpu=20.27
    user=0, sys=0, intr=0

ddb{0}> show map /f 0xffffffff8193f260
MAP 0xffffffff8193f260: [0x1000->0x7fbfdfeff000]
        brk() allocate range: 0x0-0x0
        stack allocate range: 0x0-0x0
        sz=0, ref=1, version=1, flags=0x41
        pmap=0xffffffff819715e0(resident=4385)
        vm_refcnt=20 vm_shm=0x0 vm_rssize=0 vm_swrss=0
        vm_tsize=0 vm_dsize=0
        vm_taddr=0x0 vm_daddr=0x0
        vm_maxsaddr=0x0 vm_minsaddr=0x0
 - 0xffffffff819322c8: 0x1000->0x1000: obj=0x0/0x0, amap=0x0/0
        submap=F, cow=F, nc=F, prot(max)=0/0, inh=0, wc=0, adv=0
        hole=F, free=T, guard=0x0, free=0x1000-0x7fbfdfeff000
        fspace_augment=140462072520704
        freemapped=T, uaddr=0xffffff011edd1000
                (0x1000-0x7fbfdfeff000 uaddr_rnd)
- uvm_addr exe: NULL
- uvm_addr any[0]: 0xffffff011edd1000 (uaddr_rnd 0x1000-0x7fbfdfeff000)
- uvm_addr any[1]: NULL
- uvm_addr any[2]: NULL
- uvm_addr any[3]: NULL
- uvm_addr brk/stack: 0xffffff011edd0000 (uaddr_stckbrk 0x1000-0x7fbfdfeff000)

ddb{0}> boot sync

invinstr.ddb.log (13K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: installer amd64 'Get/Verify bsd' -> 'Illegal instruction' - shuttle ds47d

Stefan Kempf-2
Marcus MERIGHI wrote:

> [hidden email] (Stefan Kempf), 2016.01.30 (Sat) 10:49 (CET):
> > We need to see how it looks like from within the kernel (and whether
> > the illegal instruction is really raised from within sendsig()). Can you
> > try the diff below?
>
> > You should get a kernel panic now instead of an illegal instruction
> > signal if you try running ping or top. We need the output of the panic
> > message and the output of the following commands:
>
> ping(1), top(1) messed up the screen.
>
> # ping 192.168.188.189                                                  
> PING 192.168.188.189 (192.168.188.189): 56 data bytes
> 64 bytes from 192.168.188.189: icmp_seq=0 ttl=255 time=166.533 ms
> panic: sendsig 1: fxsave 0xffff800032c8a000, sp 0x7f7fff0d20b1,
> fxave_size 512, savefpu_size 832, fpu_save_len 15773951, tf_rsp
> 0x7f7ffffdd238, userstack 1

fpu_save_len is way too large (0xf0b0ff in hex). It should be 832 at
most.  And that causes the kernel to attempt writes outside of the
process stack (and/or to read beyond the saved FPU state).

Either the value we get from CPUID is strange (or we handle CPUID
wrongly), or something trashes fpu_save_len.

Can you try this diff and paste the "cpuid1:" "cpuid2:" lines? Please
revert the previous diff. That will show us what CPUID returns.

Index: arch/amd64/amd64/cpu.c
===================================================================
RCS file: /cvs/src/sys/arch/amd64/amd64/cpu.c,v
retrieving revision 1.94
diff -u -p -r1.94 cpu.c
--- arch/amd64/amd64/cpu.c 27 Dec 2015 04:31:34 -0000 1.94
+++ arch/amd64/amd64/cpu.c 1 Feb 2016 18:00:02 -0000
@@ -477,6 +477,13 @@ cpu_attach(struct device *parent, struct
  * Initialize the processor appropriately.
  */
 
+__attribute__((noinline)) void
+print_cpuid2(uint32_t ebx)
+{
+ printf("cpuid2: fpu_save_len: 0x%zx, ebx: 0x%x\n",
+    fpu_save_len, ebx);
+}
+
 void
 cpu_init(struct cpu_info *ci)
 {
@@ -510,11 +517,13 @@ cpu_init(struct cpu_info *ci)
 
  xsave_mask = XCR0_X87 | XCR0_SSE;
  CPUID_LEAF(0xd, 0, eax, ebx, ecx, edx);
+ printf("cpuid1: ebx: 0x%x\n", ebx);
  if (eax & XCR0_AVX)
  xsave_mask |= XCR0_AVX;
  xsetbv(0, xsave_mask);
  CPUID_LEAF(0xd, 0, eax, ebx, ecx, edx);
  fpu_save_len = ebx;
+ print_cpuid2(ebx);
  }
 
 #if NVMM > 0

Reply | Threaded
Open this post in threaded view
|

Re: installer amd64 'Get/Verify bsd' -> 'Illegal instruction' - shuttle ds47d

Marcus MERIGHI
[hidden email] (Stefan Kempf), 2016.02.01 (Mon) 19:13 (CET):

> Marcus MERIGHI wrote:
> > [hidden email] (Stefan Kempf), 2016.01.30 (Sat) 10:49 (CET):
> > > We need to see how it looks like from within the kernel (and whether
> > > the illegal instruction is really raised from within sendsig()). Can you
> > > try the diff below?
> >
> > > You should get a kernel panic now instead of an illegal instruction
> > > signal if you try running ping or top. We need the output of the panic
> > > message and the output of the following commands:
> >
> > ping(1), top(1) messed up the screen.
> >
> > # ping 192.168.188.189                                                  
> > PING 192.168.188.189 (192.168.188.189): 56 data bytes
> > 64 bytes from 192.168.188.189: icmp_seq=0 ttl=255 time=166.533 ms
> > panic: sendsig 1: fxsave 0xffff800032c8a000, sp 0x7f7fff0d20b1,
> > fxave_size 512, savefpu_size 832, fpu_save_len 15773951, tf_rsp
> > 0x7f7ffffdd238, userstack 1
>
> fpu_save_len is way too large (0xf0b0ff in hex). It should be 832 at
> most.  And that causes the kernel to attempt writes outside of the
> process stack (and/or to read beyond the saved FPU state).
>
> Either the value we get from CPUID is strange (or we handle CPUID
> wrongly), or something trashes fpu_save_len.
Now that you mention CPUID...
If I switch 'Max CPUID Value Limit' to 'disabled' in the BIOS, the
symptom is gone. It re-appears when setting to 'enabled'.

Diff between dmesgs (I did some line wrapping; file attached for better
readability):

--- dmesg.out.enabled Tue Feb  2 09:55:41 2016
+++ dmesg.out.disabled Tue Feb  2 09:55:41 2016
@@ -15,7 +15,7 @@
 acpitimer0 at acpi0: 3579545 Hz, 24 bits
 acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
 cpu0 at mainbus0: apid 0 (boot processor)
-cpu0: Intel(R) Celeron(R) CPU 847 @ 1.10GHz, 1097.70 MHz
+cpu0: Intel(R) Celeron(R) CPU 847 @ 1.10GHz, 1097.68 MHz
 cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,
       PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,
       PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,
       PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,XSAVE,NXE,LONG,LAHF,
       PERF,ITSC
 cpu0: 256KB 64b/line 8-way L2 cache
 mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
@@ -160,16 +160,18 @@
 acpitimer0 at acpi0: 3579545 Hz, 24 bits
 acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
 cpu0 at mainbus0: apid 0 (boot processor)
-cpu0: Intel(R) Celeron(R) CPU 847 @ 1.10GHz, 1097.68 MHz
-cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,
       PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,
       PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,
       PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,XSAVE,NXE,LONG,LAHF,
       PERF,ITSC
+cpu0: Intel(R) Celeron(R) CPU 847 @ 1.10GHz, 1097.67 MHz
+cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,
       PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,
       PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,
       PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,XSAVE,NXE,LONG,LAHF,
       PERF,ITSC,SENSOR,ARAT
 cpu0: 256KB 64b/line 8-way L2 cache
+cpu0: smt 0, core 0, package 0
 mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
 cpu0: apic clock running at 99MHz
-cpu0: mwait min=23041, max=45311 (bogus)
+cpu0: mwait min=64, max=64, C-substates=0.2.1.1.2, IBE
 cpu1 at mainbus0: apid 2 (application processor)
 cpu1: Intel(R) Celeron(R) CPU 847 @ 1.10GHz, 1097.51 MHz
-cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,
       PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,
       PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,
       PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,XSAVE,NXE,LONG,LAHF,
       PERF,ITSC
+cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,
       PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,
       PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,
       PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,XSAVE,NXE,LONG,LAHF,
       PERF,ITSC,SENSOR,ARAT
 cpu1: 256KB 64b/line 8-way L2 cache
+cpu1: smt 0, core 1, package 0
 ioapic0 at mainbus0: apid 2 pa 0xfec00000, version 20, 24 pins
 acpimcfg0 at acpi0 addr 0xf8000000, bus 0-63
 acpihpet0 at acpi0: 14318179 Hz
@@ -188,8 +190,8 @@
 acpiprt12 at acpi0: bus -1 (PEG2)
 acpiprt13 at acpi0: bus -1 (PEG3)
 acpiec0 at acpi0: not present
-acpicpu0 at acpi0: C1(@1 halt!), PSS
-acpicpu1 at acpi0: C1(@1 halt!), PSS
+acpicpu0 at acpi0: C2(350@104 mwait.1@0x20), C1(1000@1 mwait.1), PSS
+acpicpu1 at acpi0: C2(350@104 mwait.1@0x20), C1(1000@1 mwait.1), PSS
 acpipwrres0 at acpi0: FN00, resource for FAN0
 acpipwrres1 at acpi0: FN01, resource for FAN1
 acpipwrres2 at acpi0: FN02, resource for FAN2

I'm now off to working off your instructions below...

Bye+Thanks, Marcus

> Can you try this diff and paste the "cpuid1:" "cpuid2:" lines? Please
> revert the previous diff. That will show us what CPUID returns.
>
> Index: arch/amd64/amd64/cpu.c
> ===================================================================
> RCS file: /cvs/src/sys/arch/amd64/amd64/cpu.c,v
> retrieving revision 1.94
> diff -u -p -r1.94 cpu.c
> --- arch/amd64/amd64/cpu.c 27 Dec 2015 04:31:34 -0000 1.94
> +++ arch/amd64/amd64/cpu.c 1 Feb 2016 18:00:02 -0000
> @@ -477,6 +477,13 @@ cpu_attach(struct device *parent, struct
>   * Initialize the processor appropriately.
>   */
>  
> +__attribute__((noinline)) void
> +print_cpuid2(uint32_t ebx)
> +{
> + printf("cpuid2: fpu_save_len: 0x%zx, ebx: 0x%x\n",
> +    fpu_save_len, ebx);
> +}
> +
>  void
>  cpu_init(struct cpu_info *ci)
>  {
> @@ -510,11 +517,13 @@ cpu_init(struct cpu_info *ci)
>  
>   xsave_mask = XCR0_X87 | XCR0_SSE;
>   CPUID_LEAF(0xd, 0, eax, ebx, ecx, edx);
> + printf("cpuid1: ebx: 0x%x\n", ebx);
>   if (eax & XCR0_AVX)
>   xsave_mask |= XCR0_AVX;
>   xsetbv(0, xsave_mask);
>   CPUID_LEAF(0xd, 0, eax, ebx, ecx, edx);
>   fpu_save_len = ebx;
> + print_cpuid2(ebx);
>   }
>  
>  #if NVMM > 0
>
>
> !DSPAM:56afa11b8919659718217!
>

dmesg.out.diff (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: installer amd64 'Get/Verify bsd' -> 'Illegal instruction' - shuttle ds47d

Marcus MERIGHI
In reply to this post by Stefan Kempf-2
[hidden email] (Stefan Kempf), 2016.02.01 (Mon) 19:13 (CET):

> Marcus MERIGHI wrote:
> > [hidden email] (Stefan Kempf), 2016.01.30 (Sat) 10:49 (CET):
> > > We need to see how it looks like from within the kernel (and whether
> > > the illegal instruction is really raised from within sendsig()). Can you
> > > try the diff below?
> >
> > > You should get a kernel panic now instead of an illegal instruction
> > > signal if you try running ping or top. We need the output of the panic
> > > message and the output of the following commands:
> >
> > ping(1), top(1) messed up the screen.
> >
> > # ping 192.168.188.189                                                  
> > PING 192.168.188.189 (192.168.188.189): 56 data bytes
> > 64 bytes from 192.168.188.189: icmp_seq=0 ttl=255 time=166.533 ms
> > panic: sendsig 1: fxsave 0xffff800032c8a000, sp 0x7f7fff0d20b1,
> > fxave_size 512, savefpu_size 832, fpu_save_len 15773951, tf_rsp
> > 0x7f7ffffdd238, userstack 1
>
> fpu_save_len is way too large (0xf0b0ff in hex). It should be 832 at
> most.  And that causes the kernel to attempt writes outside of the
> process stack (and/or to read beyond the saved FPU state).
>
> Either the value we get from CPUID is strange (or we handle CPUID
> wrongly), or something trashes fpu_save_len.
>
> Can you try this diff and paste the "cpuid1:" "cpuid2:" lines? Please
> revert the previous diff. That will show us what CPUID returns.
 
Twice in dmesg:

cpuid1: ebx: 0xf0b0ff
cpuid2: fpu_save_len: 0xf0b0ff, ebx: 0xf0b0ff

cpuid1: ebx: 0xf0b0ff
cpuid2: fpu_save_len: 0xf0b0ff, ebx: 0xf0b0ff

Full dmesg attached.

Thanks once more, Marcus


dmesg.out.cpuid (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: installer amd64 'Get/Verify bsd' -> 'Illegal instruction' - shuttle ds47d

Philip Guenther
In reply to this post by Marcus MERIGHI
On Tue, 2 Feb 2016, Marcus MERIGHI wrote:

> [hidden email] (Stefan Kempf), 2016.02.01 (Mon) 19:13 (CET):
> > Marcus MERIGHI wrote:
> > > [hidden email] (Stefan Kempf), 2016.01.30 (Sat) 10:49 (CET):
> > > > We need to see how it looks like from within the kernel (and whether
> > > > the illegal instruction is really raised from within sendsig()). Can you
> > > > try the diff below?
> > >
> > > > You should get a kernel panic now instead of an illegal instruction
> > > > signal if you try running ping or top. We need the output of the panic
> > > > message and the output of the following commands:
> > >
> > > ping(1), top(1) messed up the screen.
> > >
> > > # ping 192.168.188.189                                                  
> > > PING 192.168.188.189 (192.168.188.189): 56 data bytes
> > > 64 bytes from 192.168.188.189: icmp_seq=0 ttl=255 time=166.533 ms
> > > panic: sendsig 1: fxsave 0xffff800032c8a000, sp 0x7f7fff0d20b1,
> > > fxave_size 512, savefpu_size 832, fpu_save_len 15773951, tf_rsp
> > > 0x7f7ffffdd238, userstack 1
> >
> > fpu_save_len is way too large (0xf0b0ff in hex). It should be 832 at
> > most.  And that causes the kernel to attempt writes outside of the
> > process stack (and/or to read beyond the saved FPU state).
> >
> > Either the value we get from CPUID is strange (or we handle CPUID
> > wrongly), or something trashes fpu_save_len.
>
> Now that you mention CPUID...
> If I switch 'Max CPUID Value Limit' to 'disabled' in the BIOS, the
> symptom is gone. It re-appears when setting to 'enabled'.

"Doctor, it hurts when I do this..."

That BIOS option exists to support ancient OSes (Windows NT, etc) and
shouldn't be enabled when using OpenBSD.

Currently we seem to assume that the presence of certain CPU features like
AVX implies that CPUID supports the related leaf; that BIOS option breaks
that assumption, resulting in the bogus fpu_save_len sizing you hit.  
From the dmesg you posted I see it also explains the bogus mwait sizing
that has been reported by some others.  Your machine will perform better
with that option off; I guess we should add check to the code to catch
this sort of setup by checking the cpuid_level variable before using the
higher CPUID leafs.

Can you try applying the diff below, temporarily re-enable that BIOS
option, then report the resulting dmesg and verify that ping works
properly?


Philip Guenther


Index: i386/i386/cpu.c
===================================================================
RCS file: /data/src/openbsd/src/sys/arch/i386/i386/cpu.c,v
retrieving revision 1.70
diff -u -p -r1.70 cpu.c
--- i386/i386/cpu.c 27 Dec 2015 04:31:34 -0000 1.70
+++ i386/i386/cpu.c 2 Feb 2016 16:54:09 -0000
@@ -784,7 +784,7 @@ cpu_init_mwait(struct device *dv)
 {
  unsigned int smallest, largest, extensions, c_substates;
 
- if ((cpu_ecxfeature & CPUIDECX_MWAIT) == 0)
+ if ((cpu_ecxfeature & CPUIDECX_MWAIT) == 0 || cpuid_level < 0x5)
  return;
 
  /* get the monitor granularity */
Index: amd64/amd64/cpu.c
===================================================================
RCS file: /data/src/openbsd/src/sys/arch/amd64/amd64/cpu.c,v
retrieving revision 1.94
diff -u -p -r1.94 cpu.c
--- amd64/amd64/cpu.c 27 Dec 2015 04:31:34 -0000 1.94
+++ amd64/amd64/cpu.c 2 Feb 2016 16:54:30 -0000
@@ -282,7 +282,7 @@ cpu_init_mwait(struct cpu_softc *sc)
 {
  unsigned int smallest, largest, extensions, c_substates;
 
- if ((cpu_ecxfeature & CPUIDECX_MWAIT) == 0)
+ if ((cpu_ecxfeature & CPUIDECX_MWAIT) == 0 || cpuid_level < 0x5)
  return;
 
  /* get the monitor granularity */
@@ -505,7 +505,7 @@ cpu_init(struct cpu_info *ci)
  cr4 |= CR4_OSXSAVE;
  lcr4(cr4);
 
- if (cpu_ecxfeature & CPUIDECX_XSAVE) {
+ if (cpu_ecxfeature & CPUIDECX_XSAVE && cpuid_level >= 0xd) {
  u_int32_t eax, ebx, ecx, edx;
 
  xsave_mask = XCR0_X87 | XCR0_SSE;

Reply | Threaded
Open this post in threaded view
|

Re: installer amd64 'Get/Verify bsd' -> 'Illegal instruction' - shuttle ds47d

Philip Guenther
On Tue, 2 Feb 2016, Philip Guenther wrote:
...
> Currently we seem to assume that the presence of certain CPU features like
> AVX implies that CPUID supports the related leaf; that BIOS option breaks
> that assumption, resulting in the bogus fpu_save_len sizing you hit.  
> From the dmesg you posted I see it also explains the bogus mwait sizing
> that has been reported by some others.  Your machine will perform better
> with that option off; I guess we should add check to the code to catch
> this sort of setup by checking the cpuid_level variable before using the
> higher CPUID leafs.

Revised version that switches a few places to check cpuid_level instead of
calling CPUID(0) again and similar for using curcpu()->ci_pnfeatset
instead of calling CPUID(0x80000000) once identifycpu() sets that, and add
a check of ci->ci_pnfeatset before using CPUID(CPUID_AMD_SVM_CAP) in the
vmm bits.

ok?

Philip Guenther


Index: i386/i386/cpu.c
===================================================================
RCS file: /data/src/openbsd/src/sys/arch/i386/i386/cpu.c,v
retrieving revision 1.70
diff -u -p -r1.70 cpu.c
--- i386/i386/cpu.c 27 Dec 2015 04:31:34 -0000 1.70
+++ i386/i386/cpu.c 2 Feb 2016 16:54:09 -0000
@@ -784,7 +784,7 @@ cpu_init_mwait(struct device *dv)
 {
  unsigned int smallest, largest, extensions, c_substates;
 
- if ((cpu_ecxfeature & CPUIDECX_MWAIT) == 0)
+ if ((cpu_ecxfeature & CPUIDECX_MWAIT) == 0 || cpuid_level < 0x5)
  return;
 
  /* get the monitor granularity */
Index: amd64/amd64/amd64_mem.c
===================================================================
RCS file: /data/src/openbsd/src/sys/arch/amd64/amd64/amd64_mem.c,v
retrieving revision 1.11
diff -u -p -r1.11 amd64_mem.c
--- amd64/amd64/amd64_mem.c 14 Mar 2015 03:38:46 -0000 1.11
+++ amd64/amd64/amd64_mem.c 2 Feb 2016 17:37:55 -0000
@@ -583,8 +583,7 @@ mrinit(struct mem_range_softc *sc)
  * If CPUID does not support leaf function 0x80000008, use the
  * default a 36-bit address size.
  */
- CPUID(0x80000000, regs[0], regs[1], regs[2], regs[3]);
- if (regs[0] >= 0x80000008) {
+ if (curcpu()->ci_pnfeatset >= 0x80000008) {
  CPUID(0x80000008, regs[0], regs[1], regs[2], regs[3]);
  if (regs[0] & 0xff) {
  mtrrmask = (1ULL << (regs[0] & 0xff)) - 1;
Index: amd64/amd64/cacheinfo.c
===================================================================
RCS file: /data/src/openbsd/src/sys/arch/amd64/amd64/cacheinfo.c,v
retrieving revision 1.7
diff -u -p -r1.7 cacheinfo.c
--- amd64/amd64/cacheinfo.c 13 Nov 2015 07:52:20 -0000 1.7
+++ amd64/amd64/cacheinfo.c 2 Feb 2016 17:36:11 -0000
@@ -159,7 +159,6 @@ amd_cpu_cacheinfo(struct cpu_info *ci)
  struct x86_cache_info *cai;
  int family, model;
  u_int descs[4];
- u_int lfunc;
 
  family = ci->ci_family;
  model = ci->ci_model;
@@ -171,15 +170,9 @@ amd_cpu_cacheinfo(struct cpu_info *ci)
  return;
 
  /*
- * Determine the largest extended function value.
- */
- CPUID(0x80000000, descs[0], descs[1], descs[2], descs[3]);
- lfunc = descs[0];
-
- /*
  * Determine L1 cache/TLB info.
  */
- if (lfunc < 0x80000005) {
+ if (ci->ci_pnfeatset < 0x80000005) {
  /* No L1 cache info available. */
  return;
  }
@@ -228,7 +221,7 @@ amd_cpu_cacheinfo(struct cpu_info *ci)
  /*
  * Determine L2 cache/TLB info.
  */
- if (lfunc < 0x80000006) {
+ if (ci->ci_pnfeatset < 0x80000006) {
  /* No L2 cache info available. */
  return;
  }
Index: amd64/amd64/cpu.c
===================================================================
RCS file: /data/src/openbsd/src/sys/arch/amd64/amd64/cpu.c,v
retrieving revision 1.94
diff -u -p -r1.94 cpu.c
--- amd64/amd64/cpu.c 27 Dec 2015 04:31:34 -0000 1.94
+++ amd64/amd64/cpu.c 2 Feb 2016 17:03:04 -0000
@@ -282,7 +282,7 @@ cpu_init_mwait(struct cpu_softc *sc)
 {
  unsigned int smallest, largest, extensions, c_substates;
 
- if ((cpu_ecxfeature & CPUIDECX_MWAIT) == 0)
+ if ((cpu_ecxfeature & CPUIDECX_MWAIT) == 0 || cpuid_level < 0x5)
  return;
 
  /* get the monitor granularity */
@@ -505,7 +505,7 @@ cpu_init(struct cpu_info *ci)
  cr4 |= CR4_OSXSAVE;
  lcr4(cr4);
 
- if (cpu_ecxfeature & CPUIDECX_XSAVE) {
+ if ((cpu_ecxfeature & CPUIDECX_XSAVE) && cpuid_level >= 0xd) {
  u_int32_t eax, ebx, ecx, edx;
 
  xsave_mask = XCR0_X87 | XCR0_SSE;
Index: amd64/amd64/identcpu.c
===================================================================
RCS file: /data/src/openbsd/src/sys/arch/amd64/amd64/identcpu.c,v
retrieving revision 1.71
diff -u -p -r1.71 identcpu.c
--- amd64/amd64/identcpu.c 27 Dec 2015 04:31:34 -0000 1.71
+++ amd64/amd64/identcpu.c 2 Feb 2016 17:35:36 -0000
@@ -700,8 +700,7 @@ cpu_topology(struct cpu_info *ci)
  u_int32_t smt_mask = 0, core_mask, pkg_mask = 0;
 
  /* We need at least apicid at CPUID 1 */
- CPUID(0, eax, ebx, ecx, edx);
- if (eax < 1)
+ if (cpuid_level < 1)
  goto no_topology;
 
  /* Initial apicid */
@@ -710,8 +709,7 @@ cpu_topology(struct cpu_info *ci)
 
  if (strcmp(cpu_vendor, "AuthenticAMD") == 0) {
  /* We need at least apicid at CPUID 0x80000008 */
- CPUID(0x80000000, eax, ebx, ecx, edx);
- if (eax < 0x80000008)
+ if (ci->ci_pnfeatset < 0x80000008)
  goto no_topology;
 
  CPUID(0x80000008, eax, ebx, ecx, edx);
@@ -727,8 +725,7 @@ cpu_topology(struct cpu_info *ci)
  ci->ci_pkg_id >>= core_bits;
  } else if (strcmp(cpu_vendor, "GenuineIntel") == 0) {
  /* We only support leaf 1/4 detection */
- CPUID(0, eax, ebx, ecx, edx);
- if (eax < 4)
+ if (cpuid_level < 4)
  goto no_topology;
  /* Get max_apicid */
  CPUID(1, eax, ebx, ecx, edx);
@@ -858,7 +855,8 @@ cpu_check_vmm_cap(struct cpu_info *ci)
  /*
  * Check for SVM Nested Paging
  */
- if (ci->ci_vmm_flags & CI_VMM_SVM) {
+ if ((ci->ci_vmm_flags & CI_VMM_SVM) &&
+    ci->ci_pnfeatset >= CPUID_AMD_SVM_CAP) {
  CPUID(CPUID_AMD_SVM_CAP, dummy, dummy, dummy, cap);
  if (cap & AMD_SVM_NESTED_PAGING_CAP)
  ci->ci_vmm_flags |= CI_VMM_RVI;

Reply | Threaded
Open this post in threaded view
|

Re: installer amd64 'Get/Verify bsd' -> 'Illegal instruction' - shuttle ds47d

Mark Kettenis
> Date: Tue, 2 Feb 2016 09:43:54 -0800
> From: Philip Guenther <[hidden email]>
>
> On Tue, 2 Feb 2016, Philip Guenther wrote:
> ...
> > Currently we seem to assume that the presence of certain CPU features like
> > AVX implies that CPUID supports the related leaf; that BIOS option breaks
> > that assumption, resulting in the bogus fpu_save_len sizing you hit.  
> > From the dmesg you posted I see it also explains the bogus mwait sizing
> > that has been reported by some others.  Your machine will perform better
> > with that option off; I guess we should add check to the code to catch
> > this sort of setup by checking the cpuid_level variable before using the
> > higher CPUID leafs.
>
> Revised version that switches a few places to check cpuid_level instead of
> calling CPUID(0) again and similar for using curcpu()->ci_pnfeatset
> instead of calling CPUID(0x80000000) once identifycpu() sets that, and add
> a check of ci->ci_pnfeatset before using CPUID(CPUID_AMD_SVM_CAP) in the
> vmm bits.
>
> ok?

ok kettenis@

> Index: i386/i386/cpu.c
> ===================================================================
> RCS file: /data/src/openbsd/src/sys/arch/i386/i386/cpu.c,v
> retrieving revision 1.70
> diff -u -p -r1.70 cpu.c
> --- i386/i386/cpu.c 27 Dec 2015 04:31:34 -0000 1.70
> +++ i386/i386/cpu.c 2 Feb 2016 16:54:09 -0000
> @@ -784,7 +784,7 @@ cpu_init_mwait(struct device *dv)
>  {
>   unsigned int smallest, largest, extensions, c_substates;
>  
> - if ((cpu_ecxfeature & CPUIDECX_MWAIT) == 0)
> + if ((cpu_ecxfeature & CPUIDECX_MWAIT) == 0 || cpuid_level < 0x5)
>   return;
>  
>   /* get the monitor granularity */
> Index: amd64/amd64/amd64_mem.c
> ===================================================================
> RCS file: /data/src/openbsd/src/sys/arch/amd64/amd64/amd64_mem.c,v
> retrieving revision 1.11
> diff -u -p -r1.11 amd64_mem.c
> --- amd64/amd64/amd64_mem.c 14 Mar 2015 03:38:46 -0000 1.11
> +++ amd64/amd64/amd64_mem.c 2 Feb 2016 17:37:55 -0000
> @@ -583,8 +583,7 @@ mrinit(struct mem_range_softc *sc)
>   * If CPUID does not support leaf function 0x80000008, use the
>   * default a 36-bit address size.
>   */
> - CPUID(0x80000000, regs[0], regs[1], regs[2], regs[3]);
> - if (regs[0] >= 0x80000008) {
> + if (curcpu()->ci_pnfeatset >= 0x80000008) {
>   CPUID(0x80000008, regs[0], regs[1], regs[2], regs[3]);
>   if (regs[0] & 0xff) {
>   mtrrmask = (1ULL << (regs[0] & 0xff)) - 1;
> Index: amd64/amd64/cacheinfo.c
> ===================================================================
> RCS file: /data/src/openbsd/src/sys/arch/amd64/amd64/cacheinfo.c,v
> retrieving revision 1.7
> diff -u -p -r1.7 cacheinfo.c
> --- amd64/amd64/cacheinfo.c 13 Nov 2015 07:52:20 -0000 1.7
> +++ amd64/amd64/cacheinfo.c 2 Feb 2016 17:36:11 -0000
> @@ -159,7 +159,6 @@ amd_cpu_cacheinfo(struct cpu_info *ci)
>   struct x86_cache_info *cai;
>   int family, model;
>   u_int descs[4];
> - u_int lfunc;
>  
>   family = ci->ci_family;
>   model = ci->ci_model;
> @@ -171,15 +170,9 @@ amd_cpu_cacheinfo(struct cpu_info *ci)
>   return;
>  
>   /*
> - * Determine the largest extended function value.
> - */
> - CPUID(0x80000000, descs[0], descs[1], descs[2], descs[3]);
> - lfunc = descs[0];
> -
> - /*
>   * Determine L1 cache/TLB info.
>   */
> - if (lfunc < 0x80000005) {
> + if (ci->ci_pnfeatset < 0x80000005) {
>   /* No L1 cache info available. */
>   return;
>   }
> @@ -228,7 +221,7 @@ amd_cpu_cacheinfo(struct cpu_info *ci)
>   /*
>   * Determine L2 cache/TLB info.
>   */
> - if (lfunc < 0x80000006) {
> + if (ci->ci_pnfeatset < 0x80000006) {
>   /* No L2 cache info available. */
>   return;
>   }
> Index: amd64/amd64/cpu.c
> ===================================================================
> RCS file: /data/src/openbsd/src/sys/arch/amd64/amd64/cpu.c,v
> retrieving revision 1.94
> diff -u -p -r1.94 cpu.c
> --- amd64/amd64/cpu.c 27 Dec 2015 04:31:34 -0000 1.94
> +++ amd64/amd64/cpu.c 2 Feb 2016 17:03:04 -0000
> @@ -282,7 +282,7 @@ cpu_init_mwait(struct cpu_softc *sc)
>  {
>   unsigned int smallest, largest, extensions, c_substates;
>  
> - if ((cpu_ecxfeature & CPUIDECX_MWAIT) == 0)
> + if ((cpu_ecxfeature & CPUIDECX_MWAIT) == 0 || cpuid_level < 0x5)
>   return;
>  
>   /* get the monitor granularity */
> @@ -505,7 +505,7 @@ cpu_init(struct cpu_info *ci)
>   cr4 |= CR4_OSXSAVE;
>   lcr4(cr4);
>  
> - if (cpu_ecxfeature & CPUIDECX_XSAVE) {
> + if ((cpu_ecxfeature & CPUIDECX_XSAVE) && cpuid_level >= 0xd) {
>   u_int32_t eax, ebx, ecx, edx;
>  
>   xsave_mask = XCR0_X87 | XCR0_SSE;
> Index: amd64/amd64/identcpu.c
> ===================================================================
> RCS file: /data/src/openbsd/src/sys/arch/amd64/amd64/identcpu.c,v
> retrieving revision 1.71
> diff -u -p -r1.71 identcpu.c
> --- amd64/amd64/identcpu.c 27 Dec 2015 04:31:34 -0000 1.71
> +++ amd64/amd64/identcpu.c 2 Feb 2016 17:35:36 -0000
> @@ -700,8 +700,7 @@ cpu_topology(struct cpu_info *ci)
>   u_int32_t smt_mask = 0, core_mask, pkg_mask = 0;
>  
>   /* We need at least apicid at CPUID 1 */
> - CPUID(0, eax, ebx, ecx, edx);
> - if (eax < 1)
> + if (cpuid_level < 1)
>   goto no_topology;
>  
>   /* Initial apicid */
> @@ -710,8 +709,7 @@ cpu_topology(struct cpu_info *ci)
>  
>   if (strcmp(cpu_vendor, "AuthenticAMD") == 0) {
>   /* We need at least apicid at CPUID 0x80000008 */
> - CPUID(0x80000000, eax, ebx, ecx, edx);
> - if (eax < 0x80000008)
> + if (ci->ci_pnfeatset < 0x80000008)
>   goto no_topology;
>  
>   CPUID(0x80000008, eax, ebx, ecx, edx);
> @@ -727,8 +725,7 @@ cpu_topology(struct cpu_info *ci)
>   ci->ci_pkg_id >>= core_bits;
>   } else if (strcmp(cpu_vendor, "GenuineIntel") == 0) {
>   /* We only support leaf 1/4 detection */
> - CPUID(0, eax, ebx, ecx, edx);
> - if (eax < 4)
> + if (cpuid_level < 4)
>   goto no_topology;
>   /* Get max_apicid */
>   CPUID(1, eax, ebx, ecx, edx);
> @@ -858,7 +855,8 @@ cpu_check_vmm_cap(struct cpu_info *ci)
>   /*
>   * Check for SVM Nested Paging
>   */
> - if (ci->ci_vmm_flags & CI_VMM_SVM) {
> + if ((ci->ci_vmm_flags & CI_VMM_SVM) &&
> +    ci->ci_pnfeatset >= CPUID_AMD_SVM_CAP) {
>   CPUID(CPUID_AMD_SVM_CAP, dummy, dummy, dummy, cap);
>   if (cap & AMD_SVM_NESTED_PAGING_CAP)
>   ci->ci_vmm_flags |= CI_VMM_RVI;
>
>

Reply | Threaded
Open this post in threaded view
|

Re: installer amd64 'Get/Verify bsd' -> 'Illegal instruction' - shuttle ds47d

Marcus MERIGHI
In reply to this post by Philip Guenther
[hidden email] (Philip Guenther), 2016.02.02 (Tue) 18:00 (CET):

> On Tue, 2 Feb 2016, Marcus MERIGHI wrote:
> > [hidden email] (Stefan Kempf), 2016.02.01 (Mon) 19:13 (CET):
> > > Marcus MERIGHI wrote:
> > > > [hidden email] (Stefan Kempf), 2016.01.30 (Sat) 10:49 (CET):
> > > > > We need to see how it looks like from within the kernel (and whether
> > > > > the illegal instruction is really raised from within sendsig()). Can you
> > > > > try the diff below?
> > > >
> > > > > You should get a kernel panic now instead of an illegal instruction
> > > > > signal if you try running ping or top. We need the output of the panic
> > > > > message and the output of the following commands:
> > > >
> > > > ping(1), top(1) messed up the screen.
> > > > # ping 192.168.188.189                                                  
> > > > PING 192.168.188.189 (192.168.188.189): 56 data bytes
> > > > 64 bytes from 192.168.188.189: icmp_seq=0 ttl=255 time=166.533 ms
> > > > panic: sendsig 1: fxsave 0xffff800032c8a000, sp 0x7f7fff0d20b1,
> > > > fxave_size 512, savefpu_size 832, fpu_save_len 15773951, tf_rsp
> > > > 0x7f7ffffdd238, userstack 1
> > >
> > > fpu_save_len is way too large (0xf0b0ff in hex). It should be 832 at
> > > most.  And that causes the kernel to attempt writes outside of the
> > > process stack (and/or to read beyond the saved FPU state).
> > >
> > > Either the value we get from CPUID is strange (or we handle CPUID
> > > wrongly), or something trashes fpu_save_len.
> >
> > Now that you mention CPUID...
> > If I switch 'Max CPUID Value Limit' to 'disabled' in the BIOS, the
> > symptom is gone. It re-appears when setting to 'enabled'.
>
> "Doctor, it hurts when I do this..."

And Dr. Guenther replies: Then... don't do it!

> That BIOS option exists to support ancient OSes (Windows NT, etc) and
> shouldn't be enabled when using OpenBSD.

I apologise for wasting everybody's time!

To find out whether this was a default 'enabled' setting I used BIOS
'load defaults settings' -> 'load optimized default? yes'
It wasn't default: Max CPUID Value Limit [Disabled]

To justify myself setting this to 'enabled'...

The help on the BIOS option says: 'Disabled for Windows XP'. This in my
brain resolves to: 'This is a setting that must be disabled when running
windows xp'.
Possibly I did an invalid inversion of the argument: "If it has to be
disabled for windows xp I'd better enable it for anything else."

Sorry, Marcus

> Currently we seem to assume that the presence of certain CPU features like
> AVX implies that CPUID supports the related leaf; that BIOS option breaks
> that assumption, resulting in the bogus fpu_save_len sizing you hit.  
> From the dmesg you posted I see it also explains the bogus mwait sizing
> that has been reported by some others.  Your machine will perform better
> with that option off; I guess we should add check to the code to catch
> this sort of setup by checking the cpuid_level variable before using the
> higher CPUID leafs.
>
> Can you try applying the diff below, temporarily re-enable that BIOS
> option, then report the resulting dmesg and verify that ping works
> properly?
>
>
> Philip Guenther
>
>
> Index: i386/i386/cpu.c
> ===================================================================
> RCS file: /data/src/openbsd/src/sys/arch/i386/i386/cpu.c,v
> retrieving revision 1.70
> diff -u -p -r1.70 cpu.c
> --- i386/i386/cpu.c 27 Dec 2015 04:31:34 -0000 1.70
> +++ i386/i386/cpu.c 2 Feb 2016 16:54:09 -0000
> @@ -784,7 +784,7 @@ cpu_init_mwait(struct device *dv)
>  {
>   unsigned int smallest, largest, extensions, c_substates;
>  
> - if ((cpu_ecxfeature & CPUIDECX_MWAIT) == 0)
> + if ((cpu_ecxfeature & CPUIDECX_MWAIT) == 0 || cpuid_level < 0x5)
>   return;
>  
>   /* get the monitor granularity */
> Index: amd64/amd64/cpu.c
> ===================================================================
> RCS file: /data/src/openbsd/src/sys/arch/amd64/amd64/cpu.c,v
> retrieving revision 1.94
> diff -u -p -r1.94 cpu.c
> --- amd64/amd64/cpu.c 27 Dec 2015 04:31:34 -0000 1.94
> +++ amd64/amd64/cpu.c 2 Feb 2016 16:54:30 -0000
> @@ -282,7 +282,7 @@ cpu_init_mwait(struct cpu_softc *sc)
>  {
>   unsigned int smallest, largest, extensions, c_substates;
>  
> - if ((cpu_ecxfeature & CPUIDECX_MWAIT) == 0)
> + if ((cpu_ecxfeature & CPUIDECX_MWAIT) == 0 || cpuid_level < 0x5)
>   return;
>  
>   /* get the monitor granularity */
> @@ -505,7 +505,7 @@ cpu_init(struct cpu_info *ci)
>   cr4 |= CR4_OSXSAVE;
>   lcr4(cr4);
>  
> - if (cpu_ecxfeature & CPUIDECX_XSAVE) {
> + if (cpu_ecxfeature & CPUIDECX_XSAVE && cpuid_level >= 0xd) {
>   u_int32_t eax, ebx, ecx, edx;
>  
>   xsave_mask = XCR0_X87 | XCR0_SSE;
>
>
> !DSPAM:56b0ecaf140281406920325!
>