reboot loop on -current, one machine of several

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

reboot loop on -current, one machine of several

Nick Holland
Help.

I was upgrading a few very similar machines to -current today.
ONE of the three decided to be unpleasant.  The thing has a
serial console, and but it is about 370km from me. :-/

Upgrade from Sep 9 current to today's current via bsd.rd, just
like the other two.

Upon reboot, it does this (from /boot) :

booting hd0a:/bsd: 8484712+2429968+244048+0+667648 [636809heap full (0x9d304+65536)

And then reboots the system, as if from power-down/power-up.
(already something I haven't seen before)

Reboot from "bsd.rd" and "bsd.sp", same results.  reboot from "obsd"
(Sept 9), same results.  Not a kernel problem, it seems.  About this
point, I'm starting to think how the serial console has let me down.

I remember how to bring up a DRAC remote CD image via ssh tunnels
to the drac and how to run java in a windows browser, and
reboot off the remote CD image, do another upgrade, all goes fine
(again), but upon reboot, same results...  "heap full" and reboot.

Boot from remote CD, at the boot> prompt, enter "boot hd0a:/bsd",
and it boots Just Fine from the local hard disk (only boot pulled
from the remote CD).  Boot loader!  Reinstalled boot:

# installboot -v sd0
Using / as root
installing bootstrap on /dev/rsd0c
using first-stage /usr/mdec/biosboot, second-stage /usr/mdec/boot
copying /usr/mdec/boot to /boot
/boot is 3 blocks x 32768 bytes
fs block shift 3; part offset 64; inode block 24, offset 2088
master boot record (MBR) at sector 0
        partition 3: type 0xA6 offset 64 size 2000397671
/usr/mdec/biosboot will be written at sector 64

good, right?

Reboot off local hard disk, boom.  problem is still there.  maybe
not the boot loader. :-/

Verified /boot on trouble system and good system are the same.  

I'm not going to cry "bug", since there are two nearly identical
systems working just fine.  But I can't think of what I did wrong
or what to do to fix it.

Suggestions?

Nick.


$ dmesg
OpenBSD 6.2-current (GENERIC.MP) #203: Sat Nov 11 19:01:19 MST 2017
    [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 17131339776 (16337MB)
avail mem = 16605306880 (15836MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xe6680 (57 entries)
bios0: vendor Dell Inc. version "2.8.0" date 06/24/2014
bios0: Dell Inc. PowerEdge R210 II
acpi0 at bios0: rev 2
acpi0: sleep states S0 S4 S5
acpi0: tables DSDT FACP SPMI ASF! HPET APIC MCFG BOOT SSDT ASPT SSDT SSDT SPCR HEST ERST BERT EINJ
acpi0: wakeup devices P0P1(S4) GLAN(S0) EHC1(S4) EHC2(S4) XHC_(S4) PXSX(S4) RP01(S5) PXSX(S4) RP02(S5) PXSX(S4) RP03(S5) PXSX(S4) RP04(S5) PXSX(S4) RP05(S5) PXSX(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpihpet0 at acpi0: 14318179 Hz
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz, 3193.24 MHz
cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
cpu0: 256KB 64b/line 8-way L2 cache
acpihpet0: recalibrated TSC frequency 3192748207 Hz
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.1, IBE
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz, 3192.75 MHz
cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 1, core 0, package 0
cpu2 at mainbus0: apid 2 (application processor)
cpu2: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz, 3192.75 MHz
cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
cpu2: 256KB 64b/line 8-way L2 cache
cpu2: smt 0, core 1, package 0
cpu3 at mainbus0: apid 3 (application processor)
cpu3: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz, 3192.75 MHz
cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
cpu3: 256KB 64b/line 8-way L2 cache
cpu3: smt 1, core 1, package 0
cpu4 at mainbus0: apid 4 (application processor)
cpu4: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz, 3192.75 MHz
cpu4: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
cpu4: 256KB 64b/line 8-way L2 cache
cpu4: smt 0, core 2, package 0
cpu5 at mainbus0: apid 5 (application processor)
cpu5: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz, 3192.75 MHz
cpu5: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
cpu5: 256KB 64b/line 8-way L2 cache
cpu5: smt 1, core 2, package 0
cpu6 at mainbus0: apid 6 (application processor)
cpu6: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz, 3192.75 MHz
cpu6: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
cpu6: 256KB 64b/line 8-way L2 cache
cpu6: smt 0, core 3, package 0
cpu7 at mainbus0: apid 7 (application processor)
cpu7: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz, 3192.75 MHz
cpu7: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
cpu7: 256KB 64b/line 8-way L2 cache
cpu7: smt 1, core 3, package 0
ioapic0 at mainbus0: apid 0 pa 0xfec00000, version 20, 24 pins
acpimcfg0 at acpi0 addr 0xe0000000, bus 0-255
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 2 (P0P1)
acpiprt2 at acpi0: bus 1 (RP01)
acpiprt3 at acpi0: bus -1 (RP02)
acpiprt4 at acpi0: bus -1 (RP03)
acpiprt5 at acpi0: bus -1 (RP04)
acpiprt6 at acpi0: bus -1 (RP05)
acpiprt7 at acpi0: bus -1 (RP06)
acpiprt8 at acpi0: bus -1 (RP07)
acpiprt9 at acpi0: bus -1 (RP08)
acpiprt10 at acpi0: bus -1 (PEG0)
acpiprt11 at acpi0: bus -1 (PEG1)
acpiprt12 at acpi0: bus -1 (PEG2)
acpiprt13 at acpi0: bus -1 (PEG3)
acpicpu0 at acpi0: C3(350@104 mwait.1@0x20), C2(500@80 mwait.1@0x10), C1(1000@1 mwait.1), PSS
acpicpu1 at acpi0: C3(350@104 mwait.1@0x20), C2(500@80 mwait.1@0x10), C1(1000@1 mwait.1), PSS
acpicpu2 at acpi0: C3(350@104 mwait.1@0x20), C2(500@80 mwait.1@0x10), C1(1000@1 mwait.1), PSS
acpicpu3 at acpi0: C3(350@104 mwait.1@0x20), C2(500@80 mwait.1@0x10), C1(1000@1 mwait.1), PSS
acpicpu4 at acpi0: C3(350@104 mwait.1@0x20), C2(500@80 mwait.1@0x10), C1(1000@1 mwait.1), PSS
acpicpu5 at acpi0: C3(350@104 mwait.1@0x20), C2(500@80 mwait.1@0x10), C1(1000@1 mwait.1), PSS
acpicpu6 at acpi0: C3(350@104 mwait.1@0x20), C2(500@80 mwait.1@0x10), C1(1000@1 mwait.1), PSS
acpicpu7 at acpi0: C3(350@104 mwait.1@0x20), C2(500@80 mwait.1@0x10), C1(1000@1 mwait.1), PSS
acpipwrres0 at acpi0: FN00, resource for FAN0
acpipwrres1 at acpi0: FN01, resource for FAN1
acpipwrres2 at acpi0: FN02, resource for FAN2
acpipwrres3 at acpi0: FN03, resource for FAN3
acpipwrres4 at acpi0: FN04, resource for FAN4
acpitz0 at acpi0: critical temperature is 95 degC
"IPI0001" at acpi0 not configured
"INT3F0D" at acpi0 not configured
"PNP0A05" at acpi0 not configured
"PNP0A05" at acpi0 not configured
"PNP0C14" at acpi0 not configured
"PNP0C0B" at acpi0 not configured
"PNP0C0B" at acpi0 not configured
"PNP0C0B" at acpi0 not configured
"PNP0C0B" at acpi0 not configured
"PNP0C0B" at acpi0 not configured
ipmi at mainbus0 not configured
cpu0: Enhanced SpeedStep 3193 MHz: speeds: 3201, 3200, 3100, 3000, 2900, 2700, 2600, 2500, 2400, 2300, 2200, 2100, 1900, 1800, 1700, 1600 MHz
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel Xeon E3-1200 Host" rev 0x09
ehci0 at pci0 dev 26 function 0 "Intel 6 Series USB" rev 0x04: apic 0 int 20
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 configuration 1 interface 0 "Intel EHCI root hub" rev 2.00/1.00 addr 1
ppb0 at pci0 dev 28 function 0 "Intel 6 Series PCIE" rev 0xb4: msi
pci1 at ppb0 bus 1
bnx0 at pci1 dev 0 function 0 "Broadcom BCM5716" rev 0x20: apic 0 int 16
bnx1 at pci1 dev 0 function 1 "Broadcom BCM5716" rev 0x20: apic 0 int 17
ehci1 at pci0 dev 29 function 0 "Intel 6 Series USB" rev 0x04: apic 0 int 23
usb1 at ehci1: USB revision 2.0
uhub1 at usb1 configuration 1 interface 0 "Intel EHCI root hub" rev 2.00/1.00 addr 1
ppb1 at pci0 dev 30 function 0 "Intel 82801BA Hub-to-PCI" rev 0xa4
pci2 at ppb1 bus 2
2:3:0: mem address conflict 0xffff0000/0x10000
vga1 at pci2 dev 3 function 0 "Matrox MGA G200eW" rev 0x0a
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
pcib0 at pci0 dev 31 function 0 "Intel C202 LPC" rev 0x04
ahci0 at pci0 dev 31 function 2 "Intel 6 Series AHCI" rev 0x04: msi, AHCI 1.3
ahci0: port 0: 3.0Gb/s
ahci0: port 1: 1.5Gb/s
scsibus1 at ahci0: 32 targets
sd0 at scsibus1 targ 0 lun 0: <ATA, Samsung SSD 850, EXM0> SCSI3 0/direct fixed naa.50025388400563fe
sd0: 976762MB, 512 bytes/sector, 2000409264 sectors, thin
cd0 at scsibus1 targ 1 lun 0: <PLDS, DVD+-RW DS-8A8SH, KD51> ATAPI 5/cdrom removable
ichiic0 at pci0 dev 31 function 3 "Intel 6 Series SMBus" rev 0x04: apic 0 int 19
iic0 at ichiic0
sdtemp0 at iic0 addr 0x18: stts2002
sdtemp1 at iic0 addr 0x19: mcp98243
sdtemp2 at iic0 addr 0x1a: mcp98243
sdtemp3 at iic0 addr 0x1b: stts2002
spdmem0 at iic0 addr 0x50: 4GB DDR3 SDRAM ECC PC3-10600 with thermal sensor
spdmem1 at iic0 addr 0x51: 4GB DDR3 SDRAM ECC PC3-10600 with thermal sensor
spdmem2 at iic0 addr 0x52: 4GB DDR3 SDRAM ECC PC3-10600 with thermal sensor
spdmem3 at iic0 addr 0x53: 4GB DDR3 SDRAM ECC PC3-10600 with thermal sensor
isa0 at pcib0
isadma0 at isa0
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
com0: console
com1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
uhub2 at uhub0 port 1 configuration 1 interface 0 "Intel Rate Matching Hub" rev 2.00/0.00 addr 2
uhidev0 at uhub2 port 1 configuration 1 interface 0 "Avocent USB Composite Device-0" rev 1.10/0.00 addr 3
uhidev0: iclass 3/1
ukbd0 at uhidev0: 8 variable keys, 6 key codes
wskbd0 at ukbd0: console keyboard, using wsdisplay0
uhidev1 at uhub2 port 1 configuration 1 interface 1 "Avocent USB Composite Device-0" rev 1.10/0.00 addr 3
uhidev1: iclass 3/1
ums0 at uhidev1: 3 buttons, Z dir
wsmouse0 at ums0 mux 0
umass0 at uhub2 port 2 configuration 1 interface 0 "Avocent USB Composite Device-1" rev 2.00/0.00 addr 4
umass0: using SCSI over Bulk-Only
scsibus2 at umass0: 2 targets, initiator 0
sd1 at scsibus2 targ 1 lun 0: <iDRAC, LCDRIVE, 0323> SCSI0 0/direct removable
umass1 at uhub2 port 2 configuration 1 interface 1 "Avocent USB Composite Device-1" rev 2.00/0.00 addr 4
umass1: using SCSI over Bulk-Only
scsibus3 at umass1: 2 targets, initiator 0
cd1 at scsibus3 targ 1 lun 0: <iDRAC, Virtual CD, 0323> SCSI0 5/cdrom removable
sd2 at scsibus3 targ 1 lun 1: <iDRAC, Virtual Floppy, 0323> SCSI0 0/direct removable
uhub3 at uhub1 port 1 configuration 1 interface 0 "Intel Rate Matching Hub" rev 2.00/0.00 addr 2
uhub4 at uhub3 port 5 configuration 1 interface 0 "Standard Microsystems product 0x2514" rev 2.00/0.00 addr 3
vscsi0 at root
scsibus4 at vscsi0: 256 targets
softraid0 at root
scsibus5 at softraid0: 256 targets
root on sd0a (ccde728ba2c9bbe7.a) swap on sd0b dump on sd0b
bnx0: address d4:ae:52:b9:6a:10
brgphy0 at bnx0 phy 1: BCM5709 10/100/1000baseT PHY, rev. 8
bnx1: address d4:ae:52:b9:6a:11
brgphy1 at bnx1 phy 1: BCM5709 10/100/1000baseT PHY, rev. 8

# disklabel sd0
# /dev/rsd0c:
type: SCSI
disk: SCSI disk
label: Samsung SSD 850
duid: ccde728ba2c9bbe7
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 255
sectors/cylinder: 16065
cylinders: 124519
total sectors: 2000409264
boundstart: 64
boundend: 2000397735
drivedata: 0

16 partitions:
#                size           offset  fstype [fsize bsize   cpg]
  a:          1028096               64  4.2BSD   4096 32768     1 # /
  b:         41929705       1048578560    swap                    # /repo/anoncvs/tmp
  c:       2000409264                0  unused                    
  d:         20980864       1090508288  4.2BSD   2048 16384     1 # /usr
  e:          8385920       1111489152  4.2BSD   2048 16384     1 # /tmp
  f:         20964832       1119875072  4.2BSD   2048 16384     1 # /var
  g:         41929664       1161804704  4.2BSD   2048 16384     1 # /repo
  h:         20964800       1140839904  4.2BSD   2048 16384     1 # /home
  i:            16064       1203734368  4.2BSD   2048 16384     1 # /repo/anoncvs/dev
  j:         83875360       1203750432  4.2BSD   2048 16384     1

# fdisk sd0
Disk: sd0       geometry: 124519/255/63 [2000409264 Sectors]
Offset: 0       Signature: 0xAA55
            Starting         Ending         LBA Info:
 #: id      C   H   S -      C   H   S [       start:        size ]
-------------------------------------------------------------------------------
 0: 00      0   0   0 -      0   0   0 [           0:           0 ] unused      
 1: 00      0   0   0 -      0   0   0 [           0:           0 ] unused      
 2: 00      0   0   0 -      0   0   0 [           0:           0 ] unused      
*3: A6      0   1   2 - 124518 254  63 [          64:  2000397671 ] OpenBSD    

Reply | Threaded
Open this post in threaded view
|

Re: reboot loop on -current, one machine of several

Otto Moerbeek
On Sun, Nov 12, 2017 at 01:28:39PM -0500, Nick Holland wrote:

> Help.
>
> I was upgrading a few very similar machines to -current today.
> ONE of the three decided to be unpleasant.  The thing has a
> serial console, and but it is about 370km from me. :-/
>
> Upgrade from Sep 9 current to today's current via bsd.rd, just
> like the other two.
>
> Upon reboot, it does this (from /boot) :
>
> booting hd0a:/bsd: 8484712+2429968+244048+0+667648 [636809heap full (0x9d304+65536)
>
> And then reboots the system, as if from power-down/power-up.
> (already something I haven't seen before)
>
> Reboot from "bsd.rd" and "bsd.sp", same results.  reboot from "obsd"
> (Sept 9), same results.  Not a kernel problem, it seems.  About this
> point, I'm starting to think how the serial console has let me down.
>
> I remember how to bring up a DRAC remote CD image via ssh tunnels
> to the drac and how to run java in a windows browser, and
> reboot off the remote CD image, do another upgrade, all goes fine
> (again), but upon reboot, same results...  "heap full" and reboot.
>
> Boot from remote CD, at the boot> prompt, enter "boot hd0a:/bsd",
> and it boots Just Fine from the local hard disk (only boot pulled
> from the remote CD).  Boot loader!  Reinstalled boot:
>
> # installboot -v sd0
> Using / as root
> installing bootstrap on /dev/rsd0c
> using first-stage /usr/mdec/biosboot, second-stage /usr/mdec/boot
> copying /usr/mdec/boot to /boot
> /boot is 3 blocks x 32768 bytes
> fs block shift 3; part offset 64; inode block 24, offset 2088
> master boot record (MBR) at sector 0
>         partition 3: type 0xA6 offset 64 size 2000397671
> /usr/mdec/biosboot will be written at sector 64
>
> good, right?
>
> Reboot off local hard disk, boom.  problem is still there.  maybe
> not the boot loader. :-/
>
> Verified /boot on trouble system and good system are the same.  
>
> I'm not going to cry "bug", since there are two nearly identical
> systems working just fine.  But I can't think of what I did wrong
> or what to do to fix it.
>
> Suggestions?

You are hitting -DHEAP_LIMIT=0xA0000 in /boot. The code is in libsa/alloa.c

No idea why. But something in that system is different.

You do have one weird line in your disklabel output: a filesystem
mounted on swap?

Can you boot into single user mode?

        -Otto

>
> Nick.
>
>
> $ dmesg
> OpenBSD 6.2-current (GENERIC.MP) #203: Sat Nov 11 19:01:19 MST 2017
>     [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 17131339776 (16337MB)
> avail mem = 16605306880 (15836MB)
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xe6680 (57 entries)
> bios0: vendor Dell Inc. version "2.8.0" date 06/24/2014
> bios0: Dell Inc. PowerEdge R210 II
> acpi0 at bios0: rev 2
> acpi0: sleep states S0 S4 S5
> acpi0: tables DSDT FACP SPMI ASF! HPET APIC MCFG BOOT SSDT ASPT SSDT SSDT SPCR HEST ERST BERT EINJ
> acpi0: wakeup devices P0P1(S4) GLAN(S0) EHC1(S4) EHC2(S4) XHC_(S4) PXSX(S4) RP01(S5) PXSX(S4) RP02(S5) PXSX(S4) RP03(S5) PXSX(S4) RP04(S5) PXSX(S4) RP05(S5) PXSX(S4) [...]
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpihpet0 at acpi0: 14318179 Hz
> acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz, 3193.24 MHz
> cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
> cpu0: 256KB 64b/line 8-way L2 cache
> acpihpet0: recalibrated TSC frequency 3192748207 Hz
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 99MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.1.1, IBE
> cpu1 at mainbus0: apid 1 (application processor)
> cpu1: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz, 3192.75 MHz
> cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
> cpu1: 256KB 64b/line 8-way L2 cache
> cpu1: smt 1, core 0, package 0
> cpu2 at mainbus0: apid 2 (application processor)
> cpu2: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz, 3192.75 MHz
> cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
> cpu2: 256KB 64b/line 8-way L2 cache
> cpu2: smt 0, core 1, package 0
> cpu3 at mainbus0: apid 3 (application processor)
> cpu3: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz, 3192.75 MHz
> cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
> cpu3: 256KB 64b/line 8-way L2 cache
> cpu3: smt 1, core 1, package 0
> cpu4 at mainbus0: apid 4 (application processor)
> cpu4: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz, 3192.75 MHz
> cpu4: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
> cpu4: 256KB 64b/line 8-way L2 cache
> cpu4: smt 0, core 2, package 0
> cpu5 at mainbus0: apid 5 (application processor)
> cpu5: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz, 3192.75 MHz
> cpu5: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
> cpu5: 256KB 64b/line 8-way L2 cache
> cpu5: smt 1, core 2, package 0
> cpu6 at mainbus0: apid 6 (application processor)
> cpu6: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz, 3192.75 MHz
> cpu6: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
> cpu6: 256KB 64b/line 8-way L2 cache
> cpu6: smt 0, core 3, package 0
> cpu7 at mainbus0: apid 7 (application processor)
> cpu7: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz, 3192.75 MHz
> cpu7: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
> cpu7: 256KB 64b/line 8-way L2 cache
> cpu7: smt 1, core 3, package 0
> ioapic0 at mainbus0: apid 0 pa 0xfec00000, version 20, 24 pins
> acpimcfg0 at acpi0 addr 0xe0000000, bus 0-255
> acpiprt0 at acpi0: bus 0 (PCI0)
> acpiprt1 at acpi0: bus 2 (P0P1)
> acpiprt2 at acpi0: bus 1 (RP01)
> acpiprt3 at acpi0: bus -1 (RP02)
> acpiprt4 at acpi0: bus -1 (RP03)
> acpiprt5 at acpi0: bus -1 (RP04)
> acpiprt6 at acpi0: bus -1 (RP05)
> acpiprt7 at acpi0: bus -1 (RP06)
> acpiprt8 at acpi0: bus -1 (RP07)
> acpiprt9 at acpi0: bus -1 (RP08)
> acpiprt10 at acpi0: bus -1 (PEG0)
> acpiprt11 at acpi0: bus -1 (PEG1)
> acpiprt12 at acpi0: bus -1 (PEG2)
> acpiprt13 at acpi0: bus -1 (PEG3)
> acpicpu0 at acpi0: C3(350@104 mwait.1@0x20), C2(500@80 mwait.1@0x10), C1(1000@1 mwait.1), PSS
> acpicpu1 at acpi0: C3(350@104 mwait.1@0x20), C2(500@80 mwait.1@0x10), C1(1000@1 mwait.1), PSS
> acpicpu2 at acpi0: C3(350@104 mwait.1@0x20), C2(500@80 mwait.1@0x10), C1(1000@1 mwait.1), PSS
> acpicpu3 at acpi0: C3(350@104 mwait.1@0x20), C2(500@80 mwait.1@0x10), C1(1000@1 mwait.1), PSS
> acpicpu4 at acpi0: C3(350@104 mwait.1@0x20), C2(500@80 mwait.1@0x10), C1(1000@1 mwait.1), PSS
> acpicpu5 at acpi0: C3(350@104 mwait.1@0x20), C2(500@80 mwait.1@0x10), C1(1000@1 mwait.1), PSS
> acpicpu6 at acpi0: C3(350@104 mwait.1@0x20), C2(500@80 mwait.1@0x10), C1(1000@1 mwait.1), PSS
> acpicpu7 at acpi0: C3(350@104 mwait.1@0x20), C2(500@80 mwait.1@0x10), C1(1000@1 mwait.1), PSS
> acpipwrres0 at acpi0: FN00, resource for FAN0
> acpipwrres1 at acpi0: FN01, resource for FAN1
> acpipwrres2 at acpi0: FN02, resource for FAN2
> acpipwrres3 at acpi0: FN03, resource for FAN3
> acpipwrres4 at acpi0: FN04, resource for FAN4
> acpitz0 at acpi0: critical temperature is 95 degC
> "IPI0001" at acpi0 not configured
> "INT3F0D" at acpi0 not configured
> "PNP0A05" at acpi0 not configured
> "PNP0A05" at acpi0 not configured
> "PNP0C14" at acpi0 not configured
> "PNP0C0B" at acpi0 not configured
> "PNP0C0B" at acpi0 not configured
> "PNP0C0B" at acpi0 not configured
> "PNP0C0B" at acpi0 not configured
> "PNP0C0B" at acpi0 not configured
> ipmi at mainbus0 not configured
> cpu0: Enhanced SpeedStep 3193 MHz: speeds: 3201, 3200, 3100, 3000, 2900, 2700, 2600, 2500, 2400, 2300, 2200, 2100, 1900, 1800, 1700, 1600 MHz
> pci0 at mainbus0 bus 0
> pchb0 at pci0 dev 0 function 0 "Intel Xeon E3-1200 Host" rev 0x09
> ehci0 at pci0 dev 26 function 0 "Intel 6 Series USB" rev 0x04: apic 0 int 20
> usb0 at ehci0: USB revision 2.0
> uhub0 at usb0 configuration 1 interface 0 "Intel EHCI root hub" rev 2.00/1.00 addr 1
> ppb0 at pci0 dev 28 function 0 "Intel 6 Series PCIE" rev 0xb4: msi
> pci1 at ppb0 bus 1
> bnx0 at pci1 dev 0 function 0 "Broadcom BCM5716" rev 0x20: apic 0 int 16
> bnx1 at pci1 dev 0 function 1 "Broadcom BCM5716" rev 0x20: apic 0 int 17
> ehci1 at pci0 dev 29 function 0 "Intel 6 Series USB" rev 0x04: apic 0 int 23
> usb1 at ehci1: USB revision 2.0
> uhub1 at usb1 configuration 1 interface 0 "Intel EHCI root hub" rev 2.00/1.00 addr 1
> ppb1 at pci0 dev 30 function 0 "Intel 82801BA Hub-to-PCI" rev 0xa4
> pci2 at ppb1 bus 2
> 2:3:0: mem address conflict 0xffff0000/0x10000
> vga1 at pci2 dev 3 function 0 "Matrox MGA G200eW" rev 0x0a
> wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
> wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
> pcib0 at pci0 dev 31 function 0 "Intel C202 LPC" rev 0x04
> ahci0 at pci0 dev 31 function 2 "Intel 6 Series AHCI" rev 0x04: msi, AHCI 1.3
> ahci0: port 0: 3.0Gb/s
> ahci0: port 1: 1.5Gb/s
> scsibus1 at ahci0: 32 targets
> sd0 at scsibus1 targ 0 lun 0: <ATA, Samsung SSD 850, EXM0> SCSI3 0/direct fixed naa.50025388400563fe
> sd0: 976762MB, 512 bytes/sector, 2000409264 sectors, thin
> cd0 at scsibus1 targ 1 lun 0: <PLDS, DVD+-RW DS-8A8SH, KD51> ATAPI 5/cdrom removable
> ichiic0 at pci0 dev 31 function 3 "Intel 6 Series SMBus" rev 0x04: apic 0 int 19
> iic0 at ichiic0
> sdtemp0 at iic0 addr 0x18: stts2002
> sdtemp1 at iic0 addr 0x19: mcp98243
> sdtemp2 at iic0 addr 0x1a: mcp98243
> sdtemp3 at iic0 addr 0x1b: stts2002
> spdmem0 at iic0 addr 0x50: 4GB DDR3 SDRAM ECC PC3-10600 with thermal sensor
> spdmem1 at iic0 addr 0x51: 4GB DDR3 SDRAM ECC PC3-10600 with thermal sensor
> spdmem2 at iic0 addr 0x52: 4GB DDR3 SDRAM ECC PC3-10600 with thermal sensor
> spdmem3 at iic0 addr 0x53: 4GB DDR3 SDRAM ECC PC3-10600 with thermal sensor
> isa0 at pcib0
> isadma0 at isa0
> com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
> com0: console
> com1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
> pckbc0 at isa0 port 0x60/5 irq 1 irq 12
> pcppi0 at isa0 port 0x61
> spkr0 at pcppi0
> uhub2 at uhub0 port 1 configuration 1 interface 0 "Intel Rate Matching Hub" rev 2.00/0.00 addr 2
> uhidev0 at uhub2 port 1 configuration 1 interface 0 "Avocent USB Composite Device-0" rev 1.10/0.00 addr 3
> uhidev0: iclass 3/1
> ukbd0 at uhidev0: 8 variable keys, 6 key codes
> wskbd0 at ukbd0: console keyboard, using wsdisplay0
> uhidev1 at uhub2 port 1 configuration 1 interface 1 "Avocent USB Composite Device-0" rev 1.10/0.00 addr 3
> uhidev1: iclass 3/1
> ums0 at uhidev1: 3 buttons, Z dir
> wsmouse0 at ums0 mux 0
> umass0 at uhub2 port 2 configuration 1 interface 0 "Avocent USB Composite Device-1" rev 2.00/0.00 addr 4
> umass0: using SCSI over Bulk-Only
> scsibus2 at umass0: 2 targets, initiator 0
> sd1 at scsibus2 targ 1 lun 0: <iDRAC, LCDRIVE, 0323> SCSI0 0/direct removable
> umass1 at uhub2 port 2 configuration 1 interface 1 "Avocent USB Composite Device-1" rev 2.00/0.00 addr 4
> umass1: using SCSI over Bulk-Only
> scsibus3 at umass1: 2 targets, initiator 0
> cd1 at scsibus3 targ 1 lun 0: <iDRAC, Virtual CD, 0323> SCSI0 5/cdrom removable
> sd2 at scsibus3 targ 1 lun 1: <iDRAC, Virtual Floppy, 0323> SCSI0 0/direct removable
> uhub3 at uhub1 port 1 configuration 1 interface 0 "Intel Rate Matching Hub" rev 2.00/0.00 addr 2
> uhub4 at uhub3 port 5 configuration 1 interface 0 "Standard Microsystems product 0x2514" rev 2.00/0.00 addr 3
> vscsi0 at root
> scsibus4 at vscsi0: 256 targets
> softraid0 at root
> scsibus5 at softraid0: 256 targets
> root on sd0a (ccde728ba2c9bbe7.a) swap on sd0b dump on sd0b
> bnx0: address d4:ae:52:b9:6a:10
> brgphy0 at bnx0 phy 1: BCM5709 10/100/1000baseT PHY, rev. 8
> bnx1: address d4:ae:52:b9:6a:11
> brgphy1 at bnx1 phy 1: BCM5709 10/100/1000baseT PHY, rev. 8
>
> # disklabel sd0
> # /dev/rsd0c:
> type: SCSI
> disk: SCSI disk
> label: Samsung SSD 850
> duid: ccde728ba2c9bbe7
> flags:
> bytes/sector: 512
> sectors/track: 63
> tracks/cylinder: 255
> sectors/cylinder: 16065
> cylinders: 124519
> total sectors: 2000409264
> boundstart: 64
> boundend: 2000397735
> drivedata: 0
>
> 16 partitions:
> #                size           offset  fstype [fsize bsize   cpg]
>   a:          1028096               64  4.2BSD   4096 32768     1 # /
>   b:         41929705       1048578560    swap                    # /repo/anoncvs/tmp
>   c:       2000409264                0  unused                    
>   d:         20980864       1090508288  4.2BSD   2048 16384     1 # /usr
>   e:          8385920       1111489152  4.2BSD   2048 16384     1 # /tmp
>   f:         20964832       1119875072  4.2BSD   2048 16384     1 # /var
>   g:         41929664       1161804704  4.2BSD   2048 16384     1 # /repo
>   h:         20964800       1140839904  4.2BSD   2048 16384     1 # /home
>   i:            16064       1203734368  4.2BSD   2048 16384     1 # /repo/anoncvs/dev
>   j:         83875360       1203750432  4.2BSD   2048 16384     1
>
> # fdisk sd0
> Disk: sd0       geometry: 124519/255/63 [2000409264 Sectors]
> Offset: 0       Signature: 0xAA55
>             Starting         Ending         LBA Info:
>  #: id      C   H   S -      C   H   S [       start:        size ]
> -------------------------------------------------------------------------------
>  0: 00      0   0   0 -      0   0   0 [           0:           0 ] unused      
>  1: 00      0   0   0 -      0   0   0 [           0:           0 ] unused      
>  2: 00      0   0   0 -      0   0   0 [           0:           0 ] unused      
> *3: A6      0   1   2 - 124518 254  63 [          64:  2000397671 ] OpenBSD    

Reply | Threaded
Open this post in threaded view
|

Re: reboot loop on -current, one machine of several

Nick Holland
On 11/12/17 14:13, Otto Moerbeek wrote:

> On Sun, Nov 12, 2017 at 01:28:39PM -0500, Nick Holland wrote:
>
>> Help.
>>
>> I was upgrading a few very similar machines to -current today.
>> ONE of the three decided to be unpleasant.  The thing has a
>> serial console, and but it is about 370km from me. :-/
>>
>> Upgrade from Sep 9 current to today's current via bsd.rd, just
>> like the other two.
>>
>> Upon reboot, it does this (from /boot) :
>>
>> booting hd0a:/bsd: 8484712+2429968+244048+0+667648 [636809heap full (0x9d304+65536)
>>
>> And then reboots the system, as if from power-down/power-up.
>> (already something I haven't seen before)
>>
>> Reboot from "bsd.rd" and "bsd.sp", same results.  reboot from "obsd"
>> (Sept 9), same results.  Not a kernel problem, it seems.  About this
>> point, I'm starting to think how the serial console has let me down.
>>
>> I remember how to bring up a DRAC remote CD image via ssh tunnels
>> to the drac and how to run java in a windows browser, and
>> reboot off the remote CD image, do another upgrade, all goes fine
>> (again), but upon reboot, same results...  "heap full" and reboot.
>>
>> Boot from remote CD, at the boot> prompt, enter "boot hd0a:/bsd",
>> and it boots Just Fine from the local hard disk (only boot pulled
>> from the remote CD).  Boot loader!  Reinstalled boot:
>>
>> # installboot -v sd0
>> Using / as root
>> installing bootstrap on /dev/rsd0c
>> using first-stage /usr/mdec/biosboot, second-stage /usr/mdec/boot
>> copying /usr/mdec/boot to /boot
>> /boot is 3 blocks x 32768 bytes
>> fs block shift 3; part offset 64; inode block 24, offset 2088
>> master boot record (MBR) at sector 0
>>         partition 3: type 0xA6 offset 64 size 2000397671
>> /usr/mdec/biosboot will be written at sector 64
>>
>> good, right?
>>
>> Reboot off local hard disk, boom.  problem is still there.  maybe
>> not the boot loader. :-/
>>
>> Verified /boot on trouble system and good system are the same.  
>>
>> I'm not going to cry "bug", since there are two nearly identical
>> systems working just fine.  But I can't think of what I did wrong
>> or what to do to fix it.
>>
>> Suggestions?
>
> You are hitting -DHEAP_LIMIT=0xA0000 in /boot. The code is in libsa/alloa.c
>
> No idea why. But something in that system is different.
>
> You do have one weird line in your disklabel output: a filesystem
> mounted on swap?

that's an mfs.  This application has one directory which has a HUGE
benefit to an MFS for tmp files.  Though the reboot happens long before
the mfs is created.

$ more /etc/fstab                                                                                                          
cde728ba2c9bbe7.b none swap sw
ccde728ba2c9bbe7.a / ffs rw,noatime 1 1
ccde728ba2c9bbe7.h /home ffs rw,noatime,nodev,nosuid 1 2
ccde728ba2c9bbe7.e /tmp ffs rw,noatime,nodev,nosuid 1 2
ccde728ba2c9bbe7.d /usr ffs rw,noatime,nodev 1 2
ccde728ba2c9bbe7.f /var ffs rw,noatime,nodev,nosuid 1 2
ccde728ba2c9bbe7.g /repo ffs rw,noatime,nodev 1 2
ccde728ba2c9bbe7.i /repo/anoncvs/dev ffs rw,noatime,nosuid 1 2
/dev/sd0b /repo/anoncvs/tmp mfs rw,nodev,nosuid,-m=1,-s=3072000,-i=2048 0 0

> Can you boot into single user mode?

nope.  Considering how fast the reboot happens, I wouldn't have expected
it to, unless something is very different very early in the boot process.
This is what happened:

On the console:
Using drive 0, partion 3.
Loading...
probing: pc0 com0 com1 mem[631K 3038M 2M 68K 72K 176k 64K 13312M a20=on]
disk: fd0 hd0+
>> OpenBSD/amd64 BOOT 3.33
switching console to com0

and then on the serial console:
>> OpenBSD/amd64 BOOT 3.33                                                      
boot> boot -s                                                                  
booting hd0a:/bsd: 8484304+2429960+244080+0+667648 [643739heap full (0x9d4fc+65536)

(boom. reboot)

here's a dmesg diff between the "good" and "bad" machines...
$ diff -u dmesg.good dmesg.bad  
--- dmesg.good  Sun Nov 12 14:51:30 2017
+++ dmesg.bad   Sun Nov 12 14:51:21 2017
@@ -1,7 +1,7 @@
 OpenBSD 6.2-current (GENERIC.MP) #203: Sat Nov 11 19:01:19 MST 2017
     [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
 real mem = 17131339776 (16337MB)
-avail mem = 16605302784 (15836MB)
+avail mem = 16605294592 (15836MB)
 mpath0 at root
 scsibus0 at mpath0: 256 targets
 mainbus0 at root
@@ -16,46 +16,46 @@
 acpihpet0 at acpi0: 14318179 Hz
 acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
 cpu0 at mainbus0: apid 0 (boot processor)
-cpu0: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz, 3193.18 MHz
+cpu0: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz, 3193.22 MHz
 cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
 cpu0: 256KB 64b/line 8-way L2 cache
-acpihpet0: recalibrated TSC frequency 3192750214 Hz
+acpihpet0: recalibrated TSC frequency 3192750287 Hz
 cpu0: smt 0, core 0, package 0
 mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
 cpu0: apic clock running at 99MHz
 cpu0: mwait min=64, max=64, C-substates=0.2.1.1, IBE
 cpu1 at mainbus0: apid 1 (application processor)
-cpu1: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz, 3192.75 MHz
+cpu1: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz, 3192.76 MHz
 cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
 cpu1: 256KB 64b/line 8-way L2 cache
 cpu1: smt 1, core 0, package 0
 cpu2 at mainbus0: apid 2 (application processor)
-cpu2: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz, 3192.75 MHz
+cpu2: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz, 3192.76 MHz
 cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
 cpu2: 256KB 64b/line 8-way L2 cache
 cpu2: smt 0, core 1, package 0
 cpu3 at mainbus0: apid 3 (application processor)
-cpu3: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz, 3192.75 MHz
+cpu3: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz, 3192.76 MHz
 cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
 cpu3: 256KB 64b/line 8-way L2 cache
 cpu3: smt 1, core 1, package 0
 cpu4 at mainbus0: apid 4 (application processor)
-cpu4: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz, 3192.75 MHz
+cpu4: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz, 3192.76 MHz
 cpu4: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
 cpu4: 256KB 64b/line 8-way L2 cache
 cpu4: smt 0, core 2, package 0
 cpu5 at mainbus0: apid 5 (application processor)
-cpu5: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz, 3192.75 MHz
+cpu5: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz, 3192.76 MHz
 cpu5: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
 cpu5: 256KB 64b/line 8-way L2 cache
 cpu5: smt 1, core 2, package 0
 cpu6 at mainbus0: apid 6 (application processor)
-cpu6: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz, 3192.75 MHz
+cpu6: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz, 3192.76 MHz
 cpu6: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
 cpu6: 256KB 64b/line 8-way L2 cache
 cpu6: smt 0, core 3, package 0
 cpu7 at mainbus0: apid 7 (application processor)
-cpu7: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz, 3192.75 MHz
+cpu7: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz, 3192.76 MHz
 cpu7: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
 cpu7: 256KB 64b/line 8-way L2 cache
 cpu7: smt 1, core 3, package 0
@@ -121,16 +121,15 @@
 wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
 pcib0 at pci0 dev 31 function 0 "Intel C202 LPC" rev 0x04
 ahci0 at pci0 dev 31 function 2 "Intel 6 Series AHCI" rev 0x04: msi, AHCI 1.3
-ahci0: port 2: 3.0Gb/s
-ahci0: port 3: 3.0Gb/s
+ahci0: port 0: 3.0Gb/s
+ahci0: port 1: 1.5Gb/s
 scsibus1 at ahci0: 32 targets
-sd0 at scsibus1 targ 2 lun 0: <ATA, Samsung SSD 850, EXM0> SCSI3 0/direct fixed naa.50025388400562d4
+sd0 at scsibus1 targ 0 lun 0: <ATA, Samsung SSD 850, EXM0> SCSI3 0/direct fixed naa.50025388400563fe
 sd0: 976762MB, 512 bytes/sector, 2000409264 sectors, thin
-sd1 at scsibus1 targ 3 lun 0: <ATA, Samsung SSD 850, EXM0> SCSI3 0/direct fixed naa.5002538c70007b02
-sd1: 1953514MB, 512 bytes/sector, 4000797360 sectors, thin
+cd0 at scsibus1 targ 1 lun 0: <PLDS, DVD+-RW DS-8A8SH, KD51> ATAPI 5/cdrom removable
 ichiic0 at pci0 dev 31 function 3 "Intel 6 Series SMBus" rev 0x04: apic 0 int 19
 iic0 at ichiic0
-sdtemp0 at iic0 addr 0x18: mcp98243
+sdtemp0 at iic0 addr 0x18: stts2002
 sdtemp1 at iic0 addr 0x19: mcp98243
 sdtemp2 at iic0 addr 0x1a: mcp98243
 sdtemp3 at iic0 addr 0x1b: stts2002
@@ -141,7 +140,6 @@
 isa0 at pcib0
 isadma0 at isa0
 com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
-com0: console
 com1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
 pckbc0 at isa0 port 0x60/5 irq 1 irq 12
 pcppi0 at isa0 port 0x61
@@ -155,14 +153,23 @@
 uhidev1: iclass 3/1
 ums0 at uhidev1: 3 buttons, Z dir
 wsmouse0 at ums0 mux 0
+umass0 at uhub2 port 2 configuration 1 interface 0 "Avocent USB Composite Device-1" rev 2.00/0.00 addr 4
+umass0: using SCSI over Bulk-Only
+scsibus2 at umass0: 2 targets, initiator 0
+sd1 at scsibus2 targ 1 lun 0: <iDRAC, LCDRIVE, 0323> SCSI0 0/direct removable
+umass1 at uhub2 port 2 configuration 1 interface 1 "Avocent USB Composite Device-1" rev 2.00/0.00 addr 4
+umass1: using SCSI over Bulk-Only
+scsibus3 at umass1: 2 targets, initiator 0
+cd1 at scsibus3 targ 1 lun 0: <iDRAC, Virtual CD, 0323> SCSI0 5/cdrom removable
+sd2 at scsibus3 targ 1 lun 1: <iDRAC, Virtual Floppy, 0323> SCSI0 0/direct removable
 uhub3 at uhub1 port 1 configuration 1 interface 0 "Intel Rate Matching Hub" rev 2.00/0.00 addr 2
 uhub4 at uhub3 port 5 configuration 1 interface 0 "Standard Microsystems product 0x2514" rev 2.00/0.00 addr 3
 vscsi0 at root
-scsibus2 at vscsi0: 256 targets
+scsibus4 at vscsi0: 256 targets
 softraid0 at root
-scsibus3 at softraid0: 256 targets
-root on sd0a (ff6add5e908e72c7.a) swap on sd0b dump on sd0b
-bnx0: address d4:ae:52:b9:6a:80
+scsibus5 at softraid0: 256 targets
+root on sd0a (ccde728ba2c9bbe7.a) swap on sd0b dump on sd0b
+bnx0: address d4:ae:52:b9:6a:10
 brgphy0 at bnx0 phy 1: BCM5709 10/100/1000baseT PHY, rev. 8
-bnx1: address d4:ae:52:b9:6a:81
+bnx1: address d4:ae:52:b9:6a:11
 brgphy1 at bnx1 phy 1: BCM5709 10/100/1000baseT PHY, rev. 8

Nick.

Reply | Threaded
Open this post in threaded view
|

Re: reboot loop on -current, one machine of several

Mihai Popescu-3
In reply to this post by Nick Holland
> booting hd0a:/bsd: 8484712+2429968+244048+0+667648 [636809heap full (0x9d304+65536)

Maybe a corrupted or too big /bsd file?

I am curious about the cause.

Reply | Threaded
Open this post in threaded view
|

Re: reboot loop on -current, one machine of several

Gregory Edigarov-5
In reply to this post by Nick Holland


On 12.11.17 21:59, Nick Holland wrote:

> On 11/12/17 14:13, Otto Moerbeek wrote:
>> On Sun, Nov 12, 2017 at 01:28:39PM -0500, Nick Holland wrote:
>>
>>> Help.
>>>
>>> I was upgrading a few very similar machines to -current today.
>>> ONE of the three decided to be unpleasant.  The thing has a
>>> serial console, and but it is about 370km from me. :-/
>>>
>>> Upgrade from Sep 9 current to today's current via bsd.rd, just
>>> like the other two.
>>>
>>> Upon reboot, it does this (from /boot) :
>>>
>>> booting hd0a:/bsd: 8484712+2429968+244048+0+667648 [636809heap full (0x9d304+65536)
>>>
>>> And then reboots the system, as if from power-down/power-up.
>>> (already something I haven't seen before)
>>>
>>> Reboot from "bsd.rd" and "bsd.sp", same results.  reboot from "obsd"
>>> (Sept 9), same results.  Not a kernel problem, it seems.  About this
>>> point, I'm starting to think how the serial console has let me down.
>>>
>>> I remember how to bring up a DRAC remote CD image via ssh tunnels
>>> to the drac and how to run java in a windows browser, and
>>> reboot off the remote CD image, do another upgrade, all goes fine
>>> (again), but upon reboot, same results...  "heap full" and reboot.
>>>
>>> Boot from remote CD, at the boot> prompt, enter "boot hd0a:/bsd",
>>> and it boots Just Fine from the local hard disk (only boot pulled
>>> from the remote CD).  Boot loader!  Reinstalled boot:
>>>
>>> # installboot -v sd0
>>> Using / as root
>>> installing bootstrap on /dev/rsd0c
>>> using first-stage /usr/mdec/biosboot, second-stage /usr/mdec/boot
>>> copying /usr/mdec/boot to /boot
>>> /boot is 3 blocks x 32768 bytes
>>> fs block shift 3; part offset 64; inode block 24, offset 2088
>>> master boot record (MBR) at sector 0
>>>          partition 3: type 0xA6 offset 64 size 2000397671
>>> /usr/mdec/biosboot will be written at sector 64
>>>
>>> good, right?
>>>
>>> Reboot off local hard disk, boom.  problem is still there.  maybe
>>> not the boot loader. :-/
>>>
>>> Verified /boot on trouble system and good system are the same.
>>>
>>> I'm not going to cry "bug", since there are two nearly identical
>>> systems working just fine.  But I can't think of what I did wrong
>>> or what to do to fix it.
>>>
>>> Suggestions?
>> You are hitting -DHEAP_LIMIT=0xA0000 in /boot. The code is in libsa/alloa.c
>>
>> No idea why. But something in that system is different.
>>
>> You do have one weird line in your disklabel output: a filesystem
>> mounted on swap?
> that's an mfs.  This application has one directory which has a HUGE
> benefit to an MFS for tmp files.  Though the reboot happens long before
> the mfs is created.
>
>
>   scsibus1 at ahci0: 32 targets
> -sd0 at scsibus1 targ 2 lun 0: <ATA, Samsung SSD 850, EXM0> SCSI3 0/direct fixed naa.50025388400562d4
> +sd0 at scsibus1 targ 0 lun 0: <ATA, Samsung SSD 850, EXM0> SCSI3 0/direct fixed naa.50025388400563fe
>   sd0: 976762MB, 512 bytes/sector, 2000409264 sectors, thin
> -sd1 at scsibus1 targ 3 lun 0: <ATA, Samsung SSD 850, EXM0> SCSI3 0/direct fixed naa.5002538c70007b02
> -sd1: 1953514MB, 512 bytes/sector, 4000797360 sectors, thin
> +cd0 at scsibus1 targ 1 lun 0: <PLDS, DVD+-RW DS-8A8SH, KD51> ATAPI 5/cdrom removable
>   ichiic0 at pci0 dev 31 function 3 "Intel 6 Series SMBus" rev 0x04: apic 0 int 19
>   iic0 at ichiic0
My suspicion goes to SSDs. one of them have somehow become bad.
>
> Nick.
>

Reply | Threaded
Open this post in threaded view
|

Re: reboot loop on -current, one machine of several

Nick Holland
On 11/13/17 14:24, Gregory Edigarov wrote:
...
>>   scsibus1 at ahci0: 32 targets
>> -sd0 at scsibus1 targ 2 lun 0: <ATA, Samsung SSD 850, EXM0> SCSI3 0/direct fixed naa.50025388400562d4
>> +sd0 at scsibus1 targ 0 lun 0: <ATA, Samsung SSD 850, EXM0> SCSI3 0/direct fixed naa.50025388400563fe
>>   sd0: 976762MB, 512 bytes/sector, 2000409264 sectors, thin
>> -sd1 at scsibus1 targ 3 lun 0: <ATA, Samsung SSD 850, EXM0> SCSI3 0/direct fixed naa.5002538c70007b02
>> -sd1: 1953514MB, 512 bytes/sector, 4000797360 sectors, thin
>> +cd0 at scsibus1 targ 1 lun 0: <PLDS, DVD+-RW DS-8A8SH, KD51> ATAPI 5/cdrom removable
>>   ichiic0 at pci0 dev 31 function 3 "Intel 6 Series SMBus" rev 0x04: apic 0 int 19
>>   iic0 at ichiic0

> My suspicion goes to SSDs. one of them have somehow become bad.

I'm not able to say "no" to that.  Been kinda leaning that direction,
myself.
These have been troublesome little beasts.  Got several of the Samsung
850 series in this project, and never had so many problems with storage
since I tried some off-brand (JTI?) disks around 20 years ago.  Yes, I
know, lots of people think these are the best around (Samsung, not JTI).
 *shrug*

However, I did do a dd read over the first few GB (entire 'a' partition,
partition table, mbr, etc.) of the disk to see if there were any read
errors -- none.  Whatever that's worth.

If all else fails, I'll be moving the function to spare hw and totally
rebuild this machine and see if it fixes it.

Nick.