Kernel panics after some hours of use (likely related to modeset)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Kernel panics after some hours of use (likely related to modeset)

azarus
To: [hidden email]
Subject: Kernel panics after some hours of use (likely related to modeset)
From: [hidden email]
Cc: [hidden email]
Reply-To: [hidden email]

>Synopsis: The kernel panics reproducibly after a couple of hours of use (2-4 hours)
>Category: system amd64 kernel
>Environment:
        System      : OpenBSD 6.2
        Details     : OpenBSD 6.2-current (GENERIC.MP) #333: Sun Jan  7 09:13:00 MST 2018
                         [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.MP

        Architecture: OpenBSD.amd64
        Machine     : amd64
>Description:
In snapshots #320-#333 (every second snapshot or so tested) the kernel
hangs reproducibly after some hours of use. During use I have a pdf
viewer (mupdf), a browser (Firefox), tmux, an editor (nvim), a music
player (mpd) and some shells open (zsh).

This issue happens often when I leave the computer for some minutes, so
it might be something related to the screen turning off (modeset).

This might not be relevant, but I tried both with softdep enabled and
disabled, to the same result.

The machine is a ThinkPad X230, with Coreboot. (But I doubt it's
coreboot causing the issue, as the computer's not going to sleep)

I cannot provide a dmesg of the crashed system, as "boot dump" fails.

For the complete kernel error message, trace output, show registers
ouput and ps output, please regard attached pictures.

>How-To-Repeat:
    1. Use machine for a couple of hours
    2. Leave machine for some time (5-15 minutes)
    3. Kernel panics with "uvm_fault(0xfffffff81b4b158, 0x0, 0, 1) -> e"
>Fix:
unknown

dmesg:
OpenBSD 6.2-current (GENERIC.MP) #333: Sun Jan  7 09:13:00 MST 2018
    [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 8494600192 (8101MB)
avail mem = 8230227968 (7848MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xbff28020 (10 entries)
bios0: vendor coreboot version "CBET4000 4.6-196-g0fb6568" date 05/22/2017
bios0: LENOVO 2325YBN
acpi0 at bios0: rev 2
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP SSDT MCFG TCPA APIC DMAR HPET
acpi0: wakeup devices HDEF(S4) EHC1(S4) EHC2(S4) XHC_(S4) SLPB(S3) LID_(S3)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimcfg0 at acpi0 addr 0xf8000000, bus 0-63
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz, 2594.53 MHz
cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,SENSOR,ARAT
cpu0: 256KB 64b/line 8-way L2 cache
acpitimer0: recalibrated TSC frequency 2594107462 Hz
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.1.2, IBE
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz, 2594.12 MHz
cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,SENSOR,ARAT
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 1, core 0, package 0
cpu2 at mainbus0: apid 2 (application processor)
cpu2: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz, 2594.12 MHz
cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,SENSOR,ARAT
cpu2: 256KB 64b/line 8-way L2 cache
cpu2: smt 0, core 1, package 0
cpu3 at mainbus0: apid 3 (application processor)
cpu3: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz, 2594.12 MHz
cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,SENSOR,ARAT
cpu3: 256KB 64b/line 8-way L2 cache
cpu3: smt 1, core 1, package 0
ioapic0 at mainbus0: apid 2 pa 0xfec00000, version 20, 24 pins
acpihpet0 at acpi0: 14318179 Hz
acpihpet0: recalibrated TSC frequency 2594105458 Hz
acpiprt0 at acpi0: bus 1 (RP01)
acpiprt1 at acpi0: bus 2 (RP02)
acpiprt2 at acpi0: bus 3 (RP03)
acpiprt3 at acpi0: bus -1 (RP04)
acpiprt4 at acpi0: bus -1 (RP05)
acpiprt5 at acpi0: bus -1 (RP06)
acpiprt6 at acpi0: bus -1 (RP07)
acpiprt7 at acpi0: bus -1 (RP08)
acpiprt8 at acpi0: bus 0 (PCI0)
acpiec0 at acpi0
acpicpu0 at acpi0: C3(200@90 mwait.1@0x30), C2(500@63 mwait.1@0x10), C1(1000@1 mwait.1), PSS
acpicpu1 at acpi0: C3(200@90 mwait.1@0x30), C2(500@63 mwait.1@0x10), C1(1000@1 mwait.1), PSS
acpicpu2 at acpi0: C3(200@90 mwait.1@0x30), C2(500@63 mwait.1@0x10), C1(1000@1 mwait.1), PSS
acpicpu3 at acpi0: C3(200@90 mwait.1@0x30), C2(500@63 mwait.1@0x10), C1(1000@1 mwait.1), PSS
acpitz0 at acpi0: critical temperature is 127 degC
acpitz1 at acpi0: critical temperature is 99 degC
acpithinkpad0 at acpi0
acpiac0 at acpi0: AC unit online
acpibat0 at acpi0: BAT0 model "45N1710" serial  1024 type LION oem "Panasonic"
acpibat1 at acpi0: BAT1 not present
acpibtn0 at acpi0: SLPB
acpibtn1 at acpi0: LID_
"BOOT0000" at acpi0 not configured
acpivideo0 at acpi0: GFX0
acpivout0 at acpivideo0: LCD0
cpu0: Enhanced SpeedStep 2594 MHz: speeds: 2601, 2600, 2400, 2200, 2000, 1800, 1600, 1400, 1200 MHz
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel Core 3G Host" rev 0x09
inteldrm0 at pci0 dev 2 function 0 "Intel HD Graphics 4000" rev 0x09
drm0 at inteldrm0
inteldrm0: msi
inteldrm0: 1366x768, 32bpp
wsdisplay0 at inteldrm0 mux 1: console (std, vt100 emulation)
wsdisplay0: screen 1-5 added (std, vt100 emulation)
"Intel Core 3G Thermal" rev 0x09 at pci0 dev 4 function 0 not configured
xhci0 at pci0 dev 20 function 0 "Intel 7 Series xHCI" rev 0x04: msi
usb0 at xhci0: USB revision 3.0
uhub0 at usb0 configuration 1 interface 0 "Intel xHCI root hub" rev 3.00/1.00 addr 1
"Intel 7 Series MEI" rev 0x04 at pci0 dev 22 function 0 not configured
em0 at pci0 dev 25 function 0 "Intel 82579LM" rev 0x04: msi, address 3c:97:0e:d0:90:18
ehci0 at pci0 dev 26 function 0 "Intel 7 Series USB" rev 0x04: apic 2 int 21
usb1 at ehci0: USB revision 2.0
uhub1 at usb1 configuration 1 interface 0 "Intel EHCI root hub" rev 2.00/1.00 addr 1
azalia0 at pci0 dev 27 function 0 "Intel 7 Series HD Audio" rev 0x04: msi
azalia0: codecs: Realtek ALC269, Intel/0x2806, using Realtek ALC269
audio0 at azalia0
ppb0 at pci0 dev 28 function 0 "Intel 7 Series PCIE" rev 0xc4
pci1 at ppb0 bus 1
sdhc0 at pci1 dev 0 function 0 "Ricoh 5U823 SD/MMC" rev 0x04: apic 2 int 16
sdhc0: SDHC 3.0, 50 MHz base clock
sdmmc0 at sdhc0: 4-bit, sd high-speed, mmc high-speed, dma
ppb1 at pci0 dev 28 function 1 "Intel 7 Series PCIE" rev 0xc4
pci2 at ppb1 bus 2
athn0 at pci2 dev 0 function 0 "Atheros AR9285" rev 0x01: apic 2 int 17
athn0: AR9285 rev 2 (1T1R), ROM rev 14, address 64:27:37:37:ab:e4
ppb2 at pci0 dev 28 function 2 "Intel 7 Series PCIE" rev 0xc4: msi
pci3 at ppb2 bus 3
ehci1 at pci0 dev 29 function 0 "Intel 7 Series USB" rev 0x04: apic 2 int 19
usb2 at ehci1: USB revision 2.0
uhub2 at usb2 configuration 1 interface 0 "Intel EHCI root hub" rev 2.00/1.00 addr 1
pcib0 at pci0 dev 31 function 0 "Intel QM77 LPC" rev 0x04
ahci0 at pci0 dev 31 function 2 "Intel 7 Series AHCI" rev 0x04: msi, AHCI 1.3
ahci0: port 0: 6.0Gb/s
scsibus1 at ahci0: 32 targets
sd0 at scsibus1 targ 0 lun 0: <ATA, ADATA SP550, O123> SCSI3 0/direct fixed t10.ATA_ADATA_SP550_2G3020035950_
sd0: 114473MB, 512 bytes/sector, 234441648 sectors, thin
ichiic0 at pci0 dev 31 function 3 "Intel 7 Series SMBus" rev 0x04: apic 2 int 23
iic0 at ichiic0
iic0: addr 0x24 03=04 09=39 0a=09 0b=23 0c=02 0d=08 0e=01 0f=18 words 00=00ff 01=00ff 02=00ff 03=04ff 04=00ff 05=00ff 06=00ff 07=00ff
spdmem0 at iic0 addr 0x50: 4GB DDR3 SDRAM PC3-12800 SO-DIMM
spdmem1 at iic0 addr 0x51: 4GB DDR3 SDRAM PC3-12800 SO-DIMM
isa0 at pcib0
isadma0 at isa0
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pms0 at pckbc0 (aux slot)
wsmouse0 at pms0 mux 0
wsmouse1 at pms0 mux 0
pms0: Synaptics clickpad, firmware 8.1, 0x1e2b1 0x940300
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
vmm0 at mainbus0: VMX/EPT
error: [drm:pid0:cpt_set_fifo_underrun_reporting] *ERROR* uncleared pch fifo underrun on pch transcoder A
error: [drm:pid0:intel_pch_fifo_underrun_irq_handler] *ERROR* PCH transcoder A FIFO underrun
sdmmc0: can't enable card
uhub3 at uhub1 port 1 configuration 1 interface 0 "Intel Rate Matching Hub" rev 2.00/0.00 addr 2
ugen0 at uhub3 port 4 "Broadcom Corp BCM20702A0" rev 2.00/1.12 addr 3
uvideo0 at uhub3 port 6 configuration 1 interface 0 "Ricoh Company Ltd. Integrated Camera" rev 2.00/0.11 addr 4
video0 at uvideo0
uhub4 at uhub2 port 1 configuration 1 interface 0 "Intel Rate Matching Hub" rev 2.00/0.00 addr 2
uhub5 at uhub4 port 8 configuration 1 interface 0 "Lenovo product 0x100a" rev 2.00/0.00 addr 3
uhidev0 at uhub5 port 1 configuration 1 interface 0 "Logitech Gaming Mouse G402" rev 2.00/90.02 addr 4
uhidev0: iclass 3/1
ums0 at uhidev0: 16 buttons, Z and W dir
wsmouse2 at ums0 mux 0
uhidev1 at uhub5 port 1 configuration 1 interface 1 "Logitech Gaming Mouse G402" rev 2.00/90.02 addr 4
uhidev1: iclass 3/0, 17 report ids
ukbd0 at uhidev1 reportid 1: 8 variable keys, 6 key codes
wskbd1 at ukbd0 mux 1
wskbd1: connecting to wsdisplay0
uhid0 at uhidev1 reportid 3: input=4, output=0, feature=0
uhid1 at uhidev1 reportid 4: input=1, output=0, feature=0
uhid2 at uhidev1 reportid 16: input=6, output=6, feature=0
uhid3 at uhidev1 reportid 17: input=19, output=19, feature=0
uhidev2 at uhub5 port 2 configuration 1 interface 0 "Dell Dell USB Keyboard" rev 1.10/3.06 addr 5
uhidev2: iclass 3/1
ukbd1 at uhidev2: 8 variable keys, 6 key codes
wskbd2 at ukbd1 mux 1
wskbd2: connecting to wsdisplay0
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
softraid0: sd1 was not shutdown properly
sd1 at scsibus3 targ 1 lun 0: <OPENBSD, SR CRYPTO, 006> SCSI2 0/direct fixed
sd1: 114470MB, 512 bytes/sector, 234435953 sectors
root on sd1a (0f5c1af7ffad77c3.a) swap on sd1b dump on sd1b
WARNING: / was not properly unmounted

usbdevs:
Controller /dev/usb0:
addr 1: super speed, self powered, config 1, xHCI root hub(0x0000), Intel(0x8086), rev 1.00
 port 1 disabled
 port 2 disabled
 port 3 disabled
 port 4 disabled
 port 5 disabled
 port 6 disabled
 port 7 disabled
 port 8 disabled
Controller /dev/usb1:
addr 1: high speed, self powered, config 1, EHCI root hub(0x0000), Intel(0x8086), rev 1.00
 port 1 addr 2: high speed, self powered, config 1, Rate Matching Hub(0x0024), Intel(0x8087), rev 0.00
  port 1 powered
  port 2 powered
  port 3 powered
  port 4 addr 3: full speed, self powered, config 1, BCM20702A0(0x21e6), Broadcom Corp(0x0a5c), rev 1.12, iSerialNumber 3C77E6EDD09A
  port 5 powered
  port 6 addr 4: high speed, power 200 mA, config 1, Integrated Camera(0x02d2), Ricoh Company Ltd.(0x5986), rev 0.11
 port 2 powered
 port 3 powered
Controller /dev/usb2:
addr 1: high speed, self powered, config 1, EHCI root hub(0x0000), Intel(0x8086), rev 1.00
 port 1 addr 2: high speed, self powered, config 1, Rate Matching Hub(0x0024), Intel(0x8087), rev 0.00
  port 1 powered
  port 2 powered
  port 3 powered
  port 4 powered
  port 5 powered
  port 6 powered
  port 7 powered
  port 8 addr 3: high speed, self powered, config 1, product 0x100a(0x100a), Lenovo(0x17ef), rev 0.00
   port 1 addr 4: full speed, power 300 mA, config 1, Gaming Mouse G402(0xc07e), Logitech(0x046d), rev 90.02, iSerialNumber 6D901F985253
   port 2 addr 5: low speed, power 70 mA, config 1, Dell USB Keyboard(0x2003), Dell(0x413c), rev 3.06
   port 3 powered
   port 4 powered
   port 5 powered
   port 6 powered
 port 2 powered
 port 3 powered

panicmessage.jpg (99K) Download Attachment
trace.jpg (117K) Download Attachment
registers.jpg (193K) Download Attachment
ps1.jpg (354K) Download Attachment
ps2.jpg (303K) Download Attachment
ps3.jpg (219K) Download Attachment
ps4.jpg (276K) Download Attachment
ps5.jpg (241K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Kernel panics after some hours of use (likely related to modeset)

Mike Larkin
On Tue, Jan 09, 2018 at 12:44:04AM +0100, azarus wrote:

> To: [hidden email]
> Subject: Kernel panics after some hours of use (likely related to modeset)
> From: [hidden email]
> Cc: [hidden email]
> Reply-To: [hidden email]
>
> >Synopsis: The kernel panics reproducibly after a couple of hours of use (2-4 hours)
> >Category: system amd64 kernel
> >Environment:
> System      : OpenBSD 6.2
> Details     : OpenBSD 6.2-current (GENERIC.MP) #333: Sun Jan  7 09:13:00 MST 2018
> [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>
> Architecture: OpenBSD.amd64
> Machine     : amd64
> >Description:
> In snapshots #320-#333 (every second snapshot or so tested) the kernel
> hangs reproducibly after some hours of use. During use I have a pdf
> viewer (mupdf), a browser (Firefox), tmux, an editor (nvim), a music
> player (mpd) and some shells open (zsh).
>
> This issue happens often when I leave the computer for some minutes, so
> it might be something related to the screen turning off (modeset).
>
> This might not be relevant, but I tried both with softdep enabled and
> disabled, to the same result.
>
> The machine is a ThinkPad X230, with Coreboot. (But I doubt it's
> coreboot causing the issue, as the computer's not going to sleep)
>
> I cannot provide a dmesg of the crashed system, as "boot dump" fails.
>
> For the complete kernel error message, trace output, show registers
> ouput and ps output, please regard attached pictures.
>
> >How-To-Repeat:
>     1. Use machine for a couple of hours
>     2. Leave machine for some time (5-15 minutes)
>     3. Kernel panics with "uvm_fault(0xfffffff81b4b158, 0x0, 0, 1) -> e"
> >Fix:
> unknown
>

A few of us have been seeing this, so we know about the issue. There is
no fix at this time however. Thanks for reporting it though.

-ml

> dmesg:
> OpenBSD 6.2-current (GENERIC.MP) #333: Sun Jan  7 09:13:00 MST 2018
>     [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 8494600192 (8101MB)
> avail mem = 8230227968 (7848MB)
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xbff28020 (10 entries)
> bios0: vendor coreboot version "CBET4000 4.6-196-g0fb6568" date 05/22/2017
> bios0: LENOVO 2325YBN
> acpi0 at bios0: rev 2
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP SSDT MCFG TCPA APIC DMAR HPET
> acpi0: wakeup devices HDEF(S4) EHC1(S4) EHC2(S4) XHC_(S4) SLPB(S3) LID_(S3)
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimcfg0 at acpi0 addr 0xf8000000, bus 0-63
> acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz, 2594.53 MHz
> cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,SENSOR,ARAT
> cpu0: 256KB 64b/line 8-way L2 cache
> acpitimer0: recalibrated TSC frequency 2594107462 Hz
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 99MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.1.1.2, IBE
> cpu1 at mainbus0: apid 1 (application processor)
> cpu1: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz, 2594.12 MHz
> cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,SENSOR,ARAT
> cpu1: 256KB 64b/line 8-way L2 cache
> cpu1: smt 1, core 0, package 0
> cpu2 at mainbus0: apid 2 (application processor)
> cpu2: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz, 2594.12 MHz
> cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,SENSOR,ARAT
> cpu2: 256KB 64b/line 8-way L2 cache
> cpu2: smt 0, core 1, package 0
> cpu3 at mainbus0: apid 3 (application processor)
> cpu3: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz, 2594.12 MHz
> cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,SENSOR,ARAT
> cpu3: 256KB 64b/line 8-way L2 cache
> cpu3: smt 1, core 1, package 0
> ioapic0 at mainbus0: apid 2 pa 0xfec00000, version 20, 24 pins
> acpihpet0 at acpi0: 14318179 Hz
> acpihpet0: recalibrated TSC frequency 2594105458 Hz
> acpiprt0 at acpi0: bus 1 (RP01)
> acpiprt1 at acpi0: bus 2 (RP02)
> acpiprt2 at acpi0: bus 3 (RP03)
> acpiprt3 at acpi0: bus -1 (RP04)
> acpiprt4 at acpi0: bus -1 (RP05)
> acpiprt5 at acpi0: bus -1 (RP06)
> acpiprt6 at acpi0: bus -1 (RP07)
> acpiprt7 at acpi0: bus -1 (RP08)
> acpiprt8 at acpi0: bus 0 (PCI0)
> acpiec0 at acpi0
> acpicpu0 at acpi0: C3(200@90 mwait.1@0x30), C2(500@63 mwait.1@0x10), C1(1000@1 mwait.1), PSS
> acpicpu1 at acpi0: C3(200@90 mwait.1@0x30), C2(500@63 mwait.1@0x10), C1(1000@1 mwait.1), PSS
> acpicpu2 at acpi0: C3(200@90 mwait.1@0x30), C2(500@63 mwait.1@0x10), C1(1000@1 mwait.1), PSS
> acpicpu3 at acpi0: C3(200@90 mwait.1@0x30), C2(500@63 mwait.1@0x10), C1(1000@1 mwait.1), PSS
> acpitz0 at acpi0: critical temperature is 127 degC
> acpitz1 at acpi0: critical temperature is 99 degC
> acpithinkpad0 at acpi0
> acpiac0 at acpi0: AC unit online
> acpibat0 at acpi0: BAT0 model "45N1710" serial  1024 type LION oem "Panasonic"
> acpibat1 at acpi0: BAT1 not present
> acpibtn0 at acpi0: SLPB
> acpibtn1 at acpi0: LID_
> "BOOT0000" at acpi0 not configured
> acpivideo0 at acpi0: GFX0
> acpivout0 at acpivideo0: LCD0
> cpu0: Enhanced SpeedStep 2594 MHz: speeds: 2601, 2600, 2400, 2200, 2000, 1800, 1600, 1400, 1200 MHz
> pci0 at mainbus0 bus 0
> pchb0 at pci0 dev 0 function 0 "Intel Core 3G Host" rev 0x09
> inteldrm0 at pci0 dev 2 function 0 "Intel HD Graphics 4000" rev 0x09
> drm0 at inteldrm0
> inteldrm0: msi
> inteldrm0: 1366x768, 32bpp
> wsdisplay0 at inteldrm0 mux 1: console (std, vt100 emulation)
> wsdisplay0: screen 1-5 added (std, vt100 emulation)
> "Intel Core 3G Thermal" rev 0x09 at pci0 dev 4 function 0 not configured
> xhci0 at pci0 dev 20 function 0 "Intel 7 Series xHCI" rev 0x04: msi
> usb0 at xhci0: USB revision 3.0
> uhub0 at usb0 configuration 1 interface 0 "Intel xHCI root hub" rev 3.00/1.00 addr 1
> "Intel 7 Series MEI" rev 0x04 at pci0 dev 22 function 0 not configured
> em0 at pci0 dev 25 function 0 "Intel 82579LM" rev 0x04: msi, address 3c:97:0e:d0:90:18
> ehci0 at pci0 dev 26 function 0 "Intel 7 Series USB" rev 0x04: apic 2 int 21
> usb1 at ehci0: USB revision 2.0
> uhub1 at usb1 configuration 1 interface 0 "Intel EHCI root hub" rev 2.00/1.00 addr 1
> azalia0 at pci0 dev 27 function 0 "Intel 7 Series HD Audio" rev 0x04: msi
> azalia0: codecs: Realtek ALC269, Intel/0x2806, using Realtek ALC269
> audio0 at azalia0
> ppb0 at pci0 dev 28 function 0 "Intel 7 Series PCIE" rev 0xc4
> pci1 at ppb0 bus 1
> sdhc0 at pci1 dev 0 function 0 "Ricoh 5U823 SD/MMC" rev 0x04: apic 2 int 16
> sdhc0: SDHC 3.0, 50 MHz base clock
> sdmmc0 at sdhc0: 4-bit, sd high-speed, mmc high-speed, dma
> ppb1 at pci0 dev 28 function 1 "Intel 7 Series PCIE" rev 0xc4
> pci2 at ppb1 bus 2
> athn0 at pci2 dev 0 function 0 "Atheros AR9285" rev 0x01: apic 2 int 17
> athn0: AR9285 rev 2 (1T1R), ROM rev 14, address 64:27:37:37:ab:e4
> ppb2 at pci0 dev 28 function 2 "Intel 7 Series PCIE" rev 0xc4: msi
> pci3 at ppb2 bus 3
> ehci1 at pci0 dev 29 function 0 "Intel 7 Series USB" rev 0x04: apic 2 int 19
> usb2 at ehci1: USB revision 2.0
> uhub2 at usb2 configuration 1 interface 0 "Intel EHCI root hub" rev 2.00/1.00 addr 1
> pcib0 at pci0 dev 31 function 0 "Intel QM77 LPC" rev 0x04
> ahci0 at pci0 dev 31 function 2 "Intel 7 Series AHCI" rev 0x04: msi, AHCI 1.3
> ahci0: port 0: 6.0Gb/s
> scsibus1 at ahci0: 32 targets
> sd0 at scsibus1 targ 0 lun 0: <ATA, ADATA SP550, O123> SCSI3 0/direct fixed t10.ATA_ADATA_SP550_2G3020035950_
> sd0: 114473MB, 512 bytes/sector, 234441648 sectors, thin
> ichiic0 at pci0 dev 31 function 3 "Intel 7 Series SMBus" rev 0x04: apic 2 int 23
> iic0 at ichiic0
> iic0: addr 0x24 03=04 09=39 0a=09 0b=23 0c=02 0d=08 0e=01 0f=18 words 00=00ff 01=00ff 02=00ff 03=04ff 04=00ff 05=00ff 06=00ff 07=00ff
> spdmem0 at iic0 addr 0x50: 4GB DDR3 SDRAM PC3-12800 SO-DIMM
> spdmem1 at iic0 addr 0x51: 4GB DDR3 SDRAM PC3-12800 SO-DIMM
> isa0 at pcib0
> isadma0 at isa0
> pckbc0 at isa0 port 0x60/5 irq 1 irq 12
> pckbd0 at pckbc0 (kbd slot)
> wskbd0 at pckbd0: console keyboard, using wsdisplay0
> pms0 at pckbc0 (aux slot)
> wsmouse0 at pms0 mux 0
> wsmouse1 at pms0 mux 0
> pms0: Synaptics clickpad, firmware 8.1, 0x1e2b1 0x940300
> pcppi0 at isa0 port 0x61
> spkr0 at pcppi0
> vmm0 at mainbus0: VMX/EPT
> error: [drm:pid0:cpt_set_fifo_underrun_reporting] *ERROR* uncleared pch fifo underrun on pch transcoder A
> error: [drm:pid0:intel_pch_fifo_underrun_irq_handler] *ERROR* PCH transcoder A FIFO underrun
> sdmmc0: can't enable card
> uhub3 at uhub1 port 1 configuration 1 interface 0 "Intel Rate Matching Hub" rev 2.00/0.00 addr 2
> ugen0 at uhub3 port 4 "Broadcom Corp BCM20702A0" rev 2.00/1.12 addr 3
> uvideo0 at uhub3 port 6 configuration 1 interface 0 "Ricoh Company Ltd. Integrated Camera" rev 2.00/0.11 addr 4
> video0 at uvideo0
> uhub4 at uhub2 port 1 configuration 1 interface 0 "Intel Rate Matching Hub" rev 2.00/0.00 addr 2
> uhub5 at uhub4 port 8 configuration 1 interface 0 "Lenovo product 0x100a" rev 2.00/0.00 addr 3
> uhidev0 at uhub5 port 1 configuration 1 interface 0 "Logitech Gaming Mouse G402" rev 2.00/90.02 addr 4
> uhidev0: iclass 3/1
> ums0 at uhidev0: 16 buttons, Z and W dir
> wsmouse2 at ums0 mux 0
> uhidev1 at uhub5 port 1 configuration 1 interface 1 "Logitech Gaming Mouse G402" rev 2.00/90.02 addr 4
> uhidev1: iclass 3/0, 17 report ids
> ukbd0 at uhidev1 reportid 1: 8 variable keys, 6 key codes
> wskbd1 at ukbd0 mux 1
> wskbd1: connecting to wsdisplay0
> uhid0 at uhidev1 reportid 3: input=4, output=0, feature=0
> uhid1 at uhidev1 reportid 4: input=1, output=0, feature=0
> uhid2 at uhidev1 reportid 16: input=6, output=6, feature=0
> uhid3 at uhidev1 reportid 17: input=19, output=19, feature=0
> uhidev2 at uhub5 port 2 configuration 1 interface 0 "Dell Dell USB Keyboard" rev 1.10/3.06 addr 5
> uhidev2: iclass 3/1
> ukbd1 at uhidev2: 8 variable keys, 6 key codes
> wskbd2 at ukbd1 mux 1
> wskbd2: connecting to wsdisplay0
> vscsi0 at root
> scsibus2 at vscsi0: 256 targets
> softraid0 at root
> scsibus3 at softraid0: 256 targets
> softraid0: sd1 was not shutdown properly
> sd1 at scsibus3 targ 1 lun 0: <OPENBSD, SR CRYPTO, 006> SCSI2 0/direct fixed
> sd1: 114470MB, 512 bytes/sector, 234435953 sectors
> root on sd1a (0f5c1af7ffad77c3.a) swap on sd1b dump on sd1b
> WARNING: / was not properly unmounted
>
> usbdevs:
> Controller /dev/usb0:
> addr 1: super speed, self powered, config 1, xHCI root hub(0x0000), Intel(0x8086), rev 1.00
>  port 1 disabled
>  port 2 disabled
>  port 3 disabled
>  port 4 disabled
>  port 5 disabled
>  port 6 disabled
>  port 7 disabled
>  port 8 disabled
> Controller /dev/usb1:
> addr 1: high speed, self powered, config 1, EHCI root hub(0x0000), Intel(0x8086), rev 1.00
>  port 1 addr 2: high speed, self powered, config 1, Rate Matching Hub(0x0024), Intel(0x8087), rev 0.00
>   port 1 powered
>   port 2 powered
>   port 3 powered
>   port 4 addr 3: full speed, self powered, config 1, BCM20702A0(0x21e6), Broadcom Corp(0x0a5c), rev 1.12, iSerialNumber 3C77E6EDD09A
>   port 5 powered
>   port 6 addr 4: high speed, power 200 mA, config 1, Integrated Camera(0x02d2), Ricoh Company Ltd.(0x5986), rev 0.11
>  port 2 powered
>  port 3 powered
> Controller /dev/usb2:
> addr 1: high speed, self powered, config 1, EHCI root hub(0x0000), Intel(0x8086), rev 1.00
>  port 1 addr 2: high speed, self powered, config 1, Rate Matching Hub(0x0024), Intel(0x8087), rev 0.00
>   port 1 powered
>   port 2 powered
>   port 3 powered
>   port 4 powered
>   port 5 powered
>   port 6 powered
>   port 7 powered
>   port 8 addr 3: high speed, self powered, config 1, product 0x100a(0x100a), Lenovo(0x17ef), rev 0.00
>    port 1 addr 4: full speed, power 300 mA, config 1, Gaming Mouse G402(0xc07e), Logitech(0x046d), rev 90.02, iSerialNumber 6D901F985253
>    port 2 addr 5: low speed, power 70 mA, config 1, Dell USB Keyboard(0x2003), Dell(0x413c), rev 3.06
>    port 3 powered
>    port 4 powered
>    port 5 powered
>    port 6 powered
>  port 2 powered
>  port 3 powered









Reply | Threaded
Open this post in threaded view
|

Re: Kernel panics after some hours of use (likely related to modeset)

Jonathan Gray-11
On Mon, Jan 08, 2018 at 05:20:39PM -0800, Mike Larkin wrote:

> On Tue, Jan 09, 2018 at 12:44:04AM +0100, azarus wrote:
> > To: [hidden email]
> > Subject: Kernel panics after some hours of use (likely related to modeset)
> > From: [hidden email]
> > Cc: [hidden email]
> > Reply-To: [hidden email]
> >
> > >Synopsis: The kernel panics reproducibly after a couple of hours of use (2-4 hours)
> > >Category: system amd64 kernel
> > >Environment:
> > System      : OpenBSD 6.2
> > Details     : OpenBSD 6.2-current (GENERIC.MP) #333: Sun Jan  7 09:13:00 MST 2018
> > [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> >
> > Architecture: OpenBSD.amd64
> > Machine     : amd64
> > >Description:
> > In snapshots #320-#333 (every second snapshot or so tested) the kernel
> > hangs reproducibly after some hours of use. During use I have a pdf
> > viewer (mupdf), a browser (Firefox), tmux, an editor (nvim), a music
> > player (mpd) and some shells open (zsh).
> >
> > This issue happens often when I leave the computer for some minutes, so
> > it might be something related to the screen turning off (modeset).
> >
> > This might not be relevant, but I tried both with softdep enabled and
> > disabled, to the same result.
> >
> > The machine is a ThinkPad X230, with Coreboot. (But I doubt it's
> > coreboot causing the issue, as the computer's not going to sleep)
> >
> > I cannot provide a dmesg of the crashed system, as "boot dump" fails.
> >
> > For the complete kernel error message, trace output, show registers
> > ouput and ps output, please regard attached pictures.
> >
> > >How-To-Repeat:
> >     1. Use machine for a couple of hours
> >     2. Leave machine for some time (5-15 minutes)
> >     3. Kernel panics with "uvm_fault(0xfffffff81b4b158, 0x0, 0, 1) -> e"
> > >Fix:
> > unknown
> >
>
> A few of us have been seeing this, so we know about the issue. There is
> no fix at this time however. Thanks for reporting it though.

This is the workaround I have in my tree to avoid the NULL deref.

Index: sys/dev/pci/drm/linux_ww_mutex.h
===================================================================
RCS file: /cvs/src/sys/dev/pci/drm/linux_ww_mutex.h,v
retrieving revision 1.1
diff -u -p -r1.1 linux_ww_mutex.h
--- sys/dev/pci/drm/linux_ww_mutex.h 1 Jul 2017 16:14:10 -0000 1.1
+++ sys/dev/pci/drm/linux_ww_mutex.h 13 Aug 2017 06:40:35 -0000
@@ -163,7 +163,8 @@ __ww_mutex_lock(struct ww_mutex *lock, s
                          *   the `younger` process gives up all it's
                          *   resources.
  */
- if (slow || ctx == NULL || ctx->stamp < lock->ctx->stamp) {
+ if (slow || ctx == NULL ||
+    (lock->ctx != NULL && ctx->stamp < lock->ctx->stamp)) {
  int s = msleep(lock, &lock->lock,
        intr ? PCATCH : 0,
        ctx ? ctx->ww_class->name : "ww_mutex_lock", 0);

Reply | Threaded
Open this post in threaded view
|

Re: Kernel panics after some hours of use (likely related to modeset)

Mark Kettenis
> Date: Tue, 9 Jan 2018 12:32:49 +1100
> From: Jonathan Gray <[hidden email]>
>
> On Mon, Jan 08, 2018 at 05:20:39PM -0800, Mike Larkin wrote:
> > On Tue, Jan 09, 2018 at 12:44:04AM +0100, azarus wrote:
> > > To: [hidden email]
> > > Subject: Kernel panics after some hours of use (likely related to modeset)
> > > From: [hidden email]
> > > Cc: [hidden email]
> > > Reply-To: [hidden email]
> > >
> > > >Synopsis: The kernel panics reproducibly after a couple of hours of use (2-4 hours)
> > > >Category: system amd64 kernel
> > > >Environment:
> > > System      : OpenBSD 6.2
> > > Details     : OpenBSD 6.2-current (GENERIC.MP) #333: Sun Jan  7 09:13:00 MST 2018
> > > [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > >
> > > Architecture: OpenBSD.amd64
> > > Machine     : amd64
> > > >Description:
> > > In snapshots #320-#333 (every second snapshot or so tested) the kernel
> > > hangs reproducibly after some hours of use. During use I have a pdf
> > > viewer (mupdf), a browser (Firefox), tmux, an editor (nvim), a music
> > > player (mpd) and some shells open (zsh).
> > >
> > > This issue happens often when I leave the computer for some minutes, so
> > > it might be something related to the screen turning off (modeset).
> > >
> > > This might not be relevant, but I tried both with softdep enabled and
> > > disabled, to the same result.
> > >
> > > The machine is a ThinkPad X230, with Coreboot. (But I doubt it's
> > > coreboot causing the issue, as the computer's not going to sleep)
> > >
> > > I cannot provide a dmesg of the crashed system, as "boot dump" fails.
> > >
> > > For the complete kernel error message, trace output, show registers
> > > ouput and ps output, please regard attached pictures.
> > >
> > > >How-To-Repeat:
> > >     1. Use machine for a couple of hours
> > >     2. Leave machine for some time (5-15 minutes)
> > >     3. Kernel panics with "uvm_fault(0xfffffff81b4b158, 0x0, 0, 1) -> e"
> > > >Fix:
> > > unknown
> > >
> >
> > A few of us have been seeing this, so we know about the issue. There is
> > no fix at this time however. Thanks for reporting it though.
>
> This is the workaround I have in my tree to avoid the NULL deref.

Sorry for ignoring this until now.  I never found the time to actually
look into this.  Now that I have re-familliarized myself with the
code, I think the fix is right.  If somebody already locked the lock
without a context, we can't establish whether we are the 'older'
process or not.  So returning -EDEADLK would indeed be correct.  And
it looks as if the kms locking code is prepared to handle that case.

One request;  could you changes "lock->ctx != NULL" with simply "lock->ctx"?

ok kettenis@

> Index: sys/dev/pci/drm/linux_ww_mutex.h
> ===================================================================
> RCS file: /cvs/src/sys/dev/pci/drm/linux_ww_mutex.h,v
> retrieving revision 1.1
> diff -u -p -r1.1 linux_ww_mutex.h
> --- sys/dev/pci/drm/linux_ww_mutex.h 1 Jul 2017 16:14:10 -0000 1.1
> +++ sys/dev/pci/drm/linux_ww_mutex.h 13 Aug 2017 06:40:35 -0000
> @@ -163,7 +163,8 @@ __ww_mutex_lock(struct ww_mutex *lock, s
>                           *   the `younger` process gives up all it's
>                           *   resources.
>   */
> - if (slow || ctx == NULL || ctx->stamp < lock->ctx->stamp) {
> + if (slow || ctx == NULL ||
> +    (lock->ctx != NULL && ctx->stamp < lock->ctx->stamp)) {
>   int s = msleep(lock, &lock->lock,
>         intr ? PCATCH : 0,
>         ctx ? ctx->ww_class->name : "ww_mutex_lock", 0);
>
>