mp deadlock on 6.2 running on kvm

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

mp deadlock on 6.2 running on kvm

Landry Breuil-5
Hi,

i've been having kvm VMs running 6.2 hardlocking/deadlocking since a
while, all those running on proxmox 5.1 using linux 4.13.8 & qemu-kvm
2.9.1. There were hardlocks upon reboot which were 'solved' by disabling
x2apic emulation in kvm (args: -cpu=kvm64,-x2apic) or giving the host
cpu flags to the vm (args: -cpu host) but there still remains deadlocks
during normal operation.

I'm now running a kernel with MP_LOCKDEBUG, so i'm collecting traces in
the vain hope that it might help someone interested in locking issues.
Here's the latest one:

login: __mp_lock(0xffffffff81b23ed0): lock spun outStopped at  db_enter+0x5:   popq    %rbp
ddb{2}> trace
db_enter() at db_enter+0x5
___mp_lock(40) at ___mp_lock+0x66
syscall() at syscall+0x1ff
--- syscall (number 4) ---
end of kernel
end trace frame: 0xb3b20d05fe0, count: -3
0xb3ba081bbfa:
ddb{2}> mach ddbcpu 1
Stopped at      x86_ipi_db+0x5: popq    %rbp
ddb{1}> tr
x86_ipi_db(ffff800022136530) at x86_ipi_db+0x5
x86_ipi_handler() at x86_ipi_handler+0x6b
Xresume_lapic_ipi() at Xresume_lapic_ipi+0x1f
--- interrupt ---
Bad frame pointer: 0xfffffe4fe85250cc
end trace frame: 0xfffffe4fe85250cc, count: -3
0x41cb8c419c524153:
ddb{1}> ps /o
    TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
 216909  39313    726           0  0x4000000    2  rrdcached
*366090  74010      0     0x14000      0x200    1  systq
ddb{1}> tr /p 0t366090
lapic_delay(ffffffff81a97c88) at lapic_delay+0x5c
rtcget(ffffffff81a97c88) at rtcget+0x1a
resettodr() at resettodr+0x3a
perform_resettodr(ffffffff8150a7d9) at perform_resettodr+0x9
taskq_thread(0) at taskq_thread+0x67
end trace frame: 0x0, count: -5
ddb{1}> tr /p 0t216909
uvm_fault(0xffffffff81b526f8, 0x0, 0, 1) -> e
kernel: double fault trap, code=0
Faulted in DDB; continuing...

if i continue at that point or try bo sync, ddb is kaput and i can only stop
the vm. Of course an option would be to run bsd.sp, but that's a bit .. sad.

dmesg attached.

Landry

dmesg.splugen.cpu=host (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: mp deadlock on 6.2 running on kvm

Martin Pieuchot
On 07/12/17(Thu) 08:34, Landry Breuil wrote:

> Hi,
>
> i've been having kvm VMs running 6.2 hardlocking/deadlocking since a
> while, all those running on proxmox 5.1 using linux 4.13.8 & qemu-kvm
> 2.9.1. There were hardlocks upon reboot which were 'solved' by disabling
> x2apic emulation in kvm (args: -cpu=kvm64,-x2apic) or giving the host
> cpu flags to the vm (args: -cpu host) but there still remains deadlocks
> during normal operation.
>
> I'm now running a kernel with MP_LOCKDEBUG, so i'm collecting traces in
> the vain hope that it might help someone interested in locking issues.
> Here's the latest one:

Let me add that when you had x2apic enabled the kernel 'froze' inside
x2apic_readreg, trace below:

  ddb{0}> tr
  x2apic_readreg(10) at x2apic_readreg+0xf
  lapic_delay(ffff800022136900) at lapic_delay+0x5c
  rtcput(ffff800022136960) at rtcput+0x65
  resettodr() at resettodr+0x1d6
  perform_resettodr(ffffffff81769b29) at perform_resettodr+0x9
  taskq_thread(0) at taskq_thread+0x67
  end trace frame: 0x0, count: -6

What you're seeing with a MP_LOCKDEBUG kernel is just a symptom.  A CPU
enters DDB because another one is 'frozen' while holding the
KERNEL_LOCK().  What's interesting is that in both case the frozen CPU
is trying to execute apic related code:
  - x2apic_readreg
  - lapic_delay

I believe this issue should be reported to KVM developers as well.

> login: __mp_lock(0xffffffff81b23ed0): lock spun outStopped at  db_enter+0x5:   popq    %rbp
> ddb{2}> trace
> db_enter() at db_enter+0x5
> ___mp_lock(40) at ___mp_lock+0x66
> syscall() at syscall+0x1ff
> --- syscall (number 4) ---
> end of kernel
> end trace frame: 0xb3b20d05fe0, count: -3
> 0xb3ba081bbfa:
> ddb{2}> mach ddbcpu 1
> Stopped at      x86_ipi_db+0x5: popq    %rbp
> ddb{1}> tr
> x86_ipi_db(ffff800022136530) at x86_ipi_db+0x5
> x86_ipi_handler() at x86_ipi_handler+0x6b
> Xresume_lapic_ipi() at Xresume_lapic_ipi+0x1f
> --- interrupt ---
> Bad frame pointer: 0xfffffe4fe85250cc
> end trace frame: 0xfffffe4fe85250cc, count: -3
> 0x41cb8c419c524153:
> ddb{1}> ps /o
>     TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
>  216909  39313    726           0  0x4000000    2  rrdcached
> *366090  74010      0     0x14000      0x200    1  systq
> ddb{1}> tr /p 0t366090
> lapic_delay(ffffffff81a97c88) at lapic_delay+0x5c
> rtcget(ffffffff81a97c88) at rtcget+0x1a
> resettodr() at resettodr+0x3a
> perform_resettodr(ffffffff8150a7d9) at perform_resettodr+0x9
> taskq_thread(0) at taskq_thread+0x67
> end trace frame: 0x0, count: -5
> ddb{1}> tr /p 0t216909
> uvm_fault(0xffffffff81b526f8, 0x0, 0, 1) -> e
> kernel: double fault trap, code=0
> Faulted in DDB; continuing...
>
> if i continue at that point or try bo sync, ddb is kaput and i can only stop
> the vm. Of course an option would be to run bsd.sp, but that's a bit .. sad.
>
> dmesg attached.
>
> Landry

> OpenBSD 6.2-stable (GENERIC.MP) #0: Fri Dec  1 11:27:09 CET 2017
>     landry@s64.proxmox2:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 17162944512 (16367MB)
> avail mem = 16635760640 (15865MB)
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xf68d0 (10 entries)
> bios0: vendor SeaBIOS version "rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org" date 04/01/2014
> bios0: QEMU Standard PC (i440FX + PIIX, 1996)
> acpi0 at bios0: rev 0
> acpi0: sleep states S3 S4 S5
> acpi0: tables DSDT FACP APIC HPET
> acpi0: wakeup devices
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz, 286.92 MHz
> cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,HTT,SSE3,PCLMUL,SSSE3,FMA3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,FSGSBASE,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,RDSEED,ADX,SMAP,ARAT
> cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache
> cpu0: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
> cpu0: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> cpu0: apic clock running at 999MHz
> cpu1 at mainbus0: apid 1 (application processor)
> cpu1: Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz, 472.57 MHz
> cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,HTT,SSE3,PCLMUL,SSSE3,FMA3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,FSGSBASE,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,RDSEED,ADX,SMAP,ARAT
> cpu1: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache
> cpu1: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
> cpu1: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
> cpu1: smt 0, core 1, package 0
> cpu2 at mainbus0: apid 2 (application processor)
> cpu2: Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz, 465.40 MHz
> cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,HTT,SSE3,PCLMUL,SSSE3,FMA3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,FSGSBASE,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,RDSEED,ADX,SMAP,ARAT
> cpu2: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache
> cpu2: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
> cpu2: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
> cpu2: smt 0, core 2, package 0
> cpu3 at mainbus0: apid 3 (application processor)
> cpu3: Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz, 467.31 MHz
> cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,HTT,SSE3,PCLMUL,SSSE3,FMA3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,FSGSBASE,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,RDSEED,ADX,SMAP,ARAT
> cpu3: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache
> cpu3: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
> cpu3: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
> cpu3: smt 0, core 3, package 0
> ioapic0 at mainbus0: apid 0 pa 0xfec00000, version 11, 24 pins
> acpihpet0 at acpi0: 100000000 Hz
> acpiprt0 at acpi0: bus 0 (PCI0)
> acpicpu0 at acpi0: C1(@1 halt!)
> acpicpu1 at acpi0: C1(@1 halt!)
> acpicpu2 at acpi0: C1(@1 halt!)
> acpicpu3 at acpi0: C1(@1 halt!)
> "ACPI0006" at acpi0 not configured
> "PNP0F13" at acpi0 not configured
> "PNP0700" at acpi0 not configured
> "PNP0A06" at acpi0 not configured
> "PNP0A06" at acpi0 not configured
> "PNP0A06" at acpi0 not configured
> "QEMU0002" at acpi0 not configured
> "ACPI0010" at acpi0 not configured
> pvbus0 at mainbus0: KVM
> pci0 at mainbus0 bus 0
> pchb0 at pci0 dev 0 function 0 "Intel 82441FX" rev 0x02
> pcib0 at pci0 dev 1 function 0 "Intel 82371SB ISA" rev 0x00
> pciide0 at pci0 dev 1 function 1 "Intel 82371SB IDE" rev 0x00: DMA, channel 0 wired to compatibility, channel 1 wired to compatibility
> pciide0: channel 0 disabled (no drives)
> pciide0: channel 1 disabled (no drives)
> uhci0 at pci0 dev 1 function 2 "Intel 82371SB USB" rev 0x01: apic 0 int 11
> piixpm0 at pci0 dev 1 function 3 "Intel 82371AB Power" rev 0x03: apic 0 int 9
> iic0 at piixpm0
> vga1 at pci0 dev 2 function 0 "Bochs VGA" rev 0x02
> wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
> wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
> virtio0 at pci0 dev 3 function 0 "Qumranet Virtio Memory" rev 0x00
> viomb0 at virtio0
> virtio0: apic 0 int 11
> virtio1 at pci0 dev 10 function 0 "Qumranet Virtio Storage" rev 0x00
> vioblk0 at virtio1
> scsibus1 at vioblk0: 2 targets
> sd0 at scsibus1 targ 0 lun 0: <VirtIO, Block Device, > SCSI3 0/direct fixed
> sd0: 153600MB, 512 bytes/sector, 314572800 sectors
> virtio1: msix shared
> virtio2 at pci0 dev 18 function 0 "Qumranet Virtio Network" rev 0x00
> vio0 at virtio2: address 46:ae:54:5a:0d:80
> virtio2: msix shared
> ppb0 at pci0 dev 30 function 0 "Red Hat Qemu PCI-PCI" rev 0x00
> pci1 at ppb0 bus 1
> ppb1 at pci0 dev 31 function 0 "Red Hat Qemu PCI-PCI" rev 0x00
> pci2 at ppb1 bus 2
> isa0 at pcib0
> isadma0 at isa0
> fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
> com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
> com0: console
> pckbc0 at isa0 port 0x60/5 irq 1 irq 12
> pckbd0 at pckbc0 (kbd slot)
> wskbd0 at pckbd0: console keyboard, using wsdisplay0
> pms0 at pckbc0 (aux slot)
> wsmouse0 at pms0 mux 0
> pcppi0 at isa0 port 0x61
> spkr0 at pcppi0
> usb0 at uhci0: USB revision 1.0
> uhub0 at usb0 configuration 1 interface 0 "Intel UHCI root hub" rev 1.00/1.00 addr 1
> uhidev0 at uhub0 port 1 configuration 1 interface 0 "QEMU QEMU USB Tablet" rev 2.00/0.00 addr 2
> uhidev0: iclass 3/0
> ums0 at uhidev0: 3 buttons, Z dir
> wsmouse1 at ums0 mux 0
> vscsi0 at root
> scsibus2 at vscsi0: 256 targets
> softraid0 at root
> scsibus3 at softraid0: 256 targets
> root on sd0a (0a797a44a76827db.a) swap on sd0b dump on sd0b
> fd0 at fdc0 drive 1: density unknown

Reply | Threaded
Open this post in threaded view
|

Re: mp deadlock on 6.2 running on kvm

Landry Breuil-5
On Thu, Dec 07, 2017 at 11:52:46AM +0100, Martin Pieuchot wrote:

> On 07/12/17(Thu) 08:34, Landry Breuil wrote:
> > Hi,
> >
> > i've been having kvm VMs running 6.2 hardlocking/deadlocking since a
> > while, all those running on proxmox 5.1 using linux 4.13.8 & qemu-kvm
> > 2.9.1. There were hardlocks upon reboot which were 'solved' by disabling
> > x2apic emulation in kvm (args: -cpu=kvm64,-x2apic) or giving the host
> > cpu flags to the vm (args: -cpu host) but there still remains deadlocks
> > during normal operation.
> >
> > I'm now running a kernel with MP_LOCKDEBUG, so i'm collecting traces in
> > the vain hope that it might help someone interested in locking issues.
> > Here's the latest one:
>
> Let me add that when you had x2apic enabled the kernel 'froze' inside
> x2apic_readreg, trace below:
>
>   ddb{0}> tr
>   x2apic_readreg(10) at x2apic_readreg+0xf
>   lapic_delay(ffff800022136900) at lapic_delay+0x5c
>   rtcput(ffff800022136960) at rtcput+0x65
>   resettodr() at resettodr+0x1d6
>   perform_resettodr(ffffffff81769b29) at perform_resettodr+0x9
>   taskq_thread(0) at taskq_thread+0x67
>   end trace frame: 0x0, count: -6
>
> What you're seeing with a MP_LOCKDEBUG kernel is just a symptom.  A CPU
> enters DDB because another one is 'frozen' while holding the
> KERNEL_LOCK().  What's interesting is that in both case the frozen CPU
> is trying to execute apic related code:
>   - x2apic_readreg
>   - lapic_delay
>
> I believe this issue should be reported to KVM developers as well.

*very* interestingly, i had a new lock, running bsd.sp.. So i think that
rules out openbsd mp.

ddb> tr
i82489_readreg(0) at i82489_readreg+0xd
lapic_delay(ffffffff81a84090) at lapic_delay+0x5c
rtcget(ffffffff81a84090) at rtcget+0x1a
resettodr() at resettodr+0x3a
perform_resettodr(ffffffff81659e99) at perform_resettodr+0x9
taskq_thread(0) at taskq_thread+0x57
end trace frame: 0x0, count: -6

Landry

Reply | Threaded
Open this post in threaded view
|

Re: mp deadlock on 6.2 running on kvm

Juan Francisco Cantero Hurtado
On Thu, Dec 07, 2017 at 02:27:29PM +0100, Landry Breuil wrote:

> On Thu, Dec 07, 2017 at 11:52:46AM +0100, Martin Pieuchot wrote:
> > On 07/12/17(Thu) 08:34, Landry Breuil wrote:
> > > Hi,
> > >
> > > i've been having kvm VMs running 6.2 hardlocking/deadlocking since a
> > > while, all those running on proxmox 5.1 using linux 4.13.8 & qemu-kvm
> > > 2.9.1. There were hardlocks upon reboot which were 'solved' by disabling
> > > x2apic emulation in kvm (args: -cpu=kvm64,-x2apic) or giving the host
> > > cpu flags to the vm (args: -cpu host) but there still remains deadlocks
> > > during normal operation.
> > >
> > > I'm now running a kernel with MP_LOCKDEBUG, so i'm collecting traces in
> > > the vain hope that it might help someone interested in locking issues.
> > > Here's the latest one:
> >
> > Let me add that when you had x2apic enabled the kernel 'froze' inside
> > x2apic_readreg, trace below:
> >
> >   ddb{0}> tr
> >   x2apic_readreg(10) at x2apic_readreg+0xf
> >   lapic_delay(ffff800022136900) at lapic_delay+0x5c
> >   rtcput(ffff800022136960) at rtcput+0x65
> >   resettodr() at resettodr+0x1d6
> >   perform_resettodr(ffffffff81769b29) at perform_resettodr+0x9
> >   taskq_thread(0) at taskq_thread+0x67
> >   end trace frame: 0x0, count: -6
> >
> > What you're seeing with a MP_LOCKDEBUG kernel is just a symptom.  A CPU
> > enters DDB because another one is 'frozen' while holding the
> > KERNEL_LOCK().  What's interesting is that in both case the frozen CPU
> > is trying to execute apic related code:
> >   - x2apic_readreg
> >   - lapic_delay
> >
> > I believe this issue should be reported to KVM developers as well.
>
> *very* interestingly, i had a new lock, running bsd.sp.. So i think that
> rules out openbsd mp.
>
> ddb> tr
> i82489_readreg(0) at i82489_readreg+0xd
> lapic_delay(ffffffff81a84090) at lapic_delay+0x5c
> rtcget(ffffffff81a84090) at rtcget+0x1a
> resettodr() at resettodr+0x3a
> perform_resettodr(ffffffff81659e99) at perform_resettodr+0x9
> taskq_thread(0) at taskq_thread+0x57
> end trace frame: 0x0, count: -6

Try running with "-machine q35". It changes the emulated machine to
a modern platform.


--
Juan Francisco Cantero Hurtado http://juanfra.info

Reply | Threaded
Open this post in threaded view
|

Re: mp deadlock on 6.2 running on kvm

Landry Breuil-5
On Sat, Dec 09, 2017 at 04:33:28PM +0100, Juan Francisco Cantero Hurtado wrote:

> On Thu, Dec 07, 2017 at 02:27:29PM +0100, Landry Breuil wrote:
> > On Thu, Dec 07, 2017 at 11:52:46AM +0100, Martin Pieuchot wrote:
> > > On 07/12/17(Thu) 08:34, Landry Breuil wrote:
> > > > Hi,
> > > >
> > > > i've been having kvm VMs running 6.2 hardlocking/deadlocking since a
> > > > while, all those running on proxmox 5.1 using linux 4.13.8 & qemu-kvm
> > > > 2.9.1. There were hardlocks upon reboot which were 'solved' by disabling
> > > > x2apic emulation in kvm (args: -cpu=kvm64,-x2apic) or giving the host
> > > > cpu flags to the vm (args: -cpu host) but there still remains deadlocks
> > > > during normal operation.
> > > >
> > > > I'm now running a kernel with MP_LOCKDEBUG, so i'm collecting traces in
> > > > the vain hope that it might help someone interested in locking issues.
> > > > Here's the latest one:
> > >
> > > Let me add that when you had x2apic enabled the kernel 'froze' inside
> > > x2apic_readreg, trace below:
> > >
> > >   ddb{0}> tr
> > >   x2apic_readreg(10) at x2apic_readreg+0xf
> > >   lapic_delay(ffff800022136900) at lapic_delay+0x5c
> > >   rtcput(ffff800022136960) at rtcput+0x65
> > >   resettodr() at resettodr+0x1d6
> > >   perform_resettodr(ffffffff81769b29) at perform_resettodr+0x9
> > >   taskq_thread(0) at taskq_thread+0x67
> > >   end trace frame: 0x0, count: -6
> > >
> > > What you're seeing with a MP_LOCKDEBUG kernel is just a symptom.  A CPU
> > > enters DDB because another one is 'frozen' while holding the
> > > KERNEL_LOCK().  What's interesting is that in both case the frozen CPU
> > > is trying to execute apic related code:
> > >   - x2apic_readreg
> > >   - lapic_delay
> > >
> > > I believe this issue should be reported to KVM developers as well.
> >
> > *very* interestingly, i had a new lock, running bsd.sp.. So i think that
> > rules out openbsd mp.
> >
> > ddb> tr
> > i82489_readreg(0) at i82489_readreg+0xd
> > lapic_delay(ffffffff81a84090) at lapic_delay+0x5c
> > rtcget(ffffffff81a84090) at rtcget+0x1a
> > resettodr() at resettodr+0x3a
> > perform_resettodr(ffffffff81659e99) at perform_resettodr+0x9
> > taskq_thread(0) at taskq_thread+0x57
> > end trace frame: 0x0, count: -6
>
> Try running with "-machine q35". It changes the emulated machine to
> a modern platform.

Right, i suppose that matches https://wiki.qemu.org/Features/Q35,
interesting. Will definitely try, mailed the kvm mailing list but got no
feedback so far.

Thanks!
Landry

Reply | Threaded
Open this post in threaded view
|

Re: mp deadlock on 6.2 running on kvm

Jonathan Matthew-4
On 10/12/17 03:26, Landry Breuil wrote:

> On Sat, Dec 09, 2017 at 04:33:28PM +0100, Juan Francisco Cantero Hurtado wrote:
>> On Thu, Dec 07, 2017 at 02:27:29PM +0100, Landry Breuil wrote:
>>> On Thu, Dec 07, 2017 at 11:52:46AM +0100, Martin Pieuchot wrote:
>>>> On 07/12/17(Thu) 08:34, Landry Breuil wrote:
>>>>> Hi,
>>>>>
>>>>> i've been having kvm VMs running 6.2 hardlocking/deadlocking since a
>>>>> while, all those running on proxmox 5.1 using linux 4.13.8 & qemu-kvm
>>>>> 2.9.1. There were hardlocks upon reboot which were 'solved' by disabling
>>>>> x2apic emulation in kvm (args: -cpu=kvm64,-x2apic) or giving the host
>>>>> cpu flags to the vm (args: -cpu host) but there still remains deadlocks
>>>>> during normal operation.
>>>>>
>>>>> I'm now running a kernel with MP_LOCKDEBUG, so i'm collecting traces in
>>>>> the vain hope that it might help someone interested in locking issues.
>>>>> Here's the latest one:
>>>>
>>>> Let me add that when you had x2apic enabled the kernel 'froze' inside
>>>> x2apic_readreg, trace below:
>>>>
>>>>    ddb{0}> tr
>>>>    x2apic_readreg(10) at x2apic_readreg+0xf
>>>>    lapic_delay(ffff800022136900) at lapic_delay+0x5c
>>>>    rtcput(ffff800022136960) at rtcput+0x65
>>>>    resettodr() at resettodr+0x1d6
>>>>    perform_resettodr(ffffffff81769b29) at perform_resettodr+0x9
>>>>    taskq_thread(0) at taskq_thread+0x67
>>>>    end trace frame: 0x0, count: -6
>>>>
>>>> What you're seeing with a MP_LOCKDEBUG kernel is just a symptom.  A CPU
>>>> enters DDB because another one is 'frozen' while holding the
>>>> KERNEL_LOCK().  What's interesting is that in both case the frozen CPU
>>>> is trying to execute apic related code:
>>>>    - x2apic_readreg
>>>>    - lapic_delay
>>>>
>>>> I believe this issue should be reported to KVM developers as well.
>>>
>>> *very* interestingly, i had a new lock, running bsd.sp.. So i think that
>>> rules out openbsd mp.
>>>
>>> ddb> tr
>>> i82489_readreg(0) at i82489_readreg+0xd
>>> lapic_delay(ffffffff81a84090) at lapic_delay+0x5c
>>> rtcget(ffffffff81a84090) at rtcget+0x1a
>>> resettodr() at resettodr+0x3a
>>> perform_resettodr(ffffffff81659e99) at perform_resettodr+0x9
>>> taskq_thread(0) at taskq_thread+0x57
>>> end trace frame: 0x0, count: -6
>>
>> Try running with "-machine q35". It changes the emulated machine to
>> a modern platform.
>
> Right, i suppose that matches https://wiki.qemu.org/Features/Q35,
> interesting. Will definitely try, mailed the kvm mailing list but got no
> feedback so far.

I've been seeing this for a while too, on VMs that are already run with
-machine q35 and -cpu host.  I was blaming my (still unfinished) pvclock
code, but now I can fairly easily trigger it on single cpu VMs without
that, mostly by running kernel compiles in a loop in a couple of
different guests.  I'm using a Fedora 25 (4.10.15-200.fc25.x86_64) kernel.

Adding some debug output to lapic_delay, it appears the KVM virtualized
lapic counter hits zero and doesn't reset, so the lapic_delay loop in
the guest never terminates.  KVM has several different ways it can
provide the lapic counter and I'm not sure which one I'm using yet.

I just tried making lapic_delay give up after a million zero reads, and
it seems to recover after a minute or so.  I'll leave it running to see
if it happens again.

Reply | Threaded
Open this post in threaded view
|

Re: mp deadlock on 6.2 running on kvm

Landry Breuil-5
On Mon, Dec 11, 2017 at 06:21:01PM +1000, Jonathan Matthew wrote:

> On 10/12/17 03:26, Landry Breuil wrote:
> > On Sat, Dec 09, 2017 at 04:33:28PM +0100, Juan Francisco Cantero Hurtado wrote:
> > > On Thu, Dec 07, 2017 at 02:27:29PM +0100, Landry Breuil wrote:
> > > > On Thu, Dec 07, 2017 at 11:52:46AM +0100, Martin Pieuchot wrote:
> > > > > On 07/12/17(Thu) 08:34, Landry Breuil wrote:
> > > > > > Hi,
> > > > > >
> > > > > > i've been having kvm VMs running 6.2 hardlocking/deadlocking since a
> > > > > > while, all those running on proxmox 5.1 using linux 4.13.8 & qemu-kvm
> > > > > > 2.9.1. There were hardlocks upon reboot which were 'solved' by disabling
> > > > > > x2apic emulation in kvm (args: -cpu=kvm64,-x2apic) or giving the host
> > > > > > cpu flags to the vm (args: -cpu host) but there still remains deadlocks
> > > > > > during normal operation.
> > > > > >
> > > > > > I'm now running a kernel with MP_LOCKDEBUG, so i'm collecting traces in
> > > > > > the vain hope that it might help someone interested in locking issues.
> > > > > > Here's the latest one:
> > > > >
> > > > > Let me add that when you had x2apic enabled the kernel 'froze' inside
> > > > > x2apic_readreg, trace below:
> > > > >
> > > > >    ddb{0}> tr
> > > > >    x2apic_readreg(10) at x2apic_readreg+0xf
> > > > >    lapic_delay(ffff800022136900) at lapic_delay+0x5c
> > > > >    rtcput(ffff800022136960) at rtcput+0x65
> > > > >    resettodr() at resettodr+0x1d6
> > > > >    perform_resettodr(ffffffff81769b29) at perform_resettodr+0x9
> > > > >    taskq_thread(0) at taskq_thread+0x67
> > > > >    end trace frame: 0x0, count: -6
> > > > >
> > > > > What you're seeing with a MP_LOCKDEBUG kernel is just a symptom.  A CPU
> > > > > enters DDB because another one is 'frozen' while holding the
> > > > > KERNEL_LOCK().  What's interesting is that in both case the frozen CPU
> > > > > is trying to execute apic related code:
> > > > >    - x2apic_readreg
> > > > >    - lapic_delay
> > > > >
> > > > > I believe this issue should be reported to KVM developers as well.
> > > >
> > > > *very* interestingly, i had a new lock, running bsd.sp.. So i think that
> > > > rules out openbsd mp.
> > > >
> > > > ddb> tr
> > > > i82489_readreg(0) at i82489_readreg+0xd
> > > > lapic_delay(ffffffff81a84090) at lapic_delay+0x5c
> > > > rtcget(ffffffff81a84090) at rtcget+0x1a
> > > > resettodr() at resettodr+0x3a
> > > > perform_resettodr(ffffffff81659e99) at perform_resettodr+0x9
> > > > taskq_thread(0) at taskq_thread+0x57
> > > > end trace frame: 0x0, count: -6
> > >
> > > Try running with "-machine q35". It changes the emulated machine to
> > > a modern platform.
> >
> > Right, i suppose that matches https://wiki.qemu.org/Features/Q35,
> > interesting. Will definitely try, mailed the kvm mailing list but got no
> > feedback so far.
>
> I've been seeing this for a while too, on VMs that are already run with
> -machine q35 and -cpu host.  I was blaming my (still unfinished) pvclock
> code, but now I can fairly easily trigger it on single cpu VMs without that,
> mostly by running kernel compiles in a loop in a couple of different guests.
> I'm using a Fedora 25 (4.10.15-200.fc25.x86_64) kernel.
>
> Adding some debug output to lapic_delay, it appears the KVM virtualized
> lapic counter hits zero and doesn't reset, so the lapic_delay loop in the
> guest never terminates.  KVM has several different ways it can provide the
> lapic counter and I'm not sure which one I'm using yet.
>
> I just tried making lapic_delay give up after a million zero reads, and it
> seems to recover after a minute or so.  I'll leave it running to see if it
> happens again.

Hah, interesting. So i know i can try q35 just for the sake of emulating
a newer hw platform, but that wont fix this issue.

I see upstream this thread/patchset: https://lkml.org/lkml/2017/9/28/773
But i'm not sure that's the same issue.
https://patchwork.kernel.org/patch/9036161/ might be too ?

When you say you modified lapic_delay, i guess that's on the guest side ?

Landry

Reply | Threaded
Open this post in threaded view
|

Re: mp deadlock on 6.2 running on kvm

Jonathan Matthew-4
On Mon, Dec 11, 2017 at 09:34:00AM +0100, Landry Breuil wrote:

> On Mon, Dec 11, 2017 at 06:21:01PM +1000, Jonathan Matthew wrote:
> > On 10/12/17 03:26, Landry Breuil wrote:
> > > On Sat, Dec 09, 2017 at 04:33:28PM +0100, Juan Francisco Cantero Hurtado wrote:
> > > > On Thu, Dec 07, 2017 at 02:27:29PM +0100, Landry Breuil wrote:
> > > > > On Thu, Dec 07, 2017 at 11:52:46AM +0100, Martin Pieuchot wrote:
> > > > > > On 07/12/17(Thu) 08:34, Landry Breuil wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > i've been having kvm VMs running 6.2 hardlocking/deadlocking since a
> > > > > > > while, all those running on proxmox 5.1 using linux 4.13.8 & qemu-kvm
> > > > > > > 2.9.1. There were hardlocks upon reboot which were 'solved' by disabling
> > > > > > > x2apic emulation in kvm (args: -cpu=kvm64,-x2apic) or giving the host
> > > > > > > cpu flags to the vm (args: -cpu host) but there still remains deadlocks
> > > > > > > during normal operation.
> > > > > > >
> > > > > > > I'm now running a kernel with MP_LOCKDEBUG, so i'm collecting traces in
> > > > > > > the vain hope that it might help someone interested in locking issues.
> > > > > > > Here's the latest one:
> > > > > >
> > > > > > Let me add that when you had x2apic enabled the kernel 'froze' inside
> > > > > > x2apic_readreg, trace below:
> > > > > >
> > > > > >    ddb{0}> tr
> > > > > >    x2apic_readreg(10) at x2apic_readreg+0xf
> > > > > >    lapic_delay(ffff800022136900) at lapic_delay+0x5c
> > > > > >    rtcput(ffff800022136960) at rtcput+0x65
> > > > > >    resettodr() at resettodr+0x1d6
> > > > > >    perform_resettodr(ffffffff81769b29) at perform_resettodr+0x9
> > > > > >    taskq_thread(0) at taskq_thread+0x67
> > > > > >    end trace frame: 0x0, count: -6
> > > > > >
> > > > > > What you're seeing with a MP_LOCKDEBUG kernel is just a symptom.  A CPU
> > > > > > enters DDB because another one is 'frozen' while holding the
> > > > > > KERNEL_LOCK().  What's interesting is that in both case the frozen CPU
> > > > > > is trying to execute apic related code:
> > > > > >    - x2apic_readreg
> > > > > >    - lapic_delay
> > > > > >
> > > > > > I believe this issue should be reported to KVM developers as well.
> > > > >
> > > > > *very* interestingly, i had a new lock, running bsd.sp.. So i think that
> > > > > rules out openbsd mp.
> > > > >
> > > > > ddb> tr
> > > > > i82489_readreg(0) at i82489_readreg+0xd
> > > > > lapic_delay(ffffffff81a84090) at lapic_delay+0x5c
> > > > > rtcget(ffffffff81a84090) at rtcget+0x1a
> > > > > resettodr() at resettodr+0x3a
> > > > > perform_resettodr(ffffffff81659e99) at perform_resettodr+0x9
> > > > > taskq_thread(0) at taskq_thread+0x57
> > > > > end trace frame: 0x0, count: -6
> > > >
> > > > Try running with "-machine q35". It changes the emulated machine to
> > > > a modern platform.
> > >
> > > Right, i suppose that matches https://wiki.qemu.org/Features/Q35,
> > > interesting. Will definitely try, mailed the kvm mailing list but got no
> > > feedback so far.
> >
> > I've been seeing this for a while too, on VMs that are already run with
> > -machine q35 and -cpu host.  I was blaming my (still unfinished) pvclock
> > code, but now I can fairly easily trigger it on single cpu VMs without that,
> > mostly by running kernel compiles in a loop in a couple of different guests.
> > I'm using a Fedora 25 (4.10.15-200.fc25.x86_64) kernel.
> >
> > Adding some debug output to lapic_delay, it appears the KVM virtualized
> > lapic counter hits zero and doesn't reset, so the lapic_delay loop in the
> > guest never terminates.  KVM has several different ways it can provide the
> > lapic counter and I'm not sure which one I'm using yet.
> >
> > I just tried making lapic_delay give up after a million zero reads, and it
> > seems to recover after a minute or so.  I'll leave it running to see if it
> > happens again.
>
> Hah, interesting. So i know i can try q35 just for the sake of emulating
> a newer hw platform, but that wont fix this issue.
>
> I see upstream this thread/patchset: https://lkml.org/lkml/2017/9/28/773
> But i'm not sure that's the same issue.
> https://patchwork.kernel.org/patch/9036161/ might be too ?
>
> When you say you modified lapic_delay, i guess that's on the guest side ?

yeah, I haven't changed anything on the host side yet.

I have a sort of working theory now.  It looks like polling the lapic counter
in a tight loop starves whatever bit of kvm is responsible for resetting the
counter.  If the counter hits 0 and then we enter lapic_delay before kvm gets
a chance to reset it, it'll never get reset and lapic_delay will never finish.

The gross hack below makes lapic_delay issue a hlt instruction when it looks
like the lapic counter is stuck, which seems to be enough for kvm to reset it.
To make it easier to hit, I also reduced RESETTODR_PERIOD (in sys/kern/kern_time.c)
to 30, since it always seems to be the RTC read/write operations that get stuck.
Maybe there aren't any other delay()s happening.

Next I guess I'm going to dig through the kvm code to see how the lapic counter
reset works.


Index: lapic.c
===================================================================
RCS file: /cvs/src/sys/arch/amd64/amd64/lapic.c,v
retrieving revision 1.47
diff -u -p -r1.47 lapic.c
--- lapic.c 29 May 2017 14:19:49 -0000 1.47
+++ lapic.c 12 Dec 2017 10:46:16 -0000
@@ -553,6 +553,7 @@ lapic_delay(int usec)
 {
  int32_t tick, otick;
  int64_t deltat; /* XXX may want to be 64bit */
+ int tries = 0;
 
  otick = lapic_gettick();
 
@@ -570,6 +571,12 @@ lapic_delay(int usec)
  else
  deltat -= otick - tick;
  otick = tick;
+
+ tries++;
+ if (tries >= 1000000) {
+ printf("lapic_delay stuck\n");
+ __asm volatile("hlt");
+ }
 
  CPU_BUSY_CYCLE();
  }

Reply | Threaded
Open this post in threaded view
|

Re: mp deadlock on 6.2 running on kvm

Jonathan Matthew-4
On Tue, Dec 12, 2017 at 09:11:40PM +1000, Jonathan Matthew wrote:

> On Mon, Dec 11, 2017 at 09:34:00AM +0100, Landry Breuil wrote:
> > On Mon, Dec 11, 2017 at 06:21:01PM +1000, Jonathan Matthew wrote:
> > > On 10/12/17 03:26, Landry Breuil wrote:
> > > > On Sat, Dec 09, 2017 at 04:33:28PM +0100, Juan Francisco Cantero Hurtado wrote:
> > > > > On Thu, Dec 07, 2017 at 02:27:29PM +0100, Landry Breuil wrote:
> > > > > > On Thu, Dec 07, 2017 at 11:52:46AM +0100, Martin Pieuchot wrote:
> > > > > > > On 07/12/17(Thu) 08:34, Landry Breuil wrote:
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > i've been having kvm VMs running 6.2 hardlocking/deadlocking since a
> > > > > > > > while, all those running on proxmox 5.1 using linux 4.13.8 & qemu-kvm
> > > > > > > > 2.9.1. There were hardlocks upon reboot which were 'solved' by disabling
> > > > > > > > x2apic emulation in kvm (args: -cpu=kvm64,-x2apic) or giving the host
> > > > > > > > cpu flags to the vm (args: -cpu host) but there still remains deadlocks
> > > > > > > > during normal operation.
> > > > > > > >
> > > > > > > > I'm now running a kernel with MP_LOCKDEBUG, so i'm collecting traces in
> > > > > > > > the vain hope that it might help someone interested in locking issues.
> > > > > > > > Here's the latest one:
> > > > > > >
> > > > > > > Let me add that when you had x2apic enabled the kernel 'froze' inside
> > > > > > > x2apic_readreg, trace below:
> > > > > > >
> > > > > > >    ddb{0}> tr
> > > > > > >    x2apic_readreg(10) at x2apic_readreg+0xf
> > > > > > >    lapic_delay(ffff800022136900) at lapic_delay+0x5c
> > > > > > >    rtcput(ffff800022136960) at rtcput+0x65
> > > > > > >    resettodr() at resettodr+0x1d6
> > > > > > >    perform_resettodr(ffffffff81769b29) at perform_resettodr+0x9
> > > > > > >    taskq_thread(0) at taskq_thread+0x67
> > > > > > >    end trace frame: 0x0, count: -6
> > > > > > >
> > > > > > > What you're seeing with a MP_LOCKDEBUG kernel is just a symptom.  A CPU
> > > > > > > enters DDB because another one is 'frozen' while holding the
> > > > > > > KERNEL_LOCK().  What's interesting is that in both case the frozen CPU
> > > > > > > is trying to execute apic related code:
> > > > > > >    - x2apic_readreg
> > > > > > >    - lapic_delay
> > > > > > >
> > > > > > > I believe this issue should be reported to KVM developers as well.
> > > > > >
> > > > > > *very* interestingly, i had a new lock, running bsd.sp.. So i think that
> > > > > > rules out openbsd mp.
> > > > > >
> > > > > > ddb> tr
> > > > > > i82489_readreg(0) at i82489_readreg+0xd
> > > > > > lapic_delay(ffffffff81a84090) at lapic_delay+0x5c
> > > > > > rtcget(ffffffff81a84090) at rtcget+0x1a
> > > > > > resettodr() at resettodr+0x3a
> > > > > > perform_resettodr(ffffffff81659e99) at perform_resettodr+0x9
> > > > > > taskq_thread(0) at taskq_thread+0x57
> > > > > > end trace frame: 0x0, count: -6
> > > > >
> > > > > Try running with "-machine q35". It changes the emulated machine to
> > > > > a modern platform.
> > > >
> > > > Right, i suppose that matches https://wiki.qemu.org/Features/Q35,
> > > > interesting. Will definitely try, mailed the kvm mailing list but got no
> > > > feedback so far.
> > >
> > > I've been seeing this for a while too, on VMs that are already run with
> > > -machine q35 and -cpu host.  I was blaming my (still unfinished) pvclock
> > > code, but now I can fairly easily trigger it on single cpu VMs without that,
> > > mostly by running kernel compiles in a loop in a couple of different guests.
> > > I'm using a Fedora 25 (4.10.15-200.fc25.x86_64) kernel.
> > >
> > > Adding some debug output to lapic_delay, it appears the KVM virtualized
> > > lapic counter hits zero and doesn't reset, so the lapic_delay loop in the
> > > guest never terminates.  KVM has several different ways it can provide the
> > > lapic counter and I'm not sure which one I'm using yet.
> > >
> > > I just tried making lapic_delay give up after a million zero reads, and it
> > > seems to recover after a minute or so.  I'll leave it running to see if it
> > > happens again.
> >
> > Hah, interesting. So i know i can try q35 just for the sake of emulating
> > a newer hw platform, but that wont fix this issue.
> >
> > I see upstream this thread/patchset: https://lkml.org/lkml/2017/9/28/773
> > But i'm not sure that's the same issue.
> > https://patchwork.kernel.org/patch/9036161/ might be too ?
> >
> > When you say you modified lapic_delay, i guess that's on the guest side ?
>
> yeah, I haven't changed anything on the host side yet.
>
> I have a sort of working theory now.  It looks like polling the lapic counter
> in a tight loop starves whatever bit of kvm is responsible for resetting the
> counter.  If the counter hits 0 and then we enter lapic_delay before kvm gets
> a chance to reset it, it'll never get reset and lapic_delay will never finish.
>
> The gross hack below makes lapic_delay issue a hlt instruction when it looks
> like the lapic counter is stuck, which seems to be enough for kvm to reset it.
> To make it easier to hit, I also reduced RESETTODR_PERIOD (in sys/kern/kern_time.c)
> to 30, since it always seems to be the RTC read/write operations that get stuck.
> Maybe there aren't any other delay()s happening.
>
> Next I guess I'm going to dig through the kvm code to see how the lapic counter
> reset works.

So, for me at least, adding 'kvm-intel.preemption_timer=0' to the linux kernel
commandline (on the host) fixes this.  This option disables kvm's use of the
vmx preemption timer to schedule lapic counter events.  I think there's some
problem in how kvm calculates the tsc deadline, or the relationship between
that and the time value it uses for lapic counter reads, but I'm not sure what
exactly.

Reply | Threaded
Open this post in threaded view
|

Re: mp deadlock on 6.2 running on kvm

Landry Breuil-5
On Fri, Dec 15, 2017 at 06:21:08PM +1000, Jonathan Matthew wrote:

> On Tue, Dec 12, 2017 at 09:11:40PM +1000, Jonathan Matthew wrote:
> > On Mon, Dec 11, 2017 at 09:34:00AM +0100, Landry Breuil wrote:
> > > On Mon, Dec 11, 2017 at 06:21:01PM +1000, Jonathan Matthew wrote:
> > > > On 10/12/17 03:26, Landry Breuil wrote:
> > > > > On Sat, Dec 09, 2017 at 04:33:28PM +0100, Juan Francisco Cantero Hurtado wrote:
> > > > > > On Thu, Dec 07, 2017 at 02:27:29PM +0100, Landry Breuil wrote:
> > > > > > > On Thu, Dec 07, 2017 at 11:52:46AM +0100, Martin Pieuchot wrote:
> > > > > > > > On 07/12/17(Thu) 08:34, Landry Breuil wrote:
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > i've been having kvm VMs running 6.2 hardlocking/deadlocking since a
> > > > > > > > > while, all those running on proxmox 5.1 using linux 4.13.8 & qemu-kvm
> > > > > > > > > 2.9.1. There were hardlocks upon reboot which were 'solved' by disabling
> > > > > > > > > x2apic emulation in kvm (args: -cpu=kvm64,-x2apic) or giving the host
> > > > > > > > > cpu flags to the vm (args: -cpu host) but there still remains deadlocks
> > > > > > > > > during normal operation.
> > > > > > > > >
> > > > > > > > > I'm now running a kernel with MP_LOCKDEBUG, so i'm collecting traces in
> > > > > > > > > the vain hope that it might help someone interested in locking issues.
> > > > > > > > > Here's the latest one:
> > > > > > > >
> > > > > > > > Let me add that when you had x2apic enabled the kernel 'froze' inside
> > > > > > > > x2apic_readreg, trace below:
> > > > > > > >
> > > > > > > >    ddb{0}> tr
> > > > > > > >    x2apic_readreg(10) at x2apic_readreg+0xf
> > > > > > > >    lapic_delay(ffff800022136900) at lapic_delay+0x5c
> > > > > > > >    rtcput(ffff800022136960) at rtcput+0x65
> > > > > > > >    resettodr() at resettodr+0x1d6
> > > > > > > >    perform_resettodr(ffffffff81769b29) at perform_resettodr+0x9
> > > > > > > >    taskq_thread(0) at taskq_thread+0x67
> > > > > > > >    end trace frame: 0x0, count: -6
> > > > > > > >
> > > > > > > > What you're seeing with a MP_LOCKDEBUG kernel is just a symptom.  A CPU
> > > > > > > > enters DDB because another one is 'frozen' while holding the
> > > > > > > > KERNEL_LOCK().  What's interesting is that in both case the frozen CPU
> > > > > > > > is trying to execute apic related code:
> > > > > > > >    - x2apic_readreg
> > > > > > > >    - lapic_delay
> > > > > > > >
> > > > > > > > I believe this issue should be reported to KVM developers as well.
> > > > > > >
> > > > > > > *very* interestingly, i had a new lock, running bsd.sp.. So i think that
> > > > > > > rules out openbsd mp.
> > > > > > >
> > > > > > > ddb> tr
> > > > > > > i82489_readreg(0) at i82489_readreg+0xd
> > > > > > > lapic_delay(ffffffff81a84090) at lapic_delay+0x5c
> > > > > > > rtcget(ffffffff81a84090) at rtcget+0x1a
> > > > > > > resettodr() at resettodr+0x3a
> > > > > > > perform_resettodr(ffffffff81659e99) at perform_resettodr+0x9
> > > > > > > taskq_thread(0) at taskq_thread+0x57
> > > > > > > end trace frame: 0x0, count: -6
> > > > > >
> > > > > > Try running with "-machine q35". It changes the emulated machine to
> > > > > > a modern platform.
> > > > >
> > > > > Right, i suppose that matches https://wiki.qemu.org/Features/Q35,
> > > > > interesting. Will definitely try, mailed the kvm mailing list but got no
> > > > > feedback so far.
> > > >
> > > > I've been seeing this for a while too, on VMs that are already run with
> > > > -machine q35 and -cpu host.  I was blaming my (still unfinished) pvclock
> > > > code, but now I can fairly easily trigger it on single cpu VMs without that,
> > > > mostly by running kernel compiles in a loop in a couple of different guests.
> > > > I'm using a Fedora 25 (4.10.15-200.fc25.x86_64) kernel.
> > > >
> > > > Adding some debug output to lapic_delay, it appears the KVM virtualized
> > > > lapic counter hits zero and doesn't reset, so the lapic_delay loop in the
> > > > guest never terminates.  KVM has several different ways it can provide the
> > > > lapic counter and I'm not sure which one I'm using yet.
> > > >
> > > > I just tried making lapic_delay give up after a million zero reads, and it
> > > > seems to recover after a minute or so.  I'll leave it running to see if it
> > > > happens again.
> > >
> > > Hah, interesting. So i know i can try q35 just for the sake of emulating
> > > a newer hw platform, but that wont fix this issue.
> > >
> > > I see upstream this thread/patchset: https://lkml.org/lkml/2017/9/28/773
> > > But i'm not sure that's the same issue.
> > > https://patchwork.kernel.org/patch/9036161/ might be too ?
> > >
> > > When you say you modified lapic_delay, i guess that's on the guest side ?
> >
> > yeah, I haven't changed anything on the host side yet.
> >
> > I have a sort of working theory now.  It looks like polling the lapic counter
> > in a tight loop starves whatever bit of kvm is responsible for resetting the
> > counter.  If the counter hits 0 and then we enter lapic_delay before kvm gets
> > a chance to reset it, it'll never get reset and lapic_delay will never finish.
> >
> > The gross hack below makes lapic_delay issue a hlt instruction when it looks
> > like the lapic counter is stuck, which seems to be enough for kvm to reset it.
> > To make it easier to hit, I also reduced RESETTODR_PERIOD (in sys/kern/kern_time.c)
> > to 30, since it always seems to be the RTC read/write operations that get stuck.
> > Maybe there aren't any other delay()s happening.
> >
> > Next I guess I'm going to dig through the kvm code to see how the lapic counter
> > reset works.
>
> So, for me at least, adding 'kvm-intel.preemption_timer=0' to the linux kernel
> commandline (on the host) fixes this.  This option disables kvm's use of the
> vmx preemption timer to schedule lapic counter events.  I think there's some
> problem in how kvm calculates the tsc deadline, or the relationship between
> that and the time value it uses for lapic counter reads, but I'm not sure what
> exactly.
>

I'm also running with this option now, will bang the vms with some heavy
workload and keep an eye on it. Thanks for the time you spent on this ! :)