vmd linux guests unexpectedly hangs and time skew issue

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

vmd linux guests unexpectedly hangs and time skew issue

Martin Got
First of all, I have to mention OpenBSD guests work perfectly on 6.7amd64 host even without previously discovered clock skew issue. I'm using tsc and ntpd to adjust guests' clock precisely. No any huge clock skew is detected. Good job!

It seems tsc does not help on linux guests. Clock is two times slower than host system's time. I didn't find a solution to solve it. Any suggestions?

Right after update amd64 6.6->6.7 vmd machines unexpectedly hangs with linux guests. Tested with Debian 5.2.0 kernel and Alpine 5.4.43-1-virt kernel. Machines run one by one (only one vm is run in a time of hang).

CPU utilization continuously increases from 20-25% when normal run up to 160-188% when guest hangs, according to host system's top.

I tried to stop the vmd guest by 'vmctl stop linux' and CPU utilization decreased to ~99-100% but machine don't stop finally without additionally sending 'kill -HUP PID' signal to vmd process. I didn't see it in 6.6 with same linux guests.

Some errors from guests boot process which can be relevant:

Loading Linux 5.2.0-debian-amd64
Loading initial ramdist ...
ACPI BIOS Error (bug): A valid RSDP was not found (20190509/tbxfroot-210)
[Firmware Bug]: TSC doesn't count with P0 frequency!
[Firmware Bug]: cpu 0, invalid IBS interrupt offset 0 (MSRC001103A=0x0000000000000000)
mce: Unambe to init MCE device (rc:-5)
tsc: Fast TSC calibration failed
...

Loading Linux 5.4.43-1-virt #Alpine SMP
[Firmware Bug]: TSC doesn't count with P0 frequency!
tsc: Fast TSC calibration failed
tsc: Unable to calibrate against PIT
tsc: No referece (HPET/PMTIMER) available
tsc: Marking TSC unstable due to could not calculate TSC khz

Martin
Reply | Threaded
Open this post in threaded view
|

Re: vmd linux guests unexpectedly hangs and time skew issue

Pratik Vyas
* Martin <[hidden email]> [2020-06-26 12:04:33 +0000]:

>First of all, I have to mention OpenBSD guests work perfectly on 6.7amd64 host even without previously discovered clock skew issue. I'm using tsc and ntpd to adjust guests' clock precisely. No any huge clock skew is detected. Good job!
>
>It seems tsc does not help on linux guests. Clock is two times slower than host system's time. I didn't find a solution to solve it. Any suggestions?
>
>Right after update amd64 6.6->6.7 vmd machines unexpectedly hangs with linux guests. Tested with Debian 5.2.0 kernel and Alpine 5.4.43-1-virt kernel. Machines run one by one (only one vm is run in a time of hang).
>
>CPU utilization continuously increases from 20-25% when normal run up to 160-188% when guest hangs, according to host system's top.
>
>I tried to stop the vmd guest by 'vmctl stop linux' and CPU utilization decreased to ~99-100% but machine don't stop finally without additionally sending 'kill -HUP PID' signal to vmd process. I didn't see it in 6.6 with same linux guests.
>
>Some errors from guests boot process which can be relevant:
>
>Loading Linux 5.2.0-debian-amd64
>Loading initial ramdist ...
>ACPI BIOS Error (bug): A valid RSDP was not found (20190509/tbxfroot-210)
>[Firmware Bug]: TSC doesn't count with P0 frequency!
>[Firmware Bug]: cpu 0, invalid IBS interrupt offset 0 (MSRC001103A=0x0000000000000000)
>mce: Unambe to init MCE device (rc:-5)
>tsc: Fast TSC calibration failed
>...
>
>Loading Linux 5.4.43-1-virt #Alpine SMP
>[Firmware Bug]: TSC doesn't count with P0 frequency!
>tsc: Fast TSC calibration failed
>tsc: Unable to calibrate against PIT
>tsc: No referece (HPET/PMTIMER) available
>tsc: Marking TSC unstable due to could not calculate TSC khz
>
>Martin

Hi Martin,

I am not sure what your hw is without a dmesg but linux >5.4 on most
machines (intel >broadwell and ryzen) I have seen are able to use tsc.
Please try -current as it has received fixes for these issues (crashes
and lockups w/ 100% cpu util).

--
Pratik

Reply | Threaded
Open this post in threaded view
|

Re: vmd linux guests unexpectedly hangs and time skew issue

Martin Got
Hi Pratik,

Could you recommend some way how to enable tsc in Linux explicitly? I use Linux 5.4.43-1-virt #Alpine SMP mini build for virtual machines out of the box, and finally getting tsc error every boot with clock skew.

I can't move to current on production machine. Maybe some patches available for 6.7 vmd to avoid hanging with 80-100% loads?

OpenBSD 6.7 (GENERIC.MP) #182: Thu May  7 11:11:58
    [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 16554262528 (15787MB)
avail mem = 16039927808 (15296MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xe0840 (39 entries)
bios0: vendor Phoenix Technologies Ltd. version "FP4.3.0.0.312.13 X64" date 03/02/2018
bios0: CompuLab fit-PC4
acpi0 at bios0: ACPI 5.0
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP HPET APIC MCFG FPDT UEFI POAT BATB SSDT SSDT UEFI
acpi0: wakeup devices GPP0(S4) GPP1(S4) GPP2(S4) GPP3(S4) GFX_(S4) XHC0(S4) OHC1(S4) EHC1(S4) OHC2(S4) EHC2(S4) SBAZ(S4) UAR1(S3)
acpitimer0 at acpi0: 3579545 Hz, 32 bits
acpihpet0 at acpi0: 14318180 Hz
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD GX-420CA SOC with Radeon(tm) HD Graphics, 1996.50 MHz, 16-00-01
cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TOPEXT,DBKP,PCTRL3,ITSC,BMI1,XSAVEOPT
cpu0: 32KB 64b/line 2-way I-cache, 32KB 64b/line 8-way D-cache, 2MB 64b/line 16-way L2 cache
cpu0: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative
cpu0: DTLB 40 4KB entries fully associative, 8 4MB entries fully associative
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, IBE
cpu1 at mainbus0: apid 1 (application processor)
cpu1: AMD GX-420CA SOC with Radeon(tm) HD Graphics, 1996.26 MHz, 16-00-01
cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TOPEXT,DBKP,PCTRL3,ITSC,BMI1,XSAVEOPT
cpu1: 32KB 64b/line 2-way I-cache, 32KB 64b/line 8-way D-cache, 2MB 64b/line 16-way L2 cache
cpu1: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative
cpu1: DTLB 40 4KB entries fully associative, 8 4MB entries fully associative
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 2 (application processor)
cpu2: AMD GX-420CA SOC with Radeon(tm) HD Graphics, 1996.27 MHz, 16-00-01
cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TOPEXT,DBKP,PCTRL3,ITSC,BMI1,XSAVEOPT
cpu2: 32KB 64b/line 2-way I-cache, 32KB 64b/line 8-way D-cache, 2MB 64b/line 16-way L2 cache
cpu2: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative
cpu2: DTLB 40 4KB entries fully associative, 8 4MB entries fully associative
cpu2: smt 0, core 2, package 0
cpu3 at mainbus0: apid 3 (application processor)
cpu3: AMD GX-420CA SOC with Radeon(tm) HD Graphics, 1996.26 MHz, 16-00-01
cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TOPEXT,DBKP,PCTRL3,ITSC,BMI1,XSAVEOPT
cpu3: 32KB 64b/line 2-way I-cache, 32KB 64b/line 8-way D-cache, 2MB 64b/line 16-way L2 cache
cpu3: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative
cpu3: DTLB 40 4KB entries fully associative, 8 4MB entries fully associative
cpu3: smt 0, core 3, package 0
ioapic0 at mainbus0: apid 4 pa 0xfec00000, version 21, 24 pins, remapped
ioapic1 at mainbus0: apid 5 pa 0xfec01000, version 21, 32 pins, remapped
acpimcfg0 at acpi0
acpimcfg0: addr 0xf8000000, bus 0-63
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 2 (GPP0)
acpiprt2 at acpi0: bus -1 (GPP1)
acpiprt3 at acpi0: bus 4 (GPP2)
acpiprt4 at acpi0: bus 5 (GPP3)
acpiprt5 at acpi0: bus 1 (GFX_)
acpicpu0 at acpi0: C2(0@400 io@0x841), C1(@1 halt!), PSS
acpicpu1 at acpi0: C2(0@400 io@0x841), C1(@1 halt!), PSS
acpicpu2 at acpi0: C2(0@400 io@0x841), C1(@1 halt!), PSS
acpicpu3 at acpi0: C2(0@400 io@0x841), C1(@1 halt!), PSS
acpibtn0 at acpi0: PWRB
acpipci0 at acpi0 PCI0: 0x00000010 0x00000011 0x00000000
acpicmos0 at acpi0
"PNP0A05" at acpi0 not configured
acpivideo0 at acpi0: VGA_
acpivout0 at acpivideo0: LCD_
cpu0: 1996 MHz: speeds: 2000 1800 1600 1200 1000 800 MHz
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "AMD 16h Host" rev 0x00
radeondrm0 at pci0 dev 1 function 0 "ATI Kabini" rev 0x00
drm0 at radeondrm0
radeondrm0: msi
azalia0 at pci0 dev 1 function 1 "ATI Radeon HD Audio" rev 0x00: msi
azalia0: no supported codecs
pchb1 at pci0 dev 2 function 0 vendor "AMD", unknown product 0x1538 rev 0x00
ppb0 at pci0 dev 2 function 1 "AMD 16h PCIE" rev 0x00: msi
pci1 at ppb0 bus 1
ppb1 at pci0 dev 2 function 2 "AMD 16h PCIE" rev 0x00: msi
pci2 at ppb1 bus 2
athn0 at pci2 dev 0 function 0 "Atheros AR9281" rev 0x01: apic 5 int 4
athn0: AR9280 rev 2 (2T2R), ROM rev 16, address 00:23:4d:12:08:01
ppb2 at pci0 dev 2 function 4 "AMD 16h PCIE" rev 0x00: msi
pci3 at ppb2 bus 4
em0 at pci3 dev 0 function 0 "Intel I211" rev 0x03: msi, address 00:01:c0:54:0A:63
ppb3 at pci0 dev 2 function 5 "AMD 16h PCIE" rev 0x00: msi
pci4 at ppb3 bus 5
em1 at pci4 dev 0 function 0 "Intel I211" rev 0x03: msi, address 00:01:c0:54:0A:64
xhci0 at pci0 dev 16 function 0 "AMD Bolton xHCI" rev 0x01: msi, xHCI 1.0
usb0 at xhci0: USB revision 3.0
uhub0 at usb0 configuration 1 interface 0 "AMD xHCI root hub" rev 3.00/1.00 addr 1
ahci0 at pci0 dev 17 function 0 "AMD Hudson-2 SATA" rev 0x40: msi, AHCI 1.3
ahci0: port 0: 6.0Gb/s
scsibus1 at ahci0: 32 targets
ohci0 at pci0 dev 18 function 0 "AMD Hudson-2 USB" rev 0x39: apic 4 int 18, version 1.0, legacy support
ehci0 at pci0 dev 18 function 2 "AMD Hudson-2 USB2" rev 0x39: apic 4 int 17
usb1 at ehci0: USB revision 2.0
uhub1 at usb1 configuration 1 interface 0 "AMD EHCI root hub" rev 2.00/1.00 addr 1
ohci1 at pci0 dev 19 function 0 "AMD Hudson-2 USB" rev 0x39: apic 4 int 18, version 1.0, legacy support
ehci1 at pci0 dev 19 function 2 "AMD Hudson-2 USB2" rev 0x39: apic 4 int 17
usb2 at ehci1: USB revision 2.0
uhub2 at usb2 configuration 1 interface 0 "AMD EHCI root hub" rev 2.00/1.00 addr 1
piixpm0 at pci0 dev 20 function 0 "AMD Hudson-2 SMBus" rev 0x3a: SMBus disabled
azalia1 at pci0 dev 20 function 2 "AMD Hudson-2 HD Audio" rev 0x02: apic 4 int 16
azalia1: codecs: Realtek ALC888
audio0 at azalia1
pcib0 at pci0 dev 20 function 3 "AMD Hudson-2 LPC" rev 0x11
sdhc0 at pci0 dev 20 function 7 "AMD Bolton SD/MMC" rev 0x01: apic 4 int 16
sdhc0: SDHC 2.0, 50 MHz base clock
sdmmc0 at sdhc0: 4-bit, sd high-speed, mmc high-speed, dma
pchb2 at pci0 dev 24 function 0 "AMD 16h Link Cfg" rev 0x00
pchb3 at pci0 dev 24 function 1 "AMD 16h Address Map" rev 0x00
pchb4 at pci0 dev 24 function 2 "AMD 16h DRAM Cfg" rev 0x00
km0 at pci0 dev 24 function 3 "AMD 16h Misc Cfg" rev 0x00
pchb5 at pci0 dev 24 function 4 "AMD 16h CPU Power" rev 0x00
pchb6 at pci0 dev 24 function 5 vendor "AMD", unknown product 0x1535 rev 0x00
usb3 at ohci0: USB revision 1.0
uhub3 at usb3 configuration 1 interface 0 "AMD OHCI root hub" rev 1.00/1.00 addr 1
usb4 at ohci1: USB revision 1.0
uhub4 at usb4 configuration 1 interface 0 "AMD OHCI root hub" rev 1.00/1.00 addr 1
isa0 at pcib0
isadma0 at isa0
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
vmm0 at mainbus0: SVM/RVI
sdmmc0: can't enable card

Martin

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Sunday, June 28, 2020 5:54 PM, Pratik Vyas <[hidden email]> wrote:

> -   Martin [hidden email] [2020-06-26 12:04:33 +0000]:
>
> > First of all, I have to mention OpenBSD guests work perfectly on 6.7amd64 host even without previously discovered clock skew issue. I'm using tsc and ntpd to adjust guests' clock precisely. No any huge clock skew is detected. Good job!
> > It seems tsc does not help on linux guests. Clock is two times slower than host system's time. I didn't find a solution to solve it. Any suggestions?
> > Right after update amd64 6.6->6.7 vmd machines unexpectedly hangs with linux guests. Tested with Debian 5.2.0 kernel and Alpine 5.4.43-1-virt kernel. Machines run one by one (only one vm is run in a time of hang).
> > CPU utilization continuously increases from 20-25% when normal run up to 160-188% when guest hangs, according to host system's top.
> > I tried to stop the vmd guest by 'vmctl stop linux' and CPU utilization decreased to ~99-100% but machine don't stop finally without additionally sending 'kill -HUP PID' signal to vmd process. I didn't see it in 6.6 with same linux guests.
> > Some errors from guests boot process which can be relevant:
> > Loading Linux 5.2.0-debian-amd64
> > Loading initial ramdist ...
> > ACPI BIOS Error (bug): A valid RSDP was not found (20190509/tbxfroot-210)
> > [Firmware Bug]: TSC doesn't count with P0 frequency!
> > [Firmware Bug]: cpu 0, invalid IBS interrupt offset 0 (MSRC001103A=0x0000000000000000)
> > mce: Unambe to init MCE device (rc:-5)
> > tsc: Fast TSC calibration failed
> > ...
> > Loading Linux 5.4.43-1-virt #Alpine SMP
> > [Firmware Bug]: TSC doesn't count with P0 frequency!
> > tsc: Fast TSC calibration failed
> > tsc: Unable to calibrate against PIT
> > tsc: No referece (HPET/PMTIMER) available
> > tsc: Marking TSC unstable due to could not calculate TSC khz
> > Martin
>
> Hi Martin,
>
> I am not sure what your hw is without a dmesg but linux >5.4 on most
> machines (intel >broadwell and ryzen) I have seen are able to use tsc.
>
> Please try -current as it has received fixes for these issues (crashes
> and lockups w/ 100% cpu util).
>
> ------------------------------------------------------------------------------------------------------
>
> Pratik