VMD consumes 100% cpu after unpausing guest

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

VMD consumes 100% cpu after unpausing guest

Dave Voutila-2
>Synopsis: VMD consumes 100% cpu after unpausing guest
>Category: amd64
>Environment:
        System      : OpenBSD 6.2
        Details     : OpenBSD 6.2-current (GENERIC.MP) #10: Wed Feb 21 21:26:27 MST 2018
                         [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.MP

        Architecture: OpenBSD.amd64
        Machine     : amd64

>Description:

        Not sure if this is a known issue, but I couldn't find anything
searching the lists.

Using an Alpine Linux guest vm, I can successfully pause the guest using
`vmctl pause 1` and some time later resume it using `vmctl unpause 1`.

Unpausing works as the guest comes back to life, I can SSH back in, and
it's fine. However, on the host the vmd process representing that guest
sits at 100% CPU utilization with 1 thread constantly queueing onto a
cpu and running. The guest reports normal load so it must be one of the
2 threads.

Taking a ktrace of that particular thread, and slimming for sake of
email, it's constantly calling clock_gettime and kevent:

CALL    futex(0x7361d183cd0,0x2<FUTEX_WAKE>,1,0,0)      
RET     futex   0
CALL    kevent(5,0,0,0x7361d17c800,64,0x735f272b7c0)    
STRU    struct  timespec
RET     kevent  0
CALL    clock_gettime(CLOCK_MONOTONIC,0x735f272b860)    
STRU    struct  timespec
RET     clock_gettime   0
CALL    kevent(5,0,0,0x7361d17c800,64,0x735f272b7c0)    
STRU    struct  timespec
RET     kevent  0
CALL    clock_gettime(CLOCK_MONOTONIC,0x735f272b860)    
STRU    struct  timespec
RET     clock_gettime   0
CALL    kevent(5,0,0,0x7361d17c800,64,0x735f272b7c0)    
STRU    struct  timespec
RET     kevent  0
CALL    clock_gettime(CLOCK_MONOTONIC,0x735f272b860)    
STRU    struct  timespec
RET     clock_gettime   0
CALL    kevent(5,0,0,0x7361d17c800,64,0x735f272b7c0)    
STRU    struct  timespec
RET     kevent  0
CALL    clock_gettime(CLOCK_MONOTONIC,0x735f272b860)    
STRU    struct  timespec
RET     clock_gettime   0
CALL    kevent(5,0,0,0x7361d17c800,64,0x735f272b7c0)    
STRU    struct  timespec
RET     kevent  0
...etc.

VMD reports nothing strange, which I'd expect as the guest vm is
perfectly functional during this period even while that thread
burns up the CPU:

startup
/etc/vm.conf:3: switch "uplink" registered
vm_register: registering vm 1  
/etc/vm.conf:12: vm "alpine" registered (disabled)
vm_priv_brconfig: interface bridge0 description switch1-uplink
vmd_configure: not creating vm alpine (disabled)
config_setconfig: setting config
config_getconfig: retrieving config
config_getconfig: retrieving config
config_getconfig: retrieving config
vm_opentty: vm alpine tty /dev/ttyp5 uid 1000 gid 4 mode 620
vm_register: registering vm 1
vm_priv_ifconfig: interface tap0 description vm1-if0-alpine
vm_priv_ifconfig: switch "uplink" interface bridge0 add tap0
alpine: started vm 1 successfully, tty /dev/ttyp5
loadfile_bios: loaded BIOS image
run_vm: initializing hardware for vm alpine
virtio_init: vm "alpine" vio0 lladdr fe:e1:bb:d1:1b:bd
run_vm: starting vcpu threads for vm alpine
vcpu_reset: resetting vcpu 0 for vm 3
run_vm: waiting on events for VM alpine
i8259_write_datareg: master pic, reset IRQ vector to 0x8
i8259_write_datareg: slave pic, reset IRQ vector to 0x70
vcpu_exit_i8253: channel 0 reset, mode=0, start=65535
virtio_blk_io: device reset
virtio_blk_io: device reset
vcpu_process_com_lcr: set baudrate = 115200
vcpu_process_com_lcr: set baudrate = 115200
i8259_write_datareg: master pic, reset IRQ vector to 0x30
i8259_write_datareg: slave pic, reset IRQ vector to 0x38
vcpu_process_com_lcr: set baudrate = 115200
vcpu_exit_i8253: channel 0 reset, mode=7, start=3977
vcpu_exit_i8253: channel 2 reset, mode=7, start=65535
vcpu_exit_i8253: channel 2 reset, mode=7, start=65535
vcpu_exit_i8253: channel 2 reset, mode=7, start=65535
vcpu_exit_i8253: channel 2 reset, mode=7, start=65535
vcpu_process_com_lcr: set baudrate = 115200
vcpu_process_com_data: guest reading com1 when not ready
vcpu_process_com_data: guest reading com1 when not ready
vcpu_process_com_data: guest reading com1 when not ready
vcpu_process_com_lcr: set baudrate = 115200
virtio_blk_io: device reset
virtio_blk_io: device reset
virtio_net_io: device reset
alpine: paused vm 1 successfully
alpine: unpaused vm 1 successfully.
rtc_update_rega: set non-32KHz timebase not supported
rtc_fire1: RTC clock drift (44s), requesting guest resync
rtc_update_rega: set non-32KHz timebase not supported

>How-To-Repeat:
        Pause an actively running linux guest: `vmctl pause 1`
        After some time, resume the guest: `vmctl unpause 1`
        Observe CPU utilization of matching VMD process.

>Fix:
        Unknown. Stopping the guest through either having it halt or
`vmctl stop <id>` obviously ends the cpu consumption.

dmesg:
OpenBSD 6.2-current (GENERIC.MP) #10: Wed Feb 21 21:26:27 MST 2018
    [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 17053851648 (16263MB)
avail mem = 16529985536 (15764MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 3.0 @ 0xba6ce000 (62 entries)
bios0: vendor LENOVO version "R0IET50W (1.28 )" date 01/29/2018
bios0: LENOVO 20HNCTO1WW
acpi0 at bios0: rev 2
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP SSDT TPM2 UEFI SSDT SSDT HPET APIC MCFG ECDT SSDT SSDT BOOT BATB SLIC SSDT SSDT SSDT WSMT SSDT SSDT DBGP DBG2 MSDM DMAR ASF! FPDT UEFI
acpi0: wakeup devices GLAN(S4) XHC_(S3) XDCI(S4) HDAS(S4) RP01(S4) RP02(S4) RP04(S4) RP05(S4) RP06(S4) RP07(S4) RP08(S4) RP09(S4) RP10(S4) RP11(S4) RP12(S4) RP13(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpihpet0 at acpi0: 23999999 Hz
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz, 2585.78 MHz
cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,SENSOR,ARAT,MELTDOWN
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 24MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz, 2593.97 MHz
cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,SENSOR,ARAT,MELTDOWN
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 1 (application processor)
cpu2: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz, 2593.97 MHz
cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,SENSOR,ARAT,MELTDOWN
cpu2: 256KB 64b/line 8-way L2 cache
cpu2: smt 1, core 0, package 0
cpu3 at mainbus0: apid 3 (application processor)
cpu3: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz, 2593.97 MHz
cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,SENSOR,ARAT,MELTDOWN
cpu3: 256KB 64b/line 8-way L2 cache
cpu3: smt 1, core 1, package 0
ioapic0 at mainbus0: apid 2 pa 0xfec00000, version 20, 120 pins
acpimcfg0 at acpi0 addr 0xf0000000, bus 0-63
acpiec0 at acpi0
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 2 (RP01)
acpiprt2 at acpi0: bus -1 (RP02)
acpiprt3 at acpi0: bus 3 (RP03)
acpiprt4 at acpi0: bus -1 (RP04)
acpiprt5 at acpi0: bus 4 (RP05)
acpiprt6 at acpi0: bus -1 (RP06)
acpiprt7 at acpi0: bus -1 (RP07)
acpiprt8 at acpi0: bus -1 (RP08)
acpiprt9 at acpi0: bus -1 (RP09)
acpiprt10 at acpi0: bus -1 (RP10)
acpiprt11 at acpi0: bus -1 (RP11)
acpiprt12 at acpi0: bus -1 (RP12)
acpiprt13 at acpi0: bus -1 (RP13)
acpiprt14 at acpi0: bus -1 (RP14)
acpiprt15 at acpi0: bus -1 (RP15)
acpiprt16 at acpi0: bus -1 (RP16)
acpiprt17 at acpi0: bus -1 (RP17)
acpiprt18 at acpi0: bus -1 (RP18)
acpiprt19 at acpi0: bus -1 (RP19)
acpiprt20 at acpi0: bus -1 (RP20)
acpiprt21 at acpi0: bus -1 (RP21)
acpiprt22 at acpi0: bus -1 (RP22)
acpiprt23 at acpi0: bus -1 (RP23)
acpiprt24 at acpi0: bus -1 (RP24)
acpicpu0 at acpi0: C3(200@1034 mwait.1@0x60), C2(200@151 mwait.1@0x33), C1(1000@1 mwait.1), PSS
acpicpu1 at acpi0: C3(200@1034 mwait.1@0x60), C2(200@151 mwait.1@0x33), C1(1000@1 mwait.1), PSS
acpicpu2 at acpi0: C3(200@1034 mwait.1@0x60), C2(200@151 mwait.1@0x33), C1(1000@1 mwait.1), PSS
acpicpu3 at acpi0: C3(200@1034 mwait.1@0x60), C2(200@151 mwait.1@0x33), C1(1000@1 mwait.1), PSS
acpipwrres0 at acpi0: PUBS, resource for XHC_
acpipwrres1 at acpi0: WRST
acpipwrres2 at acpi0: WRST
acpitz0 at acpi0: critical temperature is 128 degC
acpithinkpad0 at acpi0
acpiac0 at acpi0: AC unit online
acpibat0 at acpi0: BAT0 model "45N1113" serial  8867 type LION oem "LGC"
acpibat1 at acpi0: BAT1 model "45N1738" serial  5539 type LION oem "LGC"
"INT3F0D" at acpi0 not configured
"LEN0071" at acpi0 not configured
"LEN2046" at acpi0 not configured
"INT3515" at acpi0 not configured
acpibtn0 at acpi0: SLPB
"PNP0C14" at acpi0 not configured
acpibtn1 at acpi0: LID_
"PNP0C14" at acpi0 not configured
"PNP0C14" at acpi0 not configured
"PNP0C14" at acpi0 not configured
"MSFT0101" at acpi0 not configured
"INT3394" at acpi0 not configured
acpivideo0 at acpi0: GFX0
acpivout at acpivideo0 not configured
cpu0: Enhanced SpeedStep 2585 MHz: speeds: 2701, 2700, 2600, 2500, 2400, 2200, 2000, 1800, 1600, 1500, 1300, 1100, 800, 700, 600, 400 MHz
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel Core 7G Host" rev 0x02
inteldrm0 at pci0 dev 2 function 0 "Intel HD Graphics 620" rev 0x02
drm0 at inteldrm0
inteldrm0: msi
error: [drm:pid0:i915_firmware_load_error_print] *ERROR* failed to load firmware i915/kbl_dmc_ver1.bin (-22)
inteldrm0: 1920x1080, 32bpp
wsdisplay0 at inteldrm0 mux 1: console (std, vt100 emulation)
wsdisplay0: screen 1-5 added (std, vt100 emulation)
xhci0 at pci0 dev 20 function 0 "Intel 100 Series xHCI" rev 0x21: msi
usb0 at xhci0: USB revision 3.0
uhub0 at usb0 configuration 1 interface 0 "Intel xHCI root hub" rev 3.00/1.00 addr 1
pchtemp0 at pci0 dev 20 function 2 "Intel 100 Series Thermal" rev 0x21
dwiic0 at pci0 dev 21 function 0 "Intel 100 Series I2C" rev 0x21: apic 2 int 16
iic0 at dwiic0
dwiic1 at pci0 dev 21 function 1 "Intel 100 Series I2C" rev 0x21: apic 2 int 17
iic1 at dwiic1
"INT3515" at iic1 addr 0x38 not configured
"Intel 100 Series MEI" rev 0x21 at pci0 dev 22 function 0 not configured
ahci0 at pci0 dev 23 function 0 "Intel 100 Series AHCI" rev 0x21: msi, AHCI 1.3.1
ahci0: port 2: 6.0Gb/s
scsibus1 at ahci0: 32 targets
sd0 at scsibus1 targ 2 lun 0: <ATA, ZTC-SM201-256G, 1.00> SCSI3 0/direct fixed naa.0000000000000000
sd0: 244198MB, 512 bytes/sector, 500118192 sectors, thin
ppb0 at pci0 dev 28 function 0 "Intel 100 Series PCIE" rev 0xf1: msi
pci1 at ppb0 bus 2
rtsx0 at pci1 dev 0 function 0 "Realtek RTS522A Card Reader" rev 0x01: msi
sdmmc0 at rtsx0: 4-bit, dma
ppb1 at pci0 dev 28 function 2 "Intel 100 Series PCIE" rev 0xf1: msi
pci2 at ppb1 bus 3
iwm0 at pci2 dev 0 function 0 "Intel Dual Band Wireless-AC 8265" rev 0x78, msi
ppb2 at pci0 dev 28 function 4 "Intel 100 Series PCIE" rev 0xf1: msi
pci3 at ppb2 bus 4
nvme0 at pci3 dev 0 function 0 vendor "Samsung", unknown product 0xa808 rev 0x00: msi, NVMe 1.2
nvme0: SAMSUNG MZVLB512HAJQ-000L7, firmware 3L2QEXA7, serial S3TNNE0JB80896
scsibus2 at nvme0: 1 targets
sd1 at scsibus2 targ 0 lun 0: <NVMe, SAMSUNG MZVLB512, 3L2Q> SCSI4 0/direct fixed
sd1: 488386MB, 512 bytes/sector, 1000215216 sectors
pcib0 at pci0 dev 31 function 0 "Intel 200 Series LPC" rev 0x21
"Intel 100 Series PMC" rev 0x21 at pci0 dev 31 function 2 not configured
azalia0 at pci0 dev 31 function 3 "Intel 200 Series HD Audio" rev 0x21: msi
azalia0: codecs: Realtek/0x0298, Intel/0x280b, using Realtek/0x0298
audio0 at azalia0
ichiic0 at pci0 dev 31 function 4 "Intel 100 Series SMBus" rev 0x21: apic 2 int 16
iic2 at ichiic0
em0 at pci0 dev 31 function 6 "Intel I219-V" rev 0x21: msi, address 54:e1:ad:db:2e:82
isa0 at pcib0
isadma0 at isa0
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pms0 at pckbc0 (aux slot)
wsmouse0 at pms0 mux 0
wsmouse1 at pms0 mux 0
pms0: Synaptics clickpad, firmware 8.2, 0x1e2b1 0x943300
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
vmm0 at mainbus0: VMX/EPT
efifb at mainbus0 not configured
uvideo0 at uhub0 port 8 configuration 1 interface 0 "SunplusIT Inc Integrated Camera" rev 2.00/0.10 addr 2
video0 at uvideo0
scsibus3 at sdmmc0: 2 targets, initiator 0
sd2 at scsibus3 targ 1 lun 0: <SD/MMC, SP64G, 0080> SCSI2 0/direct removable
sd2: 60906MB, 512 bytes/sector, 124735488 sectors
vscsi0 at root
scsibus4 at vscsi0: 256 targets
softraid0 at root
scsibus5 at softraid0: 256 targets
sd3 at scsibus5 targ 1 lun 0: <OPENBSD, SR CRYPTO, 006> SCSI2 0/direct fixed
sd3: 240369MB, 512 bytes/sector, 492277232 sectors
root on sd3a (1435a22fc6a86fb5.a) swap on sd3b dump on sd3b
iwm0: hw rev 0x230, fw ver 22.361476.0, address e4:70:b8:0f:41:aa
iwm0: unhandled firmware response 0xff/0xb8000010 rx ring 0[35]
iwm0: unhandled firmware response 0xff/0xb8000010 rx ring 0[23]
iwm0: unhandled firmware response 0xff/0xb8000010 rx ring 7[212]

usbdevs:
Controller /dev/usb0:
addr 1: super speed, self powered, config 1, xHCI root hub(0x0000), Intel(0x8086), rev 1.00
 port 1 disabled
 port 2 disabled
 port 3 disabled
 port 4 disabled
 port 5 disabled
 port 6 disabled
 port 7 disabled
 port 8 addr 2: high speed, power 500 mA, config 1, Integrated Camera(0xb5ab), SunplusIT Inc(0x04f2), rev 0.10
 port 9 disabled
 port 10 disabled
 port 11 disabled
 port 12 disabled
 port 13 disabled
 port 14 disabled
 port 15 disabled
 port 16 disabled

Reply | Threaded
Open this post in threaded view
|

Re: VMD consumes 100% cpu after unpausing guest

Pratik Vyas
* Dave Voutila <[hidden email]> [2018-02-22 23:40:21 -0500]:

>>Synopsis: VMD consumes 100% cpu after unpausing guest
>>Category: amd64
>>Environment:
> System      : OpenBSD 6.2
> Details     : OpenBSD 6.2-current (GENERIC.MP) #10: Wed Feb 21 21:26:27 MST 2018
> [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>
> Architecture: OpenBSD.amd64
> Machine     : amd64
>
>>Description:
>
>        Not sure if this is a known issue, but I couldn't find anything
>searching the lists.
>
>Using an Alpine Linux guest vm, I can successfully pause the guest using
>`vmctl pause 1` and some time later resume it using `vmctl unpause 1`.
>
>Unpausing works as the guest comes back to life, I can SSH back in, and
>it's fine. However, on the host the vmd process representing that guest
>sits at 100% CPU utilization with 1 thread constantly queueing onto a
>cpu and running. The guest reports normal load so it must be one of the
>2 threads.
>
Thanks Dave for the report. I can reproduce this with a receive as well.
Probably mc146818_start doesn't do the right thing. Will report back
when I find a solution.

--
Pratik

Reply | Threaded
Open this post in threaded view
|

Re: VMD consumes 100% cpu after unpausing guest

Pratik Vyas
In reply to this post by Dave Voutila-2
* Dave Voutila <[hidden email]> [2018-02-22 23:40:21 -0500]:

>>Synopsis: VMD consumes 100% cpu after unpausing guest
>>Category: amd64
>>Environment:
> System      : OpenBSD 6.2
> Details     : OpenBSD 6.2-current (GENERIC.MP) #10: Wed Feb 21 21:26:27 MST 2018
> [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>
> Architecture: OpenBSD.amd64
> Machine     : amd64
>
>>Description:
>
>        Not sure if this is a known issue, but I couldn't find anything
>searching the lists.
>
>Using an Alpine Linux guest vm, I can successfully pause the guest using
>`vmctl pause 1` and some time later resume it using `vmctl unpause 1`.
>
>Unpausing works as the guest comes back to life, I can SSH back in, and
>it's fine. However, on the host the vmd process representing that guest
>sits at 100% CPU utilization with 1 thread constantly queueing onto a
>cpu and running. The guest reports normal load so it must be one of the
>2 threads.

This should fix it.

Use rtc_reschedule_per in mc146818_start instead of re arming the
periodic interrupt without checking if it's enabled in REGB.

ok?

--
Pratik

Index: usr.sbin/vmd/mc146818.c
===================================================================
RCS file: /home/pdvyas/cvs/src/usr.sbin/vmd/mc146818.c,v
retrieving revision 1.15
diff -u -p -a -u -r1.15 mc146818.c
--- usr.sbin/vmd/mc146818.c 9 Jul 2017 00:51:40 -0000 1.15
+++ usr.sbin/vmd/mc146818.c 27 Feb 2018 02:47:18 -0000
@@ -354,6 +354,6 @@ mc146818_stop()
 void
 mc146818_start()
 {
- evtimer_add(&rtc.per, &rtc.per_tv);
  evtimer_add(&rtc.sec, &rtc.sec_tv);
+ rtc_reschedule_per();
 }

Reply | Threaded
Open this post in threaded view
|

Re: VMD consumes 100% cpu after unpausing guest

phessler
On 2018 Feb 26 (Mon) at 18:52:34 -0800 (-0800), Pratik Vyas wrote:
:* Dave Voutila <[hidden email]> [2018-02-22 23:40:21 -0500]:
:
:> > Synopsis: VMD consumes 100% cpu after unpausing guest
:> > Category: amd64
:> > Environment:
:> System      : OpenBSD 6.2
:> Details     : OpenBSD 6.2-current (GENERIC.MP) #10: Wed Feb 21 21:26:27 MST 2018
:> [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
:>
:> Architecture: OpenBSD.amd64
:> Machine     : amd64
:>
:> > Description:
:>
:>        Not sure if this is a known issue, but I couldn't find anything
:> searching the lists.
:>
:> Using an Alpine Linux guest vm, I can successfully pause the guest using
:> `vmctl pause 1` and some time later resume it using `vmctl unpause 1`.
:>
:> Unpausing works as the guest comes back to life, I can SSH back in, and
:> it's fine. However, on the host the vmd process representing that guest
:> sits at 100% CPU utilization with 1 thread constantly queueing onto a
:> cpu and running. The guest reports normal load so it must be one of the
:> 2 threads.
:
:This should fix it.
:
:Use rtc_reschedule_per in mc146818_start instead of re arming the
:periodic interrupt without checking if it's enabled in REGB.
:
:ok?
:
:--
:Pratik
:
:Index: usr.sbin/vmd/mc146818.c
:===================================================================
:RCS file: /home/pdvyas/cvs/src/usr.sbin/vmd/mc146818.c,v
:retrieving revision 1.15
:diff -u -p -a -u -r1.15 mc146818.c
:--- usr.sbin/vmd/mc146818.c 9 Jul 2017 00:51:40 -0000 1.15
:+++ usr.sbin/vmd/mc146818.c 27 Feb 2018 02:47:18 -0000
:@@ -354,6 +354,6 @@ mc146818_stop()
:void
:mc146818_start()
:{
:- evtimer_add(&rtc.per, &rtc.per_tv);
: evtimer_add(&rtc.sec, &rtc.sec_tv);
:+ rtc_reschedule_per();
:}
:

This helps a lot with the CPU load on a vmd host.  Drops my single guest
from ~50% CPU to ~9% CPU on the host.

OK

(btw, should rtc_fireper() receive a similar change?)


--
The right half of the brain controls the left half of the body.  This
means that only left handed people are in their right mind.

Reply | Threaded
Open this post in threaded view
|

Re: VMD consumes 100% cpu after unpausing guest

Dave Voutila-2
Peter Hessler <[hidden email]> writes:

> On 2018 Feb 26 (Mon) at 18:52:34 -0800 (-0800), Pratik Vyas wrote:
> :* Dave Voutila <[hidden email]> [2018-02-22 23:40:21 -0500]:
> :
> :> > Synopsis: VMD consumes 100% cpu after unpausing guest
> :> > Category: amd64
> :> > Environment:
> :> System      : OpenBSD 6.2
> :> Details     : OpenBSD 6.2-current (GENERIC.MP) #10: Wed Feb 21 21:26:27 MST 2018
> :> [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> :>
> :> Architecture: OpenBSD.amd64
> :> Machine     : amd64
> :>
> :> > Description:
> :>
> :>        Not sure if this is a known issue, but I couldn't find anything
> :> searching the lists.
> :>
> :> Using an Alpine Linux guest vm, I can successfully pause the guest using
> :> `vmctl pause 1` and some time later resume it using `vmctl unpause 1`.
> :>
> :> Unpausing works as the guest comes back to life, I can SSH back in, and
> :> it's fine. However, on the host the vmd process representing that guest
> :> sits at 100% CPU utilization with 1 thread constantly queueing onto a
> :> cpu and running. The guest reports normal load so it must be one of the
> :> 2 threads.
> :
> :This should fix it.
> :
> :Use rtc_reschedule_per in mc146818_start instead of re arming the
> :periodic interrupt without checking if it's enabled in REGB.
> :
> :ok?
> :
> :--
> :Pratik
> :
> :Index: usr.sbin/vmd/mc146818.c
> :===================================================================
> :RCS file: /home/pdvyas/cvs/src/usr.sbin/vmd/mc146818.c,v
> :retrieving revision 1.15
> :diff -u -p -a -u -r1.15 mc146818.c
> :--- usr.sbin/vmd/mc146818.c 9 Jul 2017 00:51:40 -0000 1.15
> :+++ usr.sbin/vmd/mc146818.c 27 Feb 2018 02:47:18 -0000
> :@@ -354,6 +354,6 @@ mc146818_stop()
> :void
> :mc146818_start()
> :{
> :- evtimer_add(&rtc.per, &rtc.per_tv);
> : evtimer_add(&rtc.sec, &rtc.sec_tv);
> :+ rtc_reschedule_per();
> :}
> :
>
> This helps a lot with the CPU load on a vmd host.  Drops my single guest
> from ~50% CPU to ~9% CPU on the host.

I can confirm this patch resolves the issue I reported. I _think_ I'm
seeing a similar CPU load drop as well, but definitely have
paused/unpaused the guest multiple times without issues.


Reply | Threaded
Open this post in threaded view
|

Re: VMD consumes 100% cpu after unpausing guest

Pratik Vyas
* Dave Voutila <[hidden email]> [2018-02-27 21:29:25 -0500]:
>
>I can confirm this patch resolves the issue I reported. I _think_ I'm
>seeing a similar CPU load drop as well, but definitely have
>paused/unpaused the guest multiple times without issues.
>
>

Thanks Dave and Peter for testing. I will commit this.

I cannot explain a general decrease in CPU load because these lines are
in the code path only when you unpause or receive a vm.


* Peter Hessler <[hidden email]> [2018-02-27 11:16:52 +0100]:
>
>(btw, should rtc_fireper() receive a similar change?)
>
>

rtc_fireper is unrelated to the cause of this. rtc_reschedule_per will
do an event_add for rtc_fireper if required and rtc_fireper keeps on
doing an event_add for itself.

--
Pratik