pvclock(4)

classic Classic list List threaded Threaded
23 messages Options
12
Reply | Threaded
Open this post in threaded view
|

pvclock(4)

Reyk Floeter-2
Hi,

the attached diff is another attempt at implementing a pvclock(4)
guest driver.  This improves the clock on KVM and replaces the need
for using the VM-expensive acpihpet(4).

While pvclock(4) is available on KVM, Xen, Hyper-V (in a modified
form), it currently only attaches under KVM:

pvbus0 at mainbus0: KVM
pvclock0 at pvbus0

A few notes:

- The "invtsc" workaround to get a real TSC is not always possible, as
it breaks migration and a few other things.  If you're running in a
cloud, it is very unlikely that you'll get it enabled.

- pvclock suffered from the same problem as early TSC implementations:
it wasn't synced across CPUs, so the OS had to do it manually.  In
OpenBSD, we never supported that and we decided to only support the
newer "invariant" TSCs that started to show up around SkyLake.  And I
think that's why the previous pvlock implementations for OpenBSD
didn't take off.

- So my diff does the same for pvclock what we did for TSC: only
support the "stable" pvclock that guarantees to provide a synchronized
clock by either doing the synchronization on the host side or by
providing it based on an invariant TSC on the host itself.  So this
diff might not work on older KVM versions or CPUs; I'd be happy to see
some test results.

- I did not copy+paste from the Linux and/or FreeBSD (they have
suprising similarities but different licenses), and I wasn't convinced
that it is worth copying the asm code that they use instead of the
following multiplication:
        ctr = ((delta * mul_frac) >> 32) + system_time;
Let me know if I'm wrong and if this code provides significant benefits:
https://github.com/freebsd/freebsd/blob/1d6e4247415d264485ee94b59fdbc12e0c566fd0/sys/x86/x86/pvclock.c#L90-L124

- This driver only implements the "system time" clock, and not the
static "wall clock".  I didn't see a point in supporting the wall
clock as its timestamp is only ever updated when you cold-start the
VM.  The wall clock struct is still defined in the driver for
reference but could probably be removed.

- I assigned a timecounter priority of 1500 to be in the middle
between acpihpet(1000) and tsc(2000).  If TSC is available, we should
probably use it instead; even if pvclock is theoretically better as it
is TSC-based and optimized for VMs.

- This is only a guest driver, not a driver for vmm(4)/vmd(8).  But it
could solve some issues if somebody would implement it for vmm(4)...

-  I'm running it on a busy build machine on Exoscale.ch for at least
a week now.  This machine runs stuff that utilizes all CPUs.  With
HPET, we've sometimes experienced livelocks but the pvclock(4) driver
made it much snappier.

$ time sleep 2
    0m02.00s real     0m00.00s user     0m00.01s system
$ cat /var/db/ntpd.drift                                                  
-4.679
$ sysctl kern.timecounter                                                
kern.timecounter.tick=1
kern.timecounter.timestepwarnings=1
kern.timecounter.hardware=pvclock0
kern.timecounter.choice=i8254(0) pvclock0(1500) acpihpet0(1000) acpitimer0(1000) dummy(-1000000)
$ sysctl hw.model hw.ncpuonline hw.vendor hw.product hw.version
hw.model=Intel Xeon Processor (Skylake)
hw.ncpuonline=8
hw.vendor=Apache Software Foundation
hw.product=CloudStack KVM Hypervisor
hw.version=pc-i440fx-2.11

Feedback? Tests? OKs?

Reyk

Index: share/man/man4/pvclock.4
===================================================================
RCS file: share/man/man4/pvclock.4
diff -N share/man/man4/pvclock.4
--- /dev/null 1 Jan 1970 00:00:00 -0000
+++ share/man/man4/pvclock.4 19 Nov 2018 11:48:33 -0000
@@ -0,0 +1,45 @@
+.\" $OpenBSD$
+.\"
+.\" Copyright (c) 2018 Reyk Floeter <[hidden email]>
+.\"
+.\" Permission to use, copy, modify, and distribute this software for any
+.\" purpose with or without fee is hereby granted, provided that the above
+.\" copyright notice and this permission notice appear in all copies.
+.\"
+.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+.\"
+.Dd $Mdocdate$
+.Dt PVCLOCK 4
+.Os
+.Sh NAME
+.Nm pvclock
+.Nd paravirtual clock driver
+.Sh SYNOPSIS
+.Cd "pvclock* at pvbus?
+.Sh DESCRIPTION
+The
+.Nm
+driver supports the paravirtual clock that is available in KVM and
+other hypervisors.
+.Nm
+uses a shared page between the host and the hypervisor to synchronize
+the TSC clock in an efficient way.
+.Sh SEE ALSO
+.Xr pvbus 4
+.Sh HISTORY
+The
+.Nm
+driver first appeared in
+.Ox 6.5 .
+.Sh AUTHORS
+.An -nosplit
+The
+.Nm
+driver was written by
+.An Reyk Floeter Aq Mt [hidden email] .
Index: sys/arch/amd64/conf/GENERIC
===================================================================
RCS file: /cvs/src/sys/arch/amd64/conf/GENERIC,v
retrieving revision 1.464
diff -u -p -u -p -r1.464 GENERIC
--- sys/arch/amd64/conf/GENERIC 26 Oct 2018 20:26:19 -0000 1.464
+++ sys/arch/amd64/conf/GENERIC 19 Nov 2018 11:48:33 -0000
@@ -79,6 +79,8 @@ ipmi0 at mainbus? disable # IPMI
 
 vmt0 at pvbus? # VMware Tools
 
+pvclock0 at pvbus? # KVM pvclock
+
 xen0 at pvbus? # Xen HVM domU
 xnf* at xen? # Xen Netfront
 xbf* at xen? # Xen Blkfront
Index: sys/dev/pv/files.pv
===================================================================
RCS file: /cvs/src/sys/dev/pv/files.pv,v
retrieving revision 1.14
diff -u -p -u -p -r1.14 files.pv
--- sys/dev/pv/files.pv 24 Aug 2018 16:07:01 -0000 1.14
+++ sys/dev/pv/files.pv 19 Nov 2018 11:48:33 -0000
@@ -8,6 +8,11 @@ device pvbus
 attach pvbus at mainbus
 file dev/pv/pvbus.c pvbus needs-flag
 
+# KVM clock
+device pvclock
+attach pvclock at pvbus
+file dev/pv/pvclock.c pvclock needs-flag
+
 # VMware Tools
 device vmt
 attach vmt at pvbus
Index: sys/dev/pv/pvclock.c
===================================================================
RCS file: sys/dev/pv/pvclock.c
diff -N sys/dev/pv/pvclock.c
--- /dev/null 1 Jan 1970 00:00:00 -0000
+++ sys/dev/pv/pvclock.c 19 Nov 2018 11:48:33 -0000
@@ -0,0 +1,229 @@
+/* $OpenBSD$ */
+
+/*
+ * Copyright (c) 2018 Reyk Floeter <[hidden email]>
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#if !defined(__i386__) && !defined(__amd64__)
+#error pvclock(4) is only supported on i386 and amd64
+#endif
+
+#include <sys/param.h>
+#include <sys/systm.h>
+#include <sys/kernel.h>
+#include <sys/timetc.h>
+#include <sys/timeout.h>
+#include <sys/malloc.h>
+#include <sys/atomic.h>
+
+#include <machine/cpu.h>
+#include <uvm/uvm_extern.h>
+
+#include <dev/pv/pvvar.h>
+#include <dev/pv/pvreg.h>
+
+struct pvclock_softc {
+ struct device sc_dev;
+ void *sc_time;
+ paddr_t sc_paddr;
+ struct timecounter *sc_tc;
+};
+
+struct pvclock_wall_clock {
+ uint32_t wc_version;
+ uint32_t wc_sec;
+ uint32_t wc_nsec;
+} __packed;
+
+struct pvclock_time_info {
+ uint32_t ti_version;
+ uint32_t ti_pad0;
+ uint64_t ti_tsc_timestamp;
+ uint64_t ti_system_time;
+ uint32_t ti_tsc_to_system_mul;
+ int8_t ti_tsc_shift;
+ uint8_t ti_flags;
+ uint8_t ti_pad[2];
+} __packed;
+
+#define PVCLOCK_FLAG_TSC_STABLE 0x01
+#define PVCLOCK_SYSTEM_TIME_ENABLE 0x01
+#define DEVNAME(_s) ((_s)->sc_dev.dv_xname)
+
+int pvclock_match(struct device *, void *, void *);
+void pvclock_attach(struct device *, struct device *, void *);
+int pvclock_activate(struct device *, int);
+
+uint pvclock_get_timecount(struct timecounter *);
+void pvclock_read_time_info(struct pvclock_softc *,
+    struct pvclock_time_info *);
+
+struct cfattach pvclock_ca = {
+ sizeof(struct pvclock_softc),
+ pvclock_match,
+ pvclock_attach,
+ NULL,
+ pvclock_activate
+};
+
+struct cfdriver pvclock_cd = {
+ NULL,
+ "pvclock",
+ DV_DULL
+};
+
+struct timecounter pvclock_timecounter = {
+ pvclock_get_timecount, NULL, ~0u, 0, NULL, -2000, NULL
+};
+
+int
+pvclock_match(struct device *parent, void *match, void *aux)
+{
+ struct pv_attach_args *pva = aux;
+ struct pvbus_hv *hv;
+
+ /*
+ * pvclock is provided by different hypervisors, we currently
+ * only support the "kvmclock".
+ */
+ hv = &pva->pva_hv[PVBUS_KVM];
+ if (hv->hv_base != 0) {
+ /*
+ * We only implement support for the 2nd version of pvclock.
+ * The first version is basically the same but with different
+ * non-standard MSRs and it is deprecated.
+ */
+ if ((hv->hv_features & (1 << KVM_FEATURE_CLOCKSOURCE2)) == 0)
+ return (0);
+
+ /*
+ * Only the "stable" clock with a sync'ed TSC is supported.
+ * In this case the host guarantees that the TSC is constant
+ * and invariant, either by the underlying TSC or by passing
+ * on a synchronized value.
+ */
+ if ((hv->hv_features &
+    (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT)) == 0)
+ return (0);
+ }
+
+ return (1);
+}
+
+void
+pvclock_attach(struct device *parent, struct device *self, void *aux)
+{
+ struct pvclock_softc *sc = (struct pvclock_softc *)self;
+ paddr_t pa;
+
+ if ((sc->sc_time = km_alloc(PAGE_SIZE,
+    &kv_any, &kp_zero, &kd_nowait)) == NULL) {
+ printf(": time page allocation failed\n");
+ return;
+ }
+ if (!pmap_extract(pmap_kernel(), (vaddr_t)sc->sc_time, &pa)) {
+ printf(": time page PA extraction failed\n");
+ km_free(sc->sc_time, PAGE_SIZE, &kv_any, &kp_zero);
+ return;
+ }
+
+ wrmsr(KVM_MSR_SYSTEM_TIME, pa | PVCLOCK_SYSTEM_TIME_ENABLE);
+ sc->sc_paddr = pa;
+
+ sc->sc_tc = &pvclock_timecounter;
+ sc->sc_tc->tc_name = DEVNAME(sc);
+ sc->sc_tc->tc_frequency = 1000000000ULL;
+ sc->sc_tc->tc_priv = sc;
+
+ /* Better than HPET but below TSC */
+ sc->sc_tc->tc_quality = 1500;
+
+ tc_init(sc->sc_tc);
+
+ printf("\n");
+}
+
+int
+pvclock_activate(struct device *self, int act)
+{
+ struct pvclock_softc *sc = (struct pvclock_softc *)self;
+ int rv = 0;
+ paddr_t pa = sc->sc_paddr;
+
+ switch (act) {
+ case DVACT_POWERDOWN:
+ wrmsr(KVM_MSR_SYSTEM_TIME, pa & ~PVCLOCK_SYSTEM_TIME_ENABLE);
+ break;
+ case DVACT_RESUME:
+ wrmsr(KVM_MSR_SYSTEM_TIME, pa | PVCLOCK_SYSTEM_TIME_ENABLE);
+ break;
+ }
+
+ return (rv);
+}
+
+static inline uint32_t
+pvclock_read_begin(const struct pvclock_time_info *ti)
+{
+ uint32_t version = ti->ti_version & ~0x1;
+ virtio_membar_sync();
+ return (version);
+}
+
+static inline int
+pvclock_read_done(const struct pvclock_time_info *ti,
+    uint32_t version)
+{
+ virtio_membar_sync();
+ return (ti->ti_version == version);
+}
+
+uint
+pvclock_get_timecount(struct timecounter *tc)
+{
+ struct pvclock_softc *sc = tc->tc_priv;
+ struct pvclock_time_info *ti;
+ uint64_t tsc_timestamp, system_time, delta, ctr;
+ uint32_t version, mul_frac;
+ int8_t shift;
+ uint8_t flags;
+
+ ti = sc->sc_time;
+ do {
+ version = pvclock_read_begin(ti);
+ system_time = ti->ti_system_time;
+ tsc_timestamp = ti->ti_tsc_timestamp;
+ mul_frac = ti->ti_tsc_to_system_mul;
+ shift = ti->ti_tsc_shift;
+ flags = ti->ti_flags;
+ } while (!pvclock_read_done(ti, version));
+
+ /* This bit must be set as we attached based on the stable flag */
+ if ((flags & PVCLOCK_FLAG_TSC_STABLE) == 0)
+ panic("%s: unstable result on stable clock", DEVNAME(sc));
+
+ /*
+ * The algorithm is described in
+ * linux/Documentation/virtual/kvm/msr.txt
+ */
+ delta = rdtsc() - tsc_timestamp;
+ if (shift < 0)
+ delta >>= -shift;
+ else
+ delta <<= shift;
+ ctr = ((delta * mul_frac) >> 32) + system_time;
+
+ return (ctr);
+}
Index: sys/dev/pv/pvreg.h
===================================================================
RCS file: /cvs/src/sys/dev/pv/pvreg.h,v
retrieving revision 1.4
diff -u -p -u -p -r1.4 pvreg.h
--- sys/dev/pv/pvreg.h 12 Dec 2015 12:33:49 -0000 1.4
+++ sys/dev/pv/pvreg.h 19 Nov 2018 11:48:33 -0000
@@ -43,6 +43,9 @@
 #define KVM_MSR_EOI_EN 0x4b564d04
 #define KVM_PV_EOI_BIT 0
 
+#define KVM_MSR_WALL_CLOCK 0x4b564d00
+#define KVM_MSR_SYSTEM_TIME 0x4b564d01
+
 /*
  * Hyper-V
  */

Reply | Threaded
Open this post in threaded view
|

Re: pvclock(4)

Sebastian Benoit-3
Reyk Floeter([hidden email]) on 2018.11.19 13:12:46 +0100:
> Feedback? Tests? OKs?

test on my incarnation of kvm:

OpenBSD 6.4-current (GENERIC.MP) #0: Mon Nov 19 18:32:29 CET 2018
    [hidden email]:/sys/arch/amd64/compile/GENERIC.MP
real mem = 2097004544 (1999MB)
avail mem = 2024218624 (1930MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0x7dfffd30 (14 entries)
bios0: vendor SeaBIOS version "1.10.2-1" date 04/01/2014
bios0: QEMU Standard PC (i440FX + PIIX, 1996)
acpi0 at bios0: rev 0
acpi0: sleep states S3 S4 S5
acpi0: tables DSDT FACP APIC HPET
acpi0: wakeup devices
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Westmere E56xx/L56xx/X56xx (Nehalem-C), 2533.93 MHz, 06-2c-01
cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,CX16,SSE4.1,SSE4.2,x2APIC,POPCNT,AES,HV,NXE,LONG,LAHF,ARAT,MELTDOWN
cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache
cpu0: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 999MHz
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Westmere E56xx/L56xx/X56xx (Nehalem-C), 2533.45 MHz, 06-2c-01
cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,CX16,SSE4.1,SSE4.2,x2APIC,POPCNT,AES,HV,NXE,LONG,LAHF,ARAT,MELTDOWN
cpu1: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache
cpu1: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu1: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu1: smt 0, core 0, package 1
cpu2 at mainbus0: apid 2 (application processor)
cpu2: Westmere E56xx/L56xx/X56xx (Nehalem-C), 2533.47 MHz, 06-2c-01
cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,CX16,SSE4.1,SSE4.2,x2APIC,POPCNT,AES,HV,NXE,LONG,LAHF,ARAT,MELTDOWN
cpu2: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache
cpu2: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu2: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu2: smt 0, core 0, package 2
cpu3 at mainbus0: apid 3 (application processor)
cpu3: Westmere E56xx/L56xx/X56xx (Nehalem-C), 2533.47 MHz, 06-2c-01
cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,CX16,SSE4.1,SSE4.2,x2APIC,POPCNT,AES,HV,NXE,LONG,LAHF,ARAT,MELTDOWN
cpu3: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache
cpu3: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu3: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu3: smt 0, core 0, package 3
cpu4 at mainbus0: apid 4 (application processor)
cpu4: Westmere E56xx/L56xx/X56xx (Nehalem-C), 2533.46 MHz, 06-2c-01
cpu4: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,CX16,SSE4.1,SSE4.2,x2APIC,POPCNT,AES,HV,NXE,LONG,LAHF,ARAT,MELTDOWN
cpu4: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache
cpu4: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu4: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu4: smt 0, core 0, package 4
cpu5 at mainbus0: apid 5 (application processor)
cpu5: Westmere E56xx/L56xx/X56xx (Nehalem-C), 2533.45 MHz, 06-2c-01
cpu5: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,CX16,SSE4.1,SSE4.2,x2APIC,POPCNT,AES,HV,NXE,LONG,LAHF,ARAT,MELTDOWN
cpu5: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache
cpu5: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu5: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu5: smt 0, core 0, package 5
ioapic0 at mainbus0: apid 0 pa 0xfec00000, version 11, 24 pins
acpihpet0 at acpi0: 100000000 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpicpu0 at acpi0: C1(@1 halt!)
acpicpu1 at acpi0: C1(@1 halt!)
acpicpu2 at acpi0: C1(@1 halt!)
acpicpu3 at acpi0: C1(@1 halt!)
acpicpu4 at acpi0: C1(@1 halt!)
acpicpu5 at acpi0: C1(@1 halt!)
"ACPI0006" at acpi0 not configured
acpipci0 at acpi0 PCI0: _OSC failed
acpicmos0 at acpi0
"PNP0A06" at acpi0 not configured
"PNP0A06" at acpi0 not configured
"PNP0A06" at acpi0 not configured
"QEMU0002" at acpi0 not configured
"ACPI0010" at acpi0 not configured
pvbus0 at mainbus0: KVM
pvclock0 at pvbus0
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel 82441FX" rev 0x02
pcib0 at pci0 dev 1 function 0 "Intel 82371SB ISA" rev 0x00
pciide0 at pci0 dev 1 function 1 "Intel 82371SB IDE" rev 0x00: DMA, channel 0 wired to compatibility, channel 1 wired to compatibility
pciide0: channel 0 disabled (no drives)
atapiscsi0 at pciide0 channel 1 drive 0
scsibus1 at atapiscsi0: 2 targets
cd0 at scsibus1 targ 0 lun 0: <QEMU, QEMU DVD-ROM, 2.5+> ATAPI 5/cdrom removable
cd0(pciide0:1:0): using PIO mode 4, DMA mode 2
uhci0 at pci0 dev 1 function 2 "Intel 82371SB USB" rev 0x01: apic 0 int 11
piixpm0 at pci0 dev 1 function 3 "Intel 82371AB Power" rev 0x03: apic 0 int 9
iic0 at piixpm0
vga1 at pci0 dev 2 function 0 "Bochs VGA" rev 0x02
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
virtio0 at pci0 dev 3 function 0 "Qumranet Virtio Memory" rev 0x00
viomb0 at virtio0
virtio0: apic 0 int 11
virtio1 at pci0 dev 4 function 0 "Qumranet Virtio Storage" rev 0x00
vioblk0 at virtio1
scsibus2 at vioblk0: 2 targets
sd0 at scsibus2 targ 0 lun 0: <VirtIO, Block Device, > SCSI3 0/direct fixed
sd0: 61440MB, 512 bytes/sector, 125829120 sectors
virtio1: msix shared
virtio2 at pci0 dev 5 function 0 "Qumranet Virtio Network" rev 0x00
vio0 at virtio2: address 52:54:00:00:03:92
virtio2: msix shared
isa0 at pcib0
isadma0 at isa0
fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
com0: console
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pms0 at pckbc0 (aux slot)
wsmouse0 at pms0 mux 0
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
lpt0 at isa0 port 0x378/4 irq 7
usb0 at uhci0: USB revision 1.0
uhub0 at usb0 configuration 1 interface 0 "Intel UHCI root hub" rev 1.00/1.00 addr 1
uhidev0 at uhub0 port 1 configuration 1 interface 0 "QEMU QEMU USB Tablet" rev 2.00/0.00 addr 2
uhidev0: iclass 3/0
ums0 at uhidev0: 3 buttons, Z dir
wsmouse1 at ums0 mux 0
vscsi0 at root
scsibus3 at vscsi0: 256 targets
softraid0 at root
scsibus4 at softraid0: 256 targets
root on sd0a (e51b442ddbf210ff.a) swap on sd0b dump on sd0b
fd0 at fdc0 drive 1: density unknown


kern.timecounter.tick=1
kern.timecounter.timestepwarnings=0
kern.timecounter.hardware=pvclock0
kern.timecounter.choice=i8254(0) pvclock0(1500) acpihpet0(1000) acpitimer0(1000) dummy(-1000000)

$ time sleep 2
    0m02.00s real     0m00.00s user     0m00.00s system

I'll keep an eye on ntpd.

Reply | Threaded
Open this post in threaded view
|

Re: pvclock(4)

Landry Breuil-5
In reply to this post by Reyk Floeter-2
On Mon, Nov 19, 2018 at 01:12:46PM +0100, Reyk Floeter wrote:

> Hi,
>
> the attached diff is another attempt at implementing a pvclock(4)
> guest driver.  This improves the clock on KVM and replaces the need
> for using the VM-expensive acpihpet(4).
>
> While pvclock(4) is available on KVM, Xen, Hyper-V (in a modified
> form), it currently only attaches under KVM:
>
> pvbus0 at mainbus0: KVM
> pvclock0 at pvbus0
Works fine on my proxmox 5.2 / KVM 2.11 dpb build cluster emulating 6x
"QEMU Standard PC (Q35 + ICH9, 2009)". Before i had to restart ntpd
every 5mn on each hosts, time would drift like crazy under build load.
With pvclock it seems the slave clocks are stable and ntpd stays synced.

kern.timecounter.tick=1
kern.timecounter.timestepwarnings=0
kern.timecounter.hardware=pvclock0
kern.timecounter.choice=i8254(0) pvclock0(1500) acpihpet0(1000) acpitimer0(1000) dummy(-1000000)

Landry

dmesg.boot (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pvclock(4)

Reyk Floeter-2
In reply to this post by Reyk Floeter-2
On Mon, Nov 19, 2018 at 01:12:46PM +0100, Reyk Floeter wrote:
> the attached diff is another attempt at implementing a pvclock(4)
> guest driver.  This improves the clock on KVM and replaces the need
> for using the VM-expensive acpihpet(4).
>

So far I only got positive reports.  Where are the problems? ;)

Otherwise: OK?

Reyk

> Index: share/man/man4/pvclock.4
> ===================================================================
> RCS file: share/man/man4/pvclock.4
> diff -N share/man/man4/pvclock.4
> --- /dev/null 1 Jan 1970 00:00:00 -0000
> +++ share/man/man4/pvclock.4 19 Nov 2018 11:48:33 -0000
> @@ -0,0 +1,45 @@
> +.\" $OpenBSD$
> +.\"
> +.\" Copyright (c) 2018 Reyk Floeter <[hidden email]>
> +.\"
> +.\" Permission to use, copy, modify, and distribute this software for any
> +.\" purpose with or without fee is hereby granted, provided that the above
> +.\" copyright notice and this permission notice appear in all copies.
> +.\"
> +.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
> +.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
> +.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
> +.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
> +.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
> +.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
> +.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
> +.\"
> +.Dd $Mdocdate$
> +.Dt PVCLOCK 4
> +.Os
> +.Sh NAME
> +.Nm pvclock
> +.Nd paravirtual clock driver
> +.Sh SYNOPSIS
> +.Cd "pvclock* at pvbus?
> +.Sh DESCRIPTION
> +The
> +.Nm
> +driver supports the paravirtual clock that is available in KVM and
> +other hypervisors.
> +.Nm
> +uses a shared page between the host and the hypervisor to synchronize
> +the TSC clock in an efficient way.
> +.Sh SEE ALSO
> +.Xr pvbus 4
> +.Sh HISTORY
> +The
> +.Nm
> +driver first appeared in
> +.Ox 6.5 .
> +.Sh AUTHORS
> +.An -nosplit
> +The
> +.Nm
> +driver was written by
> +.An Reyk Floeter Aq Mt [hidden email] .
> Index: sys/arch/amd64/conf/GENERIC
> ===================================================================
> RCS file: /cvs/src/sys/arch/amd64/conf/GENERIC,v
> retrieving revision 1.464
> diff -u -p -u -p -r1.464 GENERIC
> --- sys/arch/amd64/conf/GENERIC 26 Oct 2018 20:26:19 -0000 1.464
> +++ sys/arch/amd64/conf/GENERIC 19 Nov 2018 11:48:33 -0000
> @@ -79,6 +79,8 @@ ipmi0 at mainbus? disable # IPMI
>  
>  vmt0 at pvbus? # VMware Tools
>  
> +pvclock0 at pvbus? # KVM pvclock
> +
>  xen0 at pvbus? # Xen HVM domU
>  xnf* at xen? # Xen Netfront
>  xbf* at xen? # Xen Blkfront
> Index: sys/dev/pv/files.pv
> ===================================================================
> RCS file: /cvs/src/sys/dev/pv/files.pv,v
> retrieving revision 1.14
> diff -u -p -u -p -r1.14 files.pv
> --- sys/dev/pv/files.pv 24 Aug 2018 16:07:01 -0000 1.14
> +++ sys/dev/pv/files.pv 19 Nov 2018 11:48:33 -0000
> @@ -8,6 +8,11 @@ device pvbus
>  attach pvbus at mainbus
>  file dev/pv/pvbus.c pvbus needs-flag
>  
> +# KVM clock
> +device pvclock
> +attach pvclock at pvbus
> +file dev/pv/pvclock.c pvclock needs-flag
> +
>  # VMware Tools
>  device vmt
>  attach vmt at pvbus
> Index: sys/dev/pv/pvclock.c
> ===================================================================
> RCS file: sys/dev/pv/pvclock.c
> diff -N sys/dev/pv/pvclock.c
> --- /dev/null 1 Jan 1970 00:00:00 -0000
> +++ sys/dev/pv/pvclock.c 19 Nov 2018 11:48:33 -0000
> @@ -0,0 +1,229 @@
> +/* $OpenBSD$ */
> +
> +/*
> + * Copyright (c) 2018 Reyk Floeter <[hidden email]>
> + *
> + * Permission to use, copy, modify, and distribute this software for any
> + * purpose with or without fee is hereby granted, provided that the above
> + * copyright notice and this permission notice appear in all copies.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
> + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
> + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
> + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
> + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
> + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
> + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
> + */
> +
> +#if !defined(__i386__) && !defined(__amd64__)
> +#error pvclock(4) is only supported on i386 and amd64
> +#endif
> +
> +#include <sys/param.h>
> +#include <sys/systm.h>
> +#include <sys/kernel.h>
> +#include <sys/timetc.h>
> +#include <sys/timeout.h>
> +#include <sys/malloc.h>
> +#include <sys/atomic.h>
> +
> +#include <machine/cpu.h>
> +#include <uvm/uvm_extern.h>
> +
> +#include <dev/pv/pvvar.h>
> +#include <dev/pv/pvreg.h>
> +
> +struct pvclock_softc {
> + struct device sc_dev;
> + void *sc_time;
> + paddr_t sc_paddr;
> + struct timecounter *sc_tc;
> +};
> +
> +struct pvclock_wall_clock {
> + uint32_t wc_version;
> + uint32_t wc_sec;
> + uint32_t wc_nsec;
> +} __packed;
> +
> +struct pvclock_time_info {
> + uint32_t ti_version;
> + uint32_t ti_pad0;
> + uint64_t ti_tsc_timestamp;
> + uint64_t ti_system_time;
> + uint32_t ti_tsc_to_system_mul;
> + int8_t ti_tsc_shift;
> + uint8_t ti_flags;
> + uint8_t ti_pad[2];
> +} __packed;
> +
> +#define PVCLOCK_FLAG_TSC_STABLE 0x01
> +#define PVCLOCK_SYSTEM_TIME_ENABLE 0x01
> +#define DEVNAME(_s) ((_s)->sc_dev.dv_xname)
> +
> +int pvclock_match(struct device *, void *, void *);
> +void pvclock_attach(struct device *, struct device *, void *);
> +int pvclock_activate(struct device *, int);
> +
> +uint pvclock_get_timecount(struct timecounter *);
> +void pvclock_read_time_info(struct pvclock_softc *,
> +    struct pvclock_time_info *);
> +
> +struct cfattach pvclock_ca = {
> + sizeof(struct pvclock_softc),
> + pvclock_match,
> + pvclock_attach,
> + NULL,
> + pvclock_activate
> +};
> +
> +struct cfdriver pvclock_cd = {
> + NULL,
> + "pvclock",
> + DV_DULL
> +};
> +
> +struct timecounter pvclock_timecounter = {
> + pvclock_get_timecount, NULL, ~0u, 0, NULL, -2000, NULL
> +};
> +
> +int
> +pvclock_match(struct device *parent, void *match, void *aux)
> +{
> + struct pv_attach_args *pva = aux;
> + struct pvbus_hv *hv;
> +
> + /*
> + * pvclock is provided by different hypervisors, we currently
> + * only support the "kvmclock".
> + */
> + hv = &pva->pva_hv[PVBUS_KVM];
> + if (hv->hv_base != 0) {
> + /*
> + * We only implement support for the 2nd version of pvclock.
> + * The first version is basically the same but with different
> + * non-standard MSRs and it is deprecated.
> + */
> + if ((hv->hv_features & (1 << KVM_FEATURE_CLOCKSOURCE2)) == 0)
> + return (0);
> +
> + /*
> + * Only the "stable" clock with a sync'ed TSC is supported.
> + * In this case the host guarantees that the TSC is constant
> + * and invariant, either by the underlying TSC or by passing
> + * on a synchronized value.
> + */
> + if ((hv->hv_features &
> +    (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT)) == 0)
> + return (0);
> + }
> +
> + return (1);
> +}
> +
> +void
> +pvclock_attach(struct device *parent, struct device *self, void *aux)
> +{
> + struct pvclock_softc *sc = (struct pvclock_softc *)self;
> + paddr_t pa;
> +
> + if ((sc->sc_time = km_alloc(PAGE_SIZE,
> +    &kv_any, &kp_zero, &kd_nowait)) == NULL) {
> + printf(": time page allocation failed\n");
> + return;
> + }
> + if (!pmap_extract(pmap_kernel(), (vaddr_t)sc->sc_time, &pa)) {
> + printf(": time page PA extraction failed\n");
> + km_free(sc->sc_time, PAGE_SIZE, &kv_any, &kp_zero);
> + return;
> + }
> +
> + wrmsr(KVM_MSR_SYSTEM_TIME, pa | PVCLOCK_SYSTEM_TIME_ENABLE);
> + sc->sc_paddr = pa;
> +
> + sc->sc_tc = &pvclock_timecounter;
> + sc->sc_tc->tc_name = DEVNAME(sc);
> + sc->sc_tc->tc_frequency = 1000000000ULL;
> + sc->sc_tc->tc_priv = sc;
> +
> + /* Better than HPET but below TSC */
> + sc->sc_tc->tc_quality = 1500;
> +
> + tc_init(sc->sc_tc);
> +
> + printf("\n");
> +}
> +
> +int
> +pvclock_activate(struct device *self, int act)
> +{
> + struct pvclock_softc *sc = (struct pvclock_softc *)self;
> + int rv = 0;
> + paddr_t pa = sc->sc_paddr;
> +
> + switch (act) {
> + case DVACT_POWERDOWN:
> + wrmsr(KVM_MSR_SYSTEM_TIME, pa & ~PVCLOCK_SYSTEM_TIME_ENABLE);
> + break;
> + case DVACT_RESUME:
> + wrmsr(KVM_MSR_SYSTEM_TIME, pa | PVCLOCK_SYSTEM_TIME_ENABLE);
> + break;
> + }
> +
> + return (rv);
> +}
> +
> +static inline uint32_t
> +pvclock_read_begin(const struct pvclock_time_info *ti)
> +{
> + uint32_t version = ti->ti_version & ~0x1;
> + virtio_membar_sync();
> + return (version);
> +}
> +
> +static inline int
> +pvclock_read_done(const struct pvclock_time_info *ti,
> +    uint32_t version)
> +{
> + virtio_membar_sync();
> + return (ti->ti_version == version);
> +}
> +
> +uint
> +pvclock_get_timecount(struct timecounter *tc)
> +{
> + struct pvclock_softc *sc = tc->tc_priv;
> + struct pvclock_time_info *ti;
> + uint64_t tsc_timestamp, system_time, delta, ctr;
> + uint32_t version, mul_frac;
> + int8_t shift;
> + uint8_t flags;
> +
> + ti = sc->sc_time;
> + do {
> + version = pvclock_read_begin(ti);
> + system_time = ti->ti_system_time;
> + tsc_timestamp = ti->ti_tsc_timestamp;
> + mul_frac = ti->ti_tsc_to_system_mul;
> + shift = ti->ti_tsc_shift;
> + flags = ti->ti_flags;
> + } while (!pvclock_read_done(ti, version));
> +
> + /* This bit must be set as we attached based on the stable flag */
> + if ((flags & PVCLOCK_FLAG_TSC_STABLE) == 0)
> + panic("%s: unstable result on stable clock", DEVNAME(sc));
> +
> + /*
> + * The algorithm is described in
> + * linux/Documentation/virtual/kvm/msr.txt
> + */
> + delta = rdtsc() - tsc_timestamp;
> + if (shift < 0)
> + delta >>= -shift;
> + else
> + delta <<= shift;
> + ctr = ((delta * mul_frac) >> 32) + system_time;
> +
> + return (ctr);
> +}
> Index: sys/dev/pv/pvreg.h
> ===================================================================
> RCS file: /cvs/src/sys/dev/pv/pvreg.h,v
> retrieving revision 1.4
> diff -u -p -u -p -r1.4 pvreg.h
> --- sys/dev/pv/pvreg.h 12 Dec 2015 12:33:49 -0000 1.4
> +++ sys/dev/pv/pvreg.h 19 Nov 2018 11:48:33 -0000
> @@ -43,6 +43,9 @@
>  #define KVM_MSR_EOI_EN 0x4b564d04
>  #define KVM_PV_EOI_BIT 0
>  
> +#define KVM_MSR_WALL_CLOCK 0x4b564d00
> +#define KVM_MSR_SYSTEM_TIME 0x4b564d01
> +
>  /*
>   * Hyper-V
>   */

Reply | Threaded
Open this post in threaded view
|

Re: pvclock(4)

Mike Larkin
On Thu, Nov 22, 2018 at 04:37:49PM +0100, Reyk Floeter wrote:

> On Mon, Nov 19, 2018 at 01:12:46PM +0100, Reyk Floeter wrote:
> > the attached diff is another attempt at implementing a pvclock(4)
> > guest driver.  This improves the clock on KVM and replaces the need
> > for using the VM-expensive acpihpet(4).
> >
>
> So far I only got positive reports.  Where are the problems? ;)
>
> Otherwise: OK?
>
> Reyk
>

Reads ok. One question - you mention in pvclock.c that this is supported
on i386 and amd64 but I only see GENERIC changes for amd64?

ok mlarkin in any case, but I'd either add GENERIC changes for i386 or
make this for sure amd64 only.

-ml

> > Index: share/man/man4/pvclock.4
> > ===================================================================
> > RCS file: share/man/man4/pvclock.4
> > diff -N share/man/man4/pvclock.4
> > --- /dev/null 1 Jan 1970 00:00:00 -0000
> > +++ share/man/man4/pvclock.4 19 Nov 2018 11:48:33 -0000
> > @@ -0,0 +1,45 @@
> > +.\" $OpenBSD$
> > +.\"
> > +.\" Copyright (c) 2018 Reyk Floeter <[hidden email]>
> > +.\"
> > +.\" Permission to use, copy, modify, and distribute this software for any
> > +.\" purpose with or without fee is hereby granted, provided that the above
> > +.\" copyright notice and this permission notice appear in all copies.
> > +.\"
> > +.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
> > +.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
> > +.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
> > +.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
> > +.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
> > +.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
> > +.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
> > +.\"
> > +.Dd $Mdocdate$
> > +.Dt PVCLOCK 4
> > +.Os
> > +.Sh NAME
> > +.Nm pvclock
> > +.Nd paravirtual clock driver
> > +.Sh SYNOPSIS
> > +.Cd "pvclock* at pvbus?
> > +.Sh DESCRIPTION
> > +The
> > +.Nm
> > +driver supports the paravirtual clock that is available in KVM and
> > +other hypervisors.
> > +.Nm
> > +uses a shared page between the host and the hypervisor to synchronize
> > +the TSC clock in an efficient way.
> > +.Sh SEE ALSO
> > +.Xr pvbus 4
> > +.Sh HISTORY
> > +The
> > +.Nm
> > +driver first appeared in
> > +.Ox 6.5 .
> > +.Sh AUTHORS
> > +.An -nosplit
> > +The
> > +.Nm
> > +driver was written by
> > +.An Reyk Floeter Aq Mt [hidden email] .
> > Index: sys/arch/amd64/conf/GENERIC
> > ===================================================================
> > RCS file: /cvs/src/sys/arch/amd64/conf/GENERIC,v
> > retrieving revision 1.464
> > diff -u -p -u -p -r1.464 GENERIC
> > --- sys/arch/amd64/conf/GENERIC 26 Oct 2018 20:26:19 -0000 1.464
> > +++ sys/arch/amd64/conf/GENERIC 19 Nov 2018 11:48:33 -0000
> > @@ -79,6 +79,8 @@ ipmi0 at mainbus? disable # IPMI
> >  
> >  vmt0 at pvbus? # VMware Tools
> >  
> > +pvclock0 at pvbus? # KVM pvclock
> > +
> >  xen0 at pvbus? # Xen HVM domU
> >  xnf* at xen? # Xen Netfront
> >  xbf* at xen? # Xen Blkfront
> > Index: sys/dev/pv/files.pv
> > ===================================================================
> > RCS file: /cvs/src/sys/dev/pv/files.pv,v
> > retrieving revision 1.14
> > diff -u -p -u -p -r1.14 files.pv
> > --- sys/dev/pv/files.pv 24 Aug 2018 16:07:01 -0000 1.14
> > +++ sys/dev/pv/files.pv 19 Nov 2018 11:48:33 -0000
> > @@ -8,6 +8,11 @@ device pvbus
> >  attach pvbus at mainbus
> >  file dev/pv/pvbus.c pvbus needs-flag
> >  
> > +# KVM clock
> > +device pvclock
> > +attach pvclock at pvbus
> > +file dev/pv/pvclock.c pvclock needs-flag
> > +
> >  # VMware Tools
> >  device vmt
> >  attach vmt at pvbus
> > Index: sys/dev/pv/pvclock.c
> > ===================================================================
> > RCS file: sys/dev/pv/pvclock.c
> > diff -N sys/dev/pv/pvclock.c
> > --- /dev/null 1 Jan 1970 00:00:00 -0000
> > +++ sys/dev/pv/pvclock.c 19 Nov 2018 11:48:33 -0000
> > @@ -0,0 +1,229 @@
> > +/* $OpenBSD$ */
> > +
> > +/*
> > + * Copyright (c) 2018 Reyk Floeter <[hidden email]>
> > + *
> > + * Permission to use, copy, modify, and distribute this software for any
> > + * purpose with or without fee is hereby granted, provided that the above
> > + * copyright notice and this permission notice appear in all copies.
> > + *
> > + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
> > + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
> > + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
> > + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
> > + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
> > + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
> > + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
> > + */
> > +
> > +#if !defined(__i386__) && !defined(__amd64__)
> > +#error pvclock(4) is only supported on i386 and amd64
> > +#endif
> > +
> > +#include <sys/param.h>
> > +#include <sys/systm.h>
> > +#include <sys/kernel.h>
> > +#include <sys/timetc.h>
> > +#include <sys/timeout.h>
> > +#include <sys/malloc.h>
> > +#include <sys/atomic.h>
> > +
> > +#include <machine/cpu.h>
> > +#include <uvm/uvm_extern.h>
> > +
> > +#include <dev/pv/pvvar.h>
> > +#include <dev/pv/pvreg.h>
> > +
> > +struct pvclock_softc {
> > + struct device sc_dev;
> > + void *sc_time;
> > + paddr_t sc_paddr;
> > + struct timecounter *sc_tc;
> > +};
> > +
> > +struct pvclock_wall_clock {
> > + uint32_t wc_version;
> > + uint32_t wc_sec;
> > + uint32_t wc_nsec;
> > +} __packed;
> > +
> > +struct pvclock_time_info {
> > + uint32_t ti_version;
> > + uint32_t ti_pad0;
> > + uint64_t ti_tsc_timestamp;
> > + uint64_t ti_system_time;
> > + uint32_t ti_tsc_to_system_mul;
> > + int8_t ti_tsc_shift;
> > + uint8_t ti_flags;
> > + uint8_t ti_pad[2];
> > +} __packed;
> > +
> > +#define PVCLOCK_FLAG_TSC_STABLE 0x01
> > +#define PVCLOCK_SYSTEM_TIME_ENABLE 0x01
> > +#define DEVNAME(_s) ((_s)->sc_dev.dv_xname)
> > +
> > +int pvclock_match(struct device *, void *, void *);
> > +void pvclock_attach(struct device *, struct device *, void *);
> > +int pvclock_activate(struct device *, int);
> > +
> > +uint pvclock_get_timecount(struct timecounter *);
> > +void pvclock_read_time_info(struct pvclock_softc *,
> > +    struct pvclock_time_info *);
> > +
> > +struct cfattach pvclock_ca = {
> > + sizeof(struct pvclock_softc),
> > + pvclock_match,
> > + pvclock_attach,
> > + NULL,
> > + pvclock_activate
> > +};
> > +
> > +struct cfdriver pvclock_cd = {
> > + NULL,
> > + "pvclock",
> > + DV_DULL
> > +};
> > +
> > +struct timecounter pvclock_timecounter = {
> > + pvclock_get_timecount, NULL, ~0u, 0, NULL, -2000, NULL
> > +};
> > +
> > +int
> > +pvclock_match(struct device *parent, void *match, void *aux)
> > +{
> > + struct pv_attach_args *pva = aux;
> > + struct pvbus_hv *hv;
> > +
> > + /*
> > + * pvclock is provided by different hypervisors, we currently
> > + * only support the "kvmclock".
> > + */
> > + hv = &pva->pva_hv[PVBUS_KVM];
> > + if (hv->hv_base != 0) {
> > + /*
> > + * We only implement support for the 2nd version of pvclock.
> > + * The first version is basically the same but with different
> > + * non-standard MSRs and it is deprecated.
> > + */
> > + if ((hv->hv_features & (1 << KVM_FEATURE_CLOCKSOURCE2)) == 0)
> > + return (0);
> > +
> > + /*
> > + * Only the "stable" clock with a sync'ed TSC is supported.
> > + * In this case the host guarantees that the TSC is constant
> > + * and invariant, either by the underlying TSC or by passing
> > + * on a synchronized value.
> > + */
> > + if ((hv->hv_features &
> > +    (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT)) == 0)
> > + return (0);
> > + }
> > +
> > + return (1);
> > +}
> > +
> > +void
> > +pvclock_attach(struct device *parent, struct device *self, void *aux)
> > +{
> > + struct pvclock_softc *sc = (struct pvclock_softc *)self;
> > + paddr_t pa;
> > +
> > + if ((sc->sc_time = km_alloc(PAGE_SIZE,
> > +    &kv_any, &kp_zero, &kd_nowait)) == NULL) {
> > + printf(": time page allocation failed\n");
> > + return;
> > + }
> > + if (!pmap_extract(pmap_kernel(), (vaddr_t)sc->sc_time, &pa)) {
> > + printf(": time page PA extraction failed\n");
> > + km_free(sc->sc_time, PAGE_SIZE, &kv_any, &kp_zero);
> > + return;
> > + }
> > +
> > + wrmsr(KVM_MSR_SYSTEM_TIME, pa | PVCLOCK_SYSTEM_TIME_ENABLE);
> > + sc->sc_paddr = pa;
> > +
> > + sc->sc_tc = &pvclock_timecounter;
> > + sc->sc_tc->tc_name = DEVNAME(sc);
> > + sc->sc_tc->tc_frequency = 1000000000ULL;
> > + sc->sc_tc->tc_priv = sc;
> > +
> > + /* Better than HPET but below TSC */
> > + sc->sc_tc->tc_quality = 1500;
> > +
> > + tc_init(sc->sc_tc);
> > +
> > + printf("\n");
> > +}
> > +
> > +int
> > +pvclock_activate(struct device *self, int act)
> > +{
> > + struct pvclock_softc *sc = (struct pvclock_softc *)self;
> > + int rv = 0;
> > + paddr_t pa = sc->sc_paddr;
> > +
> > + switch (act) {
> > + case DVACT_POWERDOWN:
> > + wrmsr(KVM_MSR_SYSTEM_TIME, pa & ~PVCLOCK_SYSTEM_TIME_ENABLE);
> > + break;
> > + case DVACT_RESUME:
> > + wrmsr(KVM_MSR_SYSTEM_TIME, pa | PVCLOCK_SYSTEM_TIME_ENABLE);
> > + break;
> > + }
> > +
> > + return (rv);
> > +}
> > +
> > +static inline uint32_t
> > +pvclock_read_begin(const struct pvclock_time_info *ti)
> > +{
> > + uint32_t version = ti->ti_version & ~0x1;
> > + virtio_membar_sync();
> > + return (version);
> > +}
> > +
> > +static inline int
> > +pvclock_read_done(const struct pvclock_time_info *ti,
> > +    uint32_t version)
> > +{
> > + virtio_membar_sync();
> > + return (ti->ti_version == version);
> > +}
> > +
> > +uint
> > +pvclock_get_timecount(struct timecounter *tc)
> > +{
> > + struct pvclock_softc *sc = tc->tc_priv;
> > + struct pvclock_time_info *ti;
> > + uint64_t tsc_timestamp, system_time, delta, ctr;
> > + uint32_t version, mul_frac;
> > + int8_t shift;
> > + uint8_t flags;
> > +
> > + ti = sc->sc_time;
> > + do {
> > + version = pvclock_read_begin(ti);
> > + system_time = ti->ti_system_time;
> > + tsc_timestamp = ti->ti_tsc_timestamp;
> > + mul_frac = ti->ti_tsc_to_system_mul;
> > + shift = ti->ti_tsc_shift;
> > + flags = ti->ti_flags;
> > + } while (!pvclock_read_done(ti, version));
> > +
> > + /* This bit must be set as we attached based on the stable flag */
> > + if ((flags & PVCLOCK_FLAG_TSC_STABLE) == 0)
> > + panic("%s: unstable result on stable clock", DEVNAME(sc));
> > +
> > + /*
> > + * The algorithm is described in
> > + * linux/Documentation/virtual/kvm/msr.txt
> > + */
> > + delta = rdtsc() - tsc_timestamp;
> > + if (shift < 0)
> > + delta >>= -shift;
> > + else
> > + delta <<= shift;
> > + ctr = ((delta * mul_frac) >> 32) + system_time;
> > +
> > + return (ctr);
> > +}
> > Index: sys/dev/pv/pvreg.h
> > ===================================================================
> > RCS file: /cvs/src/sys/dev/pv/pvreg.h,v
> > retrieving revision 1.4
> > diff -u -p -u -p -r1.4 pvreg.h
> > --- sys/dev/pv/pvreg.h 12 Dec 2015 12:33:49 -0000 1.4
> > +++ sys/dev/pv/pvreg.h 19 Nov 2018 11:48:33 -0000
> > @@ -43,6 +43,9 @@
> >  #define KVM_MSR_EOI_EN 0x4b564d04
> >  #define KVM_PV_EOI_BIT 0
> >  
> > +#define KVM_MSR_WALL_CLOCK 0x4b564d00
> > +#define KVM_MSR_SYSTEM_TIME 0x4b564d01
> > +
> >  /*
> >   * Hyper-V
> >   */
>

Reply | Threaded
Open this post in threaded view
|

Re: pvclock(4)

Landry Breuil-5
On Thu, Nov 22, 2018 at 07:44:01AM -0800, Mike Larkin wrote:

> On Thu, Nov 22, 2018 at 04:37:49PM +0100, Reyk Floeter wrote:
> > On Mon, Nov 19, 2018 at 01:12:46PM +0100, Reyk Floeter wrote:
> > > the attached diff is another attempt at implementing a pvclock(4)
> > > guest driver.  This improves the clock on KVM and replaces the need
> > > for using the VM-expensive acpihpet(4).
> > >
> >
> > So far I only got positive reports.  Where are the problems? ;)
> >
> > Otherwise: OK?
> >
> > Reyk
> >
>
> Reads ok. One question - you mention in pvclock.c that this is supported
> on i386 and amd64 but I only see GENERIC changes for amd64?
>
> ok mlarkin in any case, but I'd either add GENERIC changes for i386 or
> make this for sure amd64 only.

I'm giving it a shot on my i386 vm.

Reply | Threaded
Open this post in threaded view
|

Re: pvclock(4)

Landry Breuil-5
On Thu, Nov 22, 2018 at 05:24:10PM +0100, Landry Breuil wrote:

> On Thu, Nov 22, 2018 at 07:44:01AM -0800, Mike Larkin wrote:
> > On Thu, Nov 22, 2018 at 04:37:49PM +0100, Reyk Floeter wrote:
> > > On Mon, Nov 19, 2018 at 01:12:46PM +0100, Reyk Floeter wrote:
> > > > the attached diff is another attempt at implementing a pvclock(4)
> > > > guest driver.  This improves the clock on KVM and replaces the need
> > > > for using the VM-expensive acpihpet(4).
> > > >
> > >
> > > So far I only got positive reports.  Where are the problems? ;)
> > >
> > > Otherwise: OK?
> > >
> > > Reyk
> > >
> >
> > Reads ok. One question - you mention in pvclock.c that this is supported
> > on i386 and amd64 but I only see GENERIC changes for amd64?
> >
> > ok mlarkin in any case, but I'd either add GENERIC changes for i386 or
> > make this for sure amd64 only.
>
> I'm giving it a shot on my i386 vm.
Also seems to work fine there, i'm stressing the vm (well,
unpacking tarball/building firefox) and ntpd still manages to reduce the drift:


[17:49] c32:~/ $while sleep 10 ; do ntpctl -sa|grep clock ; date ; done                                                                  
4/4 peers valid, constraint offset -1s, clock unsynced, clock offset is 404.382ms                                                        
Thu Nov 22 17:49:49 CET 2018
4/4 peers valid, constraint offset -1s, clock unsynced, clock offset is 354.382ms                                                        
Thu Nov 22 17:49:59 CET 2018
4/4 peers valid, constraint offset -1s, clock unsynced, clock offset is 304.382ms                                                        
Thu Nov 22 17:50:09 CET 2018
4/4 peers valid, constraint offset -1s, clock unsynced, clock offset is 254.382ms                                                        
Thu Nov 22 17:50:19 CET 2018
4/4 peers valid, constraint offset -1s, clock unsynced, clock offset is 204.382ms                                                        
Thu Nov 22 17:50:29 CET 2018
4/4 peers valid, constraint offset -1s, clock unsynced, clock offset is 154.382ms                                                        
Thu Nov 22 17:50:39 CET 2018
4/4 peers valid, constraint offset -1s, clock unsynced, clock offset is 104.382ms                                                        
Thu Nov 22 17:50:49 CET 2018
4/4 peers valid, constraint offset -1s, clock unsynced, clock offset is 54.382ms                                                          
Thu Nov 22 17:50:59 CET 2018
4/4 peers valid, constraint offset -1s, clock unsynced
Thu Nov 22 17:51:09 CET 2018
4/4 peers valid, constraint offset -1s, clock unsynced
Thu Nov 22 17:51:19 CET 2018
4/4 peers valid, constraint offset -1s, clock synced, stratum 3
Thu Nov 22 17:51:29 CET 2018

dunno how valid a testcase this is

dmesg below, this time emulating a QEMU Standard PC (i440FX + PIIX, 1996)

Landry

dmesg.boot (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pvclock(4)

Greg Steuck
In reply to this post by Reyk Floeter-2
I realize this report is practically useless, but better out than in
(according to Shrek).
I found this in the logs of my GCE VM running syzkaller bot. No further
details were preserved...

2018/11/24 09:53:48 ci-openbsd-main: poll:
94bf4886dbb69e9fbf0f92f975fc23f16fc5c80f
2018/11/24 09:53:48 ci-openbsd-main: building kernel...
2018/11/24 09:54:03 ci-openbsd-main: testing image...
2018/11/24 10:04:07 ci-openbsd-main: VM boot failed with: panic: pvclock0:
unstable result on stable clock

The host is running
OpenBSD 6.4-current (GENERIC.MP) #456: Tue Nov 20 08:46:59 MST 2018
    [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.MP

The VM kernel at the time was built at "zap 10 tab leading whitespace
before 'struct evp_pkey_ctx_st {'", so maybe "only attach pvclock(4) inside
a KVM guest" would've fixed it?

Thanks
Greg
Reply | Threaded
Open this post in threaded view
|

Re: pvclock(4)

Reyk Floeter-2

> Am 25.11.2018 um 05:02 schrieb Greg Steuck <[hidden email]>:
>
> I realize this report is practically useless, but better out than in (according to Shrek).
> I found this in the logs of my GCE VM running syzkaller bot. No further details were preserved...
>
> 2018/11/24 09:53:48 ci-openbsd-main: poll: 94bf4886dbb69e9fbf0f92f975fc23f16fc5c80f
> 2018/11/24 09:53:48 ci-openbsd-main: building kernel...
> 2018/11/24 09:54:03 ci-openbsd-main: testing image...
> 2018/11/24 10:04:07 ci-openbsd-main: VM boot failed with: panic: pvclock0: unstable result on stable clock
>
> The host is running
> OpenBSD 6.4-current (GENERIC.MP) #456: Tue Nov 20 08:46:59 MST 2018
>     [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>
> The VM kernel at the time was built at "zap 10 tab leading whitespace before 'struct evp_pkey_ctx_st {'", so maybe "only attach pvclock(4) inside a KVM guest" would've fixed it?

Yes, correct. Sorry for that glitch.

Reyk

Reply | Threaded
Open this post in threaded view
|

Re: pvclock(4)

johnw-3
In reply to this post by Reyk Floeter-2

> So far I only got positive reports.  Where are the problems? ;)

> Otherwise: OK?

> Reyk

Hi, my kvm/quest/openbsd-amd64 can not boot, after upgrade to today
current (28-nov-2018).

It work before upgrade (OpenBSD 6.4-current (GENERIC.MP) #447: Sun Nov
18 17:25:58 MST 2018)

Host: 4.18.0-3-amd64 #1 SMP Debian 4.18.20-2 (2018-11-23) x86_64
GNU/Linux
qemu: QEMU emulator version 2.12.0 (Debian 1:2.12+dfsg-3+b1)
panic photo: https://ibb.co/gjWBhL4
panic photo: https://ibb.co/HDrJ7Q0
panic photo: https://ibb.co/g96J5s7

Thanks.

--
Key fingerprint: CDB3 6C62 254B C088 1E5D DD32 182C 97DB CF2C 80AC

Reply | Threaded
Open this post in threaded view
|

Re: pvclock(4)

Reyk Floeter-2
Hi,

> Am 29.11.2018 um 05:27 schrieb johnw <[hidden email]>:
>
>
>> So far I only got positive reports.  Where are the problems? ;)
>
>> Otherwise: OK?
>
>> Reyk
>
> Hi, my kvm/quest/openbsd-amd64 can not boot, after upgrade to today current (28-nov-2018).
>

thanks for reporting.

Do you have a full dmesg for me?

Reyk

> It work before upgrade (OpenBSD 6.4-current (GENERIC.MP) #447: Sun Nov 18 17:25:58 MST 2018)
>
> Host: 4.18.0-3-amd64 #1 SMP Debian 4.18.20-2 (2018-11-23) x86_64 GNU/Linux
> qemu: QEMU emulator version 2.12.0 (Debian 1:2.12+dfsg-3+b1)
> panic photo: https://ibb.co/gjWBhL4
> panic photo: https://ibb.co/HDrJ7Q0
> panic photo: https://ibb.co/g96J5s7
>
> Thanks.
>
> --
> Key fingerprint: CDB3 6C62 254B C088 1E5D DD32 182C 97DB CF2C 80AC

Reply | Threaded
Open this post in threaded view
|

Re: pvclock(4)

johnw-3
Reyk Floeter 於 2018-11-29 14:09 寫到:

> Do you have a full dmesg for me?

Yes, attached dmesg.log, thanks.


--
Key fingerprint: CDB3 6C62 254B C088 1E5D DD32 182C 97DB CF2C 80AC

dmesg.log (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pvclock(4)

johnw-3
johnw 於 2018-11-29 17:38 寫到:
> Reyk Floeter 於 2018-11-29 14:09 寫到:
>
>> Do you have a full dmesg for me?
>
> Yes, attached dmesg.log, thanks.

Hi, after disable pvclock, it can boot with new kernel again, thanks.



--
Key fingerprint: CDB3 6C62 254B C088 1E5D DD32 182C 97DB CF2C 80AC

disabled-pvclock.dmesg.txt (12K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pvclock(4)

Chris Cappuccio
johnw [[hidden email]] wrote:
>
> Hi, after disable pvclock, it can boot with new kernel again, thanks.
...
> cpu0: Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz, 105.29 MHz, 06-17-0a
> cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,SSSE3,CX16,SSE4.1,x2APIC,DEADLINE,XSAVE,HV,NXE,LONG,LAHF,PERF,ARAT,MELTDOWN
...

This CPU clearly doesn't have the invariant TSC (it is required according
to Reyk Floeter's tech@ posting from Nov 19th.) So, pvclock does not handle
this situation properly, apparently checking for KVM_FEATURE_CLOCKSOURCE2
and KVM_FEATURE_CLOCKSOURCE_STABLE_BIT is not enough.

Chris

Reply | Threaded
Open this post in threaded view
|

Re: pvclock(4)

Reyk Floeter-2

> Am 04.12.2018 um 00:52 schrieb Chris Cappuccio <[hidden email]>:
>
> johnw [[hidden email]] wrote:
>>
>> Hi, after disable pvclock, it can boot with new kernel again, thanks.
> ...
>> cpu0: Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz, 105.29 MHz, 06-17-0a
>> cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,SSSE3,CX16,SSE4.1,x2APIC,DEADLINE,XSAVE,HV,NXE,LONG,LAHF,PERF,ARAT,MELTDOWN
> ...
>
> This CPU clearly doesn't have the invariant TSC (it is required according
> to Reyk Floeter's tech@ posting from Nov 19th.) So, pvclock does not handle
> this situation properly, apparently checking for KVM_FEATURE_CLOCKSOURCE2
> and KVM_FEATURE_CLOCKSOURCE_STABLE_BIT is not enough.

Yes, KVM’s stable bit is not a reliable indication as it is seems to depend on the capabilities of the KVM version and not the actual availability of the feature on the particular hardware. How annoying.

As mentioned before: I‘d like to disable pvclock for now and I can do that in the morning CET if nobody beats me to it.

I have an idea how to deal with old platforms afterwards but this needs some more tests and thoughts.

Reyk

Reply | Threaded
Open this post in threaded view
|

Re: pvclock(4)

Chris Cappuccio
Reyk Floeter [[hidden email]] wrote:
>
> Yes, KVM???s stable bit is not a reliable indication as it is seems to depend on the capabilities of the KVM version and not the actual availability of the feature on the particular hardware. How annoying.
>
> As mentioned before: I???d like to disable pvclock for now and I can do that in the morning CET if nobody beats me to it.
>
> I have an idea how to deal with old platforms afterwards but this needs some more tests and thoughts.
>

Perhaps the solution is as "simple" as checking the status of the bit
after the presence of the bit is established ?

Index: pvclock.c
===================================================================
RCS file: /cvs/src/sys/dev/pv/pvclock.c,v
retrieving revision 1.2
diff -u -p -u -r1.2 pvclock.c
--- pvclock.c 24 Nov 2018 13:12:29 -0000 1.2
+++ pvclock.c 4 Dec 2018 00:53:56 -0000
@@ -127,8 +127,10 @@ pvclock_match(struct device *parent, voi
 void
 pvclock_attach(struct device *parent, struct device *self, void *aux)
 {
- struct pvclock_softc *sc = (struct pvclock_softc *)self;
- paddr_t pa;
+ struct pvclock_softc *sc = (struct pvclock_softc *)self;
+ struct pvclock_time_info *ti;
+ paddr_t pa;
+ uint8_t flags;
 
  if ((sc->sc_time = km_alloc(PAGE_SIZE,
     &kv_any, &kp_zero, &kd_nowait)) == NULL) {
@@ -151,6 +153,13 @@ pvclock_attach(struct device *parent, st
 
  /* Better than HPET but below TSC */
  sc->sc_tc->tc_quality = 1500;
+
+ ti = sc->sc_time;
+ flags = ti->ti_flags;
+ if ((flags & PVCLOCK_FLAG_TSC_STABLE) == 0) {
+ printf(": unstable timestamp counter\n");
+ return;
+ }
 
  tc_init(sc->sc_tc);
 

Reply | Threaded
Open this post in threaded view
|

Re: pvclock(4)

johnw-3
Chris Cappuccio 於 2018-12-04 08:56 寫到:

> Reyk Floeter [[hidden email]] wrote:
>>
>> Yes, KVM???s stable bit is not a reliable indication as it is seems to
>> depend on the capabilities of the KVM version and not the actual
>> availability of the feature on the particular hardware. How annoying.
>>
>> As mentioned before: I???d like to disable pvclock for now and I can
>> do that in the morning CET if nobody beats me to it.
>>
>> I have an idea how to deal with old platforms afterwards but this
>> needs some more tests and thoughts.
>>
>
> Perhaps the solution is as "simple" as checking the status of the bit
> after the presence of the bit is established ?
>
> Index: pvclock.c
> ===================================================================
> RCS file: /cvs/src/sys/dev/pv/pvclock.c,v
> retrieving revision 1.2
> diff -u -p -u -r1.2 pvclock.c
> --- pvclock.c 24 Nov 2018 13:12:29 -0000 1.2
> +++ pvclock.c 4 Dec 2018 00:53:56 -0000
> @@ -127,8 +127,10 @@ pvclock_match(struct device *parent, voi
>  void
>  pvclock_attach(struct device *parent, struct device *self, void *aux)
>  {
> - struct pvclock_softc *sc = (struct pvclock_softc *)self;
> - paddr_t pa;
> + struct pvclock_softc *sc = (struct pvclock_softc *)self;
> + struct pvclock_time_info *ti;
> + paddr_t pa;
> + uint8_t flags;
>
>   if ((sc->sc_time = km_alloc(PAGE_SIZE,
>      &kv_any, &kp_zero, &kd_nowait)) == NULL) {
> @@ -151,6 +153,13 @@ pvclock_attach(struct device *parent, st
>
>   /* Better than HPET but below TSC */
>   sc->sc_tc->tc_quality = 1500;
> +
> + ti = sc->sc_time;
> + flags = ti->ti_flags;
> + if ((flags & PVCLOCK_FLAG_TSC_STABLE) == 0) {
> + printf(": unstable timestamp counter\n");
> + return;
> + }
>
>   tc_init(sc->sc_tc);
Hi, your patch work for me, the system boot without manually disable
pvclock, thanks.

Attached dmesg.

--
Key fingerprint: CDB3 6C62 254B C088 1E5D DD32 182C 97DB CF2C 80AC

dmesg (17K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pvclock(4)

Peter Hessler
In reply to this post by Chris Cappuccio
On 2018 Dec 03 (Mon) at 16:56:10 -0800 (-0800), Chris Cappuccio wrote:
:Reyk Floeter [[hidden email]] wrote:
:>
:> Yes, KVM???s stable bit is not a reliable indication as it is seems to depend on the capabilities of the KVM version and not the actual availability of the feature on the particular hardware. How annoying.
:>
:> As mentioned before: I???d like to disable pvclock for now and I can do that in the morning CET if nobody beats me to it.
:>
:> I have an idea how to deal with old platforms afterwards but this needs some more tests and thoughts.
:>
:
:Perhaps the solution is as "simple" as checking the status of the bit
:after the presence of the bit is established ?
:

This makes sense, OK




:Index: pvclock.c
:===================================================================
:RCS file: /cvs/src/sys/dev/pv/pvclock.c,v
:retrieving revision 1.2
:diff -u -p -u -r1.2 pvclock.c
:--- pvclock.c 24 Nov 2018 13:12:29 -0000 1.2
:+++ pvclock.c 4 Dec 2018 00:53:56 -0000
:@@ -127,8 +127,10 @@ pvclock_match(struct device *parent, voi
: void
: pvclock_attach(struct device *parent, struct device *self, void *aux)
: {
:- struct pvclock_softc *sc = (struct pvclock_softc *)self;
:- paddr_t pa;
:+ struct pvclock_softc *sc = (struct pvclock_softc *)self;
:+ struct pvclock_time_info *ti;
:+ paddr_t pa;
:+ uint8_t flags;
:
: if ((sc->sc_time = km_alloc(PAGE_SIZE,
:    &kv_any, &kp_zero, &kd_nowait)) == NULL) {
:@@ -151,6 +153,13 @@ pvclock_attach(struct device *parent, st
:
: /* Better than HPET but below TSC */
: sc->sc_tc->tc_quality = 1500;
:+
:+ ti = sc->sc_time;
:+ flags = ti->ti_flags;
:+ if ((flags & PVCLOCK_FLAG_TSC_STABLE) == 0) {
:+ printf(": unstable timestamp counter\n");
:+ return;
:+ }
:
: tc_init(sc->sc_tc);
:
:

--
fortune -as

Reply | Threaded
Open this post in threaded view
|

Re: pvclock(4)

Reyk Floeter-2
In reply to this post by Chris Cappuccio
On Mon, Dec 03, 2018 at 04:56:10PM -0800, Chris Cappuccio wrote:

> Reyk Floeter [[hidden email]] wrote:
> >
> > Yes, KVM???s stable bit is not a reliable indication as it is seems to depend on the capabilities of the KVM version and not the actual availability of the feature on the particular hardware. How annoying.
> >
> > As mentioned before: I???d like to disable pvclock for now and I can do that in the morning CET if nobody beats me to it.
> >
> > I have an idea how to deal with old platforms afterwards but this needs some more tests and thoughts.
> >
>
> Perhaps the solution is as "simple" as checking the status of the bit
> after the presence of the bit is established ?
>

The approach makes sense but it is not the right way to read the
value.  It only works if you don't hit a refresh cycle of the host.
Better diff below.

But I'm not sure if this is enough and if the "stable" flag mit appear
occasionally.  So I'd like to disable pvclock for now until we did
some more testing and tried different options.

Reyk

Index: sys/dev/pv/pvclock.c
===================================================================
RCS file: /cvs/src/sys/dev/pv/pvclock.c,v
retrieving revision 1.2
diff -u -p -u -p -r1.2 pvclock.c
--- sys/dev/pv/pvclock.c 24 Nov 2018 13:12:29 -0000 1.2
+++ sys/dev/pv/pvclock.c 4 Dec 2018 12:09:06 -0000
@@ -70,6 +70,11 @@ uint pvclock_get_timecount(struct timec
 void pvclock_read_time_info(struct pvclock_softc *,
     struct pvclock_time_info *);
 
+static inline uint32_t
+ pvclock_read_begin(const struct pvclock_time_info *);
+static inline int
+ pvclock_read_done(const struct pvclock_time_info *, uint32_t);
+
 struct cfattach pvclock_ca = {
  sizeof(struct pvclock_softc),
  pvclock_match,
@@ -127,8 +132,11 @@ pvclock_match(struct device *parent, voi
 void
 pvclock_attach(struct device *parent, struct device *self, void *aux)
 {
- struct pvclock_softc *sc = (struct pvclock_softc *)self;
- paddr_t pa;
+ struct pvclock_softc *sc = (struct pvclock_softc *)self;
+ struct pvclock_time_info *ti;
+ paddr_t pa;
+ uint32_t version;
+ uint8_t flags;
 
  if ((sc->sc_time = km_alloc(PAGE_SIZE,
     &kv_any, &kp_zero, &kd_nowait)) == NULL) {
@@ -143,6 +151,19 @@ pvclock_attach(struct device *parent, st
 
  wrmsr(KVM_MSR_SYSTEM_TIME, pa | PVCLOCK_SYSTEM_TIME_ENABLE);
  sc->sc_paddr = pa;
+
+ ti = sc->sc_time;
+ do {
+ version = pvclock_read_begin(ti);
+ flags = ti->ti_flags;
+ } while (!pvclock_read_done(ti, version));
+
+ if ((flags & PVCLOCK_FLAG_TSC_STABLE) == 0) {
+ wrmsr(KVM_MSR_SYSTEM_TIME, pa & ~PVCLOCK_SYSTEM_TIME_ENABLE);
+ km_free(sc->sc_time, PAGE_SIZE, &kv_any, &kp_zero);
+ panic(": unstable clock\n");
+ return;
+ }
 
  sc->sc_tc = &pvclock_timecounter;
  sc->sc_tc->tc_name = DEVNAME(sc);

Reply | Threaded
Open this post in threaded view
|

Re: pvclock(4)

Reyk Floeter-2
In reply to this post by Peter Hessler
On Tue, Dec 04, 2018 at 12:46:06PM +0100, Peter Hessler wrote:

> On 2018 Dec 03 (Mon) at 16:56:10 -0800 (-0800), Chris Cappuccio wrote:
> :Reyk Floeter [[hidden email]] wrote:
> :>
> :> Yes, KVM???s stable bit is not a reliable indication as it is seems to depend on the capabilities of the KVM version and not the actual availability of the feature on the particular hardware. How annoying.
> :>
> :> As mentioned before: I???d like to disable pvclock for now and I can do that in the morning CET if nobody beats me to it.
> :>
> :> I have an idea how to deal with old platforms afterwards but this needs some more tests and thoughts.
> :>
> :
> :Perhaps the solution is as "simple" as checking the status of the bit
> :after the presence of the bit is established ?
> :
>
> This makes sense, OK
>

See my other mail, this diff is not OK.

Reyk

>
>
>
> :Index: pvclock.c
> :===================================================================
> :RCS file: /cvs/src/sys/dev/pv/pvclock.c,v
> :retrieving revision 1.2
> :diff -u -p -u -r1.2 pvclock.c
> :--- pvclock.c 24 Nov 2018 13:12:29 -0000 1.2
> :+++ pvclock.c 4 Dec 2018 00:53:56 -0000
> :@@ -127,8 +127,10 @@ pvclock_match(struct device *parent, voi
> : void
> : pvclock_attach(struct device *parent, struct device *self, void *aux)
> : {
> :- struct pvclock_softc *sc = (struct pvclock_softc *)self;
> :- paddr_t pa;
> :+ struct pvclock_softc *sc = (struct pvclock_softc *)self;
> :+ struct pvclock_time_info *ti;
> :+ paddr_t pa;
> :+ uint8_t flags;
> :
> : if ((sc->sc_time = km_alloc(PAGE_SIZE,
> :    &kv_any, &kp_zero, &kd_nowait)) == NULL) {
> :@@ -151,6 +153,13 @@ pvclock_attach(struct device *parent, st
> :
> : /* Better than HPET but below TSC */
> : sc->sc_tc->tc_quality = 1500;
> :+
> :+ ti = sc->sc_time;
> :+ flags = ti->ti_flags;
> :+ if ((flags & PVCLOCK_FLAG_TSC_STABLE) == 0) {
> :+ printf(": unstable timestamp counter\n");
> :+ return;
> :+ }
> :
> : tc_init(sc->sc_tc);
> :
> :
>
> --
> fortune -as

12