Degraded timing performance - QEMU, KVM - OpenBSD 6.2

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Degraded timing performance - QEMU, KVM - OpenBSD 6.2

Andrew Davis
Hello,

I'm experiencing some odd timing issues on OpenBSD 6.2 (and 6.1) on the
system listed below. This is preventing me from running OpenBSD on my
servers. Can you determine if this is a bug in the OpenBSD operating
system? I can provide more information if needed.

Virtualized environment.

Host CPU: 2 x Intel E5-2630 v3 2.4 Ghz
Host OS: Fedora 27
Virtualization software: QEMU + KVM (2.10.0-1.fc27)
Guest Machine: default (pc-i440fx-2.10)
Guest OS: OpenBSD 6.2 (and 6.1).

Basically, OpenBSD processes degrade over time to the point where
they're completely unresponsive. This simple date printout script is a
good example. It should print out the date once per second, but after
roughly ~20 mins on this hardware configuration, it takes 2 seconds to
print each line, then 4 seconds to print each line, and so on. After
running for about 24 hours, the delay is about 1 minute between line
printouts.

     while sleep 1; do date; done

I've tried tweaking some different settings on the guest and host, such
as disabling the HPET timer and x2apic, neither of which has proven
effective.

I saw mention of adding "kvm-intel.preemption_timer=0" in another recent
thread. This seems to resolve the slowdown issue.

However, I have run other guest operating systems on this hardware
configuration (CentOS, Ubuntu, FreeBSD) - neither of which required any
of these tweaks, or experienced timing issues. This leads me to believe
that it could be related to a bug in OpenBSD.

I have access to several machines with this hardware configuration and
tested on multiple machines, to rule out a possible one-off hardware
issue. Each host displayed the same behavior.

Best regards,
Andrew

Reply | Threaded
Open this post in threaded view
|

Re: Degraded timing performance - QEMU, KVM - OpenBSD 6.2

Mike Larkin
On Tue, Dec 26, 2017 at 12:27:31PM -0500, Andrew Davis wrote:

> Hello,
>
> I'm experiencing some odd timing issues on OpenBSD 6.2 (and 6.1) on the
> system listed below. This is preventing me from running OpenBSD on my
> servers. Can you determine if this is a bug in the OpenBSD operating system?
> I can provide more information if needed.
>
> Virtualized environment.
>
> Host CPU: 2 x Intel E5-2630 v3 2.4 Ghz
> Host OS: Fedora 27
> Virtualization software: QEMU + KVM (2.10.0-1.fc27)
> Guest Machine: default (pc-i440fx-2.10)
> Guest OS: OpenBSD 6.2 (and 6.1).
>
> Basically, OpenBSD processes degrade over time to the point where they're
> completely unresponsive. This simple date printout script is a good example.
> It should print out the date once per second, but after roughly ~20 mins on
> this hardware configuration, it takes 2 seconds to print each line, then 4
> seconds to print each line, and so on. After running for about 24 hours, the
> delay is about 1 minute between line printouts.
>
>     while sleep 1; do date; done
>
> I've tried tweaking some different settings on the guest and host, such as
> disabling the HPET timer and x2apic, neither of which has proven effective.
>
> I saw mention of adding "kvm-intel.preemption_timer=0" in another recent
> thread. This seems to resolve the slowdown issue.
>
> However, I have run other guest operating systems on this hardware
> configuration (CentOS, Ubuntu, FreeBSD) - neither of which required any of
> these tweaks, or experienced timing issues. This leads me to believe that it
> could be related to a bug in OpenBSD.
>
> I have access to several machines with this hardware configuration and
> tested on multiple machines, to rule out a possible one-off hardware issue.
> Each host displayed the same behavior.
>
> Best regards,
> Andrew
>

What timecounter source did the OpenBSD guests pick? Did you try selecting
one of the other choices to see if this helps?

sysctl kern.timecounter    if you're not sure what I'm talking about.

-ml

Reply | Threaded
Open this post in threaded view
|

Re: Degraded timing performance - QEMU, KVM - OpenBSD 6.2

Andrew Davis
Hello,

No, I didn't changing the kern.timecounter selection directly. I had
tried disabling the HPET on qemu/kvm (which may have affected this
selection?).

Two of my boxes, both OpenBSD 6.1 report this:

# sysctl kern.timecounter
kern.timecounter.tick=1
kern.timecounter.timestepwarnings=0
kern.timecounter.hardware=acpihpet0
kern.timecounter.choice=i8254(0) acpihpet0(1000) acpitimer0(1000)
dummy(-1000000)

Best,
Andrew

On 12/26/2017 2:36 PM, Mike Larkin wrote:

> On Tue, Dec 26, 2017 at 12:27:31PM -0500, Andrew Davis wrote:
>> Hello,
>>
>> I'm experiencing some odd timing issues on OpenBSD 6.2 (and 6.1) on the
>> system listed below. This is preventing me from running OpenBSD on my
>> servers. Can you determine if this is a bug in the OpenBSD operating system?
>> I can provide more information if needed.
>>
>> Virtualized environment.
>>
>> Host CPU: 2 x Intel E5-2630 v3 2.4 Ghz
>> Host OS: Fedora 27
>> Virtualization software: QEMU + KVM (2.10.0-1.fc27)
>> Guest Machine: default (pc-i440fx-2.10)
>> Guest OS: OpenBSD 6.2 (and 6.1).
>>
>> Basically, OpenBSD processes degrade over time to the point where they're
>> completely unresponsive. This simple date printout script is a good example.
>> It should print out the date once per second, but after roughly ~20 mins on
>> this hardware configuration, it takes 2 seconds to print each line, then 4
>> seconds to print each line, and so on. After running for about 24 hours, the
>> delay is about 1 minute between line printouts.
>>
>>      while sleep 1; do date; done
>>
>> I've tried tweaking some different settings on the guest and host, such as
>> disabling the HPET timer and x2apic, neither of which has proven effective.
>>
>> I saw mention of adding "kvm-intel.preemption_timer=0" in another recent
>> thread. This seems to resolve the slowdown issue.
>>
>> However, I have run other guest operating systems on this hardware
>> configuration (CentOS, Ubuntu, FreeBSD) - neither of which required any of
>> these tweaks, or experienced timing issues. This leads me to believe that it
>> could be related to a bug in OpenBSD.
>>
>> I have access to several machines with this hardware configuration and
>> tested on multiple machines, to rule out a possible one-off hardware issue.
>> Each host displayed the same behavior.
>>
>> Best regards,
>> Andrew
>>
> What timecounter source did the OpenBSD guests pick? Did you try selecting
> one of the other choices to see if this helps?
>
> sysctl kern.timecounter    if you're not sure what I'm talking about.
>
> -ml

Reply | Threaded
Open this post in threaded view
|

Re: Degraded timing performance - QEMU, KVM - OpenBSD 6.2

Mike Larkin
On Tue, Dec 26, 2017 at 03:24:03PM -0500, Andrew Davis wrote:

> Hello,
>
> No, I didn't changing the kern.timecounter selection directly. I had tried
> disabling the HPET on qemu/kvm (which may have affected this selection?).
>
> Two of my boxes, both OpenBSD 6.1 report this:
>
> # sysctl kern.timecounter
> kern.timecounter.tick=1
> kern.timecounter.timestepwarnings=0
> kern.timecounter.hardware=acpihpet0
> kern.timecounter.choice=i8254(0) acpihpet0(1000) acpitimer0(1000)
> dummy(-1000000)
>
> Best,
> Andrew
>

Could you try one of the others and let us know if it helps, please?

-ml

> On 12/26/2017 2:36 PM, Mike Larkin wrote:
> > On Tue, Dec 26, 2017 at 12:27:31PM -0500, Andrew Davis wrote:
> > > Hello,
> > >
> > > I'm experiencing some odd timing issues on OpenBSD 6.2 (and 6.1) on the
> > > system listed below. This is preventing me from running OpenBSD on my
> > > servers. Can you determine if this is a bug in the OpenBSD operating system?
> > > I can provide more information if needed.
> > >
> > > Virtualized environment.
> > >
> > > Host CPU: 2 x Intel E5-2630 v3 2.4 Ghz
> > > Host OS: Fedora 27
> > > Virtualization software: QEMU + KVM (2.10.0-1.fc27)
> > > Guest Machine: default (pc-i440fx-2.10)
> > > Guest OS: OpenBSD 6.2 (and 6.1).
> > >
> > > Basically, OpenBSD processes degrade over time to the point where they're
> > > completely unresponsive. This simple date printout script is a good example.
> > > It should print out the date once per second, but after roughly ~20 mins on
> > > this hardware configuration, it takes 2 seconds to print each line, then 4
> > > seconds to print each line, and so on. After running for about 24 hours, the
> > > delay is about 1 minute between line printouts.
> > >
> > >      while sleep 1; do date; done
> > >
> > > I've tried tweaking some different settings on the guest and host, such as
> > > disabling the HPET timer and x2apic, neither of which has proven effective.
> > >
> > > I saw mention of adding "kvm-intel.preemption_timer=0" in another recent
> > > thread. This seems to resolve the slowdown issue.
> > >
> > > However, I have run other guest operating systems on this hardware
> > > configuration (CentOS, Ubuntu, FreeBSD) - neither of which required any of
> > > these tweaks, or experienced timing issues. This leads me to believe that it
> > > could be related to a bug in OpenBSD.
> > >
> > > I have access to several machines with this hardware configuration and
> > > tested on multiple machines, to rule out a possible one-off hardware issue.
> > > Each host displayed the same behavior.
> > >
> > > Best regards,
> > > Andrew
> > >
> > What timecounter source did the OpenBSD guests pick? Did you try selecting
> > one of the other choices to see if this helps?
> >
> > sysctl kern.timecounter    if you're not sure what I'm talking about.
> >
> > -ml
>

Reply | Threaded
Open this post in threaded view
|

Re: Degraded timing performance - QEMU, KVM - OpenBSD 6.2

Andrew Davis
Hello again,

I tested with each of the "acpihpet0", "acpitimer0", and "i8254" timers.
The timing problem manifested when using all 3 timers. I ran the date
loop with "acpihpet0" and "acpitimer0" until the issue manifested, and
let "i8254" run overnight.

Here are some snippets from the date logs from where I started logging
the date loop, and where the timing issue became present.

acpitimer0:

     Tue Dec 26 23:57:57 UTC 2017
     Tue Dec 26 23:57:58 UTC 2017
     ...
     Wed Dec 27 00:10:10 UTC 2017
     Wed Dec 27 00:10:12 UTC 2017
     Wed Dec 27 00:10:14 UTC 2017

i8254:

     Wed Dec 27 00:14:23 UTC 2017
     Wed Dec 27 00:14:24 UTC 2017
     ...
     Wed Dec 27 00:59:30 UTC 2017
     Wed Dec 27 00:59:31 UTC 2017
     Wed Dec 27 00:59:33 UTC 2017

acpihpet0:

     Wed Dec 27 16:20:54 UTC 2017
     Wed Dec 27 16:20:55 UTC 2017
     ...
     Wed Dec 27 16:32:44 UTC 2017
     Wed Dec 27 16:32:45 UTC 2017
     Wed Dec 27 16:32:47 UTC 2017
     Wed Dec 27 16:32:49 UTC 2017

The i8254 timer hit a point where the system stopped reporting the
proper time altogether. I ran these commands this morning after my
OpenBSD VM ran with i8254 overnight, and this is what the "date" command
displayed. The proper time is shown below.

     # sysctl | grep -i timecounter
     kern.timecounter.tick=1
     kern.timecounter.timestepwarnings=0
     kern.timecounter.hardware=i8254
     kern.timecounter.choice=i8254(0) acpihpet0(1000) acpitimer0(1000)
dummy(-1000000)

     # date
     Wed Dec 27 01:35:51 UTC 2017

     [root@local-linux ~]# date
     Wed Dec 27 16:11:05 UTC 2017

Best regards,
Andrew


On 12/26/2017 5:44 PM, Mike Larkin wrote:

> On Tue, Dec 26, 2017 at 03:24:03PM -0500, Andrew Davis wrote:
>> Hello,
>>
>> No, I didn't changing the kern.timecounter selection directly. I had tried
>> disabling the HPET on qemu/kvm (which may have affected this selection?).
>>
>> Two of my boxes, both OpenBSD 6.1 report this:
>>
>> # sysctl kern.timecounter
>> kern.timecounter.tick=1
>> kern.timecounter.timestepwarnings=0
>> kern.timecounter.hardware=acpihpet0
>> kern.timecounter.choice=i8254(0) acpihpet0(1000) acpitimer0(1000)
>> dummy(-1000000)
>>
>> Best,
>> Andrew
>>
> Could you try one of the others and let us know if it helps, please?
>
> -ml
>
>> On 12/26/2017 2:36 PM, Mike Larkin wrote:
>>> On Tue, Dec 26, 2017 at 12:27:31PM -0500, Andrew Davis wrote:
>>>> Hello,
>>>>
>>>> I'm experiencing some odd timing issues on OpenBSD 6.2 (and 6.1) on the
>>>> system listed below. This is preventing me from running OpenBSD on my
>>>> servers. Can you determine if this is a bug in the OpenBSD operating system?
>>>> I can provide more information if needed.
>>>>
>>>> Virtualized environment.
>>>>
>>>> Host CPU: 2 x Intel E5-2630 v3 2.4 Ghz
>>>> Host OS: Fedora 27
>>>> Virtualization software: QEMU + KVM (2.10.0-1.fc27)
>>>> Guest Machine: default (pc-i440fx-2.10)
>>>> Guest OS: OpenBSD 6.2 (and 6.1).
>>>>
>>>> Basically, OpenBSD processes degrade over time to the point where they're
>>>> completely unresponsive. This simple date printout script is a good example.
>>>> It should print out the date once per second, but after roughly ~20 mins on
>>>> this hardware configuration, it takes 2 seconds to print each line, then 4
>>>> seconds to print each line, and so on. After running for about 24 hours, the
>>>> delay is about 1 minute between line printouts.
>>>>
>>>>       while sleep 1; do date; done
>>>>
>>>> I've tried tweaking some different settings on the guest and host, such as
>>>> disabling the HPET timer and x2apic, neither of which has proven effective.
>>>>
>>>> I saw mention of adding "kvm-intel.preemption_timer=0" in another recent
>>>> thread. This seems to resolve the slowdown issue.
>>>>
>>>> However, I have run other guest operating systems on this hardware
>>>> configuration (CentOS, Ubuntu, FreeBSD) - neither of which required any of
>>>> these tweaks, or experienced timing issues. This leads me to believe that it
>>>> could be related to a bug in OpenBSD.
>>>>
>>>> I have access to several machines with this hardware configuration and
>>>> tested on multiple machines, to rule out a possible one-off hardware issue.
>>>> Each host displayed the same behavior.
>>>>
>>>> Best regards,
>>>> Andrew
>>>>
>>> What timecounter source did the OpenBSD guests pick? Did you try selecting
>>> one of the other choices to see if this helps?
>>>
>>> sysctl kern.timecounter    if you're not sure what I'm talking about.
>>>
>>> -ml

Reply | Threaded
Open this post in threaded view
|

Re: Degraded timing performance - QEMU, KVM - OpenBSD 6.2

Mark Kettenis
> From: Andrew Davis <[hidden email]>
> Date: Wed, 27 Dec 2017 11:39:54 -0500
>
> Hello again,
>
> I tested with each of the "acpihpet0", "acpitimer0", and "i8254" timers.
> The timing problem manifested when using all 3 timers. I ran the date
> loop with "acpihpet0" and "acpitimer0" until the issue manifested, and
> let "i8254" run overnight.
>
> Here are some snippets from the date logs from where I started logging
> the date loop, and where the timing issue became present.
>
> acpitimer0:
>
>  Â Â Â  Tue Dec 26 23:57:57 UTC 2017
>  Â Â Â  Tue Dec 26 23:57:58 UTC 2017
>  Â Â Â  ...
>  Â Â Â  Wed Dec 27 00:10:10 UTC 2017
>  Â Â Â  Wed Dec 27 00:10:12 UTC 2017
>  Â Â Â  Wed Dec 27 00:10:14 UTC 2017
>
> i8254:
>
>  Â Â Â  Wed Dec 27 00:14:23 UTC 2017
>  Â Â Â  Wed Dec 27 00:14:24 UTC 2017
>  Â Â Â  ...
>  Â Â Â  Wed Dec 27 00:59:30 UTC 2017
>  Â Â Â  Wed Dec 27 00:59:31 UTC 2017
>  Â Â Â  Wed Dec 27 00:59:33 UTC 2017
>
> acpihpet0:
>
>  Â Â Â  Wed Dec 27 16:20:54 UTC 2017
>  Â Â Â  Wed Dec 27 16:20:55 UTC 2017
>  Â Â Â  ...
>  Â Â Â  Wed Dec 27 16:32:44 UTC 2017
>  Â Â Â  Wed Dec 27 16:32:45 UTC 2017
>  Â Â Â  Wed Dec 27 16:32:47 UTC 2017
>  Â Â Â  Wed Dec 27 16:32:49 UTC 2017
>
> The i8254 timer hit a point where the system stopped reporting the
> proper time altogether. I ran these commands this morning after my
> OpenBSD VM ran with i8254 overnight, and this is what the "date" command
> displayed. The proper time is shown below.
>
>  Â Â Â  # sysctl | grep -i timecounter
>  Â Â Â  kern.timecounter.tick=1
>  Â Â Â  kern.timecounter.timestepwarnings=0
>  Â Â Â  kern.timecounter.hardware=i8254
>  Â Â Â  kern.timecounter.choice=i8254(0) acpihpet0(1000) acpitimer0(1000)
> dummy(-1000000)
>
>  Â Â Â  # date
>  Â Â Â  Wed Dec 27 01:35:51 UTC 2017
>
>  Â Â Â  [root@local-linux ~]# date
>  Â Â Â  Wed Dec 27 16:11:05 UTC 2017

Your test results are consistent with the local APIC emulation being
broken in Linux/KVM.  Regardless of what hardware is used for the
timecounter, the clock interrupts use the local APIC timer in OpenBSD.

OpenBSD programs the local APIC to interrupt every 10ms in so-called
repeated mode.  The clock interrupt is then responsable for reading
the timecounter to update the current wall clock time and for running
things like timeouts that wake up tasks that are sleeping.  If we get
no clock interrupts those wakeups don't happen, and your sleeps take
longer than what you intended.  But as long as the timecounter doesn't
wrap the wall clock time will be correctly updated once another clock
interrupt comes in.  And that's what happens with the i8524
timecounter.  It wraps fairly quickly, so if the clock interrupts
don't come in for a while, OpenBSD's idea of wall clock time starts to
get out of sync with reality.

So why do other systems not suffer from this problem?  I'm fairly
certain they also use the local APIC for clock interrupts.  But the
systems you tested (Linux, FreeBSD) probably don't run it in repeated
mode.  Some people consider running the local APIC in repeated mode a
bad idea.  And they might even be right.  Waking a system up at
regular intervals even if there is no real work to do is a bit silly
and wastes power.  Although one could argue that 10ms between wakeups
is long enough for this to matter much on modern systems.

Maybe we'll change the way we do clock interrupts at some point in the
future.  It would probably help vmm(4).  But this is not a trivial
task and won't happen overnight.  Working around bugs in someone
else's software certainly isn't enough motivation for me to implement
it.  

Cheers,

Mark


> On 12/26/2017 5:44 PM, Mike Larkin wrote:
> > On Tue, Dec 26, 2017 at 03:24:03PM -0500, Andrew Davis wrote:
> >> Hello,
> >>
> >> No, I didn't changing the kern.timecounter selection directly. I had tried
> >> disabling the HPET on qemu/kvm (which may have affected this selection?).
> >>
> >> Two of my boxes, both OpenBSD 6.1 report this:
> >>
> >> # sysctl kern.timecounter
> >> kern.timecounter.tick=1
> >> kern.timecounter.timestepwarnings=0
> >> kern.timecounter.hardware=acpihpet0
> >> kern.timecounter.choice=i8254(0) acpihpet0(1000) acpitimer0(1000)
> >> dummy(-1000000)
> >>
> >> Best,
> >> Andrew
> >>
> > Could you try one of the others and let us know if it helps, please?
> >
> > -ml
> >
> >> On 12/26/2017 2:36 PM, Mike Larkin wrote:
> >>> On Tue, Dec 26, 2017 at 12:27:31PM -0500, Andrew Davis wrote:
> >>>> Hello,
> >>>>
> >>>> I'm experiencing some odd timing issues on OpenBSD 6.2 (and 6.1) on the
> >>>> system listed below. This is preventing me from running OpenBSD on my
> >>>> servers. Can you determine if this is a bug in the OpenBSD operating system?
> >>>> I can provide more information if needed.
> >>>>
> >>>> Virtualized environment.
> >>>>
> >>>> Host CPU: 2 x Intel E5-2630 v3 2.4 Ghz
> >>>> Host OS: Fedora 27
> >>>> Virtualization software: QEMU + KVM (2.10.0-1.fc27)
> >>>> Guest Machine: default (pc-i440fx-2.10)
> >>>> Guest OS: OpenBSD 6.2 (and 6.1).
> >>>>
> >>>> Basically, OpenBSD processes degrade over time to the point where they're
> >>>> completely unresponsive. This simple date printout script is a good example.
> >>>> It should print out the date once per second, but after roughly ~20 mins on
> >>>> this hardware configuration, it takes 2 seconds to print each line, then 4
> >>>> seconds to print each line, and so on. After running for about 24 hours, the
> >>>> delay is about 1 minute between line printouts.
> >>>>
> >>>>       while sleep 1; do date; done
> >>>>
> >>>> I've tried tweaking some different settings on the guest and host, such as
> >>>> disabling the HPET timer and x2apic, neither of which has proven effective.
> >>>>
> >>>> I saw mention of adding "kvm-intel.preemption_timer=0" in another recent
> >>>> thread. This seems to resolve the slowdown issue.
> >>>>
> >>>> However, I have run other guest operating systems on this hardware
> >>>> configuration (CentOS, Ubuntu, FreeBSD) - neither of which required any of
> >>>> these tweaks, or experienced timing issues. This leads me to believe that it
> >>>> could be related to a bug in OpenBSD.
> >>>>
> >>>> I have access to several machines with this hardware configuration and
> >>>> tested on multiple machines, to rule out a possible one-off hardware issue.
> >>>> Each host displayed the same behavior.
> >>>>
> >>>> Best regards,
> >>>> Andrew
> >>>>
> >>> What timecounter source did the OpenBSD guests pick? Did you try selecting
> >>> one of the other choices to see if this helps?
> >>>
> >>> sysctl kern.timecounter    if you're not sure what I'm talking about.
> >>>
> >>> -ml
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Degraded timing performance - QEMU, KVM - OpenBSD 6.2

srutherford
Would this be consistent with the PIT taking longer to respond? The mode of
KVM used here (mentioned on the KVM list) moves the PIT to userspace and
would make it less accurate. If I'm reading OpenBSD's LAPIC calibration code
right, this might be the case. I believe Linux uses one of the PM Timer or
TSC to do the calibration.

(The obvious solution here is to just disable that mode if you are using
OpenBSD, which apparently works.)



--
Sent from: http://openbsd-archive.7691.n7.nabble.com/openbsd-dev-bugs-f183916.html

Reply | Threaded
Open this post in threaded view
|

Re: Degraded timing performance - QEMU, KVM - OpenBSD 6.2

Edd Barrett-3
In reply to this post by Andrew Davis
Hi,

I'm experiencing this issue too.

On Tue, Dec 26, 2017 at 12:27:31PM -0500, Andrew Davis wrote:
> Virtualization software: QEMU + KVM (2.10.0-1.fc27)

FWIW, there are reports that this bug is absent from qemu-2.11.0.

--
Best Regards
Edd Barrett

http://www.theunixzoo.co.uk