timekeep: fixing large skews on amd64 with RDTSCP

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

timekeep: fixing large skews on amd64 with RDTSCP

Paul Irofti-4
Hi,

Getting lots of messages about people loving the new timekeep
functionality, which I am very happy about, but also some that have the
skew too large for it to be enabled.

I plan on sending a diff next week to improve the situation via RDTSCP
on the machines that have it. Which is basically all modern machines.

The plan is to have an auxiliary value returned by RDTSCP which
identifies the CPU we got the info from so that we can look-up its
associated skew in a table saved at init inside the timekeep structure:

static inline u_int
rdtscp(void)
{
   uint32_t hi, lo, aux;
   asm volatile("rdtscp" : "=a"(lo), "=d"(hi), "=c" (aux) : : );
   skew = get_cpu_skew(aux);
   return ((uint64_t)lo)|(((uint64_t)hi)<<32) + skew;
}

Have a nice weekend,
Paul

Reply | Threaded
Open this post in threaded view
|

Re: timekeep: fixing large skews on amd64 with RDTSCP

Mark Kettenis
> From: Paul Irofti <[hidden email]>
> Date: Sat, 11 Jul 2020 13:32:22 +0300
>
> Hi,
>
> Getting lots of messages about people loving the new timekeep
> functionality, which I am very happy about, but also some that have the
> skew too large for it to be enabled.
>
> I plan on sending a diff next week to improve the situation via RDTSCP
> on the machines that have it. Which is basically all modern machines.
>
> The plan is to have an auxiliary value returned by RDTSCP which
> identifies the CPU we got the info from so that we can look-up its
> associated skew in a table saved at init inside the timekeep structure:

I think that is the wrong approach.  Instead we should synchronize the
TSC counters themselves.  There are special MSRs you can write the
offset into IIRC.  That seems to be what FreeBSD does.

> static inline u_int
> rdtscp(void)
> {
>    uint32_t hi, lo, aux;
>    asm volatile("rdtscp" : "=a"(lo), "=d"(hi), "=c" (aux) : : );
>    skew = get_cpu_skew(aux);
>    return ((uint64_t)lo)|(((uint64_t)hi)<<32) + skew;
> }
>
> Have a nice weekend,
> Paul
>
>

Reply | Threaded
Open this post in threaded view
|

Re: timekeep: fixing large skews on amd64 with RDTSCP

Paul Irofti-4
On 2020-07-11 13:46, Mark Kettenis wrote:

>> From: Paul Irofti <[hidden email]>
>> Date: Sat, 11 Jul 2020 13:32:22 +0300
>>
>> Hi,
>>
>> Getting lots of messages about people loving the new timekeep
>> functionality, which I am very happy about, but also some that have the
>> skew too large for it to be enabled.
>>
>> I plan on sending a diff next week to improve the situation via RDTSCP
>> on the machines that have it. Which is basically all modern machines.
>>
>> The plan is to have an auxiliary value returned by RDTSCP which
>> identifies the CPU we got the info from so that we can look-up its
>> associated skew in a table saved at init inside the timekeep structure:
>
> I think that is the wrong approach.  Instead we should synchronize the
> TSC counters themselves.  There are special MSRs you can write the
> offset into IIRC.  That seems to be what FreeBSD does.

Yes, that is another option. I have not looked to see which are more
popular in terms of hardware. Did the MSRs come with RDTSCP? Before? Or
after? We should choose the most inclusive solution I guess. Or we could
have both...

>
>> static inline u_int
>> rdtscp(void)
>> {
>>     uint32_t hi, lo, aux;
>>     asm volatile("rdtscp" : "=a"(lo), "=d"(hi), "=c" (aux) : : );
>>     skew = get_cpu_skew(aux);
>>     return ((uint64_t)lo)|(((uint64_t)hi)<<32) + skew;
>> }
>>
>> Have a nice weekend,
>> Paul
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: timekeep: fixing large skews on amd64 with RDTSCP

Mark Kettenis
> From: Paul Irofti <[hidden email]>
> Date: Sat, 11 Jul 2020 13:50:37 +0300
>
> On 2020-07-11 13:46, Mark Kettenis wrote:
> >> From: Paul Irofti <[hidden email]>
> >> Date: Sat, 11 Jul 2020 13:32:22 +0300
> >>
> >> Hi,
> >>
> >> Getting lots of messages about people loving the new timekeep
> >> functionality, which I am very happy about, but also some that have the
> >> skew too large for it to be enabled.
> >>
> >> I plan on sending a diff next week to improve the situation via RDTSCP
> >> on the machines that have it. Which is basically all modern machines.
> >>
> >> The plan is to have an auxiliary value returned by RDTSCP which
> >> identifies the CPU we got the info from so that we can look-up its
> >> associated skew in a table saved at init inside the timekeep structure:
> >
> > I think that is the wrong approach.  Instead we should synchronize the
> > TSC counters themselves.  There are special MSRs you can write the
> > offset into IIRC.  That seems to be what FreeBSD does.
>
> Yes, that is another option. I have not looked to see which are more
> popular in terms of hardware. Did the MSRs come with RDTSCP? Before? Or
> after? We should choose the most inclusive solution I guess. Or we could
> have both...

One thing to keep in mind is that we only use the TSC as a timecounter
on CPUs that have the ITSC feature.

Reply | Threaded
Open this post in threaded view
|

Re: timekeep: fixing large skews on amd64 with RDTSCP

Scott Cheloha
On Sat, Jul 11, 2020 at 01:16:43PM +0200, Mark Kettenis wrote:

> > From: Paul Irofti <[hidden email]>
> > Date: Sat, 11 Jul 2020 13:50:37 +0300
> >
> > On 2020-07-11 13:46, Mark Kettenis wrote:
> > >> From: Paul Irofti <[hidden email]>
> > >> Date: Sat, 11 Jul 2020 13:32:22 +0300
> > >>
> > >> Hi,
> > >>
> > >> Getting lots of messages about people loving the new timekeep
> > >> functionality, which I am very happy about, but also some that have the
> > >> skew too large for it to be enabled.
> > >>
> > >> I plan on sending a diff next week to improve the situation via RDTSCP
> > >> on the machines that have it. Which is basically all modern machines.
> > >>
> > >> The plan is to have an auxiliary value returned by RDTSCP which
> > >> identifies the CPU we got the info from so that we can look-up its
> > >> associated skew in a table saved at init inside the timekeep structure:
> > >
> > > I think that is the wrong approach.  Instead we should synchronize the
> > > TSC counters themselves.  There are special MSRs you can write the
> > > offset into IIRC.  That seems to be what FreeBSD does.
> >
> > Yes, that is another option. I have not looked to see which are more
> > popular in terms of hardware. Did the MSRs come with RDTSCP? Before? Or
> > after? We should choose the most inclusive solution I guess. Or we could
> > have both...
>
> One thing to keep in mind is that we only use the TSC as a timecounter
> on CPUs that have the ITSC feature.

In the meantime...

Can we add the missing LFENCE instructions to userspace and the
kernel?  And can we excise the upper 32 bits?  IIRC there was talk of
doing these things anyway, regardless of how the skew problem is
addressed.

My understanding is that you need an "LFENCE sandwich" to prevent the
RDTSC from being reordered in the pipeline.

Index: lib/libc/arch/amd64/gen/usertc.c
===================================================================
RCS file: /cvs/src/lib/libc/arch/amd64/gen/usertc.c,v
retrieving revision 1.2
diff -u -p -r1.2 usertc.c
--- lib/libc/arch/amd64/gen/usertc.c 8 Jul 2020 09:17:48 -0000 1.2
+++ lib/libc/arch/amd64/gen/usertc.c 16 Jul 2020 01:31:31 -0000
@@ -21,9 +21,13 @@
 static inline u_int
 rdtsc(void)
 {
- uint32_t hi, lo;
- asm volatile("rdtsc" : "=a"(lo), "=d"(hi));
- return ((uint64_t)lo)|(((uint64_t)hi)<<32);
+ uint32_t lo;
+
+ asm volatile("lfence");
+ asm volatile("rdtsc" : "=a"(lo));
+ asm volatile("lfence");
+
+ return lo;
 }
 
 static int
Index: sys/arch/amd64/amd64/tsc.c
===================================================================
RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v
retrieving revision 1.19
diff -u -p -r1.19 tsc.c
--- sys/arch/amd64/amd64/tsc.c 6 Jul 2020 13:33:06 -0000 1.19
+++ sys/arch/amd64/amd64/tsc.c 16 Jul 2020 01:31:31 -0000
@@ -211,7 +211,13 @@ cpu_recalibrate_tsc(struct timecounter *
 u_int
 tsc_get_timecount(struct timecounter *tc)
 {
- return rdtsc() + curcpu()->ci_tsc_skew;
+ uint32_t lo;
+
+ asm volatile("lfence");
+ asm volatile("rdtsc" : "=a" (lo));
+ asm volatile("lfence");
+
+ return lo + curcpu()->ci_tsc_skew;
 }
 
 void

Reply | Threaded
Open this post in threaded view
|

Re: timekeep: fixing large skews on amd64 with RDTSCP

Christian Weisgerber
Scott Cheloha:

> Can we add the missing LFENCE instructions to userspace and the
> kernel?  And can we excise the upper 32 bits?

> + uint32_t lo;
> +
> + asm volatile("lfence");
> + asm volatile("rdtsc" : "=a"(lo));

That's wrong.  rtdsc will clobber %rdx, whether you use that value
or not.  You need a corresponding constraint:

  asm volatile("rdtsc" : "=a"(lo) : : "rdx");

--
Christian "naddy" Weisgerber                          [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: timekeep: fixing large skews on amd64 with RDTSCP

Mark Kettenis
In reply to this post by Scott Cheloha
> Date: Wed, 15 Jul 2020 20:36:04 -0500
> From: Scott Cheloha <[hidden email]>
>
> On Sat, Jul 11, 2020 at 01:16:43PM +0200, Mark Kettenis wrote:
> > > From: Paul Irofti <[hidden email]>
> > > Date: Sat, 11 Jul 2020 13:50:37 +0300
> > >
> > > On 2020-07-11 13:46, Mark Kettenis wrote:
> > > >> From: Paul Irofti <[hidden email]>
> > > >> Date: Sat, 11 Jul 2020 13:32:22 +0300
> > > >>
> > > >> Hi,
> > > >>
> > > >> Getting lots of messages about people loving the new timekeep
> > > >> functionality, which I am very happy about, but also some that have the
> > > >> skew too large for it to be enabled.
> > > >>
> > > >> I plan on sending a diff next week to improve the situation via RDTSCP
> > > >> on the machines that have it. Which is basically all modern machines.
> > > >>
> > > >> The plan is to have an auxiliary value returned by RDTSCP which
> > > >> identifies the CPU we got the info from so that we can look-up its
> > > >> associated skew in a table saved at init inside the timekeep structure:
> > > >
> > > > I think that is the wrong approach.  Instead we should synchronize the
> > > > TSC counters themselves.  There are special MSRs you can write the
> > > > offset into IIRC.  That seems to be what FreeBSD does.
> > >
> > > Yes, that is another option. I have not looked to see which are more
> > > popular in terms of hardware. Did the MSRs come with RDTSCP? Before? Or
> > > after? We should choose the most inclusive solution I guess. Or we could
> > > have both...
> >
> > One thing to keep in mind is that we only use the TSC as a timecounter
> > on CPUs that have the ITSC feature.
>
> In the meantime...
>
> Can we add the missing LFENCE instructions to userspace and the
> kernel?  And can we excise the upper 32 bits?  IIRC there was talk of
> doing these things anyway, regardless of how the skew problem is
> addressed.
>
> My understanding is that you need an "LFENCE sandwich" to prevent the
> RDTSC from being reordered in the pipeline.

Hmm, it is my understanding that only the lfence before is needed,
that is why rdtsc_lfence() looks the way it looks.  If that isn't the
case, that function should probably be adapted.

> Index: lib/libc/arch/amd64/gen/usertc.c
> ===================================================================
> RCS file: /cvs/src/lib/libc/arch/amd64/gen/usertc.c,v
> retrieving revision 1.2
> diff -u -p -r1.2 usertc.c
> --- lib/libc/arch/amd64/gen/usertc.c 8 Jul 2020 09:17:48 -0000 1.2
> +++ lib/libc/arch/amd64/gen/usertc.c 16 Jul 2020 01:31:31 -0000
> @@ -21,9 +21,13 @@
>  static inline u_int
>  rdtsc(void)
>  {
> - uint32_t hi, lo;
> - asm volatile("rdtsc" : "=a"(lo), "=d"(hi));
> - return ((uint64_t)lo)|(((uint64_t)hi)<<32);
> + uint32_t lo;
> +
> + asm volatile("lfence");
> + asm volatile("rdtsc" : "=a"(lo));
> + asm volatile("lfence");
> +
> + return lo;
>  }
>  
>  static int
> Index: sys/arch/amd64/amd64/tsc.c
> ===================================================================
> RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v
> retrieving revision 1.19
> diff -u -p -r1.19 tsc.c
> --- sys/arch/amd64/amd64/tsc.c 6 Jul 2020 13:33:06 -0000 1.19
> +++ sys/arch/amd64/amd64/tsc.c 16 Jul 2020 01:31:31 -0000
> @@ -211,7 +211,13 @@ cpu_recalibrate_tsc(struct timecounter *
>  u_int
>  tsc_get_timecount(struct timecounter *tc)
>  {
> - return rdtsc() + curcpu()->ci_tsc_skew;
> + uint32_t lo;
> +
> + asm volatile("lfence");
> + asm volatile("rdtsc" : "=a" (lo));
> + asm volatile("lfence");
> +
> + return lo + curcpu()->ci_tsc_skew;
>  }
>  
>  void
>
>

Reply | Threaded
Open this post in threaded view
|

Re: timekeep: fixing large skews on amd64 with RDTSCP

Scott Cheloha
On Thu, Jul 16, 2020 at 02:14:45PM +0200, Christian Weisgerber wrote:

> Scott Cheloha:
>
> > Can we add the missing LFENCE instructions to userspace and the
> > kernel?  And can we excise the upper 32 bits?
>
> > + uint32_t lo;
> > +
> > + asm volatile("lfence");
> > + asm volatile("rdtsc" : "=a"(lo));
>
> That's wrong.  rtdsc will clobber %rdx, whether you use that value
> or not.  You need a corresponding constraint:
>
>   asm volatile("rdtsc" : "=a"(lo) : : "rdx");

Whoops.  Thanks!  Updated below.

On Thu, Jul 16, 2020 at 02:26:56PM +0200, Mark Kettenis wrote:

> > Date: Wed, 15 Jul 2020 20:36:04 -0500
> > From: Scott Cheloha <[hidden email]>
> >
> > [...]
> >
> > My understanding is that you need an "LFENCE sandwich" to prevent the
> > RDTSC from being reordered in the pipeline.
>
> Hmm, it is my understanding that only the lfence before is needed,
> that is why rdtsc_lfence() looks the way it looks.  If that isn't the
> case, that function should probably be adapted.

My reasoning is based on the ISA reference:

https://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.html

In particular (p. 1197; Vol. 2B 4-545):

> The RDTSC instruction is not a serializing instruction.
>
> It does not necessarily wait until all previous instructions
> have been executed before reading the counter.
>
> Similarly, subsequent instructions may begin execution before the
> read operation is performed.
>
> If software requires RDTSC to be executed only after all previous
> instructions have completed locally, it can either use RDTSCP (if
> the processor supports that instruction) or execute the sequence
> LFENCE;RDTSC.

Note the third sentence.

Given that, I reason that a serializing instruction before *and* after
the RDTSC should freeze it in place.

Whether or not there are diminishing returns on the 2nd LFENCE is
beyond what I can comment on.  However, if you want a guarantee that
it won't move around at all I think you need both LFENCEs.

Updated patch adds the second LFENCE to rdtsc_lfence() in cpufunc.h.

Index: lib/libc/arch/amd64/gen/usertc.c
===================================================================
RCS file: /cvs/src/lib/libc/arch/amd64/gen/usertc.c,v
retrieving revision 1.2
diff -u -p -r1.2 usertc.c
--- lib/libc/arch/amd64/gen/usertc.c 8 Jul 2020 09:17:48 -0000 1.2
+++ lib/libc/arch/amd64/gen/usertc.c 16 Jul 2020 17:21:28 -0000
@@ -21,9 +21,13 @@
 static inline u_int
 rdtsc(void)
 {
- uint32_t hi, lo;
- asm volatile("rdtsc" : "=a"(lo), "=d"(hi));
- return ((uint64_t)lo)|(((uint64_t)hi)<<32);
+ uint32_t lo;
+
+ asm volatile("lfence");
+ asm volatile("rdtsc" : "=a"(lo) : : "rdx");
+ asm volatile("lfence");
+
+ return lo;
 }
 
 static int
Index: sys/arch/amd64/include/cpufunc.h
===================================================================
RCS file: /cvs/src/sys/arch/amd64/include/cpufunc.h,v
retrieving revision 1.35
diff -u -p -r1.35 cpufunc.h
--- sys/arch/amd64/include/cpufunc.h 3 Jul 2020 17:54:27 -0000 1.35
+++ sys/arch/amd64/include/cpufunc.h 16 Jul 2020 17:21:28 -0000
@@ -296,7 +296,9 @@ rdtsc_lfence(void)
 {
  uint32_t hi, lo;
 
- __asm volatile("lfence; rdtsc" : "=d" (hi), "=a" (lo));
+ __asm volatile("lfence");
+ __asm volatile("rdtsc" : "=d" (hi), "=a" (lo));
+ __asm volatile("lfence");
  return (((uint64_t)hi << 32) | (uint64_t) lo);
 }
 
Index: sys/arch/amd64/amd64/tsc.c
===================================================================
RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v
retrieving revision 1.19
diff -u -p -r1.19 tsc.c
--- sys/arch/amd64/amd64/tsc.c 6 Jul 2020 13:33:06 -0000 1.19
+++ sys/arch/amd64/amd64/tsc.c 16 Jul 2020 17:21:28 -0000
@@ -211,7 +211,13 @@ cpu_recalibrate_tsc(struct timecounter *
 u_int
 tsc_get_timecount(struct timecounter *tc)
 {
- return rdtsc() + curcpu()->ci_tsc_skew;
+ uint32_t lo;
+
+ asm volatile("lfence");
+ asm volatile("rdtsc" : "=a"(lo) : : "rdx");
+ asm volatile("lfence");
+
+ return lo + curcpu()->ci_tsc_skew;
 }
 
 void

Reply | Threaded
Open this post in threaded view
|

Re: timekeep: fixing large skews on amd64 with RDTSCP

Theo de Raadt-2
> Note the third sentence.
>
> Given that, I reason that a serializing instruction before *and* after
> the RDTSC should freeze it in place.

I haven't seen anyone read it that way.

Reply | Threaded
Open this post in threaded view
|

Re: timekeep: fixing large skews on amd64 with RDTSCP

Scott Cheloha
> On Jul 16, 2020, at 19:36, Theo de Raadt <[hidden email]> wrote:
>
>> Note the third sentence.
>>
>> Given that, I reason that a serializing instruction before *and* after
>> the RDTSC should freeze it in place.
>
> I haven't seen anyone read it that way.

They say that instructions after RDTSC can run before it because
it isn't a serializing instruction.

Do we want that?

And then, consider this bit of programming advice.  Also from the
ISA reference (Vol. 2B 4-547):

> If software requires RDTSC to be executed prior to execution of any
> subsequent instruction (including any memory accesses), it can
> execute the sequence LFENCE immediately after RDTSC.

What other reading is possible given this documentation?

What is your interpretation?  Am I missing something?

Reply | Threaded
Open this post in threaded view
|

Re: timekeep: fixing large skews on amd64 with RDTSCP

Philip Guenther-2
On Thu, Jul 16, 2020 at 4:55 PM Scott Cheloha <[hidden email]>
wrote:

> > On Jul 16, 2020, at 19:36, Theo de Raadt <[hidden email]> wrote:
> >
> >> Note the third sentence.
> >>
> >> Given that, I reason that a serializing instruction before *and* after
> >> the RDTSC should freeze it in place.
> >
> > I haven't seen anyone read it that way.
>
> They say that instructions after RDTSC can run before it because
> it isn't a serializing instruction.
>
> Do we want that?
>
> And then, consider this bit of programming advice.  Also from the
> ISA reference (Vol. 2B 4-547):
>
> > If software requires RDTSC to be executed prior to execution of any
> > subsequent instruction (including any memory accesses), it can
> > execute the sequence LFENCE immediately after RDTSC.
>
> What other reading is possible given this documentation?
>
> What is your interpretation?  Am I missing something?
>

If you're trying to time a sequence of instructions via rdtsc, then you
have to put an lfence after the first rdtsc (so that the sequence being
timed doesn't get lifted to before it completes) and an lfence before the
second rdtsc (so that the sequence is actually complete before the second
one).  In this case, is it a problem if instructions after the rdtsc that
are not dependent on its result may be started before it's actually
complete?  If not, then there's no reason to bracket that side.


Philip Guenther
Reply | Threaded
Open this post in threaded view
|

Re: timekeep: fixing large skews on amd64 with RDTSCP

Scott Cheloha
On Fri, Jul 17, 2020 at 10:47:50AM -0900, Philip Guenther wrote:

> On Thu, Jul 16, 2020 at 4:55 PM Scott Cheloha <[hidden email]>
> wrote:
>
> > > On Jul 16, 2020, at 19:36, Theo de Raadt <[hidden email]> wrote:
> > >
> > >> Note the third sentence.
> > >>
> > >> Given that, I reason that a serializing instruction before *and* after
> > >> the RDTSC should freeze it in place.
> > >
> > > I haven't seen anyone read it that way.
> >
> > They say that instructions after RDTSC can run before it because
> > it isn't a serializing instruction.
> >
> > Do we want that?
> >
> > And then, consider this bit of programming advice.  Also from the
> > ISA reference (Vol. 2B 4-547):
> >
> > > If software requires RDTSC to be executed prior to execution of any
> > > subsequent instruction (including any memory accesses), it can
> > > execute the sequence LFENCE immediately after RDTSC.
> >
> > What other reading is possible given this documentation?
> >
> > What is your interpretation?  Am I missing something?
>
> If you're trying to time a sequence of instructions via rdtsc, then you
> have to put an lfence after the first rdtsc (so that the sequence being
> timed doesn't get lifted to before it completes) and an lfence before the
> second rdtsc (so that the sequence is actually complete before the second
> one).  In this case, is it a problem if instructions after the rdtsc that
> are not dependent on its result may be started before it's actually
> complete?  If not, then there's no reason to bracket that side.

Oh, okay.

So, just a single preceding LFENCE for the timecounter will suffice.
There's no reason to wait for RDTSC to complete before doing the other
stuff in the loop.

As for rdtsc_lfence(), it sounds like it's entirely context-dependent
whether you want to plant a second LFENCE or not.  I guess I'll just
leave it alone.

Updated patch below.

Anyone ok?

Index: lib/libc/arch/amd64/gen/usertc.c
===================================================================
RCS file: /cvs/src/lib/libc/arch/amd64/gen/usertc.c,v
retrieving revision 1.2
diff -u -p -r1.2 usertc.c
--- lib/libc/arch/amd64/gen/usertc.c 8 Jul 2020 09:17:48 -0000 1.2
+++ lib/libc/arch/amd64/gen/usertc.c 25 Jul 2020 17:50:38 -0000
@@ -21,9 +21,12 @@
 static inline u_int
 rdtsc(void)
 {
- uint32_t hi, lo;
- asm volatile("rdtsc" : "=a"(lo), "=d"(hi));
- return ((uint64_t)lo)|(((uint64_t)hi)<<32);
+ uint32_t lo;
+
+ asm volatile("lfence");
+ asm volatile("rdtsc" : "=a"(lo) : : "rdx");
+
+ return lo;
 }
 
 static int
Index: sys/arch/amd64/amd64/tsc.c
===================================================================
RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v
retrieving revision 1.19
diff -u -p -r1.19 tsc.c
--- sys/arch/amd64/amd64/tsc.c 6 Jul 2020 13:33:06 -0000 1.19
+++ sys/arch/amd64/amd64/tsc.c 25 Jul 2020 17:50:38 -0000
@@ -211,7 +211,12 @@ cpu_recalibrate_tsc(struct timecounter *
 u_int
 tsc_get_timecount(struct timecounter *tc)
 {
- return rdtsc() + curcpu()->ci_tsc_skew;
+ uint32_t lo;
+
+ asm volatile("lfence");
+ asm volatile("rdtsc" : "=a"(lo) : : "rdx");
+
+ return lo + curcpu()->ci_tsc_skew;
 }
 
 void

Reply | Threaded
Open this post in threaded view
|

Re: timekeep: fixing large skews on amd64 with RDTSCP

Christian Weisgerber
Scott Cheloha:

> --- lib/libc/arch/amd64/gen/usertc.c 8 Jul 2020 09:17:48 -0000 1.2
> +++ lib/libc/arch/amd64/gen/usertc.c 25 Jul 2020 17:50:38 -0000
> @@ -21,9 +21,12 @@
>  static inline u_int
>  rdtsc(void)
>  {
> - uint32_t hi, lo;
> - asm volatile("rdtsc" : "=a"(lo), "=d"(hi));
> - return ((uint64_t)lo)|(((uint64_t)hi)<<32);
> + uint32_t lo;
> +
> + asm volatile("lfence");
> + asm volatile("rdtsc" : "=a"(lo) : : "rdx");

Is there a guarantee that two separate asm()s will not be reordered?

> +
> + return lo;
>  }
>  
>  static int
> --- sys/arch/amd64/amd64/tsc.c 6 Jul 2020 13:33:06 -0000 1.19
> +++ sys/arch/amd64/amd64/tsc.c 25 Jul 2020 17:50:38 -0000
> @@ -211,7 +211,12 @@ cpu_recalibrate_tsc(struct timecounter *
>  u_int
>  tsc_get_timecount(struct timecounter *tc)
>  {
> - return rdtsc() + curcpu()->ci_tsc_skew;
> + uint32_t lo;
> +
> + asm volatile("lfence");
> + asm volatile("rdtsc" : "=a"(lo) : : "rdx");
> +
> + return lo + curcpu()->ci_tsc_skew;
>  }
>  
>  void
>

I'd just do s/rdtsc/rdtsc_lfence/, which would agree well with the
rest of the file.

--
Christian "naddy" Weisgerber                          [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: timekeep: fixing large skews on amd64 with RDTSCP

Mark Kettenis
> Date: Mon, 27 Jul 2020 17:14:21 +0200
> From: Christian Weisgerber <[hidden email]>
>
> Scott Cheloha:
>
> > --- lib/libc/arch/amd64/gen/usertc.c 8 Jul 2020 09:17:48 -0000 1.2
> > +++ lib/libc/arch/amd64/gen/usertc.c 25 Jul 2020 17:50:38 -0000
> > @@ -21,9 +21,12 @@
> >  static inline u_int
> >  rdtsc(void)
> >  {
> > - uint32_t hi, lo;
> > - asm volatile("rdtsc" : "=a"(lo), "=d"(hi));
> > - return ((uint64_t)lo)|(((uint64_t)hi)<<32);
> > + uint32_t lo;
> > +
> > + asm volatile("lfence");
> > + asm volatile("rdtsc" : "=a"(lo) : : "rdx");
>
> Is there a guarantee that two separate asm()s will not be reordered?

I believe that is true for "volatile" asm statements.  But this is all
not very well documented and I believe that the compiler may hoist
bits of C code in between, which is probably not what you want.

Note that since "asm" is non-standard C, we favour spelling it as
"__asm" since that makes the compiler shut up about it even if you
request stricter C standard compliance.

And given the kernel bit nelow...

> > +
> > + return lo;
> >  }
> >  
> >  static int
> > --- sys/arch/amd64/amd64/tsc.c 6 Jul 2020 13:33:06 -0000 1.19
> > +++ sys/arch/amd64/amd64/tsc.c 25 Jul 2020 17:50:38 -0000
> > @@ -211,7 +211,12 @@ cpu_recalibrate_tsc(struct timecounter *
> >  u_int
> >  tsc_get_timecount(struct timecounter *tc)
> >  {
> > - return rdtsc() + curcpu()->ci_tsc_skew;
> > + uint32_t lo;
> > +
> > + asm volatile("lfence");
> > + asm volatile("rdtsc" : "=a"(lo) : : "rdx");
> > +
> > + return lo + curcpu()->ci_tsc_skew;
> >  }
> >  
> >  void
> >
>
> I'd just do s/rdtsc/rdtsc_lfence/, which would agree well with the
> rest of the file.

Agreed.  And I would really prefer that the libc code stays as close
to the kernel code as possible.

Reply | Threaded
Open this post in threaded view
|

Re: timekeep: fixing large skews on amd64 with RDTSCP

Paul Irofti-4
On 2020-07-27 18:24, Mark Kettenis wrote:

>> Date: Mon, 27 Jul 2020 17:14:21 +0200
>> From: Christian Weisgerber <[hidden email]>
>>
>> Scott Cheloha:
>>
>>> --- lib/libc/arch/amd64/gen/usertc.c 8 Jul 2020 09:17:48 -0000 1.2
>>> +++ lib/libc/arch/amd64/gen/usertc.c 25 Jul 2020 17:50:38 -0000
>>> @@ -21,9 +21,12 @@
>>>   static inline u_int
>>>   rdtsc(void)
>>>   {
>>> - uint32_t hi, lo;
>>> - asm volatile("rdtsc" : "=a"(lo), "=d"(hi));
>>> - return ((uint64_t)lo)|(((uint64_t)hi)<<32);
>>> + uint32_t lo;
>>> +
>>> + asm volatile("lfence");
>>> + asm volatile("rdtsc" : "=a"(lo) : : "rdx");
>>
>> Is there a guarantee that two separate asm()s will not be reordered?
>
> I believe that is true for "volatile" asm statements.  But this is all
> not very well documented and I believe that the compiler may hoist
> bits of C code in between, which is probably not what you want.
>
> Note that since "asm" is non-standard C, we favour spelling it as
> "__asm" since that makes the compiler shut up about it even if you
> request stricter C standard compliance.
>
> And given the kernel bit nelow...
>
>>> +
>>> + return lo;
>>>   }
>>>  
>>>   static int
>>> --- sys/arch/amd64/amd64/tsc.c 6 Jul 2020 13:33:06 -0000 1.19
>>> +++ sys/arch/amd64/amd64/tsc.c 25 Jul 2020 17:50:38 -0000
>>> @@ -211,7 +211,12 @@ cpu_recalibrate_tsc(struct timecounter *
>>>   u_int
>>>   tsc_get_timecount(struct timecounter *tc)
>>>   {
>>> - return rdtsc() + curcpu()->ci_tsc_skew;
>>> + uint32_t lo;
>>> +
>>> + asm volatile("lfence");
>>> + asm volatile("rdtsc" : "=a"(lo) : : "rdx");
>>> +
>>> + return lo + curcpu()->ci_tsc_skew;
>>>   }
>>>  
>>>   void
>>>
>>
>> I'd just do s/rdtsc/rdtsc_lfence/, which would agree well with the
>> rest of the file.
>
> Agreed.  And I would really prefer that the libc code stays as close
> to the kernel code as possible.

Is the issue with LFENCE slowing down the network stack settled? That
was the argument against it last time.