Clarify drand48() return values

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Clarify drand48() return values

j-62

Clarify that drand48 returns values not including 1.0.

Index: src/lib/libc/stdlib/rand48.3
===================================================================
RCS file: /cvs/src/lib/libc/stdlib/rand48.3,v
retrieving revision 1.20
diff -u -r1.20 rand48.3
--- src/lib/libc/stdlib/rand48.3 10 Nov 2015 23:48:18 -0000 1.20
+++ src/lib/libc/stdlib/rand48.3 20 Dec 2019 00:08:24 -0000
@@ -101,7 +101,8 @@
 return values of type double.
 The full 48 bits of r(n+1) are
 loaded into the mantissa of the returned value, with the exponent set
-such that the values produced lie in the interval [0.0, 1.0].
+such that the values produced lie in the interval (0.0, 1.0] (which
+excludes 1.0 and includes 0.0).
 .Pp
 .Fn lrand48
 and

Reply | Threaded
Open this post in threaded view
|

Re: Clarify drand48() return values

Joerg Sonnenberger-2
On Thu, Dec 19, 2019 at 04:19:36PM -0800, [hidden email] wrote:

>
> Clarify that drand48 returns values not including 1.0.
>
> Index: src/lib/libc/stdlib/rand48.3
> ===================================================================
> RCS file: /cvs/src/lib/libc/stdlib/rand48.3,v
> retrieving revision 1.20
> diff -u -r1.20 rand48.3
> --- src/lib/libc/stdlib/rand48.3 10 Nov 2015 23:48:18 -0000 1.20
> +++ src/lib/libc/stdlib/rand48.3 20 Dec 2019 00:08:24 -0000
> @@ -101,7 +101,8 @@
>  return values of type double.
>  The full 48 bits of r(n+1) are
>  loaded into the mantissa of the returned value, with the exponent set
> -such that the values produced lie in the interval [0.0, 1.0].
> +such that the values produced lie in the interval (0.0, 1.0] (which
> +excludes 1.0 and includes 0.0).
>  .Pp
>  .Fn lrand48
>  and

...except that your patch says the reverse.

Joerg

Reply | Threaded
Open this post in threaded view
|

Re: Clarify drand48() return values

Theo de Raadt-2
In reply to this post by j-62
[hidden email] wrote:

> Clarify that drand48 returns values not including 1.0.
>
> Index: src/lib/libc/stdlib/rand48.3
> ===================================================================
> RCS file: /cvs/src/lib/libc/stdlib/rand48.3,v
> retrieving revision 1.20
> diff -u -r1.20 rand48.3
> --- src/lib/libc/stdlib/rand48.3 10 Nov 2015 23:48:18 -0000 1.20
> +++ src/lib/libc/stdlib/rand48.3 20 Dec 2019 00:08:24 -0000
> @@ -101,7 +101,8 @@
>  return values of type double.
>  The full 48 bits of r(n+1) are
>  loaded into the mantissa of the returned value, with the exponent set
> -such that the values produced lie in the interval [0.0, 1.0].
> +such that the values produced lie in the interval (0.0, 1.0] (which
> +excludes 1.0 and includes 0.0).
>  .Pp
>  .Fn lrand48
>  and

It's a mathematical notation that anyone using this page should
understand because it comes with the territory.  Once you are touching
doubles, soon you'll be looking at functions in libm proper, where this
type of notation is even more common.  Will you then propose to change
all those as well?

I think understanding the landscape's notation is a requirement, and we
don't need to say things a 2nd time in baby talk.

Reply | Threaded
Open this post in threaded view
|

Re: Clarify drand48() return values

Andras Farkas
On Thu, Dec 19, 2019 at 10:05 PM Theo de Raadt <[hidden email]> wrote:
> It's a mathematical notation that anyone using this page should
> understand because it comes with the territory.
> [snip]
>
> I think understanding the landscape's notation is a requirement, and we
> don't need to say things a 2nd time in baby talk.
I agree it doesn't need to be repeated, but I think there's value in
explicitly showing whether an interval is open or closed.
Though, in this case, the interval would be correctly expressed as
[0.0, 1.0)
or
[0.0, 1.0[
rather than how j's diff does it.

I attached a diff which I feel concisely does this.  I elected to not
change the latter two of the three intervals in the man page, since
they already included -1 in their upper bound.  But I also have that
as an option, via largediff.txt

rand48dot3diff.txt (788 bytes) Download Attachment
largediff.txt (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Clarify drand48() return values

Andras Farkas
Not to appeal to majority, but to compare and contrast...
FreeBSD, NetBSD, POSIX, and Solaris all use the correct (or the more
explicit) interval notation for [0.0, 1.0) in drand48.3
https://www.freebsd.org/cgi/man.cgi?query=drand48&apropos=0&sektion=3&manpath=FreeBSD+12.1-RELEASE+and+Ports&arch=default&format=html
https://netbsd.gw.com/cgi-bin/man-cgi?drand48+3
https://pubs.opengroup.org/onlinepubs/9699919799/functions/drand48.html
https://docs.oracle.com/cd/E88353_01/html/E37843/drand48-3c.html
https://docs.oracle.com/cd/E26505_01/html/816-5168/drand48-3c.html
(though, Solaris's latter two of three intervals are wrong, unless
they really do mean their upper bound is inclusive)

On Thu, Dec 19, 2019 at 10:48 PM Andras Farkas
<[hidden email]> wrote:

>
> On Thu, Dec 19, 2019 at 10:05 PM Theo de Raadt <[hidden email]> wrote:
> > It's a mathematical notation that anyone using this page should
> > understand because it comes with the territory.
> > [snip]
> >
> > I think understanding the landscape's notation is a requirement, and we
> > don't need to say things a 2nd time in baby talk.
> I agree it doesn't need to be repeated, but I think there's value in
> explicitly showing whether an interval is open or closed.
> Though, in this case, the interval would be correctly expressed as
> [0.0, 1.0)
> or
> [0.0, 1.0[
> rather than how j's diff does it.
>
> I attached a diff which I feel concisely does this.  I elected to not
> change the latter two of the three intervals in the man page, since
> they already included -1 in their upper bound.  But I also have that
> as an option, via largediff.txt

Reply | Threaded
Open this post in threaded view
|

Re: Clarify drand48() return values

Theo de Raadt-2
Yes, it should be fixed.  I'll let the math nerds crawl through the
tree looking for additional errors.

My objection stands, that we should not dumb this down.

Andras Farkas <[hidden email]> wrote:

> Not to appeal to majority, but to compare and contrast...
> FreeBSD, NetBSD, POSIX, and Solaris all use the correct (or the more
> explicit) interval notation for [0.0, 1.0) in drand48.3
> https://www.freebsd.org/cgi/man.cgi?query=drand48&apropos=0&sektion=3&manpath=FreeBSD+12.1-RELEASE+and+Ports&arch=default&format=html
> https://netbsd.gw.com/cgi-bin/man-cgi?drand48+3
> https://pubs.opengroup.org/onlinepubs/9699919799/functions/drand48.html
> https://docs.oracle.com/cd/E88353_01/html/E37843/drand48-3c.html
> https://docs.oracle.com/cd/E26505_01/html/816-5168/drand48-3c.html
> (though, Solaris's latter two of three intervals are wrong, unless
> they really do mean their upper bound is inclusive)
>
> On Thu, Dec 19, 2019 at 10:48 PM Andras Farkas
> <[hidden email]> wrote:
> >
> > On Thu, Dec 19, 2019 at 10:05 PM Theo de Raadt <[hidden email]> wrote:
> > > It's a mathematical notation that anyone using this page should
> > > understand because it comes with the territory.
> > > [snip]
> > >
> > > I think understanding the landscape's notation is a requirement, and we
> > > don't need to say things a 2nd time in baby talk.
> > I agree it doesn't need to be repeated, but I think there's value in
> > explicitly showing whether an interval is open or closed.
> > Though, in this case, the interval would be correctly expressed as
> > [0.0, 1.0)
> > or
> > [0.0, 1.0[
> > rather than how j's diff does it.
> >
> > I attached a diff which I feel concisely does this.  I elected to not
> > change the latter two of the three intervals in the man page, since
> > they already included -1 in their upper bound.  But I also have that
> > as an option, via largediff.txt

Reply | Threaded
Open this post in threaded view
|

Re: Clarify drand48() return values

j-62
OK ok ok.  I admit and agree my original patch was flawed.  Dumbing down
need
not be done just for my benefit.

--J

On 2019-12-19 21:34, Theo de Raadt wrote:

> Yes, it should be fixed.  I'll let the math nerds crawl through the
> tree looking for additional errors.
>
> My objection stands, that we should not dumb this down.
>
> Andras Farkas <[hidden email]> wrote:
>
>> Not to appeal to majority, but to compare and contrast...
>> FreeBSD, NetBSD, POSIX, and Solaris all use the correct (or the more
>> explicit) interval notation for [0.0, 1.0) in drand48.3
>> https://www.freebsd.org/cgi/man.cgi?query=drand48&apropos=0&sektion=3&manpath=FreeBSD+12.1-RELEASE+and+Ports&arch=default&format=html
>> https://netbsd.gw.com/cgi-bin/man-cgi?drand48+3
>> https://pubs.opengroup.org/onlinepubs/9699919799/functions/drand48.html
>> https://docs.oracle.com/cd/E88353_01/html/E37843/drand48-3c.html
>> https://docs.oracle.com/cd/E26505_01/html/816-5168/drand48-3c.html
>> (though, Solaris's latter two of three intervals are wrong, unless
>> they really do mean their upper bound is inclusive)
>>
>> On Thu, Dec 19, 2019 at 10:48 PM Andras Farkas
>> <[hidden email]> wrote:
>> >
>> > On Thu, Dec 19, 2019 at 10:05 PM Theo de Raadt <[hidden email]> wrote:
>> > > It's a mathematical notation that anyone using this page should
>> > > understand because it comes with the territory.
>> > > [snip]
>> > >
>> > > I think understanding the landscape's notation is a requirement, and we
>> > > don't need to say things a 2nd time in baby talk.
>> > I agree it doesn't need to be repeated, but I think there's value in
>> > explicitly showing whether an interval is open or closed.
>> > Though, in this case, the interval would be correctly expressed as
>> > [0.0, 1.0)
>> > or
>> > [0.0, 1.0[
>> > rather than how j's diff does it.
>> >
>> > I attached a diff which I feel concisely does this.  I elected to not
>> > change the latter two of the three intervals in the man page, since
>> > they already included -1 in their upper bound.  But I also have that
>> > as an option, via largediff.txt

Reply | Threaded
Open this post in threaded view
|

Re: Clarify drand48() return values

j-62
In reply to this post by j-62
For completeness:


Index: src/lib/libc/stdlib/rand48.3
===================================================================
RCS file: /cvs/src/lib/libc/stdlib/rand48.3,v
retrieving revision 1.20
diff -u -r1.20 rand48.3
--- src/lib/libc/stdlib/rand48.3 10 Nov 2015 23:48:18 -0000 1.20
+++ src/lib/libc/stdlib/rand48.3 20 Dec 2019 18:07:57 -0000
@@ -101,7 +101,7 @@
 return values of type double.
 The full 48 bits of r(n+1) are
 loaded into the mantissa of the returned value, with the exponent set
-such that the values produced lie in the interval [0.0, 1.0].
+such that the values produced lie in the interval [0.0, 1.0).
 .Pp
 .Fn lrand48
 and

Reply | Threaded
Open this post in threaded view
|

Re: Clarify drand48() return values

Theo Buehler-3
On Fri, Dec 20, 2019 at 10:09:46AM -0800, [hidden email] wrote:
> For completeness:

thanks, committed

Reply | Threaded
Open this post in threaded view
|

Re: Clarify drand48() return values

Alexander Nasonov
In reply to this post by j-62
[hidden email] wrote:
>
> Clarify that drand48 returns values not including 1.0.

It's not clear from the documentation whether drand48 can generate
a denormal number. If it can't, you can exclude 0.0 because it's
a denormal ;-)

--
Alex

Reply | Threaded
Open this post in threaded view
|

Re: Clarify drand48() return values

Ingo Schwarze
Hi,

Alexander Nasonov wrote on Fri, Dec 20, 2019 at 08:33:40PM +0000:
> [hidden email] wrote:

>> Clarify that drand48 returns values not including 1.0.

> It's not clear from the documentation whether drand48 can generate
> a denormal number. If it can't, you can exclude 0.0 because it's
> a denormal ;-)

Looking at our code in lib/libc/stdlib/drand48.c, i conclude that
drand48(3) does return 0.0 with a probability of 2^-48.

More generally, the function returns a uniform distribution of
numbers from the set {2^-48 * n | n integer and 0 <= n < 2^48}.

Talking about loading bits into the mantissa and adjusting the
exponent feels mildly confusing, given that the distribution is
simply uniform over a fixed finite set.  I'm not sending a patch
because i never looked at how floating point representation works
internally, so i would likely only make matters worse.

Yours,
  Ingo

Reply | Threaded
Open this post in threaded view
|

Re: Clarify drand48() return values

Alexander Nasonov
Ingo Schwarze wrote:
> Looking at our code in lib/libc/stdlib/drand48.c, i conclude that
> drand48(3) does return 0.0 with a probability of 2^-48.

I looked at the code too and I have some comments.

> More generally, the function returns a uniform distribution of
> numbers from the set {2^-48 * n | n integer and 0 <= n < 2^48}.

You don't need three ldexp calls to compose 2^-48 * n:

        uint64_t n = (uint64_t)xseed[2] << 32 | xseed[1] << 16 | xseed[0];
        return ldexp((double)n, -48);

> Talking about loading bits into the mantissa and adjusting the
> exponent feels mildly confusing, given that the distribution is
> simply uniform over a fixed finite set.  I'm not sending a patch
> because i never looked at how floating point representation works
> internally, so i would likely only make matters worse.

Not sure if you wanted to patch the man page or the code.

If the latter, take a binary representation of 1.0 and replace 52
least significant bits with (pseudo)random bits then subtract 1.0.

        double d = 1.0;
        uint64_t u = 0;
        memcpy(&u, &d, sizeof(d));
        u |= n << 4; /* scale 48 bits to 52 bits, n is from the previous snippet */
        memcpy(&d, &u, sizeof(u));
        return d - 1.0;

It will only work with IEEE 754 compliant doubles, though.

--
Alex

Reply | Threaded
Open this post in threaded view
|

Re: Clarify drand48() return values

Theo de Raadt-2
In reply to this post by Ingo Schwarze
Ingo Schwarze <[hidden email]> wrote:

> Alexander Nasonov wrote on Fri, Dec 20, 2019 at 08:33:40PM +0000:
> > [hidden email] wrote:
>
> >> Clarify that drand48 returns values not including 1.0.
>
> > It's not clear from the documentation whether drand48 can generate
> > a denormal number. If it can't, you can exclude 0.0 because it's
> > a denormal ;-)
>
> Looking at our code in lib/libc/stdlib/drand48.c, i conclude that
> drand48(3) does return 0.0 with a probability of 2^-48.
>
> More generally, the function returns a uniform distribution of
> numbers from the set {2^-48 * n | n integer and 0 <= n < 2^48}.
>
> Talking about loading bits into the mantissa and adjusting the
> exponent feels mildly confusing, given that the distribution is
> simply uniform over a fixed finite set.  I'm not sending a patch
> because i never looked at how floating point representation works
> internally, so i would likely only make matters worse.

In which mode, the deterministic mode or the actual random mode? :)

Reply | Threaded
Open this post in threaded view
|

Re: Clarify drand48() return values

Ingo Schwarze
Hi Theo,

Theo de Raadt wrote on Sat, Dec 21, 2019 at 04:29:24PM -0700:
> Ingo Schwarze <[hidden email]> wrote:
>> Alexander Nasonov wrote on Fri, Dec 20, 2019 at 08:33:40PM +0000:
>>> [hidden email] wrote:

>>>> Clarify that drand48 returns values not including 1.0.

>>> It's not clear from the documentation whether drand48 can generate
>>> a denormal number. If it can't, you can exclude 0.0 because it's
>>> a denormal ;-)

>> Looking at our code in lib/libc/stdlib/drand48.c, i conclude that
>> drand48(3) does return 0.0 with a probability of 2^-48.
>>
>> More generally, the function returns a uniform distribution of
>> numbers from the set {2^-48 * n | n integer and 0 <= n < 2^48}.
>>
>> Talking about loading bits into the mantissa and adjusting the
>> exponent feels mildly confusing, given that the distribution is
>> simply uniform over a fixed finite set.  I'm not sending a patch
>> because i never looked at how floating point representation works
>> internally, so i would likely only make matters worse.

> In which mode, the deterministic mode or the actual random mode? :)

You are right, it's thrilling to talk about probability distributions in
deterministic mode!  Here we go, in deterministic mode, the distribution
is as follows:

 * p=1 for the deterministic value requested by the user
 * p=0 for any other value

The integral of this probability measure over the whole sample space,
as expected, is 1.

Yours,
  ;-)


P.S.
Please, everybody refrain from sending a patch adding the above
information to the manual page.  Describing the distribution in
actual random mode seems sufficient to me.

Reply | Threaded
Open this post in threaded view
|

Re: Clarify drand48() return values

Theo de Raadt-2
In reply to this post by Alexander Nasonov
Alexander Nasonov <[hidden email]> wrote:

> Ingo Schwarze wrote:
> > Looking at our code in lib/libc/stdlib/drand48.c, i conclude that
> > drand48(3) does return 0.0 with a probability of 2^-48.
>
> I looked at the code too and I have some comments.
>
> > More generally, the function returns a uniform distribution of
> > numbers from the set {2^-48 * n | n integer and 0 <= n < 2^48}.
>
> You don't need three ldexp calls to compose 2^-48 * n:
>
> uint64_t n = (uint64_t)xseed[2] << 32 | xseed[1] << 16 | xseed[0];
> return ldexp((double)n, -48);

xseed?

Yikes.

That version of the code is not reached in OpenBSD unless
__rand48_deterministic is set, which means srand48_deterministic or
seed48_deterministic or lcong48_deterministic were called...

We've seen noone going out of their way, so why touch that code...


Reply | Threaded
Open this post in threaded view
|

Re: Clarify drand48() return values

Alexander Nasonov
Theo de Raadt wrote:

> Alexander Nasonov <[hidden email]> wrote:
>
> > Ingo Schwarze wrote:
> > > Looking at our code in lib/libc/stdlib/drand48.c, i conclude that
> > > drand48(3) does return 0.0 with a probability of 2^-48.
> >
> > I looked at the code too and I have some comments.
> >
> > > More generally, the function returns a uniform distribution of
> > > numbers from the set {2^-48 * n | n integer and 0 <= n < 2^48}.
> >
> > You don't need three ldexp calls to compose 2^-48 * n:
> >
> > uint64_t n = (uint64_t)xseed[2] << 32 | xseed[1] << 16 | xseed[0];
> > return ldexp((double)n, -48);
>
> xseed?

s/xseed/rseed/g

--
Alex

Reply | Threaded
Open this post in threaded view
|

Re: Clarify drand48() return values

Raimo Niskanen-7
In reply to this post by Alexander Nasonov
On Sat, Dec 21, 2019 at 11:29:20PM +0000, Alexander Nasonov wrote:

> Ingo Schwarze wrote:
> > Looking at our code in lib/libc/stdlib/drand48.c, i conclude that
> > drand48(3) does return 0.0 with a probability of 2^-48.
>
> I looked at the code too and I have some comments.
>
> > More generally, the function returns a uniform distribution of
> > numbers from the set {2^-48 * n | n integer and 0 <= n < 2^48}.
>
> You don't need three ldexp calls to compose 2^-48 * n:
>
> uint64_t n = (uint64_t)xseed[2] << 32 | xseed[1] << 16 | xseed[0];
> return ldexp((double)n, -48);
>
> > Talking about loading bits into the mantissa and adjusting the
> > exponent feels mildly confusing, given that the distribution is
> > simply uniform over a fixed finite set.  I'm not sending a patch
> > because i never looked at how floating point representation works
> > internally, so i would likely only make matters worse.
>
> Not sure if you wanted to patch the man page or the code.
>
> If the latter, take a binary representation of 1.0 and replace 52
> least significant bits with (pseudo)random bits then subtract 1.0.
>
> double d = 1.0;
> uint64_t u = 0;
> memcpy(&u, &d, sizeof(d));
> u |= n << 4; /* scale 48 bits to 52 bits, n is from the previous snippet */
> memcpy(&d, &u, sizeof(u));
> return d - 1.0;
>
> It will only work with IEEE 754 compliant doubles, though.

A very clear way to generate an equidistributed floating point number in
the range [0.0, 1.0) given a number n in the range [0, 2^48-1] is simply:

    return n * 0x1.0p-48

Requiring only C99 and <stdint.h>.

>
> --
> Alex

--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB