fread optimization

classic Classic list List threaded Threaded
9 messages Options
enh
Reply | Threaded
Open this post in threaded view
|

fread optimization

enh
below (with all the tabs stripped) is a patch that brings openbsd's
stdio fread a lot closer to the glibc implementation. it's a more
ambitious version of last year's patch to avoid the pathologically bad
behavior. now we satisfy what we can from already-buffered data, and
then decide whether to refill the FILE buffer (like the old code
always did) or switch to reading directly into the caller's buffer.

i'm not really sure _bf._size is the best value to compare against.
for one thing, it requires us to initialize _bf, but more than that it
means that setting a small buffer with setvbuf means you're also
limiting the largest I/O we'll do.

also, is it ever possible for _r to be < 0? __srefill fixes up the
value to be 0 if the read fails, which would be the obvious source of
negative values.

--- /ssd/openbsd/src/lib/libc/stdio/fread.c 2014-12-10 21:22:41.042082212 -0800
+++ ./libc/stdio/fread.c 2015-01-16 17:18:47.618476974 -0800
@@ -1,4 +1,4 @@
-/* $OpenBSD: fread.c,v 1.13 2014/12/08 20:40:53 tedu Exp $ */
+/* $OpenBSD: fread.c,v 1.12 2014/05/01 16:40:36 deraadt Exp $ */
 /*-
  * Copyright (c) 1990, 1993
  * The Regents of the University of California.  All rights reserved.
@@ -35,20 +35,82 @@
 #include <string.h>
 #include <stdint.h>
 #include <errno.h>
+#include <sys/param.h>
 #include "local.h"

 #define MUL_NO_OVERFLOW (1UL << (sizeof(size_t) * 4))

+/*
+
+//glibc
+                                     iterations      ns/op
+    BM_stdio_fopen_fgets_fclose          200000       8656
+    BM_stdio_fread/1                  100000000         13    71.76 MiB/s
+    BM_stdio_fread/2                  100000000         14   138.31 MiB/s
+    BM_stdio_fread/3                  100000000         14   210.64 MiB/s
+    BM_stdio_fread/4                  100000000         13   292.16 MiB/s
+    BM_stdio_fread/8                  100000000         13   576.40 MiB/s
+    BM_stdio_fread/16                 100000000         14  1069.99 MiB/s
+    BM_stdio_fread/32                 100000000         15  2005.90 MiB/s
+    BM_stdio_fread/64                 100000000         19  3341.63 MiB/s
+    BM_stdio_fread/512                 50000000         70  7286.94 MiB/s
+    BM_stdio_fread/1K                  20000000        129  7922.67 MiB/s
+    BM_stdio_fread/4K                   5000000        323 12651.76 MiB/s
+    BM_stdio_fread/8K                   5000000        484 16914.71 MiB/s
+    BM_stdio_fread/16K                  2000000        831 19710.93 MiB/s
+    BM_stdio_fread/64K                  1000000       2953 22189.63 MiB/s
+    BM_stdio_fread_unbuffered/1        10000000        161     6.20 MiB/s
+    BM_stdio_fread_unbuffered/2        10000000        162    12.28 MiB/s
+    BM_stdio_fread_unbuffered/3        10000000        159    18.76 MiB/s
+    BM_stdio_fread_unbuffered/4        10000000        158    25.16 MiB/s
+    BM_stdio_fread_unbuffered/8        10000000        157    50.76 MiB/s
+    BM_stdio_fread_unbuffered/16       10000000        158   101.11 MiB/s
+    BM_stdio_fread_unbuffered/32       10000000        159   200.72 MiB/s
+    BM_stdio_fread_unbuffered/64       10000000        161   396.99 MiB/s
+    BM_stdio_fread_unbuffered/512      10000000        189  2707.13 MiB/s
+    BM_stdio_fread_unbuffered/1K       10000000        208  4921.01 MiB/s
+    BM_stdio_fread_unbuffered/4K        5000000        316 12929.90 MiB/s
+    BM_stdio_fread_unbuffered/8K        5000000        481 16999.33 MiB/s
+    BM_stdio_fread_unbuffered/16K       2000000        821 19950.40 MiB/s
+    BM_stdio_fread_unbuffered/64K       1000000       3013 21748.94 MiB/s
+
+// bionic with refill <= BUFSIZ, else direct read. (plus hacked
flockfile/funlockfile.)
+BM_stdio_fread/1                  100000000         19    51.49 MiB/s
+BM_stdio_fread/2                  100000000         20    99.84 MiB/s
+BM_stdio_fread/3                  100000000         19   152.84 MiB/s
+BM_stdio_fread/4                  100000000         20   199.33 MiB/s
+BM_stdio_fread/8                  100000000         18   427.50 MiB/s
+BM_stdio_fread/16                 100000000         21   740.93 MiB/s
+BM_stdio_fread/32                 100000000         20  1597.92 MiB/s
+BM_stdio_fread/64                 100000000         23  2753.94 MiB/s
+BM_stdio_fread/512                 50000000         66  7696.49 MiB/s
+BM_stdio_fread/1K                  20000000        114  8980.89 MiB/s
+BM_stdio_fread/4K                   5000000        407 10049.42 MiB/s
+BM_stdio_fread/8K                   5000000        474 17250.40 MiB/s
+BM_stdio_fread/16K                  2000000        803 20387.39 MiB/s
+BM_stdio_fread/64K                   500000       2905 22552.43 MiB/s
+BM_stdio_fread_unbuffered/1        10000000        206     4.83 MiB/s
+BM_stdio_fread_unbuffered/2        10000000        165    12.10 MiB/s
+BM_stdio_fread_unbuffered/3        10000000        165    18.12 MiB/s
+BM_stdio_fread_unbuffered/4        10000000        166    24.00 MiB/s
+BM_stdio_fread_unbuffered/8        10000000        166    48.18 MiB/s
+BM_stdio_fread_unbuffered/16       10000000        166    95.99 MiB/s
+BM_stdio_fread_unbuffered/32       10000000        168   190.18 MiB/s
+BM_stdio_fread_unbuffered/64       10000000        167   381.05 MiB/s
+BM_stdio_fread_unbuffered/512      10000000        196  2606.78 MiB/s
+BM_stdio_fread_unbuffered/1K       10000000        213  4797.48 MiB/s
+BM_stdio_fread_unbuffered/4K        5000000        318 12862.09 MiB/s
+BM_stdio_fread_unbuffered/8K        5000000        498 16425.53 MiB/s
+BM_stdio_fread_unbuffered/16K       1000000       1023 16011.40 MiB/s
+BM_stdio_fread_unbuffered/64K        500000       3367 19460.65 MiB/s
+
+*/
+
 size_t
 fread(void *buf, size_t size, size_t count, FILE *fp)
 {
- size_t resid;
- char *p;
- int r;
- size_t total;
-
  /*
- * Extension:  Catch integer overflow
+ * Extension:  Catch integer overflow.
  */
  if ((size >= MUL_NO_OVERFLOW || count >= MUL_NO_OVERFLOW) &&
     size > 0 && SIZE_MAX / size < count) {
@@ -57,47 +119,80 @@ fread(void *buf, size_t size, size_t cou
  return (0);
  }

+ const size_t desired_total = count * size;
+ size_t total = desired_total;
+
  /*
  * ANSI and SUSv2 require a return value of 0 if size or count are 0.
  */
- if ((resid = count * size) == 0)
+ if (total == 0) {
  return (0);
+ }
+
  FLOCKFILE(fp);
  _SET_ORIENTATION(fp, -1);
+
+ // TODO: how can this ever happen?!
  if (fp->_r < 0)
  fp->_r = 0;
- total = resid;
- p = buf;

- if ((fp->_flags & __SNBF) != 0) {
+ /*
+ * Ensure _bf._size is valid.
+ */
+ if (fp->_bf._base == NULL) {
+ __smakebuf(fp);
+ }
+
+ char* dst = buf;
+
+ while (total > 0) {
  /*
- * We know if we're unbuffered that our buffer is empty, so
- * we can just read directly. This is much faster than the
- * loop below which will perform a series of one byte reads.
+ * Copy data out of the buffer.
  */
- while (resid > 0 && (r = (*fp->_read)(fp->_cookie, p, resid)) > 0) {
- p += r;
- resid -= r;
+ size_t buffered_bytes = MIN(fp->_r, total);
+ memcpy(dst, fp->_p, buffered_bytes);
+ fp->_p += buffered_bytes;
+ fp->_r -= buffered_bytes;
+ dst += buffered_bytes;
+ total -= buffered_bytes;
+
+ /*
+ * Are we done?
+ */
+ if (total == 0) {
+ goto out;
  }
- FUNLOCKFILE(fp);
- return ((total - resid) / size);
- }

- while (resid > (r = fp->_r)) {
- (void)memcpy((void *)p, (void *)fp->_p, (size_t)r);
- fp->_p += r;
- /* fp->_r = 0 ... done in __srefill */
- p += r;
- resid -= r;
+ /*
+ * Do we have so much more to read that we should
+ * avoid copying it through the buffer?
+ */
+ if (total > (size_t) fp->_bf._size) {
+ break;
+ }
+
+ /*
+ * Less than a buffer to go, so refill the buffer and
+ * go around the loop again.
+ */
  if (__srefill(fp)) {
- /* no more input: return partial result */
- FUNLOCKFILE(fp);
- return ((total - resid) / size);
+ goto out;
  }
  }
- (void)memcpy((void *)p, (void *)fp->_p, resid);
- fp->_r -= resid;
- fp->_p += resid;
+
+ /*
+ * Read directly into the caller's buffer.
+ */
+ while (total > 0) {
+ ssize_t bytes_read = (*fp->_read)(fp->_cookie, dst, total);
+ if (bytes_read <= 0) {
+ break;
+ }
+ dst += bytes_read;
+ total -= bytes_read;
+ }
+
+out:
  FUNLOCKFILE(fp);
- return (count);
+ return ((desired_total - total) / size);
 }

enh
Reply | Threaded
Open this post in threaded view
|

Re: fread optimization

enh
that patch wasn't setting the _flags right on error or eof.

/*
* Read directly into the caller's buffer.
*/
while (total > 0) {
ssize_t bytes_read = (*fp->_read)(fp->_cookie, dst, total);
if (bytes_read <= 0) {
fp->_flags = (fp->_r == 0) ? __SEOF : __SERR;
break;
}
dst += bytes_read;
total -= bytes_read;
}

https://android-review.googlesource.com/#/c/124014/ submitted to Android.

On Fri, Jan 16, 2015 at 5:33 PM, enh <[hidden email]> wrote:

> below (with all the tabs stripped) is a patch that brings openbsd's
> stdio fread a lot closer to the glibc implementation. it's a more
> ambitious version of last year's patch to avoid the pathologically bad
> behavior. now we satisfy what we can from already-buffered data, and
> then decide whether to refill the FILE buffer (like the old code
> always did) or switch to reading directly into the caller's buffer.
>
> i'm not really sure _bf._size is the best value to compare against.
> for one thing, it requires us to initialize _bf, but more than that it
> means that setting a small buffer with setvbuf means you're also
> limiting the largest I/O we'll do.
>
> also, is it ever possible for _r to be < 0? __srefill fixes up the
> value to be 0 if the read fails, which would be the obvious source of
> negative values.
>
> --- /ssd/openbsd/src/lib/libc/stdio/fread.c 2014-12-10 21:22:41.042082212 -0800
> +++ ./libc/stdio/fread.c 2015-01-16 17:18:47.618476974 -0800
> @@ -1,4 +1,4 @@
> -/* $OpenBSD: fread.c,v 1.13 2014/12/08 20:40:53 tedu Exp $ */
> +/* $OpenBSD: fread.c,v 1.12 2014/05/01 16:40:36 deraadt Exp $ */
>  /*-
>   * Copyright (c) 1990, 1993
>   * The Regents of the University of California.  All rights reserved.
> @@ -35,20 +35,82 @@
>  #include <string.h>
>  #include <stdint.h>
>  #include <errno.h>
> +#include <sys/param.h>
>  #include "local.h"
>
>  #define MUL_NO_OVERFLOW (1UL << (sizeof(size_t) * 4))
>
> +/*
> +
> +//glibc
> +                                     iterations      ns/op
> +    BM_stdio_fopen_fgets_fclose          200000       8656
> +    BM_stdio_fread/1                  100000000         13    71.76 MiB/s
> +    BM_stdio_fread/2                  100000000         14   138.31 MiB/s
> +    BM_stdio_fread/3                  100000000         14   210.64 MiB/s
> +    BM_stdio_fread/4                  100000000         13   292.16 MiB/s
> +    BM_stdio_fread/8                  100000000         13   576.40 MiB/s
> +    BM_stdio_fread/16                 100000000         14  1069.99 MiB/s
> +    BM_stdio_fread/32                 100000000         15  2005.90 MiB/s
> +    BM_stdio_fread/64                 100000000         19  3341.63 MiB/s
> +    BM_stdio_fread/512                 50000000         70  7286.94 MiB/s
> +    BM_stdio_fread/1K                  20000000        129  7922.67 MiB/s
> +    BM_stdio_fread/4K                   5000000        323 12651.76 MiB/s
> +    BM_stdio_fread/8K                   5000000        484 16914.71 MiB/s
> +    BM_stdio_fread/16K                  2000000        831 19710.93 MiB/s
> +    BM_stdio_fread/64K                  1000000       2953 22189.63 MiB/s
> +    BM_stdio_fread_unbuffered/1        10000000        161     6.20 MiB/s
> +    BM_stdio_fread_unbuffered/2        10000000        162    12.28 MiB/s
> +    BM_stdio_fread_unbuffered/3        10000000        159    18.76 MiB/s
> +    BM_stdio_fread_unbuffered/4        10000000        158    25.16 MiB/s
> +    BM_stdio_fread_unbuffered/8        10000000        157    50.76 MiB/s
> +    BM_stdio_fread_unbuffered/16       10000000        158   101.11 MiB/s
> +    BM_stdio_fread_unbuffered/32       10000000        159   200.72 MiB/s
> +    BM_stdio_fread_unbuffered/64       10000000        161   396.99 MiB/s
> +    BM_stdio_fread_unbuffered/512      10000000        189  2707.13 MiB/s
> +    BM_stdio_fread_unbuffered/1K       10000000        208  4921.01 MiB/s
> +    BM_stdio_fread_unbuffered/4K        5000000        316 12929.90 MiB/s
> +    BM_stdio_fread_unbuffered/8K        5000000        481 16999.33 MiB/s
> +    BM_stdio_fread_unbuffered/16K       2000000        821 19950.40 MiB/s
> +    BM_stdio_fread_unbuffered/64K       1000000       3013 21748.94 MiB/s
> +
> +// bionic with refill <= BUFSIZ, else direct read. (plus hacked
> flockfile/funlockfile.)
> +BM_stdio_fread/1                  100000000         19    51.49 MiB/s
> +BM_stdio_fread/2                  100000000         20    99.84 MiB/s
> +BM_stdio_fread/3                  100000000         19   152.84 MiB/s
> +BM_stdio_fread/4                  100000000         20   199.33 MiB/s
> +BM_stdio_fread/8                  100000000         18   427.50 MiB/s
> +BM_stdio_fread/16                 100000000         21   740.93 MiB/s
> +BM_stdio_fread/32                 100000000         20  1597.92 MiB/s
> +BM_stdio_fread/64                 100000000         23  2753.94 MiB/s
> +BM_stdio_fread/512                 50000000         66  7696.49 MiB/s
> +BM_stdio_fread/1K                  20000000        114  8980.89 MiB/s
> +BM_stdio_fread/4K                   5000000        407 10049.42 MiB/s
> +BM_stdio_fread/8K                   5000000        474 17250.40 MiB/s
> +BM_stdio_fread/16K                  2000000        803 20387.39 MiB/s
> +BM_stdio_fread/64K                   500000       2905 22552.43 MiB/s
> +BM_stdio_fread_unbuffered/1        10000000        206     4.83 MiB/s
> +BM_stdio_fread_unbuffered/2        10000000        165    12.10 MiB/s
> +BM_stdio_fread_unbuffered/3        10000000        165    18.12 MiB/s
> +BM_stdio_fread_unbuffered/4        10000000        166    24.00 MiB/s
> +BM_stdio_fread_unbuffered/8        10000000        166    48.18 MiB/s
> +BM_stdio_fread_unbuffered/16       10000000        166    95.99 MiB/s
> +BM_stdio_fread_unbuffered/32       10000000        168   190.18 MiB/s
> +BM_stdio_fread_unbuffered/64       10000000        167   381.05 MiB/s
> +BM_stdio_fread_unbuffered/512      10000000        196  2606.78 MiB/s
> +BM_stdio_fread_unbuffered/1K       10000000        213  4797.48 MiB/s
> +BM_stdio_fread_unbuffered/4K        5000000        318 12862.09 MiB/s
> +BM_stdio_fread_unbuffered/8K        5000000        498 16425.53 MiB/s
> +BM_stdio_fread_unbuffered/16K       1000000       1023 16011.40 MiB/s
> +BM_stdio_fread_unbuffered/64K        500000       3367 19460.65 MiB/s
> +
> +*/
> +
>  size_t
>  fread(void *buf, size_t size, size_t count, FILE *fp)
>  {
> - size_t resid;
> - char *p;
> - int r;
> - size_t total;
> -
>   /*
> - * Extension:  Catch integer overflow
> + * Extension:  Catch integer overflow.
>   */
>   if ((size >= MUL_NO_OVERFLOW || count >= MUL_NO_OVERFLOW) &&
>      size > 0 && SIZE_MAX / size < count) {
> @@ -57,47 +119,80 @@ fread(void *buf, size_t size, size_t cou
>   return (0);
>   }
>
> + const size_t desired_total = count * size;
> + size_t total = desired_total;
> +
>   /*
>   * ANSI and SUSv2 require a return value of 0 if size or count are 0.
>   */
> - if ((resid = count * size) == 0)
> + if (total == 0) {
>   return (0);
> + }
> +
>   FLOCKFILE(fp);
>   _SET_ORIENTATION(fp, -1);
> +
> + // TODO: how can this ever happen?!
>   if (fp->_r < 0)
>   fp->_r = 0;
> - total = resid;
> - p = buf;
>
> - if ((fp->_flags & __SNBF) != 0) {
> + /*
> + * Ensure _bf._size is valid.
> + */
> + if (fp->_bf._base == NULL) {
> + __smakebuf(fp);
> + }
> +
> + char* dst = buf;
> +
> + while (total > 0) {
>   /*
> - * We know if we're unbuffered that our buffer is empty, so
> - * we can just read directly. This is much faster than the
> - * loop below which will perform a series of one byte reads.
> + * Copy data out of the buffer.
>   */
> - while (resid > 0 && (r = (*fp->_read)(fp->_cookie, p, resid)) > 0) {
> - p += r;
> - resid -= r;
> + size_t buffered_bytes = MIN(fp->_r, total);
> + memcpy(dst, fp->_p, buffered_bytes);
> + fp->_p += buffered_bytes;
> + fp->_r -= buffered_bytes;
> + dst += buffered_bytes;
> + total -= buffered_bytes;
> +
> + /*
> + * Are we done?
> + */
> + if (total == 0) {
> + goto out;
>   }
> - FUNLOCKFILE(fp);
> - return ((total - resid) / size);
> - }
>
> - while (resid > (r = fp->_r)) {
> - (void)memcpy((void *)p, (void *)fp->_p, (size_t)r);
> - fp->_p += r;
> - /* fp->_r = 0 ... done in __srefill */
> - p += r;
> - resid -= r;
> + /*
> + * Do we have so much more to read that we should
> + * avoid copying it through the buffer?
> + */
> + if (total > (size_t) fp->_bf._size) {
> + break;
> + }
> +
> + /*
> + * Less than a buffer to go, so refill the buffer and
> + * go around the loop again.
> + */
>   if (__srefill(fp)) {
> - /* no more input: return partial result */
> - FUNLOCKFILE(fp);
> - return ((total - resid) / size);
> + goto out;
>   }
>   }
> - (void)memcpy((void *)p, (void *)fp->_p, resid);
> - fp->_r -= resid;
> - fp->_p += resid;
> +
> + /*
> + * Read directly into the caller's buffer.
> + */
> + while (total > 0) {
> + ssize_t bytes_read = (*fp->_read)(fp->_cookie, dst, total);
> + if (bytes_read <= 0) {
> + break;
> + }
> + dst += bytes_read;
> + total -= bytes_read;
> + }
> +
> +out:
>   FUNLOCKFILE(fp);
> - return (count);
> + return ((desired_total - total) / size);
>  }



--
Elliott Hughes - http://who/enh - http://jessies.org/~enh/
Java i18n/JNI/NIO, or bionic questions? Mail me/drop by/add me as a reviewer.

Reply | Threaded
Open this post in threaded view
|

Re: fread optimization

Martin Pieuchot-2
Hello Elliott,

On 20/01/15(Tue) 16:15, enh wrote:
> that patch wasn't setting the _flags right on error or eof.

Thanks!  Below is a version of your diff with some tweaks to match the
recent changes done in -current.

In your original post you've shown some test numbers.  Does it mean that
you have a test program for fread(3)?  If so do you think it would fit as
test in OpenBSD's regress framework?

I'd suggest you to check the setup of your mail client, apparently it
eats all the tabs and makes impossible to apply your diffs correctly :)

Index: fread.c
===================================================================
RCS file: /home/ncvs/src/lib/libc/stdio/fread.c,v
retrieving revision 1.13
diff -u -p -r1.13 fread.c
--- fread.c 8 Dec 2014 20:40:53 -0000 1.13
+++ fread.c 21 Jan 2015 10:54:56 -0000
@@ -37,18 +37,19 @@
 #include <errno.h>
 #include "local.h"
 
+#define MINIMUM(a, b) (((a) < (b)) ? (a) : (b))
+
 #define MUL_NO_OVERFLOW (1UL << (sizeof(size_t) * 4))
 
 size_t
 fread(void *buf, size_t size, size_t count, FILE *fp)
 {
- size_t resid;
- char *p;
- int r;
+ size_t desired_total;
  size_t total;
+ char *dst;
 
  /*
- * Extension:  Catch integer overflow
+ * Extension:  Catch integer overflow.
  */
  if ((size >= MUL_NO_OVERFLOW || count >= MUL_NO_OVERFLOW) &&
     size > 0 && SIZE_MAX / size < count) {
@@ -57,47 +58,79 @@ fread(void *buf, size_t size, size_t cou
  return (0);
  }
 
+ desired_total = count * size;
+ total = desired_total;
+
  /*
  * ANSI and SUSv2 require a return value of 0 if size or count are 0.
  */
- if ((resid = count * size) == 0)
+ if (total == 0)
  return (0);
+
  FLOCKFILE(fp);
  _SET_ORIENTATION(fp, -1);
+
+ /* XXX: how can this ever happen?! */
  if (fp->_r < 0)
  fp->_r = 0;
- total = resid;
- p = buf;
 
- if ((fp->_flags & __SNBF) != 0) {
+
+ /*
+ * Ensure _bf._size is valid.
+ */
+ if (fp->_bf._base == NULL)
+ __smakebuf(fp);
+
+ dst = buf;
+
+ while (total > 0) {
  /*
- * We know if we're unbuffered that our buffer is empty, so
- * we can just read directly. This is much faster than the
- * loop below which will perform a series of one byte reads.
+ * Copy data out of the buffer.
  */
- while (resid > 0 && (r = (*fp->_read)(fp->_cookie, p, resid)) > 0) {
- p += r;
- resid -= r;
- }
- FUNLOCKFILE(fp);
- return ((total - resid) / size);
+ size_t buffered_bytes = MINIMUM(fp->_r, total);
+
+ memcpy(dst, fp->_p, buffered_bytes);
+ fp->_p += buffered_bytes;
+ fp->_r -= buffered_bytes;
+ dst += buffered_bytes;
+ total -= buffered_bytes;
+
+ /*
+ * Are we done?
+ */
+ if (total == 0)
+ goto out;
+
+ /*
+ * Do we have so much more to read that we should
+ * avoid copying it through the buffer?
+ */
+ if (total > (size_t) fp->_bf._size)
+ break;
+
+ /*
+ * Less than a buffer to go, so refill the buffer and
+ * go around the loop again.
+ */
+ if (__srefill(fp))
+ goto out;
  }
 
- while (resid > (r = fp->_r)) {
- (void)memcpy((void *)p, (void *)fp->_p, (size_t)r);
- fp->_p += r;
- /* fp->_r = 0 ... done in __srefill */
- p += r;
- resid -= r;
- if (__srefill(fp)) {
- /* no more input: return partial result */
- FUNLOCKFILE(fp);
- return ((total - resid) / size);
+ /*
+ * Read directly into the caller's buffer.
+ */
+ while (total > 0) {
+ ssize_t bytes_read = (*fp->_read)(fp->_cookie, dst, total);
+
+ if (bytes_read <= 0) {
+ fp->_flags = (fp->_r == 0) ? __SEOF : __SERR;
+ break;
  }
+ dst += bytes_read;
+ total -= bytes_read;
  }
- (void)memcpy((void *)p, (void *)fp->_p, resid);
- fp->_r -= resid;
- fp->_p += resid;
+
+out:
  FUNLOCKFILE(fp);
- return (count);
+ return ((desired_total - total) / size);
 }

Reply | Threaded
Open this post in threaded view
|

Re: fread optimization

Ted Unangst-6
On Wed, Jan 21, 2015 at 12:04, Martin Pieuchot wrote:
> Hello Elliott,
>
> On 20/01/15(Tue) 16:15, enh wrote:
>> that patch wasn't setting the _flags right on error or eof.
>
> Thanks!  Below is a version of your diff with some tweaks to match the
> recent changes done in -current.

fwiw, I think this is a good diff, but maybe for 5.8 and not 5.7. It
will take a while for me to be certain this is functionally equivalent.

enh
Reply | Threaded
Open this post in threaded view
|

Re: fread optimization

enh
In reply to this post by Martin Pieuchot-2
On Wed, Jan 21, 2015 at 3:04 AM, Martin Pieuchot <[hidden email]> wrote:

> Hello Elliott,
>
> On 20/01/15(Tue) 16:15, enh wrote:
>> that patch wasn't setting the _flags right on error or eof.
>
> Thanks!  Below is a version of your diff with some tweaks to match the
> recent changes done in -current.
>
> In your original post you've shown some test numbers.  Does it mean that
> you have a test program for fread(3)?  If so do you think it would fit as
> test in OpenBSD's regress framework?

we have over a thousand C library tests and a small number of
benchmarks in bionic/tests and bionic/benchmarks in the Android tree,
all BSD-licensed. (though this only amounts to 42% coverage.)

> I'd suggest you to check the setup of your mail client, apparently it
> eats all the tabs and makes impossible to apply your diffs correctly :)

afaik, there's nothing to can do to stop gmail eating tabs.

> Index: fread.c
> ===================================================================
> RCS file: /home/ncvs/src/lib/libc/stdio/fread.c,v
> retrieving revision 1.13
> diff -u -p -r1.13 fread.c
> --- fread.c     8 Dec 2014 20:40:53 -0000       1.13
> +++ fread.c     21 Jan 2015 10:54:56 -0000
> @@ -37,18 +37,19 @@
>  #include <errno.h>
>  #include "local.h"
>
> +#define MINIMUM(a, b)  (((a) < (b)) ? (a) : (b))

you don't want to use the MIN from <sys/param.h>? many files in libc
already do. (though admittedly fvwrite.c defines its own, but that
seems like a bug.)

>  #define MUL_NO_OVERFLOW        (1UL << (sizeof(size_t) * 4))
>
>  size_t
>  fread(void *buf, size_t size, size_t count, FILE *fp)
>  {
> -       size_t resid;
> -       char *p;
> -       int r;
> +       size_t desired_total;
>         size_t total;
> +       char *dst;
>
>         /*
> -        * Extension:  Catch integer overflow
> +        * Extension:  Catch integer overflow.
>          */
>         if ((size >= MUL_NO_OVERFLOW || count >= MUL_NO_OVERFLOW) &&
>             size > 0 && SIZE_MAX / size < count) {
> @@ -57,47 +58,79 @@ fread(void *buf, size_t size, size_t cou
>                 return (0);
>         }
>
> +       desired_total = count * size;
> +       total = desired_total;
> +
>         /*
>          * ANSI and SUSv2 require a return value of 0 if size or count are 0.
>          */
> -       if ((resid = count * size) == 0)
> +       if (total == 0)
>                 return (0);
> +
>         FLOCKFILE(fp);
>         _SET_ORIENTATION(fp, -1);
> +
> +       /* XXX: how can this ever happen?! */
>         if (fp->_r < 0)
>                 fp->_r = 0;
> -       total = resid;
> -       p = buf;
>
> -       if ((fp->_flags & __SNBF) != 0) {
> +
> +       /*
> +        * Ensure _bf._size is valid.
> +        */
> +       if (fp->_bf._base == NULL)
> +               __smakebuf(fp);
> +
> +       dst = buf;
> +
> +       while (total > 0) {
>                 /*
> -                * We know if we're unbuffered that our buffer is empty, so
> -                * we can just read directly. This is much faster than the
> -                * loop below which will perform a series of one byte reads.
> +                * Copy data out of the buffer.
>                  */
> -               while (resid > 0 && (r = (*fp->_read)(fp->_cookie, p, resid)) > 0) {
> -                       p += r;
> -                       resid -= r;
> -               }
> -               FUNLOCKFILE(fp);
> -               return ((total - resid) / size);
> +               size_t buffered_bytes = MINIMUM(fp->_r, total);
> +
> +               memcpy(dst, fp->_p, buffered_bytes);
> +               fp->_p += buffered_bytes;
> +               fp->_r -= buffered_bytes;
> +               dst += buffered_bytes;
> +               total -= buffered_bytes;
> +
> +               /*
> +                * Are we done?
> +                */
> +               if (total == 0)
> +                       goto out;
> +
> +               /*
> +                * Do we have so much more to read that we should
> +                * avoid copying it through the buffer?
> +                */
> +               if (total > (size_t) fp->_bf._size)
> +                       break;
> +
> +               /*
> +                * Less than a buffer to go, so refill the buffer and
> +                * go around the loop again.
> +                */
> +               if (__srefill(fp))
> +                       goto out;
>         }
>
> -       while (resid > (r = fp->_r)) {
> -               (void)memcpy((void *)p, (void *)fp->_p, (size_t)r);
> -               fp->_p += r;
> -               /* fp->_r = 0 ... done in __srefill */
> -               p += r;
> -               resid -= r;
> -               if (__srefill(fp)) {
> -                       /* no more input: return partial result */
> -                       FUNLOCKFILE(fp);
> -                       return ((total - resid) / size);
> +       /*
> +        * Read directly into the caller's buffer.
> +        */
> +       while (total > 0) {
> +               ssize_t bytes_read = (*fp->_read)(fp->_cookie, dst, total);
> +
> +               if (bytes_read <= 0) {
> +                       fp->_flags = (fp->_r == 0) ? __SEOF : __SERR;
> +                       break;
>                 }
> +               dst += bytes_read;
> +               total -= bytes_read;
>         }
> -       (void)memcpy((void *)p, (void *)fp->_p, resid);
> -       fp->_r -= resid;
> -       fp->_p += resid;
> +
> +out:
>         FUNLOCKFILE(fp);
> -       return (count);
> +       return ((desired_total - total) / size);
>  }

Reply | Threaded
Open this post in threaded view
|

Re: fread optimization

Loganaden Velvindron-2
On Wed, Jan 21, 2015 at 5:42 PM, enh <[hidden email]> wrote:

> On Wed, Jan 21, 2015 at 3:04 AM, Martin Pieuchot <[hidden email]> wrote:
>> Hello Elliott,
>>
>> On 20/01/15(Tue) 16:15, enh wrote:
>>> that patch wasn't setting the _flags right on error or eof.
>>
>> Thanks!  Below is a version of your diff with some tweaks to match the
>> recent changes done in -current.
>>
>> In your original post you've shown some test numbers.  Does it mean that
>> you have a test program for fread(3)?  If so do you think it would fit as
>> test in OpenBSD's regress framework?
>
> we have over a thousand C library tests and a small number of
> benchmarks in bionic/tests and bionic/benchmarks in the Android tree,
> all BSD-licensed. (though this only amounts to 42% coverage.)
>
>> I'd suggest you to check the setup of your mail client, apparently it
>> eats all the tabs and makes impossible to apply your diffs correctly :)
>
> afaik, there's nothing to can do to stop gmail eating tabs.

Check out devio.us if you need a shell account, to send diffs to OpenBSD.


>
>> Index: fread.c
>> ===================================================================
>> RCS file: /home/ncvs/src/lib/libc/stdio/fread.c,v
>> retrieving revision 1.13
>> diff -u -p -r1.13 fread.c
>> --- fread.c     8 Dec 2014 20:40:53 -0000       1.13
>> +++ fread.c     21 Jan 2015 10:54:56 -0000
>> @@ -37,18 +37,19 @@
>>  #include <errno.h>
>>  #include "local.h"
>>
>> +#define MINIMUM(a, b)  (((a) < (b)) ? (a) : (b))
>
> you don't want to use the MIN from <sys/param.h>? many files in libc
> already do. (though admittedly fvwrite.c defines its own, but that
> seems like a bug.)
>
>>  #define MUL_NO_OVERFLOW        (1UL << (sizeof(size_t) * 4))
>>
>>  size_t
>>  fread(void *buf, size_t size, size_t count, FILE *fp)
>>  {
>> -       size_t resid;
>> -       char *p;
>> -       int r;
>> +       size_t desired_total;
>>         size_t total;
>> +       char *dst;
>>
>>         /*
>> -        * Extension:  Catch integer overflow
>> +        * Extension:  Catch integer overflow.
>>          */
>>         if ((size >= MUL_NO_OVERFLOW || count >= MUL_NO_OVERFLOW) &&
>>             size > 0 && SIZE_MAX / size < count) {
>> @@ -57,47 +58,79 @@ fread(void *buf, size_t size, size_t cou
>>                 return (0);
>>         }
>>
>> +       desired_total = count * size;
>> +       total = desired_total;
>> +
>>         /*
>>          * ANSI and SUSv2 require a return value of 0 if size or count are 0.
>>          */
>> -       if ((resid = count * size) == 0)
>> +       if (total == 0)
>>                 return (0);
>> +
>>         FLOCKFILE(fp);
>>         _SET_ORIENTATION(fp, -1);
>> +
>> +       /* XXX: how can this ever happen?! */
>>         if (fp->_r < 0)
>>                 fp->_r = 0;
>> -       total = resid;
>> -       p = buf;
>>
>> -       if ((fp->_flags & __SNBF) != 0) {
>> +
>> +       /*
>> +        * Ensure _bf._size is valid.
>> +        */
>> +       if (fp->_bf._base == NULL)
>> +               __smakebuf(fp);
>> +
>> +       dst = buf;
>> +
>> +       while (total > 0) {
>>                 /*
>> -                * We know if we're unbuffered that our buffer is empty, so
>> -                * we can just read directly. This is much faster than the
>> -                * loop below which will perform a series of one byte reads.
>> +                * Copy data out of the buffer.
>>                  */
>> -               while (resid > 0 && (r = (*fp->_read)(fp->_cookie, p, resid)) > 0) {
>> -                       p += r;
>> -                       resid -= r;
>> -               }
>> -               FUNLOCKFILE(fp);
>> -               return ((total - resid) / size);
>> +               size_t buffered_bytes = MINIMUM(fp->_r, total);
>> +
>> +               memcpy(dst, fp->_p, buffered_bytes);
>> +               fp->_p += buffered_bytes;
>> +               fp->_r -= buffered_bytes;
>> +               dst += buffered_bytes;
>> +               total -= buffered_bytes;
>> +
>> +               /*
>> +                * Are we done?
>> +                */
>> +               if (total == 0)
>> +                       goto out;
>> +
>> +               /*
>> +                * Do we have so much more to read that we should
>> +                * avoid copying it through the buffer?
>> +                */
>> +               if (total > (size_t) fp->_bf._size)
>> +                       break;
>> +
>> +               /*
>> +                * Less than a buffer to go, so refill the buffer and
>> +                * go around the loop again.
>> +                */
>> +               if (__srefill(fp))
>> +                       goto out;
>>         }
>>
>> -       while (resid > (r = fp->_r)) {
>> -               (void)memcpy((void *)p, (void *)fp->_p, (size_t)r);
>> -               fp->_p += r;
>> -               /* fp->_r = 0 ... done in __srefill */
>> -               p += r;
>> -               resid -= r;
>> -               if (__srefill(fp)) {
>> -                       /* no more input: return partial result */
>> -                       FUNLOCKFILE(fp);
>> -                       return ((total - resid) / size);
>> +       /*
>> +        * Read directly into the caller's buffer.
>> +        */
>> +       while (total > 0) {
>> +               ssize_t bytes_read = (*fp->_read)(fp->_cookie, dst, total);
>> +
>> +               if (bytes_read <= 0) {
>> +                       fp->_flags = (fp->_r == 0) ? __SEOF : __SERR;
>> +                       break;
>>                 }
>> +               dst += bytes_read;
>> +               total -= bytes_read;
>>         }
>> -       (void)memcpy((void *)p, (void *)fp->_p, resid);
>> -       fp->_r -= resid;
>> -       fp->_p += resid;
>> +
>> +out:
>>         FUNLOCKFILE(fp);
>> -       return (count);
>> +       return ((desired_total - total) / size);
>>  }
>



--
This message is strictly personal and the opinions expressed do not
represent those of my employers, either past or present.

Reply | Threaded
Open this post in threaded view
|

Re: fread optimization

Theo de Raadt
In reply to this post by enh
> > +#define MINIMUM(a, b)  (((a) < (b)) ? (a) : (b))
>
> you don't want to use the MIN from <sys/param.h>? many files in libc
> already do. (though admittedly fvwrite.c defines its own, but that
> seems like a bug.)

I guess you haven't observed the commits of the last week...

I am certain you know <sys/param.h> creates much namespace pollution
on all systems.  This is largely unavoidable since this is a poorly
specified legacy header.  Recently I went through efforts to limit the
pollution, and to limit pulling in the pollution.

The base system now uses POSIX <limits.h> rather than <sys/param.h>,
wherever possible.  Simultaneously, <sys/param.h> has been wittled
down to not expose many symbols which are CSRGisms or only used by
_KERNEL code.

<sys/param.h> is still fully supported and usable, but the base is
completely refactored to use the better practice (for cleanliness,
but also as a demonstration of good practice moving forward).

So in the base, only 185 includes of <sys/param.h> exist.  This is to
support uses of the following by programs in the base:
           1 ALIGNBYTES
           1 ALIGNED_POINTER
           1 MID_MACHINE (our own infection...)
           1 NOFILE_MAX
           2 BSD
           2 PAGE_SHIFT
           2 PZERO
           3 MACHINE_ARCH (our own infection...)
           3 isclr
           4 MACHINE (our own infection...)
           4 clrbit
           4 powerof2
           5 dbtob
           7 btodb
           8 NODEV
          11 ALIGN
          11 setbit
          12 MAX
          12 isset
          13 nitems (our own infection...)
          14 MIN
          22 MAXCOMLEN
          23 roundup
          28 MAXBSIZE
          38 DEV_BSIZE

The inclusion of <sys/param.h> via <netdb.h> and <arpa/nameser.h> has
been terminated, and the ports tree was repaired (simpler than originally
thought.  10 programs?)  The most scary impact is that "BSD" will no
longer be defined in some enviroments which picked it up via <netdb.h>
or <arpa/nameser.h>, so we'll have to keep an eye out for such failures.

Finally, this brings us back to MIN() and MAX(), in your original
question.  There were quite a few uses of this.  To avoid the
namespace poisoning I replaced all uses with local #define's of
MINIMUM() and MAXIMUM(), and will revisit looking for other solutions
later...

enh
Reply | Threaded
Open this post in threaded view
|

Re: fread optimization

enh
In reply to this post by enh
On Wed, Jan 21, 2015 at 10:13 AM, Theo de Raadt <[hidden email]> wrote:
>> > +#define MINIMUM(a, b)  (((a) < (b)) ? (a) : (b))
>>
>> you don't want to use the MIN from <sys/param.h>? many files in libc
>> already do. (though admittedly fvwrite.c defines its own, but that
>> seems like a bug.)
>
> I guess you haven't observed the commits of the last week...

no, i tend to come along and sync with your HEAD every few months.

> I am certain you know <sys/param.h> creates much namespace pollution
> on all systems.  This is largely unavoidable since this is a poorly
> specified legacy header.  Recently I went through efforts to limit the
> pollution, and to limit pulling in the pollution.
>
> The base system now uses POSIX <limits.h> rather than <sys/param.h>,
> wherever possible.  Simultaneously, <sys/param.h> has been wittled
> down to not expose many symbols which are CSRGisms or only used by
> _KERNEL code.
>
> <sys/param.h> is still fully supported and usable, but the base is
> completely refactored to use the better practice (for cleanliness,
> but also as a demonstration of good practice moving forward).
>
> So in the base, only 185 includes of <sys/param.h> exist.  This is to
> support uses of the following by programs in the base:
>            1 ALIGNBYTES
>            1 ALIGNED_POINTER
>            1 MID_MACHINE        (our own infection...)
>            1 NOFILE_MAX
>            2 BSD
>            2 PAGE_SHIFT
>            2 PZERO
>            3 MACHINE_ARCH       (our own infection...)
>            3 isclr
>            4 MACHINE            (our own infection...)
>            4 clrbit
>            4 powerof2
>            5 dbtob
>            7 btodb
>            8 NODEV
>           11 ALIGN
>           11 setbit
>           12 MAX
>           12 isset
>           13 nitems             (our own infection...)
>           14 MIN
>           22 MAXCOMLEN
>           23 roundup
>           28 MAXBSIZE
>           38 DEV_BSIZE
>
> The inclusion of <sys/param.h> via <netdb.h> and <arpa/nameser.h> has
> been terminated, and the ports tree was repaired (simpler than originally
> thought.  10 programs?)  The most scary impact is that "BSD" will no
> longer be defined in some enviroments which picked it up via <netdb.h>
> or <arpa/nameser.h>, so we'll have to keep an eye out for such failures.
>
> Finally, this brings us back to MIN() and MAX(), in your original
> question.  There were quite a few uses of this.  To avoid the
> namespace poisoning I replaced all uses with local #define's of
> MINIMUM() and MAXIMUM(), and will revisit looking for other solutions
> later...

ah, okay. our <sys/param.h> isn't much of a polluter:

#include <limits.h>
#include <linux/param.h>

#define MAXPATHLEN  PATH_MAX
#define MAXSYMLINKS 8

/* Macros for counting and rounding. */
#ifndef howmany
#define howmany(x, y)   (((x)+((y)-1))/(y))
#endif
#define roundup(x, y)   ((((x)+((y)-1))/(y))*(y))
#define powerof2(x)     ((((x)-1)&(x))==0)

/* Macros for min/max. */
#define MIN(a,b) (((a)<(b))?(a):(b))
#define MAX(a,b) (((a)>(b))?(a):(b))

(although looking at that, i'm not quite sure why we #include
<linux/param.h>. i'll make sure someone looks into that...)

 --elliott

enh
Reply | Threaded
Open this post in threaded view
|

Re: fread optimization

enh
In reply to this post by Ted Unangst-6
this breaks gcov. haven't worked out why yet.


On Wed, Jan 21, 2015 at 9:29 AM, Ted Unangst <[hidden email]> wrote:

> On Wed, Jan 21, 2015 at 12:04, Martin Pieuchot wrote:
>> Hello Elliott,
>>
>> On 20/01/15(Tue) 16:15, enh wrote:
>>> that patch wasn't setting the _flags right on error or eof.
>>
>> Thanks!  Below is a version of your diff with some tweaks to match the
>> recent changes done in -current.
>
> fwiw, I think this is a good diff, but maybe for 5.8 and not 5.7. It
> will take a while for me to be certain this is functionally equivalent.



--
Elliott Hughes - http://who/enh - http://jessies.org/~enh/
Java i18n/JNI/NIO, or bionic questions? Mail me/drop by/add me as a reviewer.