fread() on unbuffered FILE doesn't set feof flag

classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

fread() on unbuffered FILE doesn't set feof flag

Sebastien Marie-3
Hi,

I am currently investigating a problem related to mercurial:
https://bz.mercurial-scm.org/show_bug.cgi?id=6035

While running an update operation, it could ends unexpectly.

For its operation, it starts workers and communicate with them via
socket file waiting for EOF. It uses cPickle python extension for
serialize/deserialize messages.

It seems that when a worker exits after sending its last message,
because it has terminate the work, EOF detection doesn't work as
expected. The main process interpretes it as failure and abort. the
errno value is 35: EAGAIN.

The cPickle code seems correct:

pobj/Python-2.7.15/Python-2.7.15/Modules/cPickle.c

from read_file():
551     PyFile_IncUseCount((PyFileObject *)self->file);
552     Py_BEGIN_ALLOW_THREADS
553     nbytesread = fread(self->buf, sizeof(char), n, self->fp);
554     Py_END_ALLOW_THREADS
555     PyFile_DecUseCount((PyFileObject *)self->file);
556     if (nbytesread != (size_t)n) {
557         asm("int $3");
558         if (feof(self->fp)) {
559             PyErr_SetNone(PyExc_EOFError);
560             return -1;
561         }
562
563         PyErr_SetFromErrno(PyExc_IOError);
564         return -1;
565     }
566

(the asm() is mine).

it uses fread() and read 1 byte a time.

after the worker writes its last message, it is exiting. while looking
via ktrace, the exit(2) call is before main process finish to read the
buffer. it doesn't seems to be a problem.

the last call of fread(3) returns 0. as it asked for 1, the condition at
line 556 is valid and gdb catch the breakpoint.

(gdb) print *self->fp
$1 = {
  _p = 0x1582c6a5250 <usual+272> "",
  _r = 0,
  _w = 0,
  _flags = 6,
  _file = 5,
  _bf = {
    _base = 0x1582c6a524f <usual+271> "\n",
    _size = 1
  },
  _lbfsize = 0,
  _cookie = 0x1582c6a51d8 <usual+152>,
  _close = 0x1582c5f3a00 <__sclose>,
  _read = 0x1582c5f38f0 <__sread>,
  _seek = 0x1582c5f39a0 <__sseek>,
  _write = 0x1582c5f3940 <__swrite>,
  _ext = {
    _base = 0x1582c6a3eb8 <usualext+296> "",
    _size = 0
  },
  _up = 0x0,
  _ur = 0,
  _ubuf = "\000\000",
  _nbuf = "\n",
  _lb = {
    _base = 0x0,
    _size = 0
  },
  _blksize = 16384,
  _offset = 1833023
}

The interesting part is _flags. It is 6.

According to stdio.h, 6 = __SNBF | __SRD, so "unbuffered" and "OK to read".

the feof() call returns false, the python code interpretes it as an error.

When looking at fread(3) code in libc, I found that we doesn't set
__SEOF when the FILE is unbuffered.

src/lib/libc/stdio/fread.c
    72          if ((fp->_flags & __SNBF) != 0) {
    73                  /*
    74                   * We know if we're unbuffered that our buffer is empty, so
    75                   * we can just read directly. This is much faster than the
    76                   * loop below which will perform a series of one byte reads.
    77                   */
    78                  while (resid > 0 && (r = (*fp->_read)(fp->_cookie, p, resid)) > 0) {
    79                          p += r;
    80                          resid -= r;
    81                  }
    82                  FUNLOCKFILE(fp);
    83                  return ((total - resid) / size);
    84          }


I am able to reproduce it in plain C:

$ cat test.c

#include <err.h>
#include <stdio.h>
#include <stdlib.h>

int
main(int argc, char *argv[])
{
        FILE * fp = stdin;
        char buf[1024];
        size_t nread;
        size_t n = 1;

        if (setvbuf(fp, NULL, _IONBF, 0) != 0)
                err(EXIT_FAILURE, "setvbuf");

        for (;;) {
                nread = fread(buf, sizeof(char), n, fp);
                if (nread != n) {
                        if (feof(fp)) {
                                printf("EOF\n");
                                break;
                        }
                       
                        if (ferror(fp))
                                err(EXIT_FAILURE, "ferror\n");

                        errx(EXIT_FAILURE, "something else\n");
                }
               
        }

        return EXIT_SUCCESS;
}

$ cc -Wall test.c && ./a.out
^D
a.out: something else

Is it a bug to not set the __SEOF flag or it is expected for unbuffered
FILE ?

Thanks.
--
Sebastien Marie

Reply | Threaded
Open this post in threaded view
|

Re: fread() on unbuffered FILE doesn't set feof flag

Ted Unangst-6
Sebastien Marie wrote:

> When looking at fread(3) code in libc, I found that we doesn't set
> __SEOF when the FILE is unbuffered.
>
> src/lib/libc/stdio/fread.c
>     72          if ((fp->_flags & __SNBF) != 0) {
>     73                  /*
>     74                   * We know if we're unbuffered that our buffer is empty, so
>     75                   * we can just read directly. This is much faster than the
>     76                   * loop below which will perform a series of one byte reads.
>     77                   */
>     78                  while (resid > 0 && (r = (*fp->_read)(fp->_cookie, p, resid)) > 0) {
>     79                          p += r;
>     80                          resid -= r;
>     81                  }
>     82                  FUNLOCKFILE(fp);
>     83                  return ((total - resid) / size);
>     84          }


> Is it a bug to not set the __SEOF flag or it is expected for unbuffered
> FILE ?

seems like a bug. the code above was added more recently, and must have missed
this case.

Reply | Threaded
Open this post in threaded view
|

Re: fread() on unbuffered FILE doesn't set feof flag

Todd C. Miller-2
In reply to this post by Sebastien Marie-3
On Tue, 11 Dec 2018 20:35:05 +0100, Sebastien Marie wrote:

> According to stdio.h, 6 = __SNBF | __SRD, so "unbuffered" and "OK to read".
>
> the feof() call returns false, the python code interpretes it as an error.
>
> When looking at fread(3) code in libc, I found that we doesn't set
> __SEOF when the FILE is unbuffered.

Yes, that is a bug.  The code should probably be calling __srefill(),
though we'd need to set _bf._base and _bf._size appropriately so
it doesn't use _nbuf.  Some care is required...

 - todd

Reply | Threaded
Open this post in threaded view
|

Re: fread() on unbuffered FILE doesn't set feof flag

Sebastien Marie-3
On Tue, Dec 11, 2018 at 03:08:38PM -0700, Todd C. Miller wrote:

> On Tue, 11 Dec 2018 20:35:05 +0100, Sebastien Marie wrote:
>
> > According to stdio.h, 6 = __SNBF | __SRD, so "unbuffered" and "OK to read".
> >
> > the feof() call returns false, the python code interpretes it as an error.
> >
> > When looking at fread(3) code in libc, I found that we doesn't set
> > __SEOF when the FILE is unbuffered.
>
> Yes, that is a bug.  The code should probably be calling __srefill(),
> though we'd need to set _bf._base and _bf._size appropriately so
> it doesn't use _nbuf.  Some care is required...
>
>  - todd

It seems it is the rediscover of a previously signaled bug by zhuk@ .
see https://marc.info/?l=openbsd-tech&m=152554758706248&w=2

I found it after setting up a patch, and it is mostly the same than the
proposed one by zhuk@, so I take his diff (exactly, only the chunk that
correct the bug. his diff also corrects something else, but I prefer to
take care to only one thing a time).

The approch is to directly set __SEOF or __SERR. It is more simple than
trying to reuse __srefill().

Index: stdio/fread.c
===================================================================
RCS file: /cvs/src/lib/libc/stdio/fread.c,v
retrieving revision 1.15
diff -u -p -r1.15 fread.c
--- stdio/fread.c 21 Sep 2016 04:38:56 -0000 1.15
+++ stdio/fread.c 12 Dec 2018 07:52:02 -0000
@@ -79,6 +79,10 @@ fread(void *buf, size_t size, size_t cou
  p += r;
  resid -= r;
  }
+ if (r == 0)
+ fp->_flags |= __SEOF;
+ else if (r < 0)
+ fp->_flags |= __SERR;
  FUNLOCKFILE(fp);
  return ((total - resid) / size);
  }

I tested it with mercurial, and it corrects the problem.

Thanks.
--
Sebastien Marie

Reply | Threaded
Open this post in threaded view
|

Re: fread() on unbuffered FILE doesn't set feof flag

Todd C. Miller-2
On Wed, 12 Dec 2018 08:59:25 +0100, Sebastien Marie wrote:

> The approch is to directly set __SEOF or __SERR. It is more simple than
> trying to reuse __srefill().

My concern is that __srefill() does more than just that.  In particular,
it has the following:

    /*
     * Before reading from a line buffered or unbuffered file,
     * flush all line buffered output files, per the ANSI C
     * standard.
     */

However, a quick scan of the ISO C99 spec didn't reveal any such
requirement.  Perhaps someone else better versed in standardese
knows what the comment is referring to.

I also worry about the lack of stdio initialization (__sinit()) and
logic for switching from reading to writing.

 - todd

Reply | Threaded
Open this post in threaded view
|

Re: fread() on unbuffered FILE doesn't set feof flag

Theo de Raadt-2
Todd C. Miller <[hidden email]> wrote:

> On Wed, 12 Dec 2018 08:59:25 +0100, Sebastien Marie wrote:
>
> > The approch is to directly set __SEOF or __SERR. It is more simple than
> > trying to reuse __srefill().
>
> My concern is that __srefill() does more than just that.  In particular,
> it has the following:
>
>     /*
>      * Before reading from a line buffered or unbuffered file,
>      * flush all line buffered output files, per the ANSI C
>      * standard.
>      */
>
> However, a quick scan of the ISO C99 spec didn't reveal any such
> requirement.  Perhaps someone else better versed in standardese
> knows what the comment is referring to.
>
> I also worry about the lack of stdio initialization (__sinit()) and
> logic for switching from reading to writing.

Should the patch commited a while back be pulled, since it is wrong?

Reply | Threaded
Open this post in threaded view
|

Re: fread() on unbuffered FILE doesn't set feof flag

Todd C. Miller-2
On Wed, 12 Dec 2018 09:42:18 -0700, "Theo de Raadt" wrote:

> Should the patch commited a while back be pulled, since it is wrong?

I think so.  We can add regress tests for this and add back the
optimization once it is correct.

 - todd

----------------------------
revision 1.13
date: 2014/12/08 20:40:53;  author: tedu;  state: Exp;  lines: +16 -1;  commitid: e3UB5L29yh1qDFSO;
don't do silly (and slow) one byte reads in unbuffered mode.
from enh at google

Reply | Threaded
Open this post in threaded view
|

Re: fread() on unbuffered FILE doesn't set feof flag

Ted Unangst-6
In reply to this post by Todd C. Miller-2
Todd C. Miller wrote:

> On Wed, 12 Dec 2018 08:59:25 +0100, Sebastien Marie wrote:
>
> > The approch is to directly set __SEOF or __SERR. It is more simple than
> > trying to reuse __srefill().
>
> My concern is that __srefill() does more than just that.  In particular,
> it has the following:
>
>     /*
>      * Before reading from a line buffered or unbuffered file,
>      * flush all line buffered output files, per the ANSI C
>      * standard.
>      */
>
> However, a quick scan of the ISO C99 spec didn't reveal any such
> requirement.  Perhaps someone else better versed in standardese
> knows what the comment is referring to.
>
> I also worry about the lack of stdio initialization (__sinit()) and
> logic for switching from reading to writing.

isn't the point to avoid all this stuff because it's not necessary? srefill()
has to be one of the worst functions around. it does a dozen different things,
so whenever something is supposed to happen a call to srefill is inserted, but
nobody knows what's supposed to happen *right here right now*. like how do we
get here without previously calling sinit? srefill seems like a very strange
place to stash the init call.

i don't want to go too far down the legacy rabbit hole, just float nicely
above it.

Reply | Threaded
Open this post in threaded view
|

Re: fread() on unbuffered FILE doesn't set feof flag

Todd C. Miller-2
On Wed, 12 Dec 2018 13:08:07 -0500, "Ted Unangst" wrote:

> i don't want to go too far down the legacy rabbit hole, just float nicely
> above it.

Which is exactly how we got to this point.  What else is missing
and how many years will it take before someone notices?  Is it
really safe to completely ignore the stdio flags?

I wouldn't object to refactoring parts of __srefill() so we don't
need to call all of it in this case.  Perhaps something akin to
cantwrite (and __swsetup) as used by fvwrite.c.

 - todd

Reply | Threaded
Open this post in threaded view
|

Re: fread() on unbuffered FILE doesn't set feof flag

Ted Unangst-6
Todd C. Miller wrote:
> On Wed, 12 Dec 2018 13:08:07 -0500, "Ted Unangst" wrote:
>
> > i don't want to go too far down the legacy rabbit hole, just float nicely
> > above it.
>
> Which is exactly how we got to this point.  What else is missing
> and how many years will it take before someone notices?  Is it
> really safe to completely ignore the stdio flags?

One missed feature every five years doesn't sound all that bad. :)

But sure, if we want to revert and regress before putting it back, ok.

Reply | Threaded
Open this post in threaded view
|

Re: fread() on unbuffered FILE doesn't set feof flag

Ted Unangst-6
Ted Unangst wrote:

> Todd C. Miller wrote:
> > On Wed, 12 Dec 2018 13:08:07 -0500, "Ted Unangst" wrote:
> >
> > > i don't want to go too far down the legacy rabbit hole, just float nicely
> > > above it.
> >
> > Which is exactly how we got to this point.  What else is missing
> > and how many years will it take before someone notices?  Is it
> > really safe to completely ignore the stdio flags?
>
> One missed feature every five years doesn't sound all that bad. :)
>
> But sure, if we want to revert and regress before putting it back, ok.

I went with a simpler #if 0 approach because I think it will make it easier to
track the fixes.

ok?

Index: fread.c
===================================================================
RCS file: /cvs/src/lib/libc/stdio/fread.c,v
retrieving revision 1.15
diff -u -p -r1.15 fread.c
--- fread.c 21 Sep 2016 04:38:56 -0000 1.15
+++ fread.c 12 Dec 2018 18:50:03 -0000
@@ -69,6 +69,7 @@ fread(void *buf, size_t size, size_t cou
  total = resid;
  p = buf;
 
+#if 0
  if ((fp->_flags & __SNBF) != 0) {
  /*
  * We know if we're unbuffered that our buffer is empty, so
@@ -82,6 +83,7 @@ fread(void *buf, size_t size, size_t cou
  FUNLOCKFILE(fp);
  return ((total - resid) / size);
  }
+#endif
 
  while (resid > (r = fp->_r)) {
  (void)memcpy(p, fp->_p, r);

Reply | Threaded
Open this post in threaded view
|

Re: fread() on unbuffered FILE doesn't set feof flag

Sebastien Marie-3
On Wed, Dec 12, 2018 at 01:51:34PM -0500, Ted Unangst wrote:

> Ted Unangst wrote:
> > Todd C. Miller wrote:
> > > On Wed, 12 Dec 2018 13:08:07 -0500, "Ted Unangst" wrote:
> > >
> > > > i don't want to go too far down the legacy rabbit hole, just float nicely
> > > > above it.
> > >
> > > Which is exactly how we got to this point.  What else is missing
> > > and how many years will it take before someone notices?  Is it
> > > really safe to completely ignore the stdio flags?
> >
> > One missed feature every five years doesn't sound all that bad. :)
> >
> > But sure, if we want to revert and regress before putting it back, ok.
>
> I went with a simpler #if 0 approach because I think it will make it easier to
> track the fixes.
>
> ok?

OK semarie@

it is the more simpler approch for now. but I agree that it should be
back if possible.

thanks.

> Index: fread.c
> ===================================================================
> RCS file: /cvs/src/lib/libc/stdio/fread.c,v
> retrieving revision 1.15
> diff -u -p -r1.15 fread.c
> --- fread.c 21 Sep 2016 04:38:56 -0000 1.15
> +++ fread.c 12 Dec 2018 18:50:03 -0000
> @@ -69,6 +69,7 @@ fread(void *buf, size_t size, size_t cou
>   total = resid;
>   p = buf;
>  
> +#if 0
>   if ((fp->_flags & __SNBF) != 0) {
>   /*
>   * We know if we're unbuffered that our buffer is empty, so
> @@ -82,6 +83,7 @@ fread(void *buf, size_t size, size_t cou
>   FUNLOCKFILE(fp);
>   return ((total - resid) / size);
>   }
> +#endif
>  
>   while (resid > (r = fp->_r)) {
>   (void)memcpy(p, fp->_p, r);

--
Sebastien Marie

Reply | Threaded
Open this post in threaded view
|

Re: fread() on unbuffered FILE doesn't set feof flag

Theo de Raadt-2
In reply to this post by Ted Unangst-6
Ted Unangst <[hidden email]> wrote:

> Todd C. Miller wrote:
> > On Wed, 12 Dec 2018 08:59:25 +0100, Sebastien Marie wrote:
> >
> > > The approch is to directly set __SEOF or __SERR. It is more simple than
> > > trying to reuse __srefill().
> >
> > My concern is that __srefill() does more than just that.  In particular,
> > it has the following:
> >
> >     /*
> >      * Before reading from a line buffered or unbuffered file,
> >      * flush all line buffered output files, per the ANSI C
> >      * standard.
> >      */
> >
> > However, a quick scan of the ISO C99 spec didn't reveal any such
> > requirement.  Perhaps someone else better versed in standardese
> > knows what the comment is referring to.
> >
> > I also worry about the lack of stdio initialization (__sinit()) and
> > logic for switching from reading to writing.
>
> isn't the point to avoid all this stuff because it's not necessary? srefill()
> has to be one of the worst functions around. it does a dozen different things,
> so whenever something is supposed to happen a call to srefill is inserted, but
> nobody knows what's supposed to happen *right here right now*. like how do we
> get here without previously calling sinit? srefill seems like a very strange
> place to stash the init call.
>
> i don't want to go too far down the legacy rabbit hole, just float nicely
> above it.

I don't understand what you are saying.  Are you saying Torek got it wrong?
I doubt it.

Reply | Threaded
Open this post in threaded view
|

Re: fread() on unbuffered FILE doesn't set feof flag

Theo de Raadt-2
In reply to this post by Ted Unangst-6
Ted Unangst <[hidden email]> wrote:

> Ted Unangst wrote:
> > Todd C. Miller wrote:
> > > On Wed, 12 Dec 2018 13:08:07 -0500, "Ted Unangst" wrote:
> > >
> > > > i don't want to go too far down the legacy rabbit hole, just float nicely
> > > > above it.
> > >
> > > Which is exactly how we got to this point.  What else is missing
> > > and how many years will it take before someone notices?  Is it
> > > really safe to completely ignore the stdio flags?
> >
> > One missed feature every five years doesn't sound all that bad. :)
> >
> > But sure, if we want to revert and regress before putting it back, ok.
>
> I went with a simpler #if 0 approach because I think it will make it easier to
> track the fixes.
>
> ok?
>
> Index: fread.c
> ===================================================================
> RCS file: /cvs/src/lib/libc/stdio/fread.c,v
> retrieving revision 1.15
> diff -u -p -r1.15 fread.c
> --- fread.c 21 Sep 2016 04:38:56 -0000 1.15
> +++ fread.c 12 Dec 2018 18:50:03 -0000
> @@ -69,6 +69,7 @@ fread(void *buf, size_t size, size_t cou
>   total = resid;
>   p = buf;
>  
> +#if 0
>   if ((fp->_flags & __SNBF) != 0) {
>   /*
>   * We know if we're unbuffered that our buffer is empty, so
> @@ -82,6 +83,7 @@ fread(void *buf, size_t size, size_t cou
>   FUNLOCKFILE(fp);
>   return ((total - resid) / size);
>   }
> +#endif
>  
>   while (resid > (r = fp->_r)) {
>   (void)memcpy(p, fp->_p, r);
>

Well, I disagree.  I think the guy at google wanted to optimize this,
and he got it wrong.

Reply | Threaded
Open this post in threaded view
|

Re: fread() on unbuffered FILE doesn't set feof flag

Theo de Raadt-2
In reply to this post by Todd C. Miller-2
my point continues to be that enh@google optimized something, but
didn't handle all the cases

that should be fixed

but until it is fixed, the change should be backed out entirely.

#if 0 is lame.

blaming torek's stdio internals for this bug is makes no sense.

Reply | Threaded
Open this post in threaded view
|

Re: fread() on unbuffered FILE doesn't set feof flag

Vadim Zhukov-2
In reply to this post by Sebastien Marie-3
ср, 12 дек. 2018 г. в 10:59, Sebastien Marie <[hidden email]>:

>
> On Tue, Dec 11, 2018 at 03:08:38PM -0700, Todd C. Miller wrote:
> > On Tue, 11 Dec 2018 20:35:05 +0100, Sebastien Marie wrote:
> >
> > > According to stdio.h, 6 = __SNBF | __SRD, so "unbuffered" and "OK to read".
> > >
> > > the feof() call returns false, the python code interpretes it as an error.
> > >
> > > When looking at fread(3) code in libc, I found that we doesn't set
> > > __SEOF when the FILE is unbuffered.
> >
> > Yes, that is a bug.  The code should probably be calling __srefill(),
> > though we'd need to set _bf._base and _bf._size appropriately so
> > it doesn't use _nbuf.  Some care is required...
> >
> >  - todd
>
> It seems it is the rediscover of a previously signaled bug by zhuk@ .
> see https://marc.info/?l=openbsd-tech&m=152554758706248&w=2
>
> I found it after setting up a patch, and it is mostly the same than the
> proposed one by zhuk@, so I take his diff (exactly, only the chunk that
> correct the bug. his diff also corrects something else, but I prefer to
> take care to only one thing a time).
>
> The approch is to directly set __SEOF or __SERR. It is more simple than
> trying to reuse __srefill().
>
> Index: stdio/fread.c
> ===================================================================
> RCS file: /cvs/src/lib/libc/stdio/fread.c,v
> retrieving revision 1.15
> diff -u -p -r1.15 fread.c
> --- stdio/fread.c       21 Sep 2016 04:38:56 -0000      1.15
> +++ stdio/fread.c       12 Dec 2018 07:52:02 -0000
> @@ -79,6 +79,10 @@ fread(void *buf, size_t size, size_t cou
>                         p += r;
>                         resid -= r;
>                 }
> +               if (r == 0)
> +                       fp->_flags |= __SEOF;
> +               else if (r < 0)
> +                       fp->_flags |= __SERR;
>                 FUNLOCKFILE(fp);
>                 return ((total - resid) / size);
>         }
>
> I tested it with mercurial, and it corrects the problem.
>
> Thanks.

I've been running libc with this patch for months, and got not
problems yet, FWIW.

The four lines you're adding are similar to other stdio code, e.g.,
see vfprintf.c.

The hunk you omitted probably not needed in my case (buffered stdio
over IPv4 TCP socket), but it seemed to be logical one (EAGAIN
received shouldn't invalidate stream offset, it's still well-known),
and also seem to be more non-seekable-file-descriptor-friendly. And,
again, I've been running both hunks all that time.

I'm glad this issue finally got discussed - thanks, I always suck at
attracting attention. :)

Reply | Threaded
Open this post in threaded view
|

Re: fread() on unbuffered FILE doesn't set feof flag

Todd C. Miller-2
In reply to this post by Todd C. Miller-2
Here's a diff that fakes up a FILE like we do in other parts of
stdio for unbuffered I/O.  This lets us call __srefill() normally.
If __srefill() returns an error, copy the flags back to the real
FILE *.

 - todd

Index: lib/libc/stdio/fread.c
===================================================================
RCS file: /cvs/src/lib/libc/stdio/fread.c,v
retrieving revision 1.15
diff -u -p -u -r1.15 fread.c
--- lib/libc/stdio/fread.c 21 Sep 2016 04:38:56 -0000 1.15
+++ lib/libc/stdio/fread.c 13 Dec 2018 17:24:34 -0000
@@ -69,18 +69,39 @@ fread(void *buf, size_t size, size_t cou
  total = resid;
  p = buf;
 
+ /*
+ * If we're unbuffered we know that the buffer in fp is empty so
+ * we can read directly into buf.  This is much faster than a
+ * series of one byte reads into fp->_nbuf.
+ */
  if ((fp->_flags & __SNBF) != 0) {
- /*
- * We know if we're unbuffered that our buffer is empty, so
- * we can just read directly. This is much faster than the
- * loop below which will perform a series of one byte reads.
- */
- while (resid > 0 && (r = (*fp->_read)(fp->_cookie, p, resid)) > 0) {
- p += r;
- resid -= r;
+ FILE fake;
+ struct __sfileext fakeext;
+
+ _FILEEXT_SETUP(&fake, &fakeext);
+ /* copy the important variables */
+ fake._flags = fp->_flags;
+ fake._file = fp->_file;
+ fake._cookie = fp->_cookie;
+ fake._read = fp->_read;
+
+ /* set up the buffer */
+ fake._bf._base = buf;
+ fake._bf._size = total;
+ fake._lbfsize = 0; /* not line buffered */
+
+ while (resid > 0) {
+ if (__srefill(&fake)) {
+ /* no more input: return partial result */
+ count = (total - resid) / size;
+ fp->_flags |= (fake._flags & (__SERR|__SEOF));
+ break;
+ }
+ fake._p += fake._r;
+ resid -= fake._r;
  }
  FUNLOCKFILE(fp);
- return ((total - resid) / size);
+ return (count);
  }
 
  while (resid > (r = fp->_r)) {

Reply | Threaded
Open this post in threaded view
|

Re: fread() on unbuffered FILE doesn't set feof flag

Juan Francisco Cantero Hurtado
On Thu, Dec 13, 2018 at 11:01:10AM -0700, Todd C. Miller wrote:
> Here's a diff that fakes up a FILE like we do in other parts of
> stdio for unbuffered I/O.  This lets us call __srefill() normally.
> If __srefill() returns an error, copy the flags back to the real
> FILE *.

Your patch fixes the mercurial crashes. Many thanks.


>
>  - todd
>
> Index: lib/libc/stdio/fread.c
> ===================================================================
> RCS file: /cvs/src/lib/libc/stdio/fread.c,v
> retrieving revision 1.15
> diff -u -p -u -r1.15 fread.c
> --- lib/libc/stdio/fread.c 21 Sep 2016 04:38:56 -0000 1.15
> +++ lib/libc/stdio/fread.c 13 Dec 2018 17:24:34 -0000
> @@ -69,18 +69,39 @@ fread(void *buf, size_t size, size_t cou
>   total = resid;
>   p = buf;
>  
> + /*
> + * If we're unbuffered we know that the buffer in fp is empty so
> + * we can read directly into buf.  This is much faster than a
> + * series of one byte reads into fp->_nbuf.
> + */
>   if ((fp->_flags & __SNBF) != 0) {
> - /*
> - * We know if we're unbuffered that our buffer is empty, so
> - * we can just read directly. This is much faster than the
> - * loop below which will perform a series of one byte reads.
> - */
> - while (resid > 0 && (r = (*fp->_read)(fp->_cookie, p, resid)) > 0) {
> - p += r;
> - resid -= r;
> + FILE fake;
> + struct __sfileext fakeext;
> +
> + _FILEEXT_SETUP(&fake, &fakeext);
> + /* copy the important variables */
> + fake._flags = fp->_flags;
> + fake._file = fp->_file;
> + fake._cookie = fp->_cookie;
> + fake._read = fp->_read;
> +
> + /* set up the buffer */
> + fake._bf._base = buf;
> + fake._bf._size = total;
> + fake._lbfsize = 0; /* not line buffered */
> +
> + while (resid > 0) {
> + if (__srefill(&fake)) {
> + /* no more input: return partial result */
> + count = (total - resid) / size;
> + fp->_flags |= (fake._flags & (__SERR|__SEOF));
> + break;
> + }
> + fake._p += fake._r;
> + resid -= fake._r;
>   }
>   FUNLOCKFILE(fp);
> - return ((total - resid) / size);
> + return (count);
>   }
>  
>   while (resid > (r = fp->_r)) {
>

--
Juan Francisco Cantero Hurtado http://juanfra.info

Reply | Threaded
Open this post in threaded view
|

Re: fread() on unbuffered FILE doesn't set feof flag

Sebastien Marie-3
In reply to this post by Todd C. Miller-2
On Thu, Dec 13, 2018 at 11:01:10AM -0700, Todd C. Miller wrote:
> Here's a diff that fakes up a FILE like we do in other parts of
> stdio for unbuffered I/O.  This lets us call __srefill() normally.
> If __srefill() returns an error, copy the flags back to the real
> FILE *.
>
>  - todd

I like a lot the way it is done. Using a copy of FILE argument with
_bf._size changed to request read size is great.

and as juanfra@ said, it solves mercurial problem (I tested it too).

some comments inlined.

> Index: lib/libc/stdio/fread.c
> ===================================================================
> RCS file: /cvs/src/lib/libc/stdio/fread.c,v
> retrieving revision 1.15
> diff -u -p -u -r1.15 fread.c
> --- lib/libc/stdio/fread.c 21 Sep 2016 04:38:56 -0000 1.15
> +++ lib/libc/stdio/fread.c 13 Dec 2018 17:24:34 -0000
> @@ -69,18 +69,39 @@ fread(void *buf, size_t size, size_t cou
>   total = resid;
>   p = buf;
>  
> + /*
> + * If we're unbuffered we know that the buffer in fp is empty so
> + * we can read directly into buf.  This is much faster than a
> + * series of one byte reads into fp->_nbuf.
> + */
>   if ((fp->_flags & __SNBF) != 0) {
> - /*
> - * We know if we're unbuffered that our buffer is empty, so
> - * we can just read directly. This is much faster than the
> - * loop below which will perform a series of one byte reads.
> - */
> - while (resid > 0 && (r = (*fp->_read)(fp->_cookie, p, resid)) > 0) {
> - p += r;
> - resid -= r;
> + FILE fake;
> + struct __sfileext fakeext;
> +
> + _FILEEXT_SETUP(&fake, &fakeext);
> + /* copy the important variables */
> + fake._flags = fp->_flags;
> + fake._file = fp->_file;
> + fake._cookie = fp->_cookie;
> + fake._read = fp->_read;

in some cases (but it might be an ill argument passed to fread(): see
"if not already reading, have to be reading and writing"), __srefill()
could call __sflush(), and it will call _write handler (which is
uninitialized data). should we copying others handlers too ?

> +
> + /* set up the buffer */
> + fake._bf._base = buf;

does a test for ill arguments (like buf == NULL) should be done ?
else __srefill() will call __smakebuf(), and it will set:

        fp->_bf._base = fp->_p = fp->_nbuf;

using _nbuf which is uninitialized data.

> + fake._bf._size = total;
> + fake._lbfsize = 0; /* not line buffered */
> +

Globally, I think `fake' should be properly initialized by doing a
memcpy() with `fp' data. And next, override values to copte with buffer
size. This way, it will take care of all ill cases, and no uninitialized
data will be used.

> + while (resid > 0) {
> + if (__srefill(&fake)) {
> + /* no more input: return partial result */
> + count = (total - resid) / size;
> + fp->_flags |= (fake._flags & (__SERR|__SEOF));

Initially, I wonder why you copied back only __SERR|__SEOF to _flags
(__srefill could change few others flags), but as _flags is sole
parameter to be copied back it makes sens.

> + break;
> + }
> + fake._p += fake._r;
> + resid -= fake._r;
>   }
>   FUNLOCKFILE(fp);
> - return ((total - resid) / size);
> + return (count);
>   }
>  
>   while (resid > (r = fp->_r)) {

Thanks.
--
Sebastien Marie

Reply | Threaded
Open this post in threaded view
|

Re: fread() on unbuffered FILE doesn't set feof flag

Todd C. Miller-2
On Fri, 14 Dec 2018 08:45:31 +0100, Sebastien Marie wrote:

> I like a lot the way it is done. Using a copy of FILE argument with
> _bf._size changed to request read size is great.
>
> and as juanfra@ said, it solves mercurial problem (I tested it too).
>
> some comments inlined.
>
> > Index: lib/libc/stdio/fread.c
> > ===================================================================
> > RCS file: /cvs/src/lib/libc/stdio/fread.c,v
> > retrieving revision 1.15
> > diff -u -p -u -r1.15 fread.c
> > --- lib/libc/stdio/fread.c 21 Sep 2016 04:38:56 -0000 1.15
> > +++ lib/libc/stdio/fread.c 13 Dec 2018 17:24:34 -0000
> > @@ -69,18 +69,39 @@ fread(void *buf, size_t size, size_t cou
> >   total = resid;
> >   p = buf;
> >  
> > + /*
> > + * If we're unbuffered we know that the buffer in fp is empty so
> > + * we can read directly into buf.  This is much faster than a
> > + * series of one byte reads into fp->_nbuf.
> > + */
> >   if ((fp->_flags & __SNBF) != 0) {
> > - /*
> > - * We know if we're unbuffered that our buffer is empty, so
> > - * we can just read directly. This is much faster than the
> > - * loop below which will perform a series of one byte reads.
> > - */
> > - while (resid > 0 && (r = (*fp->_read)(fp->_cookie, p, resid)) >
>  0) {
> > - p += r;
> > - resid -= r;
> > + FILE fake;
> > + struct __sfileext fakeext;
> > +
> > + _FILEEXT_SETUP(&fake, &fakeext);
> > + /* copy the important variables */
> > + fake._flags = fp->_flags;
> > + fake._file = fp->_file;
> > + fake._cookie = fp->_cookie;
> > + fake._read = fp->_read;
>
> in some cases (but it might be an ill argument passed to fread(): see
> "if not already reading, have to be reading and writing"), __srefill()
> could call __sflush(), and it will call _write handler (which is
> uninitialized data). should we copying others handlers too ?

Perhaps.  For that code path to be safe we'd also need to initialize
fp->_p to the same value as fp->_bf._base.  It's not really possible
to flush any data since we've replaced the buffer but since we are
unbuffered anyway there is nothing to flush.

> > +
> > + /* set up the buffer */
> > + fake._bf._base = buf;
>
> does a test for ill arguments (like buf == NULL) should be done ?
> else __srefill() will call __smakebuf(), and it will set:
>
> fp->_bf._base = fp->_p = fp->_nbuf;
>
> using _nbuf which is uninitialized data.

If buf is NULL then we should just skip this optimization and let
fread(3) deference NULL in the memcpy() call.

> > + fake._bf._size = total;
> > + fake._lbfsize = 0; /* not line buffered */
> > +
>
> Globally, I think `fake' should be properly initialized by doing a
> memcpy() with `fp' data. And next, override values to copte with buffer
> size. This way, it will take care of all ill cases, and no uninitialized
> data will be used.

I was initially going to do that but decided that since vfprintf.c
just initialized things manually to follow suit.  Another option
is to just use fp directly and reset fp->bf and fp->p when we are
done.

> > + while (resid > 0) {
> > + if (__srefill(&fake)) {
> > + /* no more input: return partial result */
> > + count = (total - resid) / size;
> > + fp->_flags |= (fake._flags & (__SERR|__SEOF));
>
> Initially, I wonder why you copied back only __SERR|__SEOF to _flags
> (__srefill could change few others flags), but as _flags is sole
> parameter to be copied back it makes sens.
>
> > + break;
> > + }
> > + fake._p += fake._r;
> > + resid -= fake._r;
> >   }
> >   FUNLOCKFILE(fp);
> > - return ((total - resid) / size);
> > + return (count);
> >   }
> >  
> >   while (resid > (r = fp->_r)) {

After thinking about this a bit more I believe it best to just use
the existing FILE * and swap in the buffer temporarily.  There's
no need to store the value of the old buffer; since we are unbuffered
it can only point to _nbuf.

This is simpler and should address all of your concerns.

 - todd

Index: lib/libc/stdio/fread.c
===================================================================
RCS file: /cvs/src/lib/libc/stdio/fread.c,v
retrieving revision 1.15
diff -u -p -u -r1.15 fread.c
--- lib/libc/stdio/fread.c 21 Sep 2016 04:38:56 -0000 1.15
+++ lib/libc/stdio/fread.c 14 Dec 2018 17:48:46 -0000
@@ -69,18 +69,33 @@ fread(void *buf, size_t size, size_t cou
  total = resid;
  p = buf;
 
- if ((fp->_flags & __SNBF) != 0) {
- /*
- * We know if we're unbuffered that our buffer is empty, so
- * we can just read directly. This is much faster than the
- * loop below which will perform a series of one byte reads.
- */
- while (resid > 0 && (r = (*fp->_read)(fp->_cookie, p, resid)) > 0) {
- p += r;
- resid -= r;
+ /*
+ * If we're unbuffered we know that the buffer in fp is empty so
+ * we can read directly into buf.  This is much faster than a
+ * series of one byte reads into fp->_nbuf.
+ */
+ if ((fp->_flags & __SNBF) != 0 && buf != NULL) {
+ /* set up the buffer */
+ fp->_bf._base = fp->_p = buf;
+ fp->_bf._size = total;
+
+ while (resid > 0) {
+ if (__srefill(fp)) {
+ /* no more input: return partial result */
+ count = (total - resid) / size;
+ break;
+ }
+ fp->_p += fp->_r;
+ resid -= fp->_r;
  }
+
+ /* restore the old buffer (see __smakebuf) */
+ fp->_bf._base = fp->_p = fp->_nbuf;
+ fp->_bf._size = 1;
+ fp->_r = 0;
+
  FUNLOCKFILE(fp);
- return ((total - resid) / size);
+ return (count);
  }
 
  while (resid > (r = fp->_r)) {

12