getdelim and O_NONBLOCK

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

getdelim and O_NONBLOCK

Martijn van Duren-8
Running getdelim on a nonblocking socket results in data loss of the
first part of the message if the said message is sent in chunks.
Code below shows how to repeat.

POSIX states the following:
For the conditions under which the getdelim() and getline() functions
shall fail and may fail, refer to fgetc.

And for fgetc:
[EAGAIN]
    The O_NONBLOCK flag is set for the file descriptor underlying stream
    and the thread would be delayed in the fgetc() operation

This to me reads that the first call should retain the data in the
buffer and the second call should return the entire sentence.
I also ran the code on Alpine Linux (musl libc) and Linux Mint (glibc).
Musl behaves just like us and glibc returns the first part of the
sentence without the delimiter (returning a positive value, indicating
there's no error so far), which is a violation of the specs, which
states that the data returned contains the newline.

I also looked at the getdelim.c code, but I don't have the knowledge/
time to send a diff at this time. But maybe someone has some useful
input on this.

martijn@

#include <sys/select.h>
#include <sys/socket.h>

#include <err.h>
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>

void parent(int);
void child(int);

int
main(int argc, char *argv[])
{
        int sp[2];

        if (socketpair(AF_UNIX, SOCK_STREAM, AF_UNSPEC, sp) == -1)
                err(1, "socketpair");

        switch (fork()) {
        case -1:
                err(1, "fork");

        case 0:
                close(sp[1]);
                child(sp[0]);
                break;
        default:
                close(sp[0]);
                parent(sp[1]);
                break;
        }
        return 0;
}

void
parent(int fd)
{
        char msg1[] = "don't ";
        char msg2[] = "fail\n";
        if (write(fd, msg1, sizeof(msg1) - 1) == -1)
                err(1, "write");
        sleep(1);
        if (write(fd, msg2, sizeof(msg2) - 1) == -1)
                err(1, "write");
}

void
child(int fd)
{
        FILE *f;
        char *line = NULL;
        size_t n = 0;
        ssize_t len;

        f = fdopen(fd, "r+");
        if (fcntl(fd, F_SETFL, O_NONBLOCK) == -1)
                err(1, "fcntl");
       
        do {
                len = getline(&line, &n, f);
        } while (len == -1 && errno == EAGAIN);

        if (len == 0 && errno != EAGAIN)
                err(1, "getline");
        printf("%s\n", line);
}

Reply | Threaded
Open this post in threaded view
|

Re: getdelim and O_NONBLOCK

Scott Cheloha
On Fri, Mar 29, 2019 at 08:19:12PM +0100, Martijn van Duren wrote:

> Running getdelim on a nonblocking socket results in data loss of the
> first part of the message if the said message is sent in chunks.
> Code below shows how to repeat.
>
> POSIX states the following:
> For the conditions under which the getdelim() and getline() functions
> shall fail and may fail, refer to fgetc.
>
> And for fgetc:
> [EAGAIN]
>     The O_NONBLOCK flag is set for the file descriptor underlying stream
>     and the thread would be delayed in the fgetc() operation
>
> This to me reads that the first call should retain the data in the
> buffer and the second call should return the entire sentence.
> I also ran the code on Alpine Linux (musl libc) and Linux Mint (glibc).
> Musl behaves just like us and glibc returns the first part of the
> sentence without the delimiter (returning a positive value, indicating
> there's no error so far), which is a violation of the specs, which
> states that the data returned contains the newline.
>
> I also looked at the getdelim.c code, but I don't have the knowledge/
> time to send a diff at this time. But maybe someone has some useful
> input on this.

Hmmm, interesting corner case.  Clobbering the data without ever
telling the caller about it is bad.

One solution is something like: check in the error case if we read
anything, and if so then "put it all back" a la ungetc(3).

I don't think we can just store what we read in the buffer across
calls.  The only state we're given when we call getdelim(3) is the
size of the buffer, which is at least buflen.  But the spec says
nothing about the caller using the same buffer between calls: even
if that's idiomatic we can't rely on the application to do that for us.

I'm beat, so this isn't happening immediately, but I think if we
refactored __srefill() and added something like __sappend() to append
new data to the FILE's buffer (growing it if necessary) and then
changed the getdelim(3) logic to __sappend() if fp->_r > 0 and the
delimiter is not in the buffer, we'd... have done what I just
described.  But I need to look more closely at stdio to figure out
how... and to make sure I'm not talking nonsense.

We sort of do something similar in fgetln(3) already, but we use an
auxiliary buffer.

Maybe of note is that our fgetc(3) docs claim C89 behavior, not all
the additional stuff POSIX.1-2008 specifies.  Which doesn't make
our getdelim(3) non-conforming per se, but it makes it difficult
for the application developer to even know that this case is
possible.