High performance IO (sendfile(), caching, and libev(ent))

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

High performance IO (sendfile(), caching, and libev(ent))

Jean-Philippe Ouellet
Hello,

I'm trying to learn about writing high performance servers, and I have a
few questions not clearly answered by any documentation I can find. I'm
comfortable with select(), poll(), and kqueue(), but that only goes so
far. I'm currently looking into how to send static files (over a
network) with the least amount of overhead.

There was a post [1] on misc@ asking about the status of a sendfile()
call, but nobody replied (and it seems that splice(2) and tee(2) are
just GNUisms). It appears that there's been some work on socket splicing
(see sosplice() in [2]), but there's still no sendfile (or if it's
there, I must not be looking in the right place [3]).

If I want to serve a bunch of files often, is it fine to rely on the
kernel's filesystem caching? or should I mmap() them into my address
space and madvise() them to not be swapped out? Is it reasonable to
stat() the file each time it is served (from my cached copy) to compare
the file's modification time to the time it was cached? Would this
actually hit the disk each time? or does the kernel keep that cached?

It seems obvious to me that it should be be cached, but I can't actually
find the relevant code. I spent a while digging through the kernel, but
I don't really know where to look, and I'm not sure I'd recognize what
I'm looking for if I found it anyway. The closest thing I found to
something I think might be relevant was some cryptic vfs stuff. :( I'm
no kernel dev, I don't pretend to understand OpenBSD internals nearly as
well as I'd like to.

Lastly, What's the OpenBSD community's current opinion on libevent /
libev. Are they secure / stable enough that they should be considered
for new code in base? Are they worth using instead of just using
select/poll/kqueue/event(3) directly?

[1] http://marc.info/?l=openbsd-misc&m=112690025715479&w=2
[2]
http://www.openbsd.org/cgi-bin/cvsweb/~checkout~/src/sys/kern/uipc_socket.c
[3] http://www.openbsd.org/cgi-bin/cvsweb/~checkout~/src/sys/kern/syscalls.c

Many thanks for any and all advice,
Jean-Philippe Ouellet

Reply | Threaded
Open this post in threaded view
|

Re: High performance IO (sendfile(), caching, and libev(ent))

Jean-Philippe Ouellet
On 12/20/12 3:53 AM, Jean-Philippe Ouellet wrote:
> and madvise() them to not be swapped out?

Oops, I think I might have misinterpreted the meaning of MADV_WILLNEED.
I think I meant mlock().

Reply | Threaded
Open this post in threaded view
|

Re: High performance IO (sendfile(), caching, and libev(ent))

Otto Moerbeek
On Thu, Dec 20, 2012 at 04:06:52AM -0500, Jean-Philippe Ouellet wrote:

> On 12/20/12 3:53 AM, Jean-Philippe Ouellet wrote:
> > and madvise() them to not be swapped out?
>
> Oops, I think I might have misinterpreted the meaning of MADV_WILLNEED.
> I think I meant mlock().

Why trying to be smarter than the kernel? Mlocking pages will kill you
if there's memory shortage.

The kernel will try to keep much used pages in mem anyway.

        -Otto

Reply | Threaded
Open this post in threaded view
|

Re: High performance IO (sendfile(), caching, and libev(ent))

Otto Moerbeek
In reply to this post by Jean-Philippe Ouellet
On Thu, Dec 20, 2012 at 03:53:44AM -0500, Jean-Philippe Ouellet wrote:

> Hello,
>
> I'm trying to learn about writing high performance servers, and I have a
> few questions not clearly answered by any documentation I can find. I'm
> comfortable with select(), poll(), and kqueue(), but that only goes so
> far. I'm currently looking into how to send static files (over a
> network) with the least amount of overhead.
>
> There was a post [1] on misc@ asking about the status of a sendfile()
> call, but nobody replied (and it seems that splice(2) and tee(2) are
> just GNUisms). It appears that there's been some work on socket splicing
> (see sosplice() in [2]), but there's still no sendfile (or if it's
> there, I must not be looking in the right place [3]).

I do not know of any effort to introduce sendfile(2).

>
> If I want to serve a bunch of files often, is it fine to rely on the
> kernel's filesystem caching? or should I mmap() them into my address
> space and madvise() them to not be swapped out? Is it reasonable to
> stat() the file each time it is served (from my cached copy) to compare
> the file's modification time to the time it was cached? Would this
> actually hit the disk each time? or does the kernel keep that cached?

Filesystem caching should be fine, a lot of effort went into this
lately.  File metadata is cached by the kernel vfs layer.

        -Otto

>
> It seems obvious to me that it should be be cached, but I can't actually
> find the relevant code. I spent a while digging through the kernel, but
> I don't really know where to look, and I'm not sure I'd recognize what
> I'm looking for if I found it anyway. The closest thing I found to
> something I think might be relevant was some cryptic vfs stuff. :( I'm
> no kernel dev, I don't pretend to understand OpenBSD internals nearly as
> well as I'd like to.
>
> Lastly, What's the OpenBSD community's current opinion on libevent /
> libev. Are they secure / stable enough that they should be considered
> for new code in base? Are they worth using instead of just using
> select/poll/kqueue/event(3) directly?
>
> [1] http://marc.info/?l=openbsd-misc&m=112690025715479&w=2
> [2]
> http://www.openbsd.org/cgi-bin/cvsweb/~checkout~/src/sys/kern/uipc_socket.c
> [3] http://www.openbsd.org/cgi-bin/cvsweb/~checkout~/src/sys/kern/syscalls.c
>
> Many thanks for any and all advice,
> Jean-Philippe Ouellet

Reply | Threaded
Open this post in threaded view
|

Re: High performance IO (sendfile(), caching, and libev(ent))

Jean-Philippe Ouellet
In reply to this post by Otto Moerbeek
On 12/20/12 4:20 AM, Otto Moerbeek wrote:

> On Thu, Dec 20, 2012 at 04:06:52AM -0500, Jean-Philippe Ouellet wrote:
>
>> On 12/20/12 3:53 AM, Jean-Philippe Ouellet wrote:
>>> and madvise() them to not be swapped out?
>>
>> Oops, I think I might have misinterpreted the meaning of MADV_WILLNEED.
>> I think I meant mlock().
>
> Why trying to be smarter than the kernel? Mlocking pages will kill you
> if there's memory shortage.
>
> The kernel will try to keep much used pages in mem anyway.
>
> -Otto
>

Okay, yeah. That's a terrible idea. But still, the question of direct
file-to-socket sending vs. keeping copies in my address space and
write()ing those to the socket still remains.

Normally I would just write both and profile them, but I can't figure
out how to do the first on OpenBSD.

Reply | Threaded
Open this post in threaded view
|

Re: High performance IO (sendfile(), caching, and libev(ent))

Andres Perera-4
In reply to this post by Jean-Philippe Ouellet
On Thu, Dec 20, 2012 at 4:23 AM, Jean-Philippe Ouellet
<[hidden email]> wrote:
> Hello,
>
> I'm trying to learn about writing high performance servers, and I have a
> few questions not clearly answered by any documentation I can find. I'm
> comfortable with select(), poll(), and kqueue(), but that only goes so
> far. I'm currently looking into how to send static files (over a
> network) with the least amount of overhead.
>

rest assured this is not the right avenue

a few moons ago there was a discussion involving a converse client issue

certain people did not understand why firefox opens so many sockets

since select() and cousins are used in situations where concurrent
socket io is desirable (if only because of inadequacies of http), you
want to go to a place where they write high throughput servers. you
won't find that in openbsd outside of non-tcp servers and contribs
like nginx

Reply | Threaded
Open this post in threaded view
|

Re: High performance IO (sendfile(), caching, and libev(ent))

Tobias Ulmer
In reply to this post by Jean-Philippe Ouellet
On Thu, Dec 20, 2012 at 04:26:48AM -0500, Jean-Philippe Ouellet wrote:

> On 12/20/12 4:20 AM, Otto Moerbeek wrote:
> > On Thu, Dec 20, 2012 at 04:06:52AM -0500, Jean-Philippe Ouellet wrote:
> >
> >> On 12/20/12 3:53 AM, Jean-Philippe Ouellet wrote:
> >>> and madvise() them to not be swapped out?
> >>
> >> Oops, I think I might have misinterpreted the meaning of MADV_WILLNEED.
> >> I think I meant mlock().
> >
> > Why trying to be smarter than the kernel? Mlocking pages will kill you
> > if there's memory shortage.
> >
> > The kernel will try to keep much used pages in mem anyway.
> >
> > -Otto
> >
>
> Okay, yeah. That's a terrible idea. But still, the question of direct
> file-to-socket sending vs. keeping copies in my address space and
> write()ing those to the socket still remains.

The file will be in the buffer cache. While it still takes a few
in-memory copies (which is what sendfile saves you), this should be fast
enough for most cases.

If you keep the data in your address space, you save one m-to-m copy,
but ignore all the benefits that the bc has compared to you (namely
knowing how much free memory really is available at runtime, not forcing
buffers into swap and more). You will probably end up shooting yourself
in the leg for a speed gain that probably can't be realized because the
network is the real bottleneck

Taking memory away from the kernel to duplicate functionality in
user-space is almost never a good idea.

>
> Normally I would just write both and profile them, but I can't figure
> out how to do the first on OpenBSD.

Reply | Threaded
Open this post in threaded view
|

Re: High performance IO (sendfile(), caching, and libev(ent))

Andres Perera-4
On Thu, Dec 20, 2012 at 6:06 AM, Tobias Ulmer <[hidden email]> wrote:
>
> The file will be in the buffer cache. While it still takes a few
> in-memory copies (which is what sendfile saves you), this should be fast
> enough for most cases.
>
> If you keep the data in your address space, you save one m-to-m copy,
> but ignore all the benefits that the bc has compared to you (namely
> knowing how much free memory really is available at runtime, not forcing
> buffers into swap and more).

this is called optimizing for the worst case of resource starvation
over disk usage. clearly a question of priorities

1. the kernel file buffer cache having knowledge about free mem alone
is irrelevant because sendfile() has the capability of returning
ENOMEM

2. if you are hitting swap, buy more memory or stop sending so many/as big files

> You will probably end up shooting yourself
> in the leg for a speed gain that probably can't be realized because the
> network is the real bottleneck
>
> Taking memory away from the kernel to duplicate functionality in
> user-space is almost never a good idea.
>
>>
>> Normally I would just write both and profile them, but I can't figure
>> out how to do the first on OpenBSD.

Reply | Threaded
Open this post in threaded view
|

Re: High performance IO (sendfile(), caching, and libev(ent))

William Ahern-2
In reply to this post by Jean-Philippe Ouellet
On Thu, Dec 20, 2012 at 03:53:44AM -0500, Jean-Philippe Ouellet wrote:
> Hello,
>
> I'm trying to learn about writing high performance servers, and I have a
> few questions not clearly answered by any documentation I can find. I'm
> comfortable with select(), poll(), and kqueue(), but that only goes so
> far. I'm currently looking into how to send static files (over a
> network) with the least amount of overhead.

If you want the least amount of overheard conceivably possible, take a look
at this project:

        http://info.iet.unipi.it/~luigi/netmap/

It's FreeBSD and Linux only, however.

> There was a post [1] on misc@ asking about the status of a sendfile()
> call, but nobody replied (and it seems that splice(2) and tee(2) are
> just GNUisms). It appears that there's been some work on socket splicing
> (see sosplice() in [2]), but there's still no sendfile (or if it's
> there, I must not be looking in the right place [3]).

AFAIK only FreeBSD, Solaris, Linux, and OS X implement sendfile, albeit
slightly differently in some cases. There is no sendfile on OpenBSD, or
NetBSD for that matter.

On Linux few-to-none use splice, because it requires an intermediate pipe.
You can't move directly between sockets, or between a socket and file, which
is an enormous PITA.

Also note that sendfile--all implementions--cannot do non-blocking I/O on
the file side. If your files won't fit in memory, you need to utilize
multiple threads or processes if you want low latency and high concurrency.

> If I want to serve a bunch of files often, is it fine to rely on the
> kernel's filesystem caching? or should I mmap() them into my address
> space and madvise() them to not be swapped out? Is it reasonable to
> stat() the file each time it is served (from my cached copy) to compare
> the file's modification time to the time it was cached? Would this
> actually hit the disk each time? or does the kernel keep that cached?
>
> It seems obvious to me that it should be be cached, but I can't actually
> find the relevant code. I spent a while digging through the kernel, but
> I don't really know where to look, and I'm not sure I'd recognize what
> I'm looking for if I found it anyway. The closest thing I found to
> something I think might be relevant was some cryptic vfs stuff. :( I'm
> no kernel dev, I don't pretend to understand OpenBSD internals nearly as
> well as I'd like to.

If you want to know which methodology is preferable, test, test, test.
Reading the implementation and forming conjectures isn't going to get you
very far. And results will vary, so it's best to design a small module or
library to abstract the details and implement your preferable interface
given the constraints. Here's where people might suggest just using libevent
2.x or Boost or whatever. The problem there is that the first, second, and
even third iterations of such implementations usually suck. And if you're
trying to be an external library you're stuck with the leaky interface,
which limits how much you can hack on the implementation.

> Lastly, What's the OpenBSD community's current opinion on libevent /
> libev. Are they secure / stable enough that they should be considered
> for new code in base? Are they worth using instead of just using
> select/poll/kqueue/event(3) directly?
>

event(3) is the original libevent, basically libevent 1.4.x. If you're
trying to write something portable, or if you would benefit from the timer
functionality, then go ahead and use it. Otherwise, it's superfluous. FWIW,
you can write a tiny wrapper around kqueue/epoll/ports(Solaris) in a small
amount of code. It's really the timer functionality that takes effort, and
both libevent and libev have good timer functionality, if a little
complicated after accumulating years of patches.

One piece of advice: avoid callback interfaces. They're of course necessary
with an event loop, but when other modules--buffered I/O, DNS, HTTP,
etc--also utilize callbacks, things become horrendously complicated and
difficult to debug. They're the modern incarnation of goto, spaghetti code
hell. The alternatives, ironically, often rely on using goto in small state
machines ;)