NSD sendto issue

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

NSD sendto issue

Joerg Jung
Hi,

I run a few busy (~800 req/s) NSD servers which I upgraded
to 6.5, all stock/default OpenBSD, e.g. I’ve not tweaked any
sysctl values and nsd.conf matches the default as well, just
added a few hundred zones.

Now, when I increase servers from default 1 to 2 in nsd.conf:
        server-count: 2
it starts spamming my log with:
        nsd[62723]: sendto 1.2.3.4 failed: Resource temporarily unavailable

checking the source, server.c seems not to handle EAGAIN
after sendto() and does not recover or retry, it just increases
txerr statistic count - so answer seems really lost :(

I tried higher debug level, as well as increasing socket buffers to:
        net.inet.udp.recvspace= 65536
        net.inet.udp.sendspace=65636
but both didn’t help and netstat -s -p udp does show
        0 dropped due to full socket buffers  
anyways. So, I don’t believe this is a socket buffer issue.

The same server-count: 2 setting worked fine with 6.3.

Any hints, insights, or pointers?
Does anyone else experience the same?

Thanks,
Regards,
Joerg
Reply | Threaded
Open this post in threaded view
|

Re: NSD sendto issue

Otto Moerbeek
On Thu, Sep 26, 2019 at 11:16:21AM +0200, Joerg Jung wrote:

> Hi,
>
> I run a few busy (~800 req/s) NSD servers which I upgraded
> to 6.5, all stock/default OpenBSD, e.g. I’ve not tweaked any
> sysctl values and nsd.conf matches the default as well, just
> added a few hundred zones.
>
> Now, when I increase servers from default 1 to 2 in nsd.conf:
> server-count: 2
> it starts spamming my log with:
> nsd[62723]: sendto 1.2.3.4 failed: Resource temporarily unavailable
>
> checking the source, server.c seems not to handle EAGAIN
> after sendto() and does not recover or retry, it just increases
> txerr statistic count - so answer seems really lost :(
>
> I tried higher debug level, as well as increasing socket buffers to:
> net.inet.udp.recvspace= 65536
> net.inet.udp.sendspace=65636
> but both didn’t help and netstat -s -p udp does show
> 0 dropped due to full socket buffers  
> anyways. So, I don’t believe this is a socket buffer issue.
>
> The same server-count: 2 setting worked fine with 6.3.
>
> Any hints, insights, or pointers?
> Does anyone else experience the same?
>
> Thanks,
> Regards,
> Joerg

This is likely an fd limit issue. Try:

nsd:\
        :openfiles=512:\
        :tc=daemon:

in login.conf, followed by a restart of nsd.

        -Otto

Reply | Threaded
Open this post in threaded view
|

Re: NSD sendto issue

Joerg Jung


> On 26. Sep 2019, at 12:40, Otto Moerbeek <[hidden email]> wrote:
>
> On Thu, Sep 26, 2019 at 11:16:21AM +0200, Joerg Jung wrote:
>
>> Hi,
>>
>> I run a few busy (~800 req/s) NSD servers which I upgraded
>> to 6.5, all stock/default OpenBSD, e.g. I’ve not tweaked any
>> sysctl values and nsd.conf matches the default as well, just
>> added a few hundred zones.
>>
>> Now, when I increase servers from default 1 to 2 in nsd.conf:
>> server-count: 2
>> it starts spamming my log with:
>> nsd[62723]: sendto 1.2.3.4 failed: Resource temporarily unavailable
>>
>> checking the source, server.c seems not to handle EAGAIN
>> after sendto() and does not recover or retry, it just increases
>> txerr statistic count - so answer seems really lost :(
>>
>> I tried higher debug level, as well as increasing socket buffers to:
>> net.inet.udp.recvspace= 65536
>> net.inet.udp.sendspace=65636
>> but both didn’t help and netstat -s -p udp does show
>> 0 dropped due to full socket buffers  
>> anyways. So, I don’t believe this is a socket buffer issue.
>>
>> The same server-count: 2 setting worked fine with 6.3.
>>
>> Any hints, insights, or pointers?
>> Does anyone else experience the same?
>
> This is likely an fd limit issue. Try:
>
> nsd:\
>        :openfiles=512:\
>        :tc=daemon:
>
> in login.conf, followed by a restart of nsd.

Thanks for reply. Tried that, but unfortunately did not help.

I checked with fstat, and each of the NSD processes has not much
more than 16 fds open, while complaining loudly in log about
sendto() EAGAIN
From what I understand TCP for (A|I)XFR transfers are also limited to
max 100 TCP connections by default (and would show up with “too
many open files” or similar).
So, should all more or less fit into the daemon class defaults.

Also this sendto() part of the server.c code seems not really changed
between 6.3 and 6.5 so, must be something different. Might be changes
in libevent or thread mutex handling, or something…

Reply | Threaded
Open this post in threaded view
|

Re: NSD sendto issue

Stuart Henderson
In reply to this post by Joerg Jung
On 2019/09/26 11:16, Joerg Jung wrote:

> Hi,
>
> I run a few busy (~800 req/s) NSD servers which I upgraded
> to 6.5, all stock/default OpenBSD, e.g. I’ve not tweaked any
> sysctl values and nsd.conf matches the default as well, just
> added a few hundred zones.
>
> Now, when I increase servers from default 1 to 2 in nsd.conf:
> server-count: 2
> it starts spamming my log with:
> nsd[62723]: sendto 1.2.3.4 failed: Resource temporarily unavailable
>
> checking the source, server.c seems not to handle EAGAIN
> after sendto() and does not recover or retry, it just increases
> txerr statistic count - so answer seems really lost :(
>
> I tried higher debug level, as well as increasing socket buffers to:
> net.inet.udp.recvspace= 65536
> net.inet.udp.sendspace=65636
> but both didn’t help and netstat -s -p udp does show
> 0 dropped due to full socket buffers  
> anyways. So, I don’t believe this is a socket buffer issue.
>
> The same server-count: 2 setting worked fine with 6.3.
>
> Any hints, insights, or pointers?
> Does anyone else experience the same?
>
> Thanks,
> Regards,
> Joerg

Maybe it's worth trying to track down further whether this is due to an
NSD change or something else in the OS - cvs up -r OPENBSD_6_3 .. (be sure
to use "make -f Makefile.bsd-wrapper [..]" when building).

Reply | Threaded
Open this post in threaded view
|

Re: NSD sendto issue

Stuart Henderson
On 2019/09/26 13:45, Stuart Henderson wrote:

> On 2019/09/26 11:16, Joerg Jung wrote:
> > Hi,
> >
> > I run a few busy (~800 req/s) NSD servers which I upgraded
> > to 6.5, all stock/default OpenBSD, e.g. I’ve not tweaked any
> > sysctl values and nsd.conf matches the default as well, just
> > added a few hundred zones.
> >
> > Now, when I increase servers from default 1 to 2 in nsd.conf:
> > server-count: 2
> > it starts spamming my log with:
> > nsd[62723]: sendto 1.2.3.4 failed: Resource temporarily unavailable
> >
> > checking the source, server.c seems not to handle EAGAIN
> > after sendto() and does not recover or retry, it just increases
> > txerr statistic count - so answer seems really lost :(
> >
> > I tried higher debug level, as well as increasing socket buffers to:
> > net.inet.udp.recvspace= 65536
> > net.inet.udp.sendspace=65636
> > but both didn’t help and netstat -s -p udp does show
> > 0 dropped due to full socket buffers  
> > anyways. So, I don’t believe this is a socket buffer issue.
> >
> > The same server-count: 2 setting worked fine with 6.3.
> >
> > Any hints, insights, or pointers?
> > Does anyone else experience the same?
> >
> > Thanks,
> > Regards,
> > Joerg
>
> Maybe it's worth trying to track down further whether this is due to an
> NSD change or something else in the OS - cvs up -r OPENBSD_6_3 .. (be sure
> to use "make -f Makefile.bsd-wrapper [..]" when building).
>

Or, following a comment from claudio@, try a kernel built with this:

Index: syscalls.master
===================================================================
RCS file: /cvs/src/sys/kern/syscalls.master,v
retrieving revision 1.189
diff -u -p -r1.189 syscalls.master
--- syscalls.master 11 Jan 2019 18:46:30 -0000 1.189
+++ syscalls.master 26 Sep 2019 13:01:46 -0000
@@ -261,7 +261,7 @@
 130 OBSOL oftruncate
 131 STD { int sys_flock(int fd, int how); }
 132 STD { int sys_mkfifo(const char *path, mode_t mode); }
-133 STD NOLOCK { ssize_t sys_sendto(int s, const void *buf, \
+133 STD { ssize_t sys_sendto(int s, const void *buf, \
     size_t len, int flags, const struct sockaddr *to, \
     socklen_t tolen); }
 134 STD { int sys_shutdown(int s, int how); }


Run "make syscalls" in sys/kern before building.

Reply | Threaded
Open this post in threaded view
|

Re: NSD sendto issue

Stuart Henderson
In reply to this post by Stuart Henderson
> > I run a few busy (~800 req/s) NSD servers which I upgraded
> > to 6.5, all stock/default OpenBSD, e.g. I’ve not tweaked any

Did you jump over 6.4 when you updated? (i.e. did you update
directly from 6.3 to 6.5?)

Reply | Threaded
Open this post in threaded view
|

Re: NSD sendto issue

Joerg Jung
On Thu, Sep 26, 2019 at 02:51:52PM +0100, Stuart Henderson wrote:
> > > I run a few busy (~800 req/s) NSD servers which I upgraded
> > > to 6.5, all stock/default OpenBSD, e.g. I’ve not tweaked any
>
> Did you jump over 6.4 when you updated? (i.e. did you update
> directly from 6.3 to 6.5?)

Of course not: 6.3->6.4->6.5. (I just skipped syspatch'ing 6.4).
Since I upgraded fully automated via some Ansible playbooks (strictly
following upgradeXX.html) I have not really tested/checked 6.4, however
I have looked in the logs right now and it seems like this was an issue
in 6.4 as well already, as I see a few instance of the sendto() EAGAIN
there as well (in the short timeframe while it was downloading the 6.5
tarballs).