[OpenBGPD] Problem with many (fast connecting) Peers

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[OpenBGPD] Problem with many (fast connecting) Peers

Daniel Seidenstücker
Dear OpenBGPD Community,



in order of measuring the performance of OpenBGPD I need to connect it with
a huge amount of peers (realized by ExaBGP). OpenBGPD 5.8 works well with
100 Peers but if I increase that number to 250 I got every try the same
error (debug mode):



handle_pollfd: imsg_read error: Resource temporarily unavailable

SE: Lost connection to RDE

handle_pollfd: poll fd: Undefined error: 0

RDE: Lost connection to SE

handle_pollfd: poll fd: Undefined error: 0

RDE: Lost connection to SE control

handle_pollfd: poll fd: No such file or directory

main: Lost connection to SE

route decision engine exiting

Segmentation fault (core dumped)



I guess it’s caused by the big number of peers or the short time interval
they connect. I also checked 5.7 but same behavior with slightly other error
msgs:



fatal in SE: session_dispatch_imsg: imsg_read error: Resource temporarily
unavailable

Lost child: session engine exited

fatal in RDE: rde_dispatch_imsg_session: pipe closed

Lost child: route decision engine exited

Terminating



If I split the Peers to 100, 50, 100 with 10 Seconds pause between arrival,
OpenBGPD breaks with same error when the 50 Peers are changing to
established.



Would be nice if you can help me.



Regards,

Daniel Seidenstuecker

Reply | Threaded
Open this post in threaded view
|

Re: [OpenBGPD] Problem with many (fast connecting) Peers

Adam Wolk-2
On Tue, 26 Jan 2016 15:41:31 +0100
Daniel Seidenstücker <[hidden email]> wrote:

> Dear OpenBGPD Community,
>
>
>
> in order of measuring the performance of OpenBGPD I need to connect
> it with a huge amount of peers (realized by ExaBGP). OpenBGPD 5.8
> works well with 100 Peers but if I increase that number to 250 I got
> every try the same error (debug mode):
>
>
>
> handle_pollfd: imsg_read error: Resource temporarily unavailable
>
> SE: Lost connection to RDE
>
> handle_pollfd: poll fd: Undefined error: 0
>
> RDE: Lost connection to SE
>
> handle_pollfd: poll fd: Undefined error: 0
>
> RDE: Lost connection to SE control
>
> handle_pollfd: poll fd: No such file or directory
>
> main: Lost connection to SE
>
> route decision engine exiting
>
> Segmentation fault (core dumped)
>
>

Load the core file in gdb and see what the error is. I have a hunch
that it might be resource limits related (like max open files).

I'm not a bgpd expert but checking /etc/login.conf might be worthwhile.

Regards,
Adam

Reply | Threaded
Open this post in threaded view
|

Re: [OpenBGPD] Problem with many (fast connecting) Peers

Gregory Edigarov-5
In reply to this post by Daniel Seidenstücker
On 26.01.16 16:41, Daniel Seidenstücker wrote:

> Dear OpenBGPD Community,
>
>
>
> in order of measuring the performance of OpenBGPD I need to connect it with
> a huge amount of peers (realized by ExaBGP). OpenBGPD 5.8 works well with
> 100 Peers but if I increase that number to 250 I got every try the same
> error (debug mode):
>
>
>
> handle_pollfd: imsg_read error: Resource temporarily unavailable
>
> SE: Lost connection to RDE
>
> handle_pollfd: poll fd: Undefined error: 0
>
> RDE: Lost connection to SE
>
> handle_pollfd: poll fd: Undefined error: 0
>
> RDE: Lost connection to SE control
>
> handle_pollfd: poll fd: No such file or directory
>
> main: Lost connection to SE
>
> route decision engine exiting
>
> Segmentation fault (core dumped)
>
>
>
> I guess it’s caused by the big number of peers or the short time interval
> they connect. I also checked 5.7 but same behavior with slightly other error
> msgs:
>
>
>
> fatal in SE: session_dispatch_imsg: imsg_read error: Resource temporarily
> unavailable
>
> Lost child: session engine exited
>
> fatal in RDE: rde_dispatch_imsg_session: pipe closed
>
> Lost child: route decision engine exited
>
> Terminating
>
>
>
> If I split the Peers to 100, 50, 100 with 10 Seconds pause between arrival,
> OpenBGPD breaks with same error when the 50 Peers are changing to
> established.
>
>
>
> Would be nice if you can help me.
Try bump up login.conf's max open file limit.
it seem like that's the case.

Reply | Threaded
Open this post in threaded view
|

Re: [OpenBGPD] Problem with many (fast connecting) Peers

phessler
In reply to this post by Daniel Seidenstücker
Good news: this is already fixed in -current (and the upcoming 5.9
release).

Bad news: this requires changes to libutil, so it isn't trivial to
backport to 5.8.

Upgrading to a snapshot newer than Nov 28 should fix your problem.

I can now connect with 1000 exabgp sessions at once.  Not all succeed on
the first connection, but they eventually all connect without crashing
the server instance.

(BTW, ExaBGP runs into problems when you try to have more than 2000
sessions.  Just run more ExaBGP's then.)


On 2016 Jan 26 (Tue) at 15:41:31 +0100 (+0100), Daniel Seidenst?cker wrote:
:Dear OpenBGPD Community,
:
:
:
:in order of measuring the performance of OpenBGPD I need to connect it with
:a huge amount of peers (realized by ExaBGP). OpenBGPD 5.8 works well with
:100 Peers but if I increase that number to 250 I got every try the same
:error (debug mode):
:
:
:
:handle_pollfd: imsg_read error: Resource temporarily unavailable
:
:SE: Lost connection to RDE
:
:handle_pollfd: poll fd: Undefined error: 0
:
:RDE: Lost connection to SE
:
:handle_pollfd: poll fd: Undefined error: 0
:
:RDE: Lost connection to SE control
:
:handle_pollfd: poll fd: No such file or directory
:
:main: Lost connection to SE
:
:route decision engine exiting
:
:Segmentation fault (core dumped)
:
:
:
:I guess it?s caused by the big number of peers or the short time interval
:they connect. I also checked 5.7 but same behavior with slightly other error
:msgs:
:
:
:
:fatal in SE: session_dispatch_imsg: imsg_read error: Resource temporarily
:unavailable
:
:Lost child: session engine exited
:
:fatal in RDE: rde_dispatch_imsg_session: pipe closed
:
:Lost child: route decision engine exited
:
:Terminating
:
:
:
:If I split the Peers to 100, 50, 100 with 10 Seconds pause between arrival,
:OpenBGPD breaks with same error when the 50 Peers are changing to
:established.
:
:
:
:Would be nice if you can help me.
:
:
:
:Regards,
:
:Daniel Seidenstuecker
:

--
Save energy: be apathetic.

Reply | Threaded
Open this post in threaded view
|

Re: [OpenBGPD] Problem with many (fast connecting) Peers

Chris Cappuccio
In reply to this post by Daniel Seidenstücker
Daniel Seidenst?cker [[hidden email]] wrote:

> Dear OpenBGPD Community,
>
>
>
> in order of measuring the performance of OpenBGPD I need to connect it with
> a huge amount of peers (realized by ExaBGP). OpenBGPD 5.8 works well with
> 100 Peers but if I increase that number to 250 I got every try the same
> error (debug mode):
>
>
>
> handle_pollfd: imsg_read error: Resource temporarily unavailable
>
>

I don't think increasing the file limits is the right answer. There's
a bug here. This is EAGAIN.

bgpd should handle EAGAIN internally, not segfault. It's to be expected.
There is another bug that needs to be found. I'll stare at it some more
if someone else who knows really this stuff doesn't do it first...

Chris

Reply | Threaded
Open this post in threaded view
|

Re: [OpenBGPD] Problem with many (fast connecting) Peers

Chris Cappuccio
In reply to this post by phessler
Peter Hessler [[hidden email]] wrote:

> Good news: this is already fixed in -current (and the upcoming 5.9
> release).
>
> Bad news: this requires changes to libutil, so it isn't trivial to
> backport to 5.8.
>
> Upgrading to a snapshot newer than Nov 28 should fix your problem.
>
> I can now connect with 1000 exabgp sessions at once.  Not all succeed on
> the first connection, but they eventually all connect without crashing
> the server instance.
>
> (BTW, ExaBGP runs into problems when you try to have more than 2000
> sessions.  Just run more ExaBGP's then.)
>

Oh, even better!!

Reply | Threaded
Open this post in threaded view
|

Re: [OpenBGPD] Problem with many (fast connecting) Peers

Stuart Henderson
In reply to this post by phessler
On 2016-01-26, Peter Hessler <[hidden email]> wrote:
> (BTW, ExaBGP runs into problems when you try to have more than 2000
> sessions.  Just run more ExaBGP's then.)

1024 wasn't it? Or, should I say, FD_SETSIZE? (hey, it's 2016, can't the 80's have
their bugs back yet?)