Important bgpd fix

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Important bgpd fix

Claudio Jeker
There is a somewhat critical bug in bgpd which got hit by local friends
a few weeks ago. The problem is that on session with the graceful restart
capability stale routes are not properly flushed. This can lead to bad
FIB entries and black holes. This happens when a router does not reconnect
before the timeout fires.

As a quick fix disabling the graceful reload capability with
"announce restart no" will skip the probelmatic code and no stale routes
will remain. The session must be cleared first before the change becomes
active.

The proper fix is attached. in short this removes a check for the
CAPA_GR_FORWARD flag in the timeout and IMSG_SESSION_RESTARTED handler.
CAPA_GR_RESTARTING is indicating that bgpd is currently doing a graceful
restart for this neighbor and it is set for any session that must issue
a flush of stale routes (either after a successful restart or because of
hitting a timeout or change of configuration). So whenever the SE issues
an IMSG_SESSION_FLUSH or IMSG_SESSION_RESTARTED message the
CAPA_GR_RESTARTING flag needs to be cleared.

Please test and/or verify that my logic is now correct.
--
:wq Claudio

PS: I'm away for a bit more then a week so I will not be able to answer
any questions anytime soon.

Index: rde.c
===================================================================
RCS file: /cvs/src/usr.sbin/bgpd/rde.c,v
retrieving revision 1.326
diff -u -p -r1.326 rde.c
--- rde.c 13 Nov 2013 20:41:01 -0000 1.326
+++ rde.c 1 Jan 2014 02:30:04 -0000
@@ -3282,6 +3282,7 @@ peer_stale(u_int32_t id, u_int8_t aid)
  return;
  }
 
+ /* flush the now even staler routes out */
  if (peer->staletime[aid])
  peer_flush(peer, aid);
  peer->staletime[aid] = now = time(NULL);
@@ -3317,7 +3318,14 @@ peer_recv_eor(struct rde_peer *peer, u_i
 {
  peer->prefix_rcvd_eor++;
 
- /* First notify SE to remove possible race with the timeout. */
+ /*
+ * First notify SE to avert a possible race with the restart timeout.
+ * If the timeout fires before this imsg is processed by the SE it will
+ * result in the same operation since the timeout issues a FLUSH which
+ * does the same as the RESTARTED action (flushing stale routes).
+ * The logic in the SE is so that only one of FLUSH or RESTARTED will
+ * be sent back to the RDE and so peer_flush is only called once.
+ */
  if (imsg_compose(ibuf_se, IMSG_SESSION_RESTARTED, peer->conf.id,
     0, -1, &aid, sizeof(aid)) == -1)
  fatal("imsg_compose error");
Index: session.c
===================================================================
RCS file: /cvs/src/usr.sbin/bgpd/session.c,v
retrieving revision 1.333
diff -u -p -r1.333 session.c
--- session.c 13 Nov 2013 20:41:01 -0000 1.333
+++ session.c 1 Jan 2014 02:36:31 -0000
@@ -1729,12 +1729,11 @@ session_graceful_stop(struct peer *p)
 
  for (i = 0; i < AID_MAX; i++) {
  /*
- * Only flush if the peer is restarting and the peer indicated
- * it hold the forwarding state. In all other cases the
- * session was already flushed when the session came up.
+ * Only flush if the peer is restarting and the timeout fired.
+ * In all other cases the session was already flushed when the
+ * session went down or when the new open message was parsed.
  */
- if (p->capa.neg.grestart.flags[i] & CAPA_GR_RESTARTING &&
-    p->capa.neg.grestart.flags[i] & CAPA_GR_FORWARD) {
+ if (p->capa.neg.grestart.flags[i] & CAPA_GR_RESTARTING) {
  log_peer_warnx(&p->conf, "graceful restart of %s, "
     "time-out, flushing", aid2str(i));
  if (imsg_compose(ibuf_rde, IMSG_SESSION_FLUSH,
@@ -2520,12 +2519,16 @@ capa_neg_calc(struct peer *p)
  */
 
  for (i = 0; i < AID_MAX; i++) {
+ int8_t negflags;
+
  /* disable GR if the AFI/SAFI is not present */
  if (p->capa.peer.grestart.flags[i] & CAPA_GR_PRESENT &&
     p->capa.neg.mp[i] == 0)
  p->capa.peer.grestart.flags[i] = 0; /* disable */
  /* look at current GR state and decide what to do */
- if (p->capa.neg.grestart.flags[i] & CAPA_GR_RESTARTING) {
+ negflags = p->capa.neg.grestart.flags[i];
+ p->capa.neg.grestart.flags[i] = p->capa.peer.grestart.flags[i];
+ if (negflags & CAPA_GR_RESTARTING) {
  if (!(p->capa.peer.grestart.flags[i] &
     CAPA_GR_FORWARD)) {
  if (imsg_compose(ibuf_rde, IMSG_SESSION_FLUSH,
@@ -2533,12 +2536,10 @@ capa_neg_calc(struct peer *p)
  return (-1);
  log_peer_warnx(&p->conf, "graceful restart of "
     "%s, not restarted, flushing", aid2str(i));
- }
- p->capa.neg.grestart.flags[i] =
-    p->capa.peer.grestart.flags[i] | CAPA_GR_RESTARTING;
- } else
- p->capa.neg.grestart.flags[i] =
-    p->capa.peer.grestart.flags[i];
+ } else
+ p->capa.neg.grestart.flags[i] |=
+    CAPA_GR_RESTARTING;
+ }
  }
  p->capa.neg.grestart.timeout = p->capa.peer.grestart.timeout;
  p->capa.neg.grestart.restart = p->capa.peer.grestart.restart;
@@ -2911,9 +2912,7 @@ session_dispatch_imsg(struct imsgbuf *ib
  if (aid >= AID_MAX)
  fatalx("IMSG_SESSION_RESTARTED: bad AID");
  if (p->capa.neg.grestart.flags[aid] &
-    CAPA_GR_RESTARTING &&
-    p->capa.neg.grestart.flags[aid] &
-    CAPA_GR_FORWARD) {
+    CAPA_GR_RESTARTING) {
  log_peer_warnx(&p->conf,
     "graceful restart of %s finished",
     aid2str(aid));

Reply | Threaded
Open this post in threaded view
|

Re: Important bgpd fix

Stuart Henderson-6
I'm running this on one router without seeing any problems yet,
however it does not have any graceful-restart peers so it's not exactly
a great test.

Has anyone else tried this at all yet?


On 2014/01/02 00:03, Claudio Jeker wrote:

> There is a somewhat critical bug in bgpd which got hit by local friends
> a few weeks ago. The problem is that on session with the graceful restart
> capability stale routes are not properly flushed. This can lead to bad
> FIB entries and black holes. This happens when a router does not reconnect
> before the timeout fires.
>
> As a quick fix disabling the graceful reload capability with
> "announce restart no" will skip the probelmatic code and no stale routes
> will remain. The session must be cleared first before the change becomes
> active.
>
> The proper fix is attached. in short this removes a check for the
> CAPA_GR_FORWARD flag in the timeout and IMSG_SESSION_RESTARTED handler.
> CAPA_GR_RESTARTING is indicating that bgpd is currently doing a graceful
> restart for this neighbor and it is set for any session that must issue
> a flush of stale routes (either after a successful restart or because of
> hitting a timeout or change of configuration). So whenever the SE issues
> an IMSG_SESSION_FLUSH or IMSG_SESSION_RESTARTED message the
> CAPA_GR_RESTARTING flag needs to be cleared.
>
> Please test and/or verify that my logic is now correct.
> --
> :wq Claudio
>
> PS: I'm away for a bit more then a week so I will not be able to answer
> any questions anytime soon.
>
> Index: rde.c
> ===================================================================
> RCS file: /cvs/src/usr.sbin/bgpd/rde.c,v
> retrieving revision 1.326
> diff -u -p -r1.326 rde.c
> --- rde.c 13 Nov 2013 20:41:01 -0000 1.326
> +++ rde.c 1 Jan 2014 02:30:04 -0000
> @@ -3282,6 +3282,7 @@ peer_stale(u_int32_t id, u_int8_t aid)
>   return;
>   }
>  
> + /* flush the now even staler routes out */
>   if (peer->staletime[aid])
>   peer_flush(peer, aid);
>   peer->staletime[aid] = now = time(NULL);
> @@ -3317,7 +3318,14 @@ peer_recv_eor(struct rde_peer *peer, u_i
>  {
>   peer->prefix_rcvd_eor++;
>  
> - /* First notify SE to remove possible race with the timeout. */
> + /*
> + * First notify SE to avert a possible race with the restart timeout.
> + * If the timeout fires before this imsg is processed by the SE it will
> + * result in the same operation since the timeout issues a FLUSH which
> + * does the same as the RESTARTED action (flushing stale routes).
> + * The logic in the SE is so that only one of FLUSH or RESTARTED will
> + * be sent back to the RDE and so peer_flush is only called once.
> + */
>   if (imsg_compose(ibuf_se, IMSG_SESSION_RESTARTED, peer->conf.id,
>      0, -1, &aid, sizeof(aid)) == -1)
>   fatal("imsg_compose error");
> Index: session.c
> ===================================================================
> RCS file: /cvs/src/usr.sbin/bgpd/session.c,v
> retrieving revision 1.333
> diff -u -p -r1.333 session.c
> --- session.c 13 Nov 2013 20:41:01 -0000 1.333
> +++ session.c 1 Jan 2014 02:36:31 -0000
> @@ -1729,12 +1729,11 @@ session_graceful_stop(struct peer *p)
>  
>   for (i = 0; i < AID_MAX; i++) {
>   /*
> - * Only flush if the peer is restarting and the peer indicated
> - * it hold the forwarding state. In all other cases the
> - * session was already flushed when the session came up.
> + * Only flush if the peer is restarting and the timeout fired.
> + * In all other cases the session was already flushed when the
> + * session went down or when the new open message was parsed.
>   */
> - if (p->capa.neg.grestart.flags[i] & CAPA_GR_RESTARTING &&
> -    p->capa.neg.grestart.flags[i] & CAPA_GR_FORWARD) {
> + if (p->capa.neg.grestart.flags[i] & CAPA_GR_RESTARTING) {
>   log_peer_warnx(&p->conf, "graceful restart of %s, "
>      "time-out, flushing", aid2str(i));
>   if (imsg_compose(ibuf_rde, IMSG_SESSION_FLUSH,
> @@ -2520,12 +2519,16 @@ capa_neg_calc(struct peer *p)
>   */
>  
>   for (i = 0; i < AID_MAX; i++) {
> + int8_t negflags;
> +
>   /* disable GR if the AFI/SAFI is not present */
>   if (p->capa.peer.grestart.flags[i] & CAPA_GR_PRESENT &&
>      p->capa.neg.mp[i] == 0)
>   p->capa.peer.grestart.flags[i] = 0; /* disable */
>   /* look at current GR state and decide what to do */
> - if (p->capa.neg.grestart.flags[i] & CAPA_GR_RESTARTING) {
> + negflags = p->capa.neg.grestart.flags[i];
> + p->capa.neg.grestart.flags[i] = p->capa.peer.grestart.flags[i];
> + if (negflags & CAPA_GR_RESTARTING) {
>   if (!(p->capa.peer.grestart.flags[i] &
>      CAPA_GR_FORWARD)) {
>   if (imsg_compose(ibuf_rde, IMSG_SESSION_FLUSH,
> @@ -2533,12 +2536,10 @@ capa_neg_calc(struct peer *p)
>   return (-1);
>   log_peer_warnx(&p->conf, "graceful restart of "
>      "%s, not restarted, flushing", aid2str(i));
> - }
> - p->capa.neg.grestart.flags[i] =
> -    p->capa.peer.grestart.flags[i] | CAPA_GR_RESTARTING;
> - } else
> - p->capa.neg.grestart.flags[i] =
> -    p->capa.peer.grestart.flags[i];
> + } else
> + p->capa.neg.grestart.flags[i] |=
> +    CAPA_GR_RESTARTING;
> + }
>   }
>   p->capa.neg.grestart.timeout = p->capa.peer.grestart.timeout;
>   p->capa.neg.grestart.restart = p->capa.peer.grestart.restart;
> @@ -2911,9 +2912,7 @@ session_dispatch_imsg(struct imsgbuf *ib
>   if (aid >= AID_MAX)
>   fatalx("IMSG_SESSION_RESTARTED: bad AID");
>   if (p->capa.neg.grestart.flags[aid] &
> -    CAPA_GR_RESTARTING &&
> -    p->capa.neg.grestart.flags[aid] &
> -    CAPA_GR_FORWARD) {
> +    CAPA_GR_RESTARTING) {
>   log_peer_warnx(&p->conf,
>      "graceful restart of %s finished",
>      aid2str(aid));
>

Reply | Threaded
Open this post in threaded view
|

Re: Important bgpd fix

Florian Obser-2
On Sat, Jan 11, 2014 at 03:07:37PM +0000, Stuart Henderson wrote:
> I'm running this on one router without seeing any problems yet,
> however it does not have any graceful-restart peers so it's not exactly
> a great test.
>
> Has anyone else tried this at all yet?

Benno is running it on one of our routers; works fine so far, but it
has the same problem - no graceful restart peers.
I have a Juniper SRX240 which I think can do graceful restart in the
lab (alternatively I can move a MX80 to the lab which definitly can do
graceful restart). I plan to test and review the diff in time for the
.nz hackathon.

--
I'm not entirely sure you are real.

Reply | Threaded
Open this post in threaded view
|

Re: Important bgpd fix

Theo de Raadt
In reply to this post by Claudio Jeker
By the way, that bgpd diff is in the snaps.