bgpd set nexthop 198.51.100.42 clarifications

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

bgpd set nexthop 198.51.100.42 clarifications

Claudio Jeker
When using a rule forcing the nexthop to a specific address bgpd
currently does not mark that nexthop as no-modify. In other words
the default rules for nexthop propagation applies. This means that
for ebgp it only sends out the set nexthop when this nexthop is connected
and on the same network as the peer. So while the Adj-RIB-Out shows the
right nexthop it is actually not on the wire.

This diff changes set nexthop 198.51.100.42 to also imply set nexthop
no-modify. This way the set nexthop is always on the wire.
The problem with that is that it will hand you a nice footgun ready to
blow of your big toe (but in the end the current behaviour is doing the
same just with a different angle of attack) .

The set nexthop section in bgpd.conf.5 needs to be adjusted once a
decision is made on how to handle this.
--
:wq Claudio

Index: rde_rib.c
===================================================================
RCS file: /cvs/src/usr.sbin/bgpd/rde_rib.c,v
retrieving revision 1.190
diff -u -p -r1.190 rde_rib.c
--- rde_rib.c 7 Mar 2019 07:42:36 -0000 1.190
+++ rde_rib.c 13 May 2019 17:32:14 -0000
@@ -1491,7 +1491,7 @@ nexthop_modify(struct nexthop *setnh, en
  break;
  nexthop_put(*nexthop);
  *nexthop = nexthop_ref(setnh);
- *flags = 0;
+ *flags = NEXTHOP_NOMODIFY;
  break;
  default:
  break;

Reply | Threaded
Open this post in threaded view
|

Re: bgpd set nexthop 198.51.100.42 clarifications

Denis Fondras
On Mon, May 13, 2019 at 09:03:41PM +0200, Claudio Jeker wrote:

> When using a rule forcing the nexthop to a specific address bgpd
> currently does not mark that nexthop as no-modify. In other words
> the default rules for nexthop propagation applies. This means that
> for ebgp it only sends out the set nexthop when this nexthop is connected
> and on the same network as the peer. So while the Adj-RIB-Out shows the
> right nexthop it is actually not on the wire.
>
> This diff changes set nexthop 198.51.100.42 to also imply set nexthop
> no-modify. This way the set nexthop is always on the wire.
> The problem with that is that it will hand you a nice footgun ready to
> blow of your big toe (but in the end the current behaviour is doing the
> same just with a different angle of attack) .
>

It does not seem right to show a NH which does not reflect reality.

OK denis@

> The set nexthop section in bgpd.conf.5 needs to be adjusted once a
> decision is made on how to handle this.
> --
> :wq Claudio
>
> Index: rde_rib.c
> ===================================================================
> RCS file: /cvs/src/usr.sbin/bgpd/rde_rib.c,v
> retrieving revision 1.190
> diff -u -p -r1.190 rde_rib.c
> --- rde_rib.c 7 Mar 2019 07:42:36 -0000 1.190
> +++ rde_rib.c 13 May 2019 17:32:14 -0000
> @@ -1491,7 +1491,7 @@ nexthop_modify(struct nexthop *setnh, en
>   break;
>   nexthop_put(*nexthop);
>   *nexthop = nexthop_ref(setnh);
> - *flags = 0;
> + *flags = NEXTHOP_NOMODIFY;
>   break;
>   default:
>   break;
>

Reply | Threaded
Open this post in threaded view
|

Re: bgpd set nexthop 198.51.100.42 clarifications

Job Snijders
In reply to this post by Claudio Jeker
On Mon, May 13, 2019 at 21:11 Claudio Jeker <[hidden email]>
wrote:

> When using a rule forcing the nexthop to a specific address bgpd
> currently does not mark that nexthop as no-modify. In other words
> the default rules for nexthop propagation applies. This means that
> for ebgp it only sends out the set nexthop when this nexthop is connected
> and on the same network as the peer. So while the Adj-RIB-Out shows the
> right nexthop it is actually not on the wire.
>
> This diff changes set nexthop 198.51.100.42 to also imply set nexthop
> no-modify. This way the set nexthop is always on the wire.
> The problem with that is that it will hand you a nice footgun ready to
> blow of your big toe (but in the end the current behaviour is doing the
> same just with a different angle of attack) .
>
> The set nexthop section in bgpd.conf.5 needs to be adjusted once a
> decision is made on how to handle this.
> --
> :wq Claudio
>
> Index: rde_rib.c
> ===================================================================
> RCS file: /cvs/src/usr.sbin/bgpd/rde_rib.c,v
> retrieving revision 1.190
> diff -u -p -r1.190 rde_rib.c
> --- rde_rib.c   7 Mar 2019 07:42:36 -0000       1.190
> +++ rde_rib.c   13 May 2019 17:32:14 -0000
> @@ -1491,7 +1491,7 @@ nexthop_modify(struct nexthop *setnh, en
>                         break;
>                 nexthop_put(*nexthop);
>                 *nexthop = nexthop_ref(setnh);
> -               *flags = 0;
> +               *flags = NEXTHOP_NOMODIFY;
>                 break;
>         default:
>                 break;
>
>
Reply | Threaded
Open this post in threaded view
|

Re: bgpd set nexthop 198.51.100.42 clarifications

Job Snijders-3
In reply to this post by Claudio Jeker
Hi,

On Mon, May 13, 2019 at 09:03:41PM +0200, Claudio Jeker wrote:

> When using a rule forcing the nexthop to a specific address bgpd
> currently does not mark that nexthop as no-modify. In other words
> the default rules for nexthop propagation applies. This means that
> for ebgp it only sends out the set nexthop when this nexthop is connected
> and on the same network as the peer. So while the Adj-RIB-Out shows the
> right nexthop it is actually not on the wire.
>
> This diff changes set nexthop 198.51.100.42 to also imply set nexthop
> no-modify. This way the set nexthop is always on the wire.
> The problem with that is that it will hand you a nice footgun ready to
> blow of your big toe (but in the end the current behaviour is doing the
> same just with a different angle of attack) .
>
> The set nexthop section in bgpd.conf.5 needs to be adjusted once a
> decision is made on how to handle this.

I think I'm not a big fan of this change.

Section 5.1.3 of RFC 4271 (the core BGP spec) is very explicit on what
NEXT_HOPs are valid to send over a non-multihop BGP session. Only
addresses that are part of the linknet between the two routers are valid
NEXT_HOP addresses on the wire. This changes makes it trivial to send
not-valid NEXT_HOPs to a neighbor, this may result in very hard to debug
troublecases. Feels like we'll be handing way too much rope to users,
especially since it facilitates protocol violations.

I am not aware of a real world BGP implementation that would allow you
to send completely arbitrary NEXT_HOPs.

Kind regards,

Job

Reply | Threaded
Open this post in threaded view
|

Re: bgpd set nexthop 198.51.100.42 clarifications

Claudio Jeker
On Tue, May 28, 2019 at 01:28:32PM +0200, Job Snijders wrote:

> Hi,
>
> On Mon, May 13, 2019 at 09:03:41PM +0200, Claudio Jeker wrote:
> > When using a rule forcing the nexthop to a specific address bgpd
> > currently does not mark that nexthop as no-modify. In other words
> > the default rules for nexthop propagation applies. This means that
> > for ebgp it only sends out the set nexthop when this nexthop is connected
> > and on the same network as the peer. So while the Adj-RIB-Out shows the
> > right nexthop it is actually not on the wire.
> >
> > This diff changes set nexthop 198.51.100.42 to also imply set nexthop
> > no-modify. This way the set nexthop is always on the wire.
> > The problem with that is that it will hand you a nice footgun ready to
> > blow of your big toe (but in the end the current behaviour is doing the
> > same just with a different angle of attack) .
> >
> > The set nexthop section in bgpd.conf.5 needs to be adjusted once a
> > decision is made on how to handle this.
>
> I think I'm not a big fan of this change.
>
> Section 5.1.3 of RFC 4271 (the core BGP spec) is very explicit on what
> NEXT_HOPs are valid to send over a non-multihop BGP session. Only
> addresses that are part of the linknet between the two routers are valid
> NEXT_HOP addresses on the wire. This changes makes it trivial to send
> not-valid NEXT_HOPs to a neighbor, this may result in very hard to debug
> troublecases. Feels like we'll be handing way too much rope to users,
> especially since it facilitates protocol violations.
>
> I am not aware of a real world BGP implementation that would allow you
> to send completely arbitrary NEXT_HOPs.
>

I came to a similar conclusion and will send out a better diff. The idea
is that for the non-multihop BGP sessions we should require the nexthop to
be in the same network. The ebgp multihop case is currently always
sticking to the local address and should probably respect the nexthop if
set explicitly (that is also what the RFC suggests). ibgp seems to do the
right thing.

The same rules need to be applied to "set nexthop no-modify" since
that is currently on massive hammer.

--
:wq Claudio

Reply | Threaded
Open this post in threaded view
|

Re: bgpd set nexthop 198.51.100.42 clarifications

Job Snijders-3
On Tue, May 28, 2019 at 05:17:08PM +0200, Claudio Jeker wrote:

> On Tue, May 28, 2019 at 01:28:32PM +0200, Job Snijders wrote:
> > On Mon, May 13, 2019 at 09:03:41PM +0200, Claudio Jeker wrote:
> > > When using a rule forcing the nexthop to a specific address bgpd
> > > currently does not mark that nexthop as no-modify. In other words
> > > the default rules for nexthop propagation applies. This means that
> > > for ebgp it only sends out the set nexthop when this nexthop is connected
> > > and on the same network as the peer. So while the Adj-RIB-Out shows the
> > > right nexthop it is actually not on the wire.
> > >
> > > This diff changes set nexthop 198.51.100.42 to also imply set nexthop
> > > no-modify. This way the set nexthop is always on the wire.
> > > The problem with that is that it will hand you a nice footgun ready to
> > > blow of your big toe (but in the end the current behaviour is doing the
> > > same just with a different angle of attack) .
> > >
> > > The set nexthop section in bgpd.conf.5 needs to be adjusted once a
> > > decision is made on how to handle this.
> >
> > I think I'm not a big fan of this change.
> >
> > Section 5.1.3 of RFC 4271 (the core BGP spec) is very explicit on
> > what NEXT_HOPs are valid to send over a non-multihop BGP session.
> > Only addresses that are part of the linknet between the two routers
> > are valid NEXT_HOP addresses on the wire. This changes makes it
> > trivial to send not-valid NEXT_HOPs to a neighbor, this may result
> > in very hard to debug troublecases. Feels like we'll be handing way
> > too much rope to users, especially since it facilitates protocol
> > violations.
> >
> > I am not aware of a real world BGP implementation that would allow
> > you to send completely arbitrary NEXT_HOPs.
>
> I came to a similar conclusion and will send out a better diff. The
> idea is that for the non-multihop BGP sessions we should require the
> nexthop to be in the same network.

With 'same network', you mean the NEXT_HOP IP address must be part of
the router-to-router linknet, right?

> The ebgp multihop case is currently always sticking to the local
> address and should probably respect the nexthop if set explicitly
> (that is also what the RFC suggests).

Right.

> ibgp seems to do the right thing.

yup, in IBGP the NEXT_HOP can be anything.

> The same rules need to be applied to "set nexthop no-modify" since
> that is currently on massive hammer.

Right.

Thanks!

Kind regards,

Job

Reply | Threaded
Open this post in threaded view
|

Re: bgpd set nexthop 198.51.100.42 clarifications

Claudio Jeker
On Tue, May 28, 2019 at 07:09:00PM +0200, Job Snijders wrote:

> On Tue, May 28, 2019 at 05:17:08PM +0200, Claudio Jeker wrote:
> > On Tue, May 28, 2019 at 01:28:32PM +0200, Job Snijders wrote:
> > > On Mon, May 13, 2019 at 09:03:41PM +0200, Claudio Jeker wrote:
> > > > When using a rule forcing the nexthop to a specific address bgpd
> > > > currently does not mark that nexthop as no-modify. In other words
> > > > the default rules for nexthop propagation applies. This means that
> > > > for ebgp it only sends out the set nexthop when this nexthop is connected
> > > > and on the same network as the peer. So while the Adj-RIB-Out shows the
> > > > right nexthop it is actually not on the wire.
> > > >
> > > > This diff changes set nexthop 198.51.100.42 to also imply set nexthop
> > > > no-modify. This way the set nexthop is always on the wire.
> > > > The problem with that is that it will hand you a nice footgun ready to
> > > > blow of your big toe (but in the end the current behaviour is doing the
> > > > same just with a different angle of attack) .
> > > >
> > > > The set nexthop section in bgpd.conf.5 needs to be adjusted once a
> > > > decision is made on how to handle this.
> > >
> > > I think I'm not a big fan of this change.
> > >
> > > Section 5.1.3 of RFC 4271 (the core BGP spec) is very explicit on
> > > what NEXT_HOPs are valid to send over a non-multihop BGP session.
> > > Only addresses that are part of the linknet between the two routers
> > > are valid NEXT_HOP addresses on the wire. This changes makes it
> > > trivial to send not-valid NEXT_HOPs to a neighbor, this may result
> > > in very hard to debug troublecases. Feels like we'll be handing way
> > > too much rope to users, especially since it facilitates protocol
> > > violations.
> > >
> > > I am not aware of a real world BGP implementation that would allow
> > > you to send completely arbitrary NEXT_HOPs.
> >
> > I came to a similar conclusion and will send out a better diff. The
> > idea is that for the non-multihop BGP sessions we should require the
> > nexthop to be in the same network.
>
> With 'same network', you mean the NEXT_HOP IP address must be part of
> the router-to-router linknet, right?
>
> > The ebgp multihop case is currently always sticking to the local
> > address and should probably respect the nexthop if set explicitly
> > (that is also what the RFC suggests).
>
> Right.
>
> > ibgp seems to do the right thing.
>
> yup, in IBGP the NEXT_HOP can be anything.
>
> > The same rules need to be applied to "set nexthop no-modify" since
> > that is currently on massive hammer.
>
> Right.
>

So here is the diff for this change. In short nexthop no-modify has only
relevance for ebgp multihop links now (for the other cases it is either
the default or ignored). set nexthop self is still overriding all
decisions. ebgp just does the same net check and decides based on that
data. I refactored up_get_nexthop in a way that allows it to be used
for all AFI/SAFI cases and so 3 of the 4 almost identical code versions
did go away and the resulting up_get_nexthop() is a lot nicer now.

I decided to drop the implied NEXTHOP_NOMODIFY on set nexthop 198.51.100.42.
It makes more sense to set this on a per ebgp multihop session if that is
requested. The reason is that a NEXTHOP_NOMODIFY can not be toggled off
again and so it may cause nasty config hacks in complex setups. People
should just use 'set nexthop no-modify' on the ebgp multihop neighbor
block or in the filters.

Tested with IPv4 and IPv6. OK?
--
:wq Claudio

Index: rde_update.c
===================================================================
RCS file: /cvs/src/usr.sbin/bgpd/rde_update.c,v
retrieving revision 1.111
diff -u -p -r1.111 rde_update.c
--- rde_update.c 13 May 2019 21:13:04 -0000 1.111
+++ rde_update.c 29 May 2019 13:07:31 -0000
@@ -238,59 +238,84 @@ up_generate_default(struct filter_head *
 }
 
 /* only for IPv4 */
-static in_addr_t
-up_get_nexthop(struct rde_peer *peer, struct filterstate *state)
+static struct bgpd_addr *
+up_get_nexthop(struct rde_peer *peer, struct filterstate *state, u_int8_t aid)
 {
- in_addr_t mask;
+ struct bgpd_addr *peer_local;
 
- /* nexthop, already network byte order */
- if (state->nhflags & NEXTHOP_NOMODIFY) {
- /* no modify flag set */
- if (state->nexthop == NULL)
- return (peer->local_v4_addr.v4.s_addr);
- else
- return (state->nexthop->exit_nexthop.v4.s_addr);
- } else if (state->nhflags & NEXTHOP_SELF)
- return (peer->local_v4_addr.v4.s_addr);
- else if (!peer->conf.ebgp) {
+ switch (aid) {
+ case AID_INET:
+ case AID_VPN_IPv4:
+ peer_local = &peer->local_v4_addr;
+ break;
+ case AID_INET6:
+ case AID_VPN_IPv6:
+ peer_local = &peer->local_v6_addr;
+ break;
+ default:
+ fatalx("%s, bad AID %s", __func__, aid2str(aid));
+ }
+
+ if (state->nhflags & NEXTHOP_SELF) {
  /*
- * If directly connected use peer->local_v4_addr
- * this is only true for announced networks.
+ * Forcing the nexthop to self is always possible
+ * and has precedence over other flags.
  */
- if (state->nexthop == NULL)
- return (peer->local_v4_addr.v4.s_addr);
- else if (state->nexthop->exit_nexthop.v4.s_addr ==
-    peer->remote_addr.v4.s_addr)
- /*
- * per RFC: if remote peer address is equal to
- * the nexthop set the nexthop to our local address.
- * This reduces the risk of routing loops.
- */
- return (peer->local_v4_addr.v4.s_addr);
- else
- return (state->nexthop->exit_nexthop.v4.s_addr);
+ return (peer_local);
+ } else if (!peer->conf.ebgp) {
+ /*
+ * in the ibgp case the nexthop is normally not
+ * modified unless it points at the peer itself.
+ */
+ if (state->nexthop == NULL) {
+ /* announced networks without explicit nexthop set */
+ return (peer_local);
+ }
+ /*
+ * per RFC: if remote peer address is equal to the nexthop set
+ * the nexthop to our local address. This reduces the risk of
+ * routing loops. This overrides NEXTHOP_NOMODIFY.
+ */
+ if (memcmp(&state->nexthop->exit_nexthop,
+    &peer->remote_addr, sizeof(peer->remote_addr)) == 0) {
+ return (peer_local);
+ }
+ return (&state->nexthop->exit_nexthop);
  } else if (peer->conf.distance == 1) {
- /* ebgp directly connected */
+ /*
+ * In the ebgp directly connected case never send
+ * out a nexthop that is outside of the connected
+ * network of the peer. No matter what flags are
+ * set. This follows section 5.1.3 of RFC 4271.
+ * So just check if the nexthop is in the same net
+ * is enough here.
+ */
  if (state->nexthop != NULL &&
-    state->nexthop->flags & NEXTHOP_CONNECTED) {
- mask = htonl(
-    prefixlen2mask(state->nexthop->nexthop_netlen));
- if ((peer->remote_addr.v4.s_addr & mask) ==
-    (state->nexthop->nexthop_net.v4.s_addr & mask))
- /* nexthop and peer are in the same net */
- return (state->nexthop->exit_nexthop.v4.s_addr);
- else
- return (peer->local_v4_addr.v4.s_addr);
- } else
- return (peer->local_v4_addr.v4.s_addr);
- } else
- /* ebgp multihop */
+    state->nexthop->flags & NEXTHOP_CONNECTED &&
+    prefix_compare(&peer->remote_addr,
+    &state->nexthop->nexthop_net,
+    state->nexthop->nexthop_netlen) == 0) {
+ /* nexthop and peer are in the same net */
+ return (&state->nexthop->exit_nexthop);
+ }
+ return (peer_local);
+ } else {
  /*
- * For ebgp multihop nh->flags should never have
- * NEXTHOP_CONNECTED set so it should be possible to unify the
- * two ebgp cases. But this is safe and RFC compliant.
+ * For ebgp multihop make it possible to overrule
+ * the sent nexthop by setting NEXTHOP_NOMODIFY.
+ * Similar to the ibgp case there is no same net check
+ * needed but still ensure that the nexthop is not
+ * pointing to the peer itself.
  */
- return (peer->local_v4_addr.v4.s_addr);
+ if (state->nhflags & NEXTHOP_NOMODIFY &&
+    state->nexthop == NULL &&
+    memcmp(&state->nexthop->exit_nexthop,
+    &peer->remote_addr, sizeof(peer->remote_addr)) != 0) {
+ /* no modify flag set and nexthop not peer addr */
+ return (&state->nexthop->exit_nexthop);
+ }
+ return (peer_local);
+ }
 }
 
 static int
@@ -350,7 +375,8 @@ up_generate_attr(u_char *buf, int len, s
  case ATTR_NEXTHOP:
  switch (aid) {
  case AID_INET:
- nexthop = up_get_nexthop(peer, state);
+ nexthop =
+    up_get_nexthop(peer, state, aid)->v4.s_addr;
  if ((r = attr_write(buf + wlen, len,
     ATTR_WELL_KNOWN, ATTR_NEXTHOP, &nexthop,
     4)) == -1)
@@ -725,10 +751,10 @@ static int
 up_generate_mp_reach(u_char *buf, int len, struct rde_peer *peer,
     struct filterstate *state, u_int8_t aid)
 {
- u_char *attrbuf;
- int r;
- int wpos, attrlen;
- u_int16_t tmp;
+ struct bgpd_addr *nexthop;
+ u_char *attrbuf;
+ int r, wpos, attrlen;
+ u_int16_t tmp;
 
  if (len < 4)
  return (-1);
@@ -751,54 +777,10 @@ up_generate_mp_reach(u_char *buf, int le
  attrbuf[3] = sizeof(struct in6_addr);
  attrbuf[20] = 0; /* Reserved must be 0 */
 
- /* nexthop dance see also up_get_nexthop() */
+ /* write nexthop */
  attrbuf += 4;
- if (state->nhflags & NEXTHOP_NOMODIFY) {
- /* no modify flag set */
- if (state->nexthop == NULL)
- memcpy(attrbuf, &peer->local_v6_addr.v6,
-    sizeof(struct in6_addr));
- else
- memcpy(attrbuf,
-    &state->nexthop->exit_nexthop.v6,
-    sizeof(struct in6_addr));
- } else if (state->nhflags & NEXTHOP_SELF)
- memcpy(attrbuf, &peer->local_v6_addr.v6,
-    sizeof(struct in6_addr));
- else if (!peer->conf.ebgp) {
- /* ibgp */
- if (state->nexthop == NULL ||
-    (state->nexthop->exit_nexthop.aid == AID_INET6 &&
-    !memcmp(&state->nexthop->exit_nexthop.v6,
-    &peer->remote_addr.v6, sizeof(struct in6_addr))))
- memcpy(attrbuf, &peer->local_v6_addr.v6,
-    sizeof(struct in6_addr));
- else
- memcpy(attrbuf,
-    &state->nexthop->exit_nexthop.v6,
-    sizeof(struct in6_addr));
- } else if (peer->conf.distance == 1) {
- /* ebgp directly connected */
- if (state->nexthop != NULL &&
-    state->nexthop->flags & NEXTHOP_CONNECTED)
- if (prefix_compare(&peer->remote_addr,
-    &state->nexthop->nexthop_net,
-    state->nexthop->nexthop_netlen) == 0) {
- /*
- * nexthop and peer are in the same
- * subnet
- */
- memcpy(attrbuf,
-    &state->nexthop->exit_nexthop.v6,
-    sizeof(struct in6_addr));
- break;
- }
- memcpy(attrbuf, &peer->local_v6_addr.v6,
-    sizeof(struct in6_addr));
- } else
- /* ebgp multihop */
- memcpy(attrbuf, &peer->local_v6_addr.v6,
-    sizeof(struct in6_addr));
+ nexthop = up_get_nexthop(peer, state, aid);
+ memcpy(attrbuf, &nexthop->v6, sizeof(struct in6_addr));
  break;
  case AID_VPN_IPv4:
  attrlen = 17; /* AFI + SAFI + NH LEN + NH + Reserved */
@@ -813,55 +795,10 @@ up_generate_mp_reach(u_char *buf, int le
  bzero(attrbuf + 4, sizeof(u_int64_t));
  attrbuf[16] = 0; /* Reserved must be 0 */
 
- /* nexthop dance see also up_get_nexthop() */
+ /* write nexthop */
  attrbuf += 12;
- if (state->nhflags & NEXTHOP_NOMODIFY) {
- /* no modify flag set */
- if (state->nexthop == NULL)
- memcpy(attrbuf, &peer->local_v4_addr.v4,
-    sizeof(struct in_addr));
- else
- /* nexthops are stored as IPv4 addrs */
- memcpy(attrbuf,
-    &state->nexthop->exit_nexthop.v4,
-    sizeof(struct in_addr));
- } else if (state->nhflags & NEXTHOP_SELF) {
- memcpy(attrbuf, &peer->local_v4_addr.v4,
-    sizeof(struct in_addr));
- } else if (!peer->conf.ebgp) {
- /* ibgp */
- if (state->nexthop == NULL ||
-    (state->nexthop->exit_nexthop.aid == AID_INET &&
-    !memcmp(&state->nexthop->exit_nexthop.v4,
-    &peer->remote_addr.v4, sizeof(struct in_addr))))
- memcpy(attrbuf, &peer->local_v4_addr.v4,
-    sizeof(struct in_addr));
- else
- memcpy(attrbuf,
-    &state->nexthop->exit_nexthop.v4,
-    sizeof(struct in_addr));
- } else if (peer->conf.distance == 1) {
- /* ebgp directly connected */
- if (state->nexthop != NULL &&
-    state->nexthop->flags & NEXTHOP_CONNECTED)
- if (prefix_compare(&peer->remote_addr,
-    &state->nexthop->nexthop_net,
-    state->nexthop->nexthop_netlen) == 0) {
- /*
- * nexthop and peer are in the same
- * subnet
- */
- memcpy(attrbuf,
-    &state->nexthop->exit_nexthop.v4,
-    sizeof(struct in_addr));
- break;
- }
- memcpy(attrbuf, &peer->local_v4_addr.v4,
-    sizeof(struct in_addr));
- } else
- /* ebgp multihop */
- memcpy(attrbuf, &peer->local_v4_addr.v4,
-    sizeof(struct in_addr));
+ nexthop = up_get_nexthop(peer, state, aid);
+ memcpy(attrbuf, &nexthop->v4, sizeof(struct in_addr));
  break;
  case AID_VPN_IPv6:
  attrlen = 29; /* AFI + SAFI + NH LEN + NH + Reserved */
@@ -876,54 +813,10 @@ up_generate_mp_reach(u_char *buf, int le
  bzero(attrbuf + 4, sizeof(u_int64_t));
  attrbuf[28] = 0; /* Reserved must be 0 */
 
- /* nexthop dance see also up_get_nexthop() */
+ /* write nexthop */
  attrbuf += 12;
- if (state->nhflags & NEXTHOP_NOMODIFY) {
- /* no modify flag set */
- if (state->nexthop == NULL)
- memcpy(attrbuf, &peer->local_v6_addr.v6,
-    sizeof(struct in6_addr));
- else
- memcpy(attrbuf,
-    &state->nexthop->exit_nexthop.v6,
-    sizeof(struct in6_addr));
- } else if (state->nhflags & NEXTHOP_SELF)
- memcpy(attrbuf, &peer->local_v6_addr.v6,
-    sizeof(struct in6_addr));
- else if (!peer->conf.ebgp) {
- /* ibgp */
- if (state->nexthop == NULL ||
-    (state->nexthop->exit_nexthop.aid == AID_INET6 &&
-    !memcmp(&state->nexthop->exit_nexthop.v6,
-    &peer->remote_addr.v6, sizeof(struct in6_addr))))
- memcpy(attrbuf, &peer->local_v6_addr.v6,
-    sizeof(struct in6_addr));
- else
- memcpy(attrbuf,
-    &state->nexthop->exit_nexthop.v6,
-    sizeof(struct in6_addr));
- } else if (peer->conf.distance == 1) {
- /* ebgp directly connected */
- if (state->nexthop != NULL &&
-    state->nexthop->flags & NEXTHOP_CONNECTED)
- if (prefix_compare(&peer->remote_addr,
-    &state->nexthop->nexthop_net,
-    state->nexthop->nexthop_netlen) == 0) {
- /*
- * nexthop and peer are in the same
- * subnet
- */
- memcpy(attrbuf,
-    &state->nexthop->exit_nexthop.v6,
-    sizeof(struct in6_addr));
- break;
- }
- memcpy(attrbuf, &peer->local_v6_addr.v6,
-    sizeof(struct in6_addr));
- } else
- /* ebgp multihop */
- memcpy(attrbuf, &peer->local_v6_addr.v6,
-    sizeof(struct in6_addr));
+ nexthop = up_get_nexthop(peer, state, aid);
+ memcpy(attrbuf, &nexthop->v6, sizeof(struct in6_addr));
  break;
  default:
  fatalx("up_generate_mp_reach: unknown AID");

Reply | Threaded
Open this post in threaded view
|

Re: bgpd set nexthop 198.51.100.42 clarifications

Claudio Jeker
On Wed, May 29, 2019 at 03:21:34PM +0200, Claudio Jeker wrote:

> On Tue, May 28, 2019 at 07:09:00PM +0200, Job Snijders wrote:
> > On Tue, May 28, 2019 at 05:17:08PM +0200, Claudio Jeker wrote:
> > > On Tue, May 28, 2019 at 01:28:32PM +0200, Job Snijders wrote:
> > > > On Mon, May 13, 2019 at 09:03:41PM +0200, Claudio Jeker wrote:
> > > > > When using a rule forcing the nexthop to a specific address bgpd
> > > > > currently does not mark that nexthop as no-modify. In other words
> > > > > the default rules for nexthop propagation applies. This means that
> > > > > for ebgp it only sends out the set nexthop when this nexthop is connected
> > > > > and on the same network as the peer. So while the Adj-RIB-Out shows the
> > > > > right nexthop it is actually not on the wire.
> > > > >
> > > > > This diff changes set nexthop 198.51.100.42 to also imply set nexthop
> > > > > no-modify. This way the set nexthop is always on the wire.
> > > > > The problem with that is that it will hand you a nice footgun ready to
> > > > > blow of your big toe (but in the end the current behaviour is doing the
> > > > > same just with a different angle of attack) .
> > > > >
> > > > > The set nexthop section in bgpd.conf.5 needs to be adjusted once a
> > > > > decision is made on how to handle this.
> > > >
> > > > I think I'm not a big fan of this change.
> > > >
> > > > Section 5.1.3 of RFC 4271 (the core BGP spec) is very explicit on
> > > > what NEXT_HOPs are valid to send over a non-multihop BGP session.
> > > > Only addresses that are part of the linknet between the two routers
> > > > are valid NEXT_HOP addresses on the wire. This changes makes it
> > > > trivial to send not-valid NEXT_HOPs to a neighbor, this may result
> > > > in very hard to debug troublecases. Feels like we'll be handing way
> > > > too much rope to users, especially since it facilitates protocol
> > > > violations.
> > > >
> > > > I am not aware of a real world BGP implementation that would allow
> > > > you to send completely arbitrary NEXT_HOPs.
> > >
> > > I came to a similar conclusion and will send out a better diff. The
> > > idea is that for the non-multihop BGP sessions we should require the
> > > nexthop to be in the same network.
> >
> > With 'same network', you mean the NEXT_HOP IP address must be part of
> > the router-to-router linknet, right?
> >
> > > The ebgp multihop case is currently always sticking to the local
> > > address and should probably respect the nexthop if set explicitly
> > > (that is also what the RFC suggests).
> >
> > Right.
> >
> > > ibgp seems to do the right thing.
> >
> > yup, in IBGP the NEXT_HOP can be anything.
> >
> > > The same rules need to be applied to "set nexthop no-modify" since
> > > that is currently on massive hammer.
> >
> > Right.
> >
>
> So here is the diff for this change. In short nexthop no-modify has only
> relevance for ebgp multihop links now (for the other cases it is either
> the default or ignored). set nexthop self is still overriding all
> decisions. ebgp just does the same net check and decides based on that
> data. I refactored up_get_nexthop in a way that allows it to be used
> for all AFI/SAFI cases and so 3 of the 4 almost identical code versions
> did go away and the resulting up_get_nexthop() is a lot nicer now.
>
> I decided to drop the implied NEXTHOP_NOMODIFY on set nexthop 198.51.100.42.
> It makes more sense to set this on a per ebgp multihop session if that is
> requested. The reason is that a NEXTHOP_NOMODIFY can not be toggled off
> again and so it may cause nasty config hacks in complex setups. People
> should just use 'set nexthop no-modify' on the ebgp multihop neighbor
> block or in the filters.
>
> Tested with IPv4 and IPv6. OK?

Ping

--
:wq Claudio

Index: rde_update.c
===================================================================
RCS file: /cvs/src/usr.sbin/bgpd/rde_update.c,v
retrieving revision 1.111
diff -u -p -r1.111 rde_update.c
--- rde_update.c 13 May 2019 21:13:04 -0000 1.111
+++ rde_update.c 29 May 2019 13:07:31 -0000
@@ -238,59 +238,84 @@ up_generate_default(struct filter_head *
 }
 
 /* only for IPv4 */
-static in_addr_t
-up_get_nexthop(struct rde_peer *peer, struct filterstate *state)
+static struct bgpd_addr *
+up_get_nexthop(struct rde_peer *peer, struct filterstate *state, u_int8_t aid)
 {
- in_addr_t mask;
+ struct bgpd_addr *peer_local;
 
- /* nexthop, already network byte order */
- if (state->nhflags & NEXTHOP_NOMODIFY) {
- /* no modify flag set */
- if (state->nexthop == NULL)
- return (peer->local_v4_addr.v4.s_addr);
- else
- return (state->nexthop->exit_nexthop.v4.s_addr);
- } else if (state->nhflags & NEXTHOP_SELF)
- return (peer->local_v4_addr.v4.s_addr);
- else if (!peer->conf.ebgp) {
+ switch (aid) {
+ case AID_INET:
+ case AID_VPN_IPv4:
+ peer_local = &peer->local_v4_addr;
+ break;
+ case AID_INET6:
+ case AID_VPN_IPv6:
+ peer_local = &peer->local_v6_addr;
+ break;
+ default:
+ fatalx("%s, bad AID %s", __func__, aid2str(aid));
+ }
+
+ if (state->nhflags & NEXTHOP_SELF) {
  /*
- * If directly connected use peer->local_v4_addr
- * this is only true for announced networks.
+ * Forcing the nexthop to self is always possible
+ * and has precedence over other flags.
  */
- if (state->nexthop == NULL)
- return (peer->local_v4_addr.v4.s_addr);
- else if (state->nexthop->exit_nexthop.v4.s_addr ==
-    peer->remote_addr.v4.s_addr)
- /*
- * per RFC: if remote peer address is equal to
- * the nexthop set the nexthop to our local address.
- * This reduces the risk of routing loops.
- */
- return (peer->local_v4_addr.v4.s_addr);
- else
- return (state->nexthop->exit_nexthop.v4.s_addr);
+ return (peer_local);
+ } else if (!peer->conf.ebgp) {
+ /*
+ * in the ibgp case the nexthop is normally not
+ * modified unless it points at the peer itself.
+ */
+ if (state->nexthop == NULL) {
+ /* announced networks without explicit nexthop set */
+ return (peer_local);
+ }
+ /*
+ * per RFC: if remote peer address is equal to the nexthop set
+ * the nexthop to our local address. This reduces the risk of
+ * routing loops. This overrides NEXTHOP_NOMODIFY.
+ */
+ if (memcmp(&state->nexthop->exit_nexthop,
+    &peer->remote_addr, sizeof(peer->remote_addr)) == 0) {
+ return (peer_local);
+ }
+ return (&state->nexthop->exit_nexthop);
  } else if (peer->conf.distance == 1) {
- /* ebgp directly connected */
+ /*
+ * In the ebgp directly connected case never send
+ * out a nexthop that is outside of the connected
+ * network of the peer. No matter what flags are
+ * set. This follows section 5.1.3 of RFC 4271.
+ * So just check if the nexthop is in the same net
+ * is enough here.
+ */
  if (state->nexthop != NULL &&
-    state->nexthop->flags & NEXTHOP_CONNECTED) {
- mask = htonl(
-    prefixlen2mask(state->nexthop->nexthop_netlen));
- if ((peer->remote_addr.v4.s_addr & mask) ==
-    (state->nexthop->nexthop_net.v4.s_addr & mask))
- /* nexthop and peer are in the same net */
- return (state->nexthop->exit_nexthop.v4.s_addr);
- else
- return (peer->local_v4_addr.v4.s_addr);
- } else
- return (peer->local_v4_addr.v4.s_addr);
- } else
- /* ebgp multihop */
+    state->nexthop->flags & NEXTHOP_CONNECTED &&
+    prefix_compare(&peer->remote_addr,
+    &state->nexthop->nexthop_net,
+    state->nexthop->nexthop_netlen) == 0) {
+ /* nexthop and peer are in the same net */
+ return (&state->nexthop->exit_nexthop);
+ }
+ return (peer_local);
+ } else {
  /*
- * For ebgp multihop nh->flags should never have
- * NEXTHOP_CONNECTED set so it should be possible to unify the
- * two ebgp cases. But this is safe and RFC compliant.
+ * For ebgp multihop make it possible to overrule
+ * the sent nexthop by setting NEXTHOP_NOMODIFY.
+ * Similar to the ibgp case there is no same net check
+ * needed but still ensure that the nexthop is not
+ * pointing to the peer itself.
  */
- return (peer->local_v4_addr.v4.s_addr);
+ if (state->nhflags & NEXTHOP_NOMODIFY &&
+    state->nexthop == NULL &&
+    memcmp(&state->nexthop->exit_nexthop,
+    &peer->remote_addr, sizeof(peer->remote_addr)) != 0) {
+ /* no modify flag set and nexthop not peer addr */
+ return (&state->nexthop->exit_nexthop);
+ }
+ return (peer_local);
+ }
 }
 
 static int
@@ -350,7 +375,8 @@ up_generate_attr(u_char *buf, int len, s
  case ATTR_NEXTHOP:
  switch (aid) {
  case AID_INET:
- nexthop = up_get_nexthop(peer, state);
+ nexthop =
+    up_get_nexthop(peer, state, aid)->v4.s_addr;
  if ((r = attr_write(buf + wlen, len,
     ATTR_WELL_KNOWN, ATTR_NEXTHOP, &nexthop,
     4)) == -1)
@@ -725,10 +751,10 @@ static int
 up_generate_mp_reach(u_char *buf, int len, struct rde_peer *peer,
     struct filterstate *state, u_int8_t aid)
 {
- u_char *attrbuf;
- int r;
- int wpos, attrlen;
- u_int16_t tmp;
+ struct bgpd_addr *nexthop;
+ u_char *attrbuf;
+ int r, wpos, attrlen;
+ u_int16_t tmp;
 
  if (len < 4)
  return (-1);
@@ -751,54 +777,10 @@ up_generate_mp_reach(u_char *buf, int le
  attrbuf[3] = sizeof(struct in6_addr);
  attrbuf[20] = 0; /* Reserved must be 0 */
 
- /* nexthop dance see also up_get_nexthop() */
+ /* write nexthop */
  attrbuf += 4;
- if (state->nhflags & NEXTHOP_NOMODIFY) {
- /* no modify flag set */
- if (state->nexthop == NULL)
- memcpy(attrbuf, &peer->local_v6_addr.v6,
-    sizeof(struct in6_addr));
- else
- memcpy(attrbuf,
-    &state->nexthop->exit_nexthop.v6,
-    sizeof(struct in6_addr));
- } else if (state->nhflags & NEXTHOP_SELF)
- memcpy(attrbuf, &peer->local_v6_addr.v6,
-    sizeof(struct in6_addr));
- else if (!peer->conf.ebgp) {
- /* ibgp */
- if (state->nexthop == NULL ||
-    (state->nexthop->exit_nexthop.aid == AID_INET6 &&
-    !memcmp(&state->nexthop->exit_nexthop.v6,
-    &peer->remote_addr.v6, sizeof(struct in6_addr))))
- memcpy(attrbuf, &peer->local_v6_addr.v6,
-    sizeof(struct in6_addr));
- else
- memcpy(attrbuf,
-    &state->nexthop->exit_nexthop.v6,
-    sizeof(struct in6_addr));
- } else if (peer->conf.distance == 1) {
- /* ebgp directly connected */
- if (state->nexthop != NULL &&
-    state->nexthop->flags & NEXTHOP_CONNECTED)
- if (prefix_compare(&peer->remote_addr,
-    &state->nexthop->nexthop_net,
-    state->nexthop->nexthop_netlen) == 0) {
- /*
- * nexthop and peer are in the same
- * subnet
- */
- memcpy(attrbuf,
-    &state->nexthop->exit_nexthop.v6,
-    sizeof(struct in6_addr));
- break;
- }
- memcpy(attrbuf, &peer->local_v6_addr.v6,
-    sizeof(struct in6_addr));
- } else
- /* ebgp multihop */
- memcpy(attrbuf, &peer->local_v6_addr.v6,
-    sizeof(struct in6_addr));
+ nexthop = up_get_nexthop(peer, state, aid);
+ memcpy(attrbuf, &nexthop->v6, sizeof(struct in6_addr));
  break;
  case AID_VPN_IPv4:
  attrlen = 17; /* AFI + SAFI + NH LEN + NH + Reserved */
@@ -813,55 +795,10 @@ up_generate_mp_reach(u_char *buf, int le
  bzero(attrbuf + 4, sizeof(u_int64_t));
  attrbuf[16] = 0; /* Reserved must be 0 */
 
- /* nexthop dance see also up_get_nexthop() */
+ /* write nexthop */
  attrbuf += 12;
- if (state->nhflags & NEXTHOP_NOMODIFY) {
- /* no modify flag set */
- if (state->nexthop == NULL)
- memcpy(attrbuf, &peer->local_v4_addr.v4,
-    sizeof(struct in_addr));
- else
- /* nexthops are stored as IPv4 addrs */
- memcpy(attrbuf,
-    &state->nexthop->exit_nexthop.v4,
-    sizeof(struct in_addr));
- } else if (state->nhflags & NEXTHOP_SELF) {
- memcpy(attrbuf, &peer->local_v4_addr.v4,
-    sizeof(struct in_addr));
- } else if (!peer->conf.ebgp) {
- /* ibgp */
- if (state->nexthop == NULL ||
-    (state->nexthop->exit_nexthop.aid == AID_INET &&
-    !memcmp(&state->nexthop->exit_nexthop.v4,
-    &peer->remote_addr.v4, sizeof(struct in_addr))))
- memcpy(attrbuf, &peer->local_v4_addr.v4,
-    sizeof(struct in_addr));
- else
- memcpy(attrbuf,
-    &state->nexthop->exit_nexthop.v4,
-    sizeof(struct in_addr));
- } else if (peer->conf.distance == 1) {
- /* ebgp directly connected */
- if (state->nexthop != NULL &&
-    state->nexthop->flags & NEXTHOP_CONNECTED)
- if (prefix_compare(&peer->remote_addr,
-    &state->nexthop->nexthop_net,
-    state->nexthop->nexthop_netlen) == 0) {
- /*
- * nexthop and peer are in the same
- * subnet
- */
- memcpy(attrbuf,
-    &state->nexthop->exit_nexthop.v4,
-    sizeof(struct in_addr));
- break;
- }
- memcpy(attrbuf, &peer->local_v4_addr.v4,
-    sizeof(struct in_addr));
- } else
- /* ebgp multihop */
- memcpy(attrbuf, &peer->local_v4_addr.v4,
-    sizeof(struct in_addr));
+ nexthop = up_get_nexthop(peer, state, aid);
+ memcpy(attrbuf, &nexthop->v4, sizeof(struct in_addr));
  break;
  case AID_VPN_IPv6:
  attrlen = 29; /* AFI + SAFI + NH LEN + NH + Reserved */
@@ -876,54 +813,10 @@ up_generate_mp_reach(u_char *buf, int le
  bzero(attrbuf + 4, sizeof(u_int64_t));
  attrbuf[28] = 0; /* Reserved must be 0 */
 
- /* nexthop dance see also up_get_nexthop() */
+ /* write nexthop */
  attrbuf += 12;
- if (state->nhflags & NEXTHOP_NOMODIFY) {
- /* no modify flag set */
- if (state->nexthop == NULL)
- memcpy(attrbuf, &peer->local_v6_addr.v6,
-    sizeof(struct in6_addr));
- else
- memcpy(attrbuf,
-    &state->nexthop->exit_nexthop.v6,
-    sizeof(struct in6_addr));
- } else if (state->nhflags & NEXTHOP_SELF)
- memcpy(attrbuf, &peer->local_v6_addr.v6,
-    sizeof(struct in6_addr));
- else if (!peer->conf.ebgp) {
- /* ibgp */
- if (state->nexthop == NULL ||
-    (state->nexthop->exit_nexthop.aid == AID_INET6 &&
-    !memcmp(&state->nexthop->exit_nexthop.v6,
-    &peer->remote_addr.v6, sizeof(struct in6_addr))))
- memcpy(attrbuf, &peer->local_v6_addr.v6,
-    sizeof(struct in6_addr));
- else
- memcpy(attrbuf,
-    &state->nexthop->exit_nexthop.v6,
-    sizeof(struct in6_addr));
- } else if (peer->conf.distance == 1) {
- /* ebgp directly connected */
- if (state->nexthop != NULL &&
-    state->nexthop->flags & NEXTHOP_CONNECTED)
- if (prefix_compare(&peer->remote_addr,
-    &state->nexthop->nexthop_net,
-    state->nexthop->nexthop_netlen) == 0) {
- /*
- * nexthop and peer are in the same
- * subnet
- */
- memcpy(attrbuf,
-    &state->nexthop->exit_nexthop.v6,
-    sizeof(struct in6_addr));
- break;
- }
- memcpy(attrbuf, &peer->local_v6_addr.v6,
-    sizeof(struct in6_addr));
- } else
- /* ebgp multihop */
- memcpy(attrbuf, &peer->local_v6_addr.v6,
-    sizeof(struct in6_addr));
+ nexthop = up_get_nexthop(peer, state, aid);
+ memcpy(attrbuf, &nexthop->v6, sizeof(struct in6_addr));
  break;
  default:
  fatalx("up_generate_mp_reach: unknown AID");