ospfd getting confused about who is DR

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

ospfd getting confused about who is DR

Claudio Jeker
On netsplits it can happen that on join multiple ospfd end up as DR.
In my case with 3 routers the one cut off stays DR even though the rest of
the network already has a DR and BDR.

Looking into this it seems that in some cases we don't issue an
IF_EVT_NBR_CHNG and so the re-evaluation of DR/BDR does not happen.
Looking at hello.c and the rfc seems to suggest that the following case is
currently not handled:

            o   Bidirectional communication has been established to a
                neighbor.  In other words, the state of the neighbor has
                transitioned to 2-Way or higher.

The other cases in the RFC seem to be covered.
The following diff fixes this and seems to solve the problem I'm seeing.

Since this is one of those bits that always caused trouble I would like
more tests and maybe someone is brave enough to OK the diff.
--
:wq Claudio

Index: hello.c
===================================================================
RCS file: /cvs/src/usr.sbin/ospfd/hello.c,v
retrieving revision 1.21
diff -u -p -r1.21 hello.c
--- hello.c 18 Nov 2014 20:54:29 -0000 1.21
+++ hello.c 9 Feb 2018 02:11:55 -0000
@@ -188,7 +188,6 @@ recv_hello(struct iface *iface, struct i
  nbr->dr.s_addr = hello.d_rtr;
  nbr->bdr.s_addr = hello.bd_rtr;
  nbr->priority = hello.rtr_priority;
- nbr_change = 1;
  }
 
  /* actually the neighbor address shouldn't be stored on virtual links */
@@ -201,8 +200,10 @@ recv_hello(struct iface *iface, struct i
  memcpy(&nbr_id, buf, sizeof(nbr_id));
  if (nbr_id == ospfe_router_id()) {
  /* seen myself */
- if (nbr->state & NBR_STA_PRELIM)
+ if (nbr->state & NBR_STA_PRELIM) {
  nbr_fsm(nbr, NBR_EVT_2_WAY_RCVD);
+ nbr_change = 1;
+ }
  break;
  }
  buf += sizeof(nbr_id);

Reply | Threaded
Open this post in threaded view
|

Re: ospfd getting confused about who is DR

Claudio Jeker
On Fri, Feb 09, 2018 at 03:39:43AM +0100, Claudio Jeker wrote:

> On netsplits it can happen that on join multiple ospfd end up as DR.
> In my case with 3 routers the one cut off stays DR even though the rest of
> the network already has a DR and BDR.
>
> Looking into this it seems that in some cases we don't issue an
> IF_EVT_NBR_CHNG and so the re-evaluation of DR/BDR does not happen.
> Looking at hello.c and the rfc seems to suggest that the following case is
> currently not handled:
>
>             o   Bidirectional communication has been established to a
>                 neighbor.  In other words, the state of the neighbor has
>                 transitioned to 2-Way or higher.
>
> The other cases in the RFC seem to be covered.
> The following diff fixes this and seems to solve the problem I'm seeing.
>
> Since this is one of those bits that always caused trouble I would like
> more tests and maybe someone is brave enough to OK the diff.

Here the ospf6d diff for the same issue

--
:wq Claudio

Index: hello.c
===================================================================
RCS file: /cvs/src/usr.sbin/ospf6d/hello.c,v
retrieving revision 1.17
diff -u -p -r1.17 hello.c
--- hello.c 18 Nov 2014 20:54:28 -0000 1.17
+++ hello.c 9 Feb 2018 03:21:01 -0000
@@ -173,7 +173,6 @@ recv_hello(struct iface *iface, struct i
  nbr->dr.s_addr = hello.d_rtr;
  nbr->bdr.s_addr = hello.bd_rtr;
  nbr->priority = LSA_24_GETHI(ntohl(hello.opts));
- nbr_change = 1;
  }
 
  /* actually the neighbor address shouldn't be stored on virtual links */
@@ -186,8 +185,10 @@ recv_hello(struct iface *iface, struct i
  memcpy(&nbr_id, buf, sizeof(nbr_id));
  if (nbr_id == ospfe_router_id()) {
  /* seen myself */
- if (nbr->state & NBR_STA_PRELIM)
+ if (nbr->state & NBR_STA_PRELIM) {
  nbr_fsm(nbr, NBR_EVT_2_WAY_RCVD);
+ nbr_change = 1;
+ }
  break;
  }
  buf += sizeof(nbr_id);

Reply | Threaded
Open this post in threaded view
|

Re: ospfd getting confused about who is DR

Stuart Henderson
In reply to this post by Claudio Jeker
On 2018/02/09 03:39, Claudio Jeker wrote:
> On netsplits it can happen that on join multiple ospfd end up as DR.
> In my case with 3 routers the one cut off stays DR even though the rest of
> the network already has a DR and BDR.

Very likely this is what I've seen. My layout has been roughly
like this,

site a router 1  -------------  site b router 3
  |                                        |
  |                                        |
site a router 2  -------------  site b router 4

and it's usually one of the site a<>b links that drops out and
later comes back, followed by the multiple DR confusion.
It's hard to say which is the "cut off" router in that case as they
all have alternative links.

> Looking into this it seems that in some cases we don't issue an
> IF_EVT_NBR_CHNG and so the re-evaluation of DR/BDR does not happen.
> Looking at hello.c and the rfc seems to suggest that the following case is
> currently not handled:
>
>             o   Bidirectional communication has been established to a
>                 neighbor.  In other words, the state of the neighbor has
>                 transitioned to 2-Way or higher.
>
> The other cases in the RFC seem to be covered.
> The following diff fixes this and seems to solve the problem I'm seeing.
>
> Since this is one of those bits that always caused trouble I would like
> more tests and maybe someone is brave enough to OK the diff.

I'm running this on a handful of routers, it's early days to say whether
it fixes things for me, but I've not seen problems yet. Not quite
feeling brave enough for an OK until I've seen it running for longer
but certainly the diff makes sense to me.


> :wq Claudio
>
> Index: hello.c
> ===================================================================
> RCS file: /cvs/src/usr.sbin/ospfd/hello.c,v
> retrieving revision 1.21
> diff -u -p -r1.21 hello.c
> --- hello.c 18 Nov 2014 20:54:29 -0000 1.21
> +++ hello.c 9 Feb 2018 02:11:55 -0000
> @@ -188,7 +188,6 @@ recv_hello(struct iface *iface, struct i
>   nbr->dr.s_addr = hello.d_rtr;
>   nbr->bdr.s_addr = hello.bd_rtr;
>   nbr->priority = hello.rtr_priority;
> - nbr_change = 1;
>   }
>  
>   /* actually the neighbor address shouldn't be stored on virtual links */
> @@ -201,8 +200,10 @@ recv_hello(struct iface *iface, struct i
>   memcpy(&nbr_id, buf, sizeof(nbr_id));
>   if (nbr_id == ospfe_router_id()) {
>   /* seen myself */
> - if (nbr->state & NBR_STA_PRELIM)
> + if (nbr->state & NBR_STA_PRELIM) {
>   nbr_fsm(nbr, NBR_EVT_2_WAY_RCVD);
> + nbr_change = 1;
> + }
>   break;
>   }
>   buf += sizeof(nbr_id);
>

Reply | Threaded
Open this post in threaded view
|

Re: ospfd getting confused about who is DR

Remi Locherer
In reply to this post by Claudio Jeker
On Fri, Feb 09, 2018 at 03:39:43AM +0100, Claudio Jeker wrote:

> On netsplits it can happen that on join multiple ospfd end up as DR.
> In my case with 3 routers the one cut off stays DR even though the rest of
> the network already has a DR and BDR.
>
> Looking into this it seems that in some cases we don't issue an
> IF_EVT_NBR_CHNG and so the re-evaluation of DR/BDR does not happen.
> Looking at hello.c and the rfc seems to suggest that the following case is
> currently not handled:
>
>             o   Bidirectional communication has been established to a
>                 neighbor.  In other words, the state of the neighbor has
>                 transitioned to 2-Way or higher.
>
> The other cases in the RFC seem to be covered.
> The following diff fixes this and seems to solve the problem I'm seeing.
>
> Since this is one of those bits that always caused trouble I would like
> more tests and maybe someone is brave enough to OK the diff.
> --
> :wq Claudio

I reproduced the issue you describe with VMs in vmm. To simulate a netsplit
I just removed the tap interface from the bridge. The vio in the VM then
still has link but can obviously not reach the others anymore.

I verified that with your patch applied this problem is solved.
No unwanted side effects happened in my testing setup.

OK remi@


>
> Index: hello.c
> ===================================================================
> RCS file: /cvs/src/usr.sbin/ospfd/hello.c,v
> retrieving revision 1.21
> diff -u -p -r1.21 hello.c
> --- hello.c 18 Nov 2014 20:54:29 -0000 1.21
> +++ hello.c 9 Feb 2018 02:11:55 -0000
> @@ -188,7 +188,6 @@ recv_hello(struct iface *iface, struct i
>   nbr->dr.s_addr = hello.d_rtr;
>   nbr->bdr.s_addr = hello.bd_rtr;
>   nbr->priority = hello.rtr_priority;
> - nbr_change = 1;
>   }
>  
>   /* actually the neighbor address shouldn't be stored on virtual links */
> @@ -201,8 +200,10 @@ recv_hello(struct iface *iface, struct i
>   memcpy(&nbr_id, buf, sizeof(nbr_id));
>   if (nbr_id == ospfe_router_id()) {
>   /* seen myself */
> - if (nbr->state & NBR_STA_PRELIM)
> + if (nbr->state & NBR_STA_PRELIM) {
>   nbr_fsm(nbr, NBR_EVT_2_WAY_RCVD);
> + nbr_change = 1;
> + }
>   break;
>   }
>   buf += sizeof(nbr_id);