Problem with CARP interfaces not responding until VHID is changed.

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Problem with CARP interfaces not responding until VHID is changed.

Rudy Baker
Hello,

This is my first time posting here so be gentle.


It seems that random CARP interfaces on our systems will just die, stop
replying to any requests OR only 1 request out of ~50 will make it through,
slowly.

tcpdump also shows no traffic reach the interface. Only when that 1 request
makes it through, we will see traffic arrive to the system.

We've tried everything we could think of to bring the carp interface back
to life such as reboot, run sh /etc/netstart and even going as far as
rebuilding the system server from scratch with maven and dropping the
site55.tgz file in there but none of these things fix the issue.

When we change the VHID to anything else and restart the interface, it
fixes everything and the interface is smoking fast again. When we change
the VHID back to what it originally was, we're dead in the water. Again,
change it back to any random VHID and the issue goes away. So I have
narrowed it down to VHID. Whenever we run into this problem I just tell
people to change it to anything else.

I know the CARP interface's MAC address is generated by the VHID so I am
sort of leaning towards it be an ARP issue and possibly not an issue with
the OBSD system. But I am hoping for some hints or ideas from you guys.

Thanks in advance for any help!

RZ

Reply | Threaded
Open this post in threaded view
|

Re: Problem with CARP interfaces not responding until VHID is changed.

Alexander Salmin
Hey,

Welcome to the OpenBSD community mailing list. I'm also using CARP for
lots of HA-setups and yes, I will be gentle. I have never had issues
like yours but my setup seems very different. The virtual host id (vhid)
and its ip adress becomes a carp-group, so changing the vhid back and
forth is not something I understand why you are doing.

  - Try to isolate this to 2 simple test machines with as simple setup
as possible. Be simple.
  - Make those machines either run the release version or current. State
which.
  - Then continue;
   - Post your interfaces configurations.
   - Post your dmesg
   - Post your pf.conf
   - Post your tcpdump (where you observed this, make it as small as
possible)
   - Also some information about why you are changing vhid would be
interesting.
   - vhid needs to be the same on all hosts participating in the same
carp interface.
   - if you change vhid, the other host(s) on the other side also needs
to change.
   - Are you using a carp on top of any other non-hardware interface?
Like a vlan, with carpdev?
   - Also, many people forget this, but if you type "man 4 carp" you
will find a lot of good stuff to be read about carp, vhids, carpdev and
such.

Best of luck,
Alexander





On 01/21/2016 11:02 PM, rizz2pro . wrote:

> Hello,
>
> This is my first time posting here so be gentle.
>
>
> It seems that random CARP interfaces on our systems will just die, stop
> replying to any requests OR only 1 request out of ~50 will make it through,
> slowly.
>
> tcpdump also shows no traffic reach the interface. Only when that 1 request
> makes it through, we will see traffic arrive to the system.
>
> We've tried everything we could think of to bring the carp interface back
> to life such as reboot, run sh /etc/netstart and even going as far as
> rebuilding the system server from scratch with maven and dropping the
> site55.tgz file in there but none of these things fix the issue.
>
> When we change the VHID to anything else and restart the interface, it
> fixes everything and the interface is smoking fast again. When we change
> the VHID back to what it originally was, we're dead in the water. Again,
> change it back to any random VHID and the issue goes away. So I have
> narrowed it down to VHID. Whenever we run into this problem I just tell
> people to change it to anything else.
>
> I know the CARP interface's MAC address is generated by the VHID so I am
> sort of leaning towards it be an ARP issue and possibly not an issue with
> the OBSD system. But I am hoping for some hints or ideas from you guys.
>
> Thanks in advance for any help!
>
> RZ

Reply | Threaded
Open this post in threaded view
|

Re: Problem with CARP interfaces not responding until VHID is changed.

Adam Thompson
In reply to this post by Rudy Baker
On 16-01-21 04:02 PM, rizz2pro . wrote:
> I know the CARP interface's MAC address is generated by the VHID so I am
> sort of leaning towards it be an ARP issue and possibly not an issue with
> the OBSD system. But I am hoping for some hints or ideas from you guys.
I have a suspicion... what kind of switches are the OpenBSD hosts in
question connected to?  If they are managed switches, have you looked at
the logs on the switch?

Some switches interpret CARP (and VRRP, for that matter) as either an
attack or a misconfigured system, and will temporarily block traffic to
one of the two ports.  Essentially, they think it's "MAC flapping"...
which it sort of is, in fact.

-Adam

Reply | Threaded
Open this post in threaded view
|

Re: Problem with CARP interfaces not responding until VHID is changed.

Rudy Baker
I like that!

They are indeed managed switches. Although I was thinking it was related to
ARP I never actually dug into the switches logs. This sounds like it might
just fit. I will get back to you shortly. Thanks for all your help
On Jan 22, 2016 11:00 AM, "Adam Thompson" <[hidden email]> wrote:

>
> On 16-01-21 04:02 PM, rizz2pro . wrote:
>
>> I know the CARP interface's MAC address is generated by the VHID so I am
>> sort of leaning towards it be an ARP issue and possibly not an issue with
>> the OBSD system. But I am hoping for some hints or ideas from you guys.
>>
> I have a suspicion... what kind of switches are the OpenBSD hosts in
> question connected to?  If they are managed switches, have you looked at
> the logs on the switch?
>
> Some switches interpret CARP (and VRRP, for that matter) as either an
> attack or a misconfigured system, and will temporarily block traffic to one
> of the two ports.  Essentially, they think it's "MAC flapping"... which it
> sort of is, in fact.
>
> -Adam

Reply | Threaded
Open this post in threaded view
|

Re: Problem with CARP interfaces not responding until VHID is changed.

Rudy Baker
In reply to this post by Adam Thompson
Ok we've figured it out.

We have a couple identical environments all attached to one switch and
they are all advertising the same VHIDs to each other and it looks to
be causing some arp problems. (Environment A was getting CARP
advertisements from Environment B and vice versa)

After specifying a "carppeer" on each CARP interface attached to that
switch in all 3 environments the issue went away. All 3 environments
are using the same VHID.

Thanks for the help

Reply | Threaded
Open this post in threaded view
|

[PATCH INCLUDED] Re: Problem with CARP interfaces not responding until VHID is changed.

Adam Thompson
On 16-01-25 03:43 PM, rizz2pro . wrote:

> Ok we've figured it out.
>
> We have a couple identical environments all attached to one switch and
> they are all advertising the same VHIDs to each other and it looks to
> be causing some arp problems. (Environment A was getting CARP
> advertisements from Environment B and vice versa)
>
> After specifying a "carppeer" on each CARP interface attached to that
> switch in all 3 environments the issue went away. All 3 environments
> are using the same VHID.
>
> Thanks for the help

Umm... per http://www.openbsd.org/faq/pf/carp.html

> /vhid/
>     The Virtual Host ID. This is a unique number that is used to
>     identify the redundancy group to other nodes in the group, and to
>     distinguish between groups on the same network. Acceptable values
>     are from 1 to 255. This must be the same on all members of the group.

Note the word "unique".  Basically, by having multiple clusters with the
same VHID, you deliberately broke carp(4).  VRRP would have broken in
exactly the same way, and I think HSRP would also have broken in same
way, too.

However, I see that neither carp(4) nor ifconfig (8) indicates that VHID
should be unique and not shared, except by very weak implication.
If specifying a common VHID is not only contraindicated, but causes
actual breakage on the network (as seen here), then the manpage(s)
should say so, IMO.  (IIRC, the manpages are canonical, not the FAQ...?)

Drat, I don't have a copy of -current on this system... following is a
proposed diff against 5.8-RELEASE.  Sorry for not doing a proper diff
against -current but if I wait, I'll forget.  At least this way I might
remember later.

I know in VRRP, the VHID is used to generate the MAC address, but I
don't recall if carp(4) works the same way.  If it does, then the
language I suggest for carp(4) below may be too permissive, in that use
of carppeer will stop each system from complaining, but external clients
will still encounter difficulties with ARP.

-Adam

--- carp.4.dist 2016-01-25 18:04:39.152065865 -0600
+++ carp.4 2016-01-25 18:08:14.326975564 -0600
@@ -58,6 +58,10 @@
  a common virtual host ID (VHID) and
  virtual host IP address on each machine which is to take part in the virtual
  group.
+The VHID uniquely identifies a cluster locally within a broadcast domain
+(network segment), but may be reused on other networks.  The
+.Cm carppeer
+option may also be used avoid conflicting VHID multicasts.
  Additional parameters can also be set on a per-interface basis:
  .Cm advbase
  and

--- ifconfig.8.dist 2016-01-25 18:15:07.089080002 -0600
+++ ifconfig.8 2016-01-25 18:14:35.745168361 -0600
@@ -849,6 +849,7 @@
  Set the virtual host ID to
  .Ar n .
  Acceptable values are 1 to 255.
+Clusters on the same network should use unique IDs.
  .El
  .Pp
  Taken together, the