Weirdness with ARP on an IBM HS20 blade

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Weirdness with ARP on an IBM HS20 blade

Sean Dogar
I've installed OpenBSD 3.8 on an IBM HS20 blade (model 8678).
Everything generally works OK (even multiprocessor support!), except for
some weirdness with the network interface, which is the onboard Broadcom
BCM57xx (bge) interface.  The kernel does correctly enumerate and bring
up the network interfaces, but after that point, I start having trouble.

What usually happens is that I can't get to the host from machines on
the local network.  When I ssh or ping from hosts *outside* of the local
network (and therefore the traffic comes from a gateway address), then I
can ping or ssh the OpenBSD box just fine.

 From the OpenBSD blade, I can successfully ping or ssh to any host,
both on the local network and outside of it.

What's interesting is that this is a problem only after the OpenBSD
machine has run for a few minutes.  Right after a reboot, hosts on the
local network can ping or ssh to the OpenBSD box, but eventually, that
ability goes away.

I'm pretty sure that this weirdness is ARP related.  When I look at the
ARP table on some of the machines that I try to ping and ssh from, the
MAC address is always "(incomplete)" for the OpenBSD box.  Which
explains why the connection never gets made. But then again, when I go
to the Cisco Catalyst which is my gateway and do a "sh arp", it has the
correct MAC address for the OpenBSD box.  When I look at the ARP table
on the OpenBSD box, it has MAC addresses associated with some of my
network infrastructure (both the gateway address and the address of
another router), as well as any other host on the network I've pinged or
connected to.

It's as if the OpenBSD machine just quits responding to ARP requests
from other machines after a while.  What could cause this?  I've looked
at /var/log/messages and the like, but I don't see any errors.

Is there anything dumb that I'm missing?

PF is turned off (with pfctl -d).  Would pf even affect ARP?

What else would help?  Should I include the output of something or my
network configs?

-Sean

Reply | Threaded
Open this post in threaded view
|

Re: Weirdness with ARP on an IBM HS20 blade

Melameth, Daniel D.
Sean Dogar wrote:

> I've installed OpenBSD 3.8 on an IBM HS20 blade (model 8678).
> Everything generally works OK (even multiprocessor support!), except
> for some weirdness with the network interface, which is the onboard
> Broadcom BCM57xx (bge) interface.  The kernel does correctly
> enumerate and bring up the network interfaces, but after that point,
> I start having trouble.
>
> What usually happens is that I can't get to the host from machines on
> the local network.  When I ssh or ping from hosts *outside* of the
> local network (and therefore the traffic comes from a gateway
> address), then I can ping or ssh the OpenBSD box just fine.
>
>  From the OpenBSD blade, I can successfully ping or ssh to any host,
> both on the local network and outside of it.
>
> What's interesting is that this is a problem only after the OpenBSD
> machine has run for a few minutes.  Right after a reboot, hosts on the
> local network can ping or ssh to the OpenBSD box, but eventually, that
> ability goes away.
>
> I'm pretty sure that this weirdness is ARP related.  When I look at
> the ARP table on some of the machines that I try to ping and ssh
> from, the MAC address is always "(incomplete)" for the OpenBSD box.
> Which explains why the connection never gets made. But then again,
> when I go to the Cisco Catalyst which is my gateway and do a "sh
> arp", it has the correct MAC address for the OpenBSD box.  When I
> look at the ARP table on the OpenBSD box, it has MAC addresses
> associated with some of my network infrastructure (both the gateway
> address and the address of another router), as well as any other host
> on the network I've pinged or connected to.
>
> It's as if the OpenBSD machine just quits responding to ARP requests
> from other machines after a while.  What could cause this?  I've
> looked at /var/log/messages and the like, but I don't see any errors.
>
> Is there anything dumb that I'm missing?
>
> PF is turned off (with pfctl -d).  Would pf even affect ARP?
>
> What else would help?  Should I include the output of something or my
> network configs?

How about an ifconfig -a from both systems, clearing the arp cache of
both hosts and capturing tcpdumps on both ends during an entire
connection attempt?

Reply | Threaded
Open this post in threaded view
|

Re: Weirdness with ARP on an IBM HS20 blade

Sean Dogar
> How about an ifconfig -a from both systems

Done.  Submitted to the list in a previous message.

  clearing the arp cache of
> both hosts

Done.

and capturing tcpdumps on both ends during an entire
> connection attempt?
>

I ran tcpdump on both hosts while attempting to secure shell from the
Linux box.

 From the OpenBSD box, I ran:

tcpdump -n host not 10.10.1.130 > bge1.dump

and got nothing back in bge1.dump at all.  tcpdump reported:

tcpdump: listening on bge1, link-type EN10MB
^C
23 packets received by filter
0 packets dropped by kernel



 From the Linux box I got a little more information:

I ran:  tcpdump -n host not 10.10.1.130 > eth0.dump

I'm not going to kill the list with the whole tcpdump (email me off-list
if you want it), but I grepped for the IP address of the OpenBSD box:

  cat eth0.dump  |grep 172.16.1.22
17:16:47.766317 arp who-has 172.16.1.22 tell 172.16.1.144
17:16:48.766068 arp who-has 172.16.1.22 tell 172.16.1.144
17:16:49.765827 arp who-has 172.16.1.22 tell 172.16.1.144


Does it seem as if the OpenBSD box isn't responding to the ARP requests?

-Sean

Reply | Threaded
Open this post in threaded view
|

Re: Weirdness with ARP on an IBM HS20 blade

Melameth, Daniel D.
In reply to this post by Sean Dogar
Sean Dogar wrote:

> I ran tcpdump on both hosts while attempting to secure shell from the
> Linux box.
>
>  From the OpenBSD box, I ran:
>
> tcpdump -n host not 10.10.1.130 > bge1.dump
>
> and got nothing back in bge1.dump at all.  tcpdump reported:
>
> tcpdump: listening on bge1, link-type EN10MB
> ^C
> 23 packets received by filter
> 0 packets dropped by kernel
>
>
>
>  From the Linux box I got a little more information:
>
> I ran:  tcpdump -n host not 10.10.1.130 > eth0.dump
>
> I'm not going to kill the list with the whole tcpdump (email me
> off-list if you want it), but I grepped for the IP address of the
> OpenBSD box:
>
>   cat eth0.dump  |grep 172.16.1.22
> 17:16:47.766317 arp who-has 172.16.1.22 tell 172.16.1.144
> 17:16:48.766068 arp who-has 172.16.1.22 tell 172.16.1.144
> 17:16:49.765827 arp who-has 172.16.1.22 tell 172.16.1.144
>
>
> Does it seem as if the OpenBSD box isn't responding to the ARP
> requests?

Appears either the switch is not broadcasting these arps or OpenBSD is
not seeing them for some reason.  Any chance the OpenBSD box is in a
different VLAN or some kind of filtering is being done between it and
the Linux box?  Do you have some kind of special switchport,
port-security, storm-control or other neat settings configured on the
Catalyst and associated ports that might be "playing games?"  Maybe a
bad port on the Catalyst?  Outside of this, I'm not certain what to
think at this time.  Perhaps someone else can chime in with other
thoughts or other things to try on the OpenBSD box...

Reply | Threaded
Open this post in threaded view
|

Re: Weirdness with ARP on an IBM HS20 blade

Sean Dogar
  > Appears either the switch is not broadcasting these arps or OpenBSD is
> not seeing them for some reason.  Any chance the OpenBSD box is in a
> different VLAN or some kind of filtering is being done between it and
> the Linux box?  Do you have some kind of special switchport,
> port-security, storm-control or other neat settings configured on the
> Catalyst and associated ports that might be "playing games?"  Maybe a
> bad port on the Catalyst?  Outside of this, I'm not certain what to
> think at this time.  Perhaps someone else can chime in with other
> thoughts or other things to try on the OpenBSD box...
>

The OpenBSD box and Linux machine are in the same VLAN.  We do use
VLAN's but each VLAN gets it's own /24.  No special security or storm
control is enabled.  This blade has previously run both Windows and
Linux and network behavior has been perfectly normal.

One of my thoughts was to maybe enable CARP and see if that changes
things.   I read somewhere that ARP handling moves to the CARP level
(vice the bge driver) if CARP is enabled; am I wrong there?

The other thing that is worth mentioning is that the support page
doesn't mention the BCM57xx in the HS20 specifically as being supported.
  Maybe there is some minor difference between that and the other,
supported Broadcom implementations?

If anybody else can think of something to try I'm certainly willing to
try it.  This was more of an experiment than anything else; it *was*
pretty cool to see SMP functioning under OpenBSD.

-Sean