On Fri, Feb 09, 2018 at 11:11:18AM +0100, Matthieu Herrb wrote:
> I've recently setup a new pair of OpenBSD 6.2 pf firewalls (with carp)
> in my lab, and that's not performing very well.
> tcp-based NFS v3 and v4 traffic (between Linux clients and a NetApp
> server) through it is struggling, and some SSH or HTTPS transfers are
> stalling, with their states disapearing from the state table.
> I'm trying to figure out what's going on to fix the issue.
Thanks to all who answered in private.
With their advices and a bit of personal research, it looks like this
firewall pair is now working as expected.
One of the main issues was caused by a server having 2 interfaces in 2
different vlans that are routed through this firewall. This generated
asymetric routing, so the reply paquets weren't travesing the firewall
and not updating the state, wich stayed half-open for 30s, before
expiring and cutting the connection. A tad of source-routing on the
linux side now forces the trafic to stay symetric and everything's
Another issue seem to come from the fact that the new firewalls are
faster than the previous Cisco router. That apparentlt triggered bugs
in the vmxnet3 driver of CentOS 6 virtual machines, Upgrading to the
driver from open-vm-tools, seems to have fixed the reset of the NFS
The last point is that there seems to be a bug in the half-open
accounting code. The huge number I'm seeing here is in fact pretty
> The main anomaly I see is the huge number (and it keeps growing) of
> half-open tcp states, after 24h of uptime. See pfctl -vsi output
> half-open tcp 4294375902
This is 0xfff6f9de
So it seems that, either because of the assymetric route issue, or
something else, the number of half open connections is decremented
more often that it's incremented and lead to this unsigned overflow.
But as Henning@ mentionned it, this is only accounting and not
actually used anywhere, so it should cause any real-life issue.