pf half-open tcp in state table

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

pf half-open tcp in state table

Matthieu Herrb-7
Hi,

I've recently setup a new pair of OpenBSD 6.2 pf firewalls (with carp)
in my lab, and that's not performing very well.

tcp-based NFS v3 and v4 traffic (between Linux clients and a NetApp
server) through it is struggling, and some SSH or HTTPS transfers are
stalling, with their states disapearing from the state table.

I'm trying to figure out what's going on to fix the issue.


The main anomaly I see is the huge number (and it keeps growing) of
half-open tcp states, after 24h of uptime. See pfctl -vsi output
below.

Any clues on how to diagnose this and hopefully fix my firewalls ?


Below here are the limits and timeouts form my pf.conf, plus pfctl -vsi and
pfctl -st output

Thanks in advance,


set limit states 80000
set timeout { adaptive.start 0, adaptive.end 0 }
set timeout {tcp.closing 1800, tcp.finwait 90, tcp.closed 180 }


Status: Enabled for 0 days 00:18:58              Debug: err

Hostid:   0xbe01b86e
Checksum: 0x489fc22aa9e7d141eb93cb12375c7c55

Interface Stats for vlan4             IPv4             IPv6
  Bytes In                      5766746796       3274270260
  Bytes Out                     3158075114       4781634462
  Packets In
    Passed                        18928091         11038798
    Blocked                        2378976           124784
  Packets Out
    Passed                         2911529          3575695
    Blocked                            108               29

State Table                          Total             Rate
  current entries                    58485              
  half-open tcp                 4294375902              
  searches                       715176441       628450.3/s
  inserts                         42749792        37565.7/s
  removals                        42691307        37514.3/s
Source Tracking Table
  current entries                        0              
  searches                               0            0.0/s
  inserts                                0            0.0/s
  removals                               0            0.0/s
Counters
  match                            6889153         6053.7/s
  bad-offset                             0            0.0/s
  fragment                            4338            3.8/s
  short                                  0            0.0/s
  normalize                              0            0.0/s
  memory                                 0            0.0/s
  bad-timestamp                          0            0.0/s
  congestion                          6297            5.5/s
  ip-option                         258621          227.3/s
  proto-cksum                            0            0.0/s
  state-mismatch                       190            0.2/s
  state-insert                           0            0.0/s
  state-limit                            0            0.0/s
  src-limit                              0            0.0/s
  synproxy                               0            0.0/s
  translate                              0            0.0/s
  no-route                               0            0.0/s
Limit Counters
  max states per rule                    0            0.0/s
  max-src-states                         0            0.0/s
  max-src-nodes                          0            0.0/s
  max-src-conn                           0            0.0/s
  max-src-conn-rate                      0            0.0/s
  overload table insertion               0            0.0/s
  overload flush states                  0            0.0/s


tcp.first                   120s
tcp.opening                  30s
tcp.established           86400s
tcp.closing                1800s
tcp.finwait                  90s
tcp.closed                  180s
tcp.tsdiff                   30s
udp.first                    60s
udp.single                   30s
udp.multiple                 60s
icmp.first                   20s
icmp.error                   10s
other.first                  60s
other.single                 30s
other.multiple               60s
frag                         60s
interval                     10s
adaptive.start                0 states
adaptive.end                  0 states
src.track                     0s




--
Matthieu Herrb

Reply | Threaded
Open this post in threaded view
|

Re: pf half-open tcp in state table

Matthieu Herrb-7
On Fri, Feb 09, 2018 at 11:11:18AM +0100, Matthieu Herrb wrote:

> Hi,
>
> I've recently setup a new pair of OpenBSD 6.2 pf firewalls (with carp)
> in my lab, and that's not performing very well.
>
> tcp-based NFS v3 and v4 traffic (between Linux clients and a NetApp
> server) through it is struggling, and some SSH or HTTPS transfers are
> stalling, with their states disapearing from the state table.
>
> I'm trying to figure out what's going on to fix the issue.
>

Thanks to all  who answered in private.

With their advices and a bit of personal research, it looks like this
firewall pair is now working as expected.

One of the main issues was caused by a server having 2 interfaces in 2
different vlans that are routed through this firewall. This generated
asymetric routing, so the reply paquets weren't travesing the firewall
and not updating the state, wich stayed half-open for 30s, before
expiring and cutting the connection. A tad of source-routing on the
linux side now forces the trafic to stay symetric and everything's
fine.

Another issue seem to come from the fact that the new firewalls are
faster than the previous Cisco router. That apparentlt triggered bugs
in the vmxnet3 driver of CentOS 6 virtual machines, Upgrading to the
driver from open-vm-tools, seems to have fixed the reset of the NFS
traffic issues.

The last point is that there seems to be a bug in the half-open
accounting code. The huge number I'm seeing here is in fact pretty
surely negative:
>
> The main anomaly I see is the huge number (and it keeps growing) of
> half-open tcp states, after 24h of uptime. See pfctl -vsi output
> below.
>
>   half-open tcp                 4294375902

This is 0xfff6f9de

So it seems that, either because of the assymetric route issue, or
something else, the number of half open connections is decremented
more often that it's incremented and lead to this unsigned overflow.

But as Henning@ mentionned it, this is only accounting and not
actually used anywhere, so it should cause any real-life issue.

--
Matthieu Herrb