IPsec and MTU / fragmentation

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

IPsec and MTU / fragmentation

Lucas-2
Hi misc@,

I've set up an IPsec tunnel to for serving my website from my home. The
tunnel works quite well most of the time, but if I try to deliver big
files over it, the HTTP client never gets a response. After some
testing, if I ran in the HTTP server end

        perl -e 'print "a" x 1386;' | doas nc -l 10.200.0.80 80

client receives 1386 "a"s, but with any bigger size the client sees no
response at all.

This smells of MTU / fragmentation issues, but I don't know enough about
networks to configure it properly. Is this the case? Any recommendations
on how to configure a sensible value? Any clue sticks? I can bang
different MTUs until it works, but that solution doesn't seem to scale.
You can find my iked and pf configs below.

Also would like to understand why it happens, so pointers to docs are
more than welcome.

Thanks in advance,
-Lucas

Initiator /etc/iked.conf:

        initiator_www = 10.200.0.80
        initiator_peer = 192.0.2.1
        responder = 198.51.100.1

        ikev2 "www" active proto tcp \
            from $initiator_www port 80 to $responder \
            peer $responder \
            srcid initiator dstid responder \
            tag IPSECWWW

Initiator /etc/pf.conf:

        set block-policy drop
        set loginterface egress
        set skip on lo0

        block all

        pass out quick on { egress enc0 }

        pass in quick on enc0 tagged IPSECWWW
        pass in on egress proto tcp to port ssh
        pass in on egress inet proto icmp all
        pass in on egress inet6 proto ipv6-icmp all

Responder /etc/iked.conf:

        initiator_www = 10.200.0.80
        initiator_peer = 192.0.2.1
        responder = 198.51.100.1

        ikev2 "www" passive proto tcp \
            from $responder to $initiator_www port 80 \
            peer $initiator_peer \
            srcid responder dstid initiator \
            tag IPSECWWW

Responder /etc/pf.conf:

        set block-policy drop
        set loginterface egress
        set skip on lo0

        block log all

        pass out quick on egress

        pass in log on egress proto udp from any to (egress) \
            port { isakmp ipsec-nat-t }
        pass in log on egress proto esp from any to (egress)
        pass in log on enc0 tagged IPSECWWW
        pass out log on enc0

        pass in on egress proto tcp to port { ssh http https }
        pass in on egress inet proto icmp all
        pass in on egress inet6 proto icmp6 all

Reply | Threaded
Open this post in threaded view
|

Re: IPsec and MTU / fragmentation

Denis Lapshin-2
It can be re-keying issue. You can check this out by adding to iked.conf
on both ends:

Intitiator:
...
ikelifetime 120m lifetime 180m bytes 200m \
tag IPSECWWW

Receiver:
...
ikelifetime 100m lifetime 160m bytes 250m \
tag IPSECWWW

The test result can be used for further investigations.

By the way, can your let us know "big files" exact size?

Denis

On 2/9/2020 9:33 PM, Lucas wrote:

> Hi misc@,
>
> I've set up an IPsec tunnel to for serving my website from my home. The
> tunnel works quite well most of the time, but if I try to deliver big
> files over it, the HTTP client never gets a response. After some
> testing, if I ran in the HTTP server end
>
> perl -e 'print "a" x 1386;' | doas nc -l 10.200.0.80 80
>
> client receives 1386 "a"s, but with any bigger size the client sees no
> response at all.
>
> This smells of MTU / fragmentation issues, but I don't know enough about
> networks to configure it properly. Is this the case? Any recommendations
> on how to configure a sensible value? Any clue sticks? I can bang
> different MTUs until it works, but that solution doesn't seem to scale.
> You can find my iked and pf configs below.
>
> Also would like to understand why it happens, so pointers to docs are
> more than welcome.
>
> Thanks in advance,
> -Lucas
>
> Initiator /etc/iked.conf:
>
> initiator_www = 10.200.0.80
> initiator_peer = 192.0.2.1
> responder = 198.51.100.1
>
> ikev2 "www" active proto tcp \
>    from $initiator_www port 80 to $responder \
>    peer $responder \
>    srcid initiator dstid responder \
>    tag IPSECWWW
>
> Initiator /etc/pf.conf:
>
> set block-policy drop
> set loginterface egress
> set skip on lo0
>
> block all
>
> pass out quick on { egress enc0 }
>
> pass in quick on enc0 tagged IPSECWWW
> pass in on egress proto tcp to port ssh
> pass in on egress inet proto icmp all
> pass in on egress inet6 proto ipv6-icmp all
>
> Responder /etc/iked.conf:
>
> initiator_www = 10.200.0.80
> initiator_peer = 192.0.2.1
> responder = 198.51.100.1
>
> ikev2 "www" passive proto tcp \
>    from $responder to $initiator_www port 80 \
>    peer $initiator_peer \
>    srcid responder dstid initiator \
>    tag IPSECWWW
>
> Responder /etc/pf.conf:
>
> set block-policy drop
> set loginterface egress
> set skip on lo0
>
> block log all
>
> pass out quick on egress
>
> pass in log on egress proto udp from any to (egress) \
>    port { isakmp ipsec-nat-t }
> pass in log on egress proto esp from any to (egress)
> pass in log on enc0 tagged IPSECWWW
> pass out log on enc0
>
> pass in on egress proto tcp to port { ssh http https }
> pass in on egress inet proto icmp all
> pass in on egress inet6 proto icmp6 all
>

Reply | Threaded
Open this post in threaded view
|

Re: IPsec and MTU / fragmentation

Lucas-2
Hi Denis,

Denis <[hidden email]> wrote:
> It can be re-keying issue. You can check this out by adding to iked.conf
> on both ends:

I took this line off from the mail while cleaning up the config. I have

        ikelifetime 3h lifetime 1h

in both ends.

> By the way, can your let us know "big files" exact size?

> > perl -e 'print "a" x 1386;' | doas nc -l 10.200.0.80 80
> >
> > client receives 1386 "a"s, but with any bigger size the client sees no
> > response at all.

Anything bigger than 1386 bytes.

-Lucas

Reply | Threaded
Open this post in threaded view
|

Re: IPsec and MTU / fragmentation

Simen Stavdal
In reply to this post by Lucas-2
Hi Lucas,

Have you tried to manipulate the mss during conversation setup?
This is done with the max-mss directive in pf.conf.

Basically, it takes the three way handshake, and overrides the MSS value in
the handshake to something lower than the default.

Client (1500 bytes) -> pf (change to 1300 bytes) -> Server
Server (1500 bytes) -> pf (change to 1300 bytes) -> Client

Now, both the server and the client thinks that the remote conversation
partner is only able to receive 1300 bytes, and will package the data
accordingly.

When a normal conversation is set up, it is becoming more and more common
to set DF=1 (don't fragment = true).
When the router/firewall receives packets that are too big, they become
discarded, but the max-mss should take care of this. I.e, while not
allowing fragmentation, but force smaller packets in the first place.

The three way handshake will usually always come true, because the packets
are very small.

Cheers,
Simon.

On Sun, 9 Feb 2020 at 21:35, Lucas <[hidden email]> wrote:

> Hi misc@,
>
> I've set up an IPsec tunnel to for serving my website from my home. The
> tunnel works quite well most of the time, but if I try to deliver big
> files over it, the HTTP client never gets a response. After some
> testing, if I ran in the HTTP server end
>
>         perl -e 'print "a" x 1386;' | doas nc -l 10.200.0.80 80
>
> client receives 1386 "a"s, but with any bigger size the client sees no
> response at all.
>
> This smells of MTU / fragmentation issues, but I don't know enough about
> networks to configure it properly. Is this the case? Any recommendations
> on how to configure a sensible value? Any clue sticks? I can bang
> different MTUs until it works, but that solution doesn't seem to scale.
> You can find my iked and pf configs below.
>
> Also would like to understand why it happens, so pointers to docs are
> more than welcome.
>
> Thanks in advance,
> -Lucas
>
> Initiator /etc/iked.conf:
>
>         initiator_www =         10.200.0.80
>         initiator_peer =        192.0.2.1
>         responder =             198.51.100.1
>
>         ikev2 "www" active proto tcp \
>             from $initiator_www port 80 to $responder \
>             peer $responder \
>             srcid initiator dstid responder \
>             tag IPSECWWW
>
> Initiator /etc/pf.conf:
>
>         set block-policy drop
>         set loginterface egress
>         set skip on lo0
>
>         block all
>
>         pass out quick on { egress enc0 }
>
>         pass in quick on enc0 tagged IPSECWWW
>         pass in on egress proto tcp to port ssh
>         pass in on egress inet proto icmp all
>         pass in on egress inet6 proto ipv6-icmp all
>
> Responder /etc/iked.conf:
>
>         initiator_www =         10.200.0.80
>         initiator_peer =        192.0.2.1
>         responder =             198.51.100.1
>
>         ikev2 "www" passive proto tcp \
>             from $responder to $initiator_www port 80 \
>             peer $initiator_peer \
>             srcid responder dstid initiator \
>             tag IPSECWWW
>
> Responder /etc/pf.conf:
>
>         set block-policy drop
>         set loginterface egress
>         set skip on lo0
>
>         block log all
>
>         pass out quick on egress
>
>         pass in log on egress proto udp from any to (egress) \
>             port { isakmp ipsec-nat-t }
>         pass in log on egress proto esp from any to (egress)
>         pass in log on enc0 tagged IPSECWWW
>         pass out log on enc0
>
>         pass in on egress proto tcp to port { ssh http https }
>         pass in on egress inet proto icmp all
>         pass in on egress inet6 proto icmp6 all
>
>
Reply | Threaded
Open this post in threaded view
|

Re: IPsec and MTU / fragmentation

Janne Johansson-3
Den mån 10 feb. 2020 kl 11:58 skrev Simen Stavdal <[hidden email]>:

> Hi Lucas,
> Have you tried to manipulate the mss during conversation setup?
> This is done with the max-mss directive in pf.conf.
> Basically, it takes the three way handshake, and overrides the MSS value in
> the handshake to something lower than the default.
>

This might fix the http/ssh issues one might see, because both of those run
over TCP, but MSS fixups will not correct large UDP or icmp packets, or any
other non-TCP protocol one might run over that ipsec, so making sure the
traffic is below the MTU should be the end goal, not fixing 90% with pf.

--
May the most significant bit of your life be positive.
Reply | Threaded
Open this post in threaded view
|

Re: IPsec and MTU / fragmentation

Simen Stavdal
True, but issue was related to downloading over http, which is over tcp.
So, if http is your only concern I would go for this option.

Most clients are configured with an MTU of their physical NIC capabilities,
and sometimes even with jumbo support.
MTU is a property of the OS in both ends, while MSS is a property of the
packets that can be adjusted in-flight.

So, if you want to fix the MTU, you will have to configure that on the
conversation parters and not in pf.
So, while we agree on the principals, how do you suggest MTU is changed?

Statically configured on each host? DHCP option?

Cheers,
Simon.

On Mon, 10 Feb 2020 at 12:06, Janne Johansson <[hidden email]> wrote:

> Den mån 10 feb. 2020 kl 11:58 skrev Simen Stavdal <[hidden email]>:
>
>> Hi Lucas,
>> Have you tried to manipulate the mss during conversation setup?
>> This is done with the max-mss directive in pf.conf.
>> Basically, it takes the three way handshake, and overrides the MSS value
>> in
>> the handshake to something lower than the default.
>>
>
> This might fix the http/ssh issues one might see, because both of those
> run over TCP, but MSS fixups will not correct large UDP or icmp packets, or
> any other non-TCP protocol one might run over that ipsec, so making sure
> the traffic is below the MTU should be the end goal, not fixing 90% with
> pf.
>
> --
> May the most significant bit of your life be positive.
>
Reply | Threaded
Open this post in threaded view
|

Re: IPsec and MTU / fragmentation

Paul de Weerd
On Mon, Feb 10, 2020 at 12:15:37PM +0100, Simen Stavdal wrote:
| True, but issue was related to downloading over http, which is over tcp.
| So, if http is your only concern I would go for this option.
|
| Most clients are configured with an MTU of their physical NIC capabilities,
| and sometimes even with jumbo support.
| MTU is a property of the OS in both ends, while MSS is a property of the
| packets that can be adjusted in-flight.
|
| So, if you want to fix the MTU, you will have to configure that on the
| conversation parters and not in pf.
| So, while we agree on the principals, how do you suggest MTU is changed?

One interesting option that I recently discovered thanks to florian@
is the 'mtu'[1] setting in /etc/rad.conf on your IPv6 router.  By
lowering the MTU, packets had a smaller MSS, which aligned with the
MTU of the IPv6 tunnel I was using in that situation.  This, in turn,
allowed me to use software my bank has provided for my mobile device
over IPv6 without a problem.

Admittedly, after learning that this worked, I switched back to
scrubbing the MSS in pf.conf for this particular bank, and I've told
them to either stop filering ICMPv6 Packet Too Large errors or
restrict the MSS to a lower value on their end (as they said they were
doing) to fix this for all their users.  The effect of using 'mtu' in
rad(8) is a lower configured MTU on your SLAAC enabled clients,
affecting also IPv4 (and local IPv6) traffic.

Cheers,

Paul 'WEiRD' de Weerd

[1]: http://man.openbsd.org/rad.conf#mtu

| Statically configured on each host? DHCP option?
|
| Cheers,
| Simon.
|
| On Mon, 10 Feb 2020 at 12:06, Janne Johansson <[hidden email]> wrote:
|
| > Den mån 10 feb. 2020 kl 11:58 skrev Simen Stavdal <[hidden email]>:
| >
| >> Hi Lucas,
| >> Have you tried to manipulate the mss during conversation setup?
| >> This is done with the max-mss directive in pf.conf.
| >> Basically, it takes the three way handshake, and overrides the MSS value
| >> in
| >> the handshake to something lower than the default.
| >>
| >
| > This might fix the http/ssh issues one might see, because both of those
| > run over TCP, but MSS fixups will not correct large UDP or icmp packets, or
| > any other non-TCP protocol one might run over that ipsec, so making sure
| > the traffic is below the MTU should be the end goal, not fixing 90% with
| > pf.
| >
| > --
| > May the most significant bit of your life be positive.
| >

--
>++++++++[<++++++++++>-]<+++++++.>+++[<------>-]<.>+++[<+
+++++++++++>-]<.>++[<------------>-]<+.--------------.[-]
                 http://www.weirdnet.nl/                 

Reply | Threaded
Open this post in threaded view
|

Re: IPsec and MTU / fragmentation

Janne Johansson-3
In reply to this post by Simen Stavdal
Den mån 10 feb. 2020 kl 12:15 skrev Simen Stavdal <[hidden email]>:

> True, but issue was related to downloading over http, which is over tcp.
> So, if http is your only concern I would go for this option.
>

To me, it sounds just a bit like "let this person notice the other errors
later".


> Most clients are configured with an MTU of their physical NIC
> capabilities, and sometimes even with jumbo support.
> MTU is a property of the OS in both ends, while MSS is a property of the
> packets that can be adjusted in-flight.
>
>
MTU is strictly a property of each and every interface in all the hops
between you and your endpoint and equally strictly is mss a property of
_tcp_ packets that can be adjusted. If you run another ipsec inside this
first ipsec tunnel-with-mss-fixed that second one would break, since ESP/AH
is not tcp and will not do the 3way handshake where PF can fix mss for it.
Or mosh, wireguard, or http/3 since they run over UDP.

Not trying to nitpick everything, but internet wasn't built on 1500 MTU
ethernet everywhere, in the old bad days you might go over PPP (576) or
SLIP (296) links at times and it still worked, so if your setups today
break if someone in your path limits you to 1476 or so, then we have
regressed quite a bit since the crap internet days.


> So, if you want to fix the MTU, you will have to configure that on the
> conversation parters and not in pf.
> So, while we agree on the principals, how do you suggest MTU is changed?
>

PMTU discovery would be one method, yes. Middle boxes that will not drop
icmp is part if this of course.


> Statically configured on each host? DHCP option?
>

This depends a bit on where you place your ipsec gw of course, but if you
can't set it on the tunnel (since ipsec on obsd isn't like openvpn or
gif/gre) you might need to set it on the interface where you take in the
traffic, if you can't set it on all clients going via the gw, which is a
believable scenario.


> This might fix the http/ssh issues one might see, because both of those
>> run over TCP, but MSS fixups will not correct large UDP or icmp packets, or
>> any other non-TCP protocol one might run over that ipsec, so making sure
>> the traffic is below the MTU should be the end goal, not fixing 90% with
>> pf.
>>
>

--
May the most significant bit of your life be positive.
Reply | Threaded
Open this post in threaded view
|

Re: IPsec and MTU / fragmentation

Stuart Henderson
In reply to this post by Paul de Weerd
On 2020-02-10, Paul de Weerd <[hidden email]> wrote:
>                                                        and I've told
> them to either stop filering ICMPv6 Packet Too Large errors or
> restrict the MSS to a lower value on their end (as they said they were
> doing) to fix this for all their users.

AFAIK some security assessors (who don't understand TCP/IP) whine
about this..


Reply | Threaded
Open this post in threaded view
|

Re: IPsec and MTU / fragmentation

Simen Stavdal
In reply to this post by Janne Johansson-3
This is more a discussion about scalability and practical implementation.
We both know that PMTU will work partly at best, your entire path back must
support this, and also, the "offending" client must allow inbound control
messages on their host firewall for this to work.
And even if the packets are received by the client, will it support and
adjust MSS? I have seen a lot of clients not adhering to standards.

Modifying thousands of clients (via dhcp options for instance) to use a
fixed MTU will affect other applications too (if you choose to go that
route), not just the ones that need to traverse a tight ipsec tunnel.
Would you adjust all your clients just because you had a single path using
SLIP in your network?

Point is, there is no perfect solution to this issue, there are just
different ways of solving bits and bobs on the way.
Adjust mss will work just fine for all tcp protocols, and no, not for UDP
because it does not use a three way handshake (no MSS to adjust).

In my opinion, max-mss works very well in most cases, especially when you
have full control of the tunnel you are using (as is the case of Lucas'
original question).
We use it extensively in many of our applications in my workplace, and as
of yet has not represented any big issues, so it is a practically good way
to solve this issue.

As for UDP, there are options here too in pf.conf (like no-df), but
personally I have not tested this, but it would be fun to try. It says it
supports IPv4 (which would include TCP, UDP and ICMP).
Would be interesting to find if UDP enforces DF in most cases.

Cheers,
Simon.

On Mon, 10 Feb 2020 at 13:50, Janne Johansson <[hidden email]> wrote:

> Den mån 10 feb. 2020 kl 12:15 skrev Simen Stavdal <[hidden email]>:
>
>> True, but issue was related to downloading over http, which is over tcp.
>> So, if http is your only concern I would go for this option.
>>
>
> To me, it sounds just a bit like "let this person notice the other errors
> later".
>
>
>> Most clients are configured with an MTU of their physical NIC
>> capabilities, and sometimes even with jumbo support.
>> MTU is a property of the OS in both ends, while MSS is a property of the
>> packets that can be adjusted in-flight.
>>
>>
> MTU is strictly a property of each and every interface in all the hops
> between you and your endpoint and equally strictly is mss a property of
> _tcp_ packets that can be adjusted. If you run another ipsec inside this
> first ipsec tunnel-with-mss-fixed that second one would break, since ESP/AH
> is not tcp and will not do the 3way handshake where PF can fix mss for it.
> Or mosh, wireguard, or http/3 since they run over UDP.
>
> Not trying to nitpick everything, but internet wasn't built on 1500 MTU
> ethernet everywhere, in the old bad days you might go over PPP (576) or
> SLIP (296) links at times and it still worked, so if your setups today
> break if someone in your path limits you to 1476 or so, then we have
> regressed quite a bit since the crap internet days.
>
>
>> So, if you want to fix the MTU, you will have to configure that on the
>> conversation parters and not in pf.
>> So, while we agree on the principals, how do you suggest MTU is changed?
>>
>
> PMTU discovery would be one method, yes. Middle boxes that will not drop
> icmp is part if this of course.
>
>
>> Statically configured on each host? DHCP option?
>>
>
> This depends a bit on where you place your ipsec gw of course, but if you
> can't set it on the tunnel (since ipsec on obsd isn't like openvpn or
> gif/gre) you might need to set it on the interface where you take in the
> traffic, if you can't set it on all clients going via the gw, which is a
> believable scenario.
>
>
>> This might fix the http/ssh issues one might see, because both of those
>>> run over TCP, but MSS fixups will not correct large UDP or icmp packets, or
>>> any other non-TCP protocol one might run over that ipsec, so making sure
>>> the traffic is below the MTU should be the end goal, not fixing 90% with
>>> pf.
>>>
>>
>
> --
> May the most significant bit of your life be positive.
>
Reply | Threaded
Open this post in threaded view
|

Re: IPsec and MTU / fragmentation

Janne Johansson-3
Den mån 10 feb. 2020 kl 16:27 skrev Simen Stavdal <[hidden email]>:

> This is more a discussion about scalability and practical implementation.
> We both know that PMTU will work partly at best, your entire path back
> must support this, and also, the "offending" client must allow inbound
> control messages on their host firewall for this to work.
> And even if the packets are received by the client, will it support and
> adjust MSS? I have seen a lot of clients not adhering to standards.
>
> Modifying thousands of clients (via dhcp options for instance) to use a
> fixed MTU will affect other applications too (if you choose to go that
> route), not just the ones that need to traverse a tight ipsec tunnel.
> Would you adjust all your clients just because you had a single path using
> SLIP in your network?
>

I would want for noone to ever have to know the complete path, slip or no
slip.


> Point is, there is no perfect solution to this issue, there are just
> different ways of solving bits and bobs on the way.
> Adjust mss will work just fine for all tcp protocols, and no, not for UDP
> because it does not use a three way handshake (no MSS to adjust).
> In my opinion, max-mss works very well in most cases, especially when you
> have full control of the tunnel you are using (as is the case of Lucas'
> original question).
> We use it extensively in many of our applications in my workplace, and as
> of yet has not represented any big issues, so it is a practically good way
> to solve this issue.
>

I think the more complete solution is to run some gif/gre inside ipsec and
set low-enough MTU on that one, so it can correctly fragment incoming
packets, and optionally rebuild the packets at the remote end, while also
giving you an idea of "state" on the link so you optionally can run things
like routing daemons or something that cares about and acts on tunnel
state. This would cause even lower MTU, but still allow all kinds of
traffic and not just the "popular" one.

I am somewhat trying to care for the ones that make a site-2-site ipsec
which may work for the initial setup, and later find out that more than one
non-tcp kind of traffic doesn't work without understanding why ssh,http
works but not list-of-things-like
mosh,wireguard,quic,yet-another-layer-of-ipsec,hosting-udp-game doesn't.

As for UDP, there are options here too in pf.conf (like no-df), but
> personally I have not tested this, but it would be fun to try. It says it
> supports IPv4 (which would include TCP, UDP and ICMP).
> Would be interesting to find if UDP enforces DF in most cases.
>

no-df in PF more or less controls if it will silently drop fragments that
arrive which has DF set. Linux used/uses to send such udp, for much
enjoyment. "noone else should fragment, but I just did and you as the
packet checker can't know who did"

--
May the most significant bit of your life be positive.
Reply | Threaded
Open this post in threaded view
|

Re: IPsec and MTU / fragmentation

Peter Müller
In reply to this post by Lucas-2
Hello Lucas,

as far as I understood, setting MTU on encN interfaces is not supported
since it is not mentioned by enc(4) and setting it manually fails:

> machine# ifconfig enc0 mtu 1500
> ifconfig: SIOCSIFMTU: Inappropriate ioctl for device

If you do not want to use GRE tunnels or gif interfaces, I suppose truncating
MSS via pf might be an acceptable but not elegant solution:

> match on enc0 scrub (max-mss 1394)

1394 bytes is intentional as the remote end has an interface MTU of 1488 bytes
configured (behind a DSL connection using VLANs).

That being said, I bumped into some reproducible but not deterministic crashes
which are most likely related to IPsec connections as the same system runs
stable using OpenVPN. Please refer to https://marc.info/?l=openbsd-bugs&m=158048415032524&w=2
for further information - unfortunately, there is no solution for this yet.

Thanks, and best regards,
Peter Müller

> Hi misc@,
>
> I've set up an IPsec tunnel to for serving my website from my home. The
> tunnel works quite well most of the time, but if I try to deliver big
> files over it, the HTTP client never gets a response. After some
> testing, if I ran in the HTTP server end
>
> perl -e 'print "a" x 1386;' | doas nc -l 10.200.0.80 80
>
> client receives 1386 "a"s, but with any bigger size the client sees no
> response at all.
>
> This smells of MTU / fragmentation issues, but I don't know enough about
> networks to configure it properly. Is this the case? Any recommendations
> on how to configure a sensible value? Any clue sticks? I can bang
> different MTUs until it works, but that solution doesn't seem to scale.
> You can find my iked and pf configs below.
>
> Also would like to understand why it happens, so pointers to docs are
> more than welcome.
>
> Thanks in advance,
> -Lucas
>
> Initiator /etc/iked.conf:
>
> initiator_www = 10.200.0.80
> initiator_peer = 192.0.2.1
> responder = 198.51.100.1
>
> ikev2 "www" active proto tcp \
>    from $initiator_www port 80 to $responder \
>    peer $responder \
>    srcid initiator dstid responder \
>    tag IPSECWWW
>
> Initiator /etc/pf.conf:
>
> set block-policy drop
> set loginterface egress
> set skip on lo0
>
> block all
>
> pass out quick on { egress enc0 }
>
> pass in quick on enc0 tagged IPSECWWW
> pass in on egress proto tcp to port ssh
> pass in on egress inet proto icmp all
> pass in on egress inet6 proto ipv6-icmp all
>
> Responder /etc/iked.conf:
>
> initiator_www = 10.200.0.80
> initiator_peer = 192.0.2.1
> responder = 198.51.100.1
>
> ikev2 "www" passive proto tcp \
>    from $responder to $initiator_www port 80 \
>    peer $initiator_peer \
>    srcid responder dstid initiator \
>    tag IPSECWWW
>
> Responder /etc/pf.conf:
>
> set block-policy drop
> set loginterface egress
> set skip on lo0
>
> block log all
>
> pass out quick on egress
>
> pass in log on egress proto udp from any to (egress) \
>    port { isakmp ipsec-nat-t }
> pass in log on egress proto esp from any to (egress)
> pass in log on enc0 tagged IPSECWWW
> pass out log on enc0
>
> pass in on egress proto tcp to port { ssh http https }
> pass in on egress inet proto icmp all
> pass in on egress inet6 proto icmp6 all
>

Reply | Threaded
Open this post in threaded view
|

Re: IPsec and MTU / fragmentation

Simen Stavdal
In reply to this post by Janne Johansson-3
On Mon, 10 Feb 2020 at 17:00, Janne Johansson <[hidden email]> wrote:

> Den mån 10 feb. 2020 kl 16:27 skrev Simen Stavdal <[hidden email]>:
>
>> This is more a discussion about scalability and practical implementation.
>> We both know that PMTU will work partly at best, your entire path back
>> must support this, and also, the "offending" client must allow inbound
>> control messages on their host firewall for this to work.
>> And even if the packets are received by the client, will it support and
>> adjust MSS? I have seen a lot of clients not adhering to standards.
>>
>> Modifying thousands of clients (via dhcp options for instance) to use a
>> fixed MTU will affect other applications too (if you choose to go that
>> route), not just the ones that need to traverse a tight ipsec tunnel.
>> Would you adjust all your clients just because you had a single path
>> using SLIP in your network?
>>
>
> I would want for noone to ever have to know the complete path, slip or no
> slip.
>
>
>> Point is, there is no perfect solution to this issue, there are just
>> different ways of solving bits and bobs on the way.
>> Adjust mss will work just fine for all tcp protocols, and no, not for UDP
>> because it does not use a three way handshake (no MSS to adjust).
>> In my opinion, max-mss works very well in most cases, especially when you
>> have full control of the tunnel you are using (as is the case of Lucas'
>> original question).
>> We use it extensively in many of our applications in my workplace, and as
>> of yet has not represented any big issues, so it is a practically good way
>> to solve this issue.
>>
>
> I think the more complete solution is to run some gif/gre inside ipsec and
> set low-enough MTU on that one, so it can correctly fragment incoming
> packets, and optionally rebuild the packets at the remote end, while also
> giving you an idea of "state" on the link so you optionally can run things
> like routing daemons or something that cares about and acts on tunnel
> state. This would cause even lower MTU, but still allow all kinds of
> traffic and not just the "popular" one.
>

So, how will your client/server know about this lower mtu? And df bit is
set more often than not, so fragmentation is now allowed in a lot of cases.
This is exactly the problem that started this thread...

>
> I am somewhat trying to care for the ones that make a site-2-site ipsec
> which may work for the initial setup, and later find out that more than one
> non-tcp kind of traffic doesn't work without understanding why ssh,http
> works but not list-of-things-like
> mosh,wireguard,quic,yet-another-layer-of-ipsec,hosting-udp-game doesn't.
>
so, I agree that it would be nice that all protocols would work, but
ultimately the client/server refusing to use fragmentation is really the
problem, when MTU is too small, and clients insist on big packets.

>
> As for UDP, there are options here too in pf.conf (like no-df), but
>> personally I have not tested this, but it would be fun to try. It says it
>> supports IPv4 (which would include TCP, UDP and ICMP).
>> Would be interesting to find if UDP enforces DF in most cases.
>>
>
> no-df in PF more or less controls if it will silently drop fragments that
> arrive which has DF set. Linux used/uses to send such udp, for much
> enjoyment. "noone else should fragment, but I just did and you as the
> packet checker can't know who did"
>

> --
> May the most significant bit of your life be positive.
>
Reply | Threaded
Open this post in threaded view
|

Re: IPsec and MTU / fragmentation

Janne Johansson-3
In reply to this post by Peter Müller
Den mån 10 feb. 2020 kl 18:18 skrev Peter Müller <[hidden email]>:

> Hello Lucas,
> as far as I understood, setting MTU on encN interfaces is not supported
> since it is not mentioned by enc(4) and setting it manually fails:
>
> > machine# ifconfig enc0 mtu 1500
> > ifconfig: SIOCSIFMTU: Inappropriate ioctl for device
>

enc(4) interfaces are not to ipsec, what tun(4) is for OpenVPN.
It is not a config device per tunnel.

--
May the most significant bit of your life be positive.
Reply | Threaded
Open this post in threaded view
|

Re: IPsec and MTU / fragmentation

Janne Johansson-3
In reply to this post by Simen Stavdal
Den mån 10 feb. 2020 kl 20:53 skrev Simen Stavdal <[hidden email]>:

> I think the more complete solution is to run some gif/gre inside ipsec and
>> set low-enough MTU on that one, so it can correctly fragment incoming
>> packets, and optionally rebuild the packets at the remote end, while also
>> giving you an idea of "state" on the link so you optionally can run things
>> like routing daemons or something that cares about and acts on tunnel
>> state. This would cause even lower MTU, but still allow all kinds of
>> traffic and not just the "popular" one.
>>
>
> So, how will your client/server know about this lower mtu? And df bit is
> set more often than not, so fragmentation is now allowed in a lot of cases.
> This is exactly the problem that started this thread...
>
>>
>>
If the inner gif/gre tunnel has a lower mtu, then it being a layer-3 tunnel
will be able to fragment all incoming ip before sending it into the ipsec,
which will not fragment for you.
The clients will not have to change, nor any other protocol that sends ip
via the double-tunnel.

--
May the most significant bit of your life be positive.
Reply | Threaded
Open this post in threaded view
|

Re: IPsec and MTU / fragmentation

Simen Stavdal
<If the inner gif/gre tunnel has a lower mtu, then it being a layer-3
tunnel will be able to fragment all incoming ip before sending it into the
ipsec, which will not fragment for you.
The clients will not have to change, nor any other protocol that sends ip
via the double-tunnel.>

If a client and a server set up a new conversation over tcp.
They both have an MTU of 1500 and DF=1
How will you fragment this, even being a L3 tunnel?

/S

On Tue, 11 Feb 2020 at 08:22, Janne Johansson <[hidden email]> wrote:

> Den mån 10 feb. 2020 kl 20:53 skrev Simen Stavdal <[hidden email]>:
>
>> I think the more complete solution is to run some gif/gre inside ipsec
>>> and set low-enough MTU on that one, so it can correctly fragment incoming
>>> packets, and optionally rebuild the packets at the remote end, while also
>>> giving you an idea of "state" on the link so you optionally can run things
>>> like routing daemons or something that cares about and acts on tunnel
>>> state. This would cause even lower MTU, but still allow all kinds of
>>> traffic and not just the "popular" one.
>>>
>>
>> So, how will your client/server know about this lower mtu? And df bit is
>> set more often than not, so fragmentation is now allowed in a lot of cases.
>> This is exactly the problem that started this thread...
>>
>>>
>>>
> If the inner gif/gre tunnel has a lower mtu, then it being a layer-3
> tunnel will be able to fragment all incoming ip before sending it into the
> ipsec, which will not fragment for you.
> The clients will not have to change, nor any other protocol that sends ip
> via the double-tunnel.
>
> --
> May the most significant bit of your life be positive.
>
Reply | Threaded
Open this post in threaded view
|

Re: IPsec and MTU / fragmentation

Janne Johansson-3
Den tis 11 feb. 2020 kl 10:25 skrev Simen Stavdal <[hidden email]>:

> <If the inner gif/gre tunnel has a lower mtu, then it being a layer-3
> tunnel will be able to fragment all incoming ip before sending it into the
> ipsec, which will not fragment for you.
> The clients will not have to change, nor any other protocol that sends ip
> via the double-tunnel.>
>
> If a client and a server set up a new conversation over tcp.
> They both have an MTU of 1500 and DF=1
> How will you fragment this, even being a L3 tunnel?
>

You don't fragment DF=1 packets, you send "Fragmentation Needed and Don't
Fragment was Set" back if they don't fit, like any L3 box would do
regardless and they adapt or fail.
That is what you should get for setting DF=1

--
May the most significant bit of your life be positive.
Reply | Threaded
Open this post in threaded view
|

Re: IPsec and MTU / fragmentation

Stuart Henderson
In reply to this post by Simen Stavdal
On 2020-02-11, Simen Stavdal <[hidden email]> wrote:
><If the inner gif/gre tunnel has a lower mtu, then it being a layer-3
> tunnel will be able to fragment all incoming ip before sending it into the
> ipsec, which will not fragment for you.
> The clients will not have to change, nor any other protocol that sends ip
> via the double-tunnel.>
>
> If a client and a server set up a new conversation over tcp.
> They both have an MTU of 1500 and DF=1
> How will you fragment this, even being a L3 tunnel?

If you encapsulate the packets you can run it like this:

The "outer" packets get fragmented. The "inner" packets stay full-size

<----1500 byte inner---->
<--encap1--> <--encap2-->

The other end reassembles the outer packets before decapsulating the
(full size) inner packet.

I've done this personally with full-size ethernet frames through an
ipsec+etherip bridge, I think it also works for L3 encap.