sasync phase 1 issue

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

sasync phase 1 issue

sangdrax8
I am new to OpenBSD, but would like to take advantage of a redundant
setup with ipsec/carp/sasyncd.  I have run into a situation which seems
to be a bug, and was directed to post to tech with config files.

I believe my problem is that the phase 1 of an ipsec negotiation is not
being synced with sasyncd, causing a repeatable condition where tunnels
die for extended periods of time.  I have tried the following with all
three machines running 5.1-stable, 5.2-stable, and 5.2-stable with a
snapshot kernel from 2/17/2013.  My main problem exists across all three
setup types.  I am running 5.2 with the snapshot kernel now as it
provides the lifetime setting in ipsec phase 2 to make the testing
faster.


####### Setup Description ######

172.16.10.0/24 behind the carp devices on vlan 2
172.16.20.0/24 the other side of the tunnel no vlan
1.1.1.0/24 is used for the internet

vlan 3 is tagged on Fw's, untagged to the lab1 box connected with a
switch between them

fw boxes use trunk ports as follows
em0 + em1 = trunk0
em2 + em3 = trunk1


####### Setup Drawing ######


                    172.16.10.0/24
                ................
                .              .   Vlan 2
                . .3           . .7
           *****.****      ****.*****
           *  fw1   *      *  fw2   *
           *        *      *        *
           *****.****      ****.*****
                . 1.1.1.2      .  1.1.1.3
                .              .
                ................   Vlan 3 to switch
                       . 1.1.1.1
                       .
                       .
                       .
                       .
                       .
                       . 1.1.1.5
                 ******.******
                 *  Lab1     *
                 *           *
                 ******.******
                       .
                    172.16.20.0/24



###### How to re-create the problem #####

Bring all machines up, and allow ipsec to come up (ensuring the fw1 is
the master)

start ping from 172.16.10.0/24 net to 172.16.20.0/24 net

tcpdump on vlan3 on both fw1 and fw2 (only fw1 should show active esp
traffic).  Note the spi's seen.  this is spi set 1

carp demote fw1 'ifconfg -g carp carpdemote 128'.

tcpdump on fw2 should now show the esp (same spi's as before, spi set
1), and a large increase in sequence numbers

soon after transfer fw2 will do a full phase1 and phase2 re-negotiation
(can be seen on the tcpdump).  Spi's will change (referring to this as
spi set 2), sequence numbers will reset, and no pings are lost.  This is
where I believe the phase 1 is now renegotiated between fw2 and lab1
because it was not synced from fw1.

recover fw1 as carp master 'ifconf -g carp -carpdemote 128'.

tcpdump on fw1 should now show the esp packets (spi's now from set 2),
and a large increase in sequence numbers

sometimes soon after transfer fw1 will attempt a phase 2 re-key and be
denied.  even if it doesn't do it quickly, when the phase 2 begins to
time out it will attempt to re-key and be denied at that time.  I have
reduced phase 2 to 5 minutes in my tests to allow this to happen more
quickly.

when phase 2 times out, the pings through the tunnel fail and the tunnel
is down.

You can fail back to fw2, and a new phase 2 negotiation will take place
to resume traffic, otherwise fw1 will not be able to re-build the tunnel
until the phase 1 times out (I believe 8 hours default)

As a note, if you fail a firewall by actually rebooting it, this problem
goes undetected as this clears the SA's.

I know this is a long e-mail, but I have tried to provide all the
details and configurations that could be needed to re-create this.  I
have been able to consistently re-create this issue every time across
multiple versions.  If there is anything I have left off, please let me
know.


#######################################################
############## Configuration Files Below ##############
#######################################################


##### fw1 configs #####

==> sasyncd.conf <==
interface carp3
group carp
peer 172.16.10.7
sharedkey
0xf04c0d7fada85a2c0f3fec1db4e52e6d6cbd360936b163133df4917566308bd3


==> hostname.carp2 <==
up
inet 172.16.10.1 255.255.255.0 172.16.10.255 vhid 2 pass password
carpdev vlan2

==> hostname.carp3 <==
up
inet 1.1.1.1 255.255.255.0 1.1.1.255 vhid 3 pass password carpdev vlan3

==> hostname.em0 <==
up

==> hostname.em1 <==
up

==> hostname.em2 <==
up

==> hostname.em3 <==
up

==> hostname.enc0 <==
up

==> hostname.gif1 <==
create
tunnel 172.16.10.1 172.16.20.1
10.10.10.1 10.10.20.1 netmask 255.255.255.252
mtu 1426
up
!route add 172.16.20.0/24 10.10.20.1

==> hostname.pfsync0 <==
up syncdev vlan2 syncpeer 172.16.10.7

==> hostname.trunk0 <==
up
trunkproto failover trunkport em0 trunkport em1

==> hostname.trunk1 <==
up
trunkproto failover trunkport em2 trunkport em3

==> hostname.vlan2 <==
up
inet 172.16.10.3 255.255.255.0 NONE vlan 2 vlandev trunk0

==> hostname.vlan3 <==
up
inet 1.1.1.2 255.255.255.0 NONE vlan 3 vlandev trunk1

==> ipsec.conf <==
fw_gw = "1.1.1.1"
fw_gif = "172.16.10.1"
fw_net = "172.16.10.0/24"

lab_gw = "1.1.1.5"
lab_gif = "172.16.20.1"
lab_net = "172.16.20.0/24"

ike esp from $fw_gif to $lab_gif \
        local $fw_gw peer $lab_gw \
        main auth hmac-sha1 enc aes-256 group modp1024 \
        quick auth hmac-sha1 enc aes-256 group modp1024 lifetime 5m \
        psk "password"




##### fw2 configs #####

==> sasyncd.conf <==
interface carp3
group carp
peer 172.16.10.3
sharedkey
0xf04c0d7fada85a2c0f3fec1db4e52e6d6cbd360936b163133df4917566308bd3

==> hostname.carp2 <==
up
inet 172.16.10.1 255.255.255.0 172.16.10.255 vhid 2 pass password
carpdev vlan2 advskew 128

==> hostname.carp3 <==
up
inet 1.1.1.1 255.255.255.0 1.1.1.255 vhid 3 pass password carpdev vlan3
advskew 128

==> hostname.em0 <==
up

==> hostname.em1 <==
up

==> hostname.em2 <==
up

==> hostname.em3 <==
up

==> hostname.enc0 <==
up

==> hostname.gif1 <==
create
tunnel 172.16.10.1 172.16.20.1
10.10.10.1 10.10.20.1 netmask 255.255.255.252
mtu 1426
up
!route add 172.16.20.0/24 10.10.20.1

==> hostname.pfsync0 <==
up syncdev vlan2 syncpeer 172.16.10.3

==> hostname.trunk0 <==
up
trunkproto failover trunkport em0 trunkport em1

==> hostname.trunk1 <==
up
trunkproto failover trunkport em2 trunkport em3

==> hostname.vlan2 <==
up
inet 172.16.10.7 255.255.255.0 NONE vlan 2 vlandev trunk0

==> hostname.vlan3 <==
up
inet 1.1.1.3 255.255.255.0 NONE vlan 3 vlandev trunk1

==> ipsec.conf <==
fw_gw = "1.1.1.1"
fw_gif = "172.16.10.1"
fw_net = "172.16.10.0/24"

lab_gw = "1.1.1.5"
lab_gif = "172.16.20.1"
lab_net = "172.16.20.0/24"

ike esp from $fw_gif to $lab_gif \
        local $fw_gw peer $lab_gw \
        main auth hmac-sha1 enc aes-256 group modp1024 \
        quick auth hmac-sha1 enc aes-256 group modp1024 lifetime 5m \
        psk "password"

###### lab1 ######

==> hostname.em0 <==
up
inet 1.1.1.5 255.255.255.0

==> hostname.em2 <==
up
inet 172.16.20.1 255.255.255.0

==> hostname.enc0 <==
up

==> hostname.gif0 <==
create
tunnel 172.16.20.1 172.16.10.1
10.10.20.1 10.10.10.1 netmask 255.255.255.252
mtu 1426
up
!route add 172.16.10.0/24 10.10.10.1

Reply | Threaded
Open this post in threaded view
|

Re: sasync phase 1 issue

sven falempin
On Fri, Feb 22, 2013 at 2:29 PM, sangdrax8 <[hidden email]> wrote:

> I am new to OpenBSD, but would like to take advantage of a redundant
> setup with ipsec/carp/sasyncd.  I have run into a situation which seems
> to be a bug, and was directed to post to tech with config files.
>
> I believe my problem is that the phase 1 of an ipsec negotiation is not
> being synced with sasyncd, causing a repeatable condition where tunnels
> die for extended periods of time.  I have tried the following with all
> three machines running 5.1-stable, 5.2-stable, and 5.2-stable with a
> snapshot kernel from 2/17/2013.  My main problem exists across all three
> setup types.  I am running 5.2 with the snapshot kernel now as it
> provides the lifetime setting in ipsec phase 2 to make the testing
> faster.
>
>
> ####### Setup Description ######
>
> 172.16.10.0/24 behind the carp devices on vlan 2
> 172.16.20.0/24 the other side of the tunnel no vlan
> 1.1.1.0/24 is used for the internet
>
> vlan 3 is tagged on Fw's, untagged to the lab1 box connected with a
> switch between them
>
> fw boxes use trunk ports as follows
> em0 + em1 = trunk0
> em2 + em3 = trunk1
>
>
> ####### Setup Drawing ######
>
>
>                     172.16.10.0/24
>                 ................
>                 .              .   Vlan 2
>                 . .3           . .7
>            *****.****      ****.*****
>            *  fw1   *      *  fw2   *
>            *        *      *        *
>            *****.****      ****.*****
>                 . 1.1.1.2      .  1.1.1.3
>                 .              .
>                 ................   Vlan 3 to switch
>                        . 1.1.1.1
>                        .
>                        .
>                        .
>                        .
>                        .
>                        . 1.1.1.5
>                  ******.******
>                  *  Lab1     *
>                  *           *
>                  ******.******
>                        .
>                     172.16.20.0/24
>
>
>
> ###### How to re-create the problem #####
>
> Bring all machines up, and allow ipsec to come up (ensuring the fw1 is
> the master)
>
> start ping from 172.16.10.0/24 net to 172.16.20.0/24 net
>
> tcpdump on vlan3 on both fw1 and fw2 (only fw1 should show active esp
> traffic).  Note the spi's seen.  this is spi set 1
>
> carp demote fw1 'ifconfg -g carp carpdemote 128'.
>
> tcpdump on fw2 should now show the esp (same spi's as before, spi set
> 1), and a large increase in sequence numbers
>
> soon after transfer fw2 will do a full phase1 and phase2 re-negotiation
> (can be seen on the tcpdump).  Spi's will change (referring to this as
> spi set 2), sequence numbers will reset, and no pings are lost.  This is
> where I believe the phase 1 is now renegotiated between fw2 and lab1
> because it was not synced from fw1.
>
> recover fw1 as carp master 'ifconf -g carp -carpdemote 128'.
>
> tcpdump on fw1 should now show the esp packets (spi's now from set 2),
> and a large increase in sequence numbers
>
> sometimes soon after transfer fw1 will attempt a phase 2 re-key and be
> denied.  even if it doesn't do it quickly, when the phase 2 begins to
> time out it will attempt to re-key and be denied at that time.  I have
> reduced phase 2 to 5 minutes in my tests to allow this to happen more
> quickly.
>
> when phase 2 times out, the pings through the tunnel fail and the tunnel
> is down.
>
> You can fail back to fw2, and a new phase 2 negotiation will take place
> to resume traffic, otherwise fw1 will not be able to re-build the tunnel
> until the phase 1 times out (I believe 8 hours default)
>
> As a note, if you fail a firewall by actually rebooting it, this problem
> goes undetected as this clears the SA's.
>
> I know this is a long e-mail, but I have tried to provide all the
> details and configurations that could be needed to re-create this.  I
> have been able to consistently re-create this issue every time across
> multiple versions.  If there is anything I have left off, please let me
> know.
>
>
> #######################################################
> ############## Configuration Files Below ##############
> #######################################################
>
>
> ##### fw1 configs #####
>
> ==> sasyncd.conf <==
> interface carp3
> group carp
> peer 172.16.10.7
> sharedkey
> 0xf04c0d7fada85a2c0f3fec1db4e52e6d6cbd360936b163133df4917566308bd3
>
>
> ==> hostname.carp2 <==
> up
> inet 172.16.10.1 255.255.255.0 172.16.10.255 vhid 2 pass password
> carpdev vlan2
>
> ==> hostname.carp3 <==
> up
> inet 1.1.1.1 255.255.255.0 1.1.1.255 vhid 3 pass password carpdev vlan3
>
> ==> hostname.em0 <==
> up
>
> ==> hostname.em1 <==
> up
>
> ==> hostname.em2 <==
> up
>
> ==> hostname.em3 <==
> up
>
> ==> hostname.enc0 <==
> up
>
> ==> hostname.gif1 <==
> create
> tunnel 172.16.10.1 172.16.20.1
> 10.10.10.1 10.10.20.1 netmask 255.255.255.252
> mtu 1426
> up
> !route add 172.16.20.0/24 10.10.20.1
>
> ==> hostname.pfsync0 <==
> up syncdev vlan2 syncpeer 172.16.10.7
>
> ==> hostname.trunk0 <==
> up
> trunkproto failover trunkport em0 trunkport em1
>
> ==> hostname.trunk1 <==
> up
> trunkproto failover trunkport em2 trunkport em3
>
> ==> hostname.vlan2 <==
> up
> inet 172.16.10.3 255.255.255.0 NONE vlan 2 vlandev trunk0
>
> ==> hostname.vlan3 <==
> up
> inet 1.1.1.2 255.255.255.0 NONE vlan 3 vlandev trunk1
>
> ==> ipsec.conf <==
> fw_gw = "1.1.1.1"
> fw_gif = "172.16.10.1"
> fw_net = "172.16.10.0/24"
>
> lab_gw = "1.1.1.5"
> lab_gif = "172.16.20.1"
> lab_net = "172.16.20.0/24"
>
> ike esp from $fw_gif to $lab_gif \
>         local $fw_gw peer $lab_gw \
>         main auth hmac-sha1 enc aes-256 group modp1024 \
>         quick auth hmac-sha1 enc aes-256 group modp1024 lifetime 5m \
>         psk "password"
>
>
>
>
> ##### fw2 configs #####
>
> ==> sasyncd.conf <==
> interface carp3
> group carp
> peer 172.16.10.3
> sharedkey
> 0xf04c0d7fada85a2c0f3fec1db4e52e6d6cbd360936b163133df4917566308bd3
>
> ==> hostname.carp2 <==
> up
> inet 172.16.10.1 255.255.255.0 172.16.10.255 vhid 2 pass password
> carpdev vlan2 advskew 128
>
> ==> hostname.carp3 <==
> up
> inet 1.1.1.1 255.255.255.0 1.1.1.255 vhid 3 pass password carpdev vlan3
> advskew 128
>
> ==> hostname.em0 <==
> up
>
> ==> hostname.em1 <==
> up
>
> ==> hostname.em2 <==
> up
>
> ==> hostname.em3 <==
> up
>
> ==> hostname.enc0 <==
> up
>
> ==> hostname.gif1 <==
> create
> tunnel 172.16.10.1 172.16.20.1
> 10.10.10.1 10.10.20.1 netmask 255.255.255.252
> mtu 1426
> up
> !route add 172.16.20.0/24 10.10.20.1
>
> ==> hostname.pfsync0 <==
> up syncdev vlan2 syncpeer 172.16.10.3
>
> ==> hostname.trunk0 <==
> up
> trunkproto failover trunkport em0 trunkport em1
>
> ==> hostname.trunk1 <==
> up
> trunkproto failover trunkport em2 trunkport em3
>
> ==> hostname.vlan2 <==
> up
> inet 172.16.10.7 255.255.255.0 NONE vlan 2 vlandev trunk0
>
> ==> hostname.vlan3 <==
> up
> inet 1.1.1.3 255.255.255.0 NONE vlan 3 vlandev trunk1
>
> ==> ipsec.conf <==
> fw_gw = "1.1.1.1"
> fw_gif = "172.16.10.1"
> fw_net = "172.16.10.0/24"
>
> lab_gw = "1.1.1.5"
> lab_gif = "172.16.20.1"
> lab_net = "172.16.20.0/24"
>
> ike esp from $fw_gif to $lab_gif \
>         local $fw_gw peer $lab_gw \
>         main auth hmac-sha1 enc aes-256 group modp1024 \
>         quick auth hmac-sha1 enc aes-256 group modp1024 lifetime 5m \
>         psk "password"
>
> ###### lab1 ######
>
> ==> hostname.em0 <==
> up
> inet 1.1.1.5 255.255.255.0
>
> ==> hostname.em2 <==
> up
> inet 172.16.20.1 255.255.255.0
>
> ==> hostname.enc0 <==
> up
>
> ==> hostname.gif0 <==
> create
> tunnel 172.16.20.1 172.16.10.1
> 10.10.20.1 10.10.10.1 netmask 255.255.255.252
> mtu 1426
> up
> !route add 172.16.10.0/24 10.10.10.1
>
> So fw1 is not ready if you manually turn it off but come back if you
reboot it !

i guess a MASTER that fail need maintenance ;-)

MAybe it is a missbehavior, but does it actually happen in real use
scenarii ?


--
---------------------------------------------------------------------------------------------------------------------
() ascii ribbon campaign - against html e-mail
/\
Reply | Threaded
Open this post in threaded view
|

Re: sasync phase 1 issue

Todd T. Fries-2
Penned by sven falempin on 20130222 17:05.33, we have:
| On Fri, Feb 22, 2013 at 2:29 PM, sangdrax8 <[hidden email]> wrote:
|
| > I am new to OpenBSD, but would like to take advantage of a redundant
| > setup with ipsec/carp/sasyncd.  I have run into a situation which seems
| > to be a bug, and was directed to post to tech with config files.
| >
| > I believe my problem is that the phase 1 of an ipsec negotiation is not
| > being synced with sasyncd, causing a repeatable condition where tunnels
| > die for extended periods of time.  I have tried the following with all
| > three machines running 5.1-stable, 5.2-stable, and 5.2-stable with a
| > snapshot kernel from 2/17/2013.  My main problem exists across all three
| > setup types.  I am running 5.2 with the snapshot kernel now as it
| > provides the lifetime setting in ipsec phase 2 to make the testing
| > faster.
| >
| >
| > ####### Setup Description ######
| >
| > 172.16.10.0/24 behind the carp devices on vlan 2
| > 172.16.20.0/24 the other side of the tunnel no vlan
| > 1.1.1.0/24 is used for the internet
| >
| > vlan 3 is tagged on Fw's, untagged to the lab1 box connected with a
| > switch between them
| >
| > fw boxes use trunk ports as follows
| > em0 + em1 = trunk0
| > em2 + em3 = trunk1
| >
| >
| > ####### Setup Drawing ######
| >
| >
| >                     172.16.10.0/24
| >                 ................
| >                 .              .   Vlan 2
| >                 . .3           . .7
| >            *****.****      ****.*****
| >            *  fw1   *      *  fw2   *
| >            *        *      *        *
| >            *****.****      ****.*****
| >                 . 1.1.1.2      .  1.1.1.3
| >                 .              .
| >                 ................   Vlan 3 to switch
| >                        . 1.1.1.1
| >                        .
| >                        .
| >                        .
| >                        .
| >                        .
| >                        . 1.1.1.5
| >                  ******.******
| >                  *  Lab1     *
| >                  *           *
| >                  ******.******
| >                        .
| >                     172.16.20.0/24
| >
| >
| >
| > ###### How to re-create the problem #####
| >
| > Bring all machines up, and allow ipsec to come up (ensuring the fw1 is
| > the master)
| >
| > start ping from 172.16.10.0/24 net to 172.16.20.0/24 net
| >
| > tcpdump on vlan3 on both fw1 and fw2 (only fw1 should show active esp
| > traffic).  Note the spi's seen.  this is spi set 1
| >
| > carp demote fw1 'ifconfg -g carp carpdemote 128'.
| >
| > tcpdump on fw2 should now show the esp (same spi's as before, spi set
| > 1), and a large increase in sequence numbers
| >
| > soon after transfer fw2 will do a full phase1 and phase2 re-negotiation
| > (can be seen on the tcpdump).  Spi's will change (referring to this as
| > spi set 2), sequence numbers will reset, and no pings are lost.  This is
| > where I believe the phase 1 is now renegotiated between fw2 and lab1
| > because it was not synced from fw1.
| >
| > recover fw1 as carp master 'ifconf -g carp -carpdemote 128'.
| >
| > tcpdump on fw1 should now show the esp packets (spi's now from set 2),
| > and a large increase in sequence numbers
| >
| > sometimes soon after transfer fw1 will attempt a phase 2 re-key and be
| > denied.  even if it doesn't do it quickly, when the phase 2 begins to
| > time out it will attempt to re-key and be denied at that time.  I have
| > reduced phase 2 to 5 minutes in my tests to allow this to happen more
| > quickly.
| >
| > when phase 2 times out, the pings through the tunnel fail and the tunnel
| > is down.
| >
| > You can fail back to fw2, and a new phase 2 negotiation will take place
| > to resume traffic, otherwise fw1 will not be able to re-build the tunnel
| > until the phase 1 times out (I believe 8 hours default)
| >
| > As a note, if you fail a firewall by actually rebooting it, this problem
| > goes undetected as this clears the SA's.
| >
| > I know this is a long e-mail, but I have tried to provide all the
| > details and configurations that could be needed to re-create this.  I
| > have been able to consistently re-create this issue every time across
| > multiple versions.  If there is anything I have left off, please let me
| > know.
| >
| >
| > #######################################################
| > ############## Configuration Files Below ##############
| > #######################################################
| >
| >
| > ##### fw1 configs #####
| >
| > ==> sasyncd.conf <==
| > interface carp3
| > group carp
| > peer 172.16.10.7
| > sharedkey
| > 0xf04c0d7fada85a2c0f3fec1db4e52e6d6cbd360936b163133df4917566308bd3
| >
| >
| > ==> hostname.carp2 <==
| > up
| > inet 172.16.10.1 255.255.255.0 172.16.10.255 vhid 2 pass password
| > carpdev vlan2
| >
| > ==> hostname.carp3 <==
| > up
| > inet 1.1.1.1 255.255.255.0 1.1.1.255 vhid 3 pass password carpdev vlan3
| >
| > ==> hostname.em0 <==
| > up
| >
| > ==> hostname.em1 <==
| > up
| >
| > ==> hostname.em2 <==
| > up
| >
| > ==> hostname.em3 <==
| > up
| >
| > ==> hostname.enc0 <==
| > up
| >
| > ==> hostname.gif1 <==
| > create
| > tunnel 172.16.10.1 172.16.20.1
| > 10.10.10.1 10.10.20.1 netmask 255.255.255.252
| > mtu 1426
| > up
| > !route add 172.16.20.0/24 10.10.20.1
| >
| > ==> hostname.pfsync0 <==
| > up syncdev vlan2 syncpeer 172.16.10.7
| >
| > ==> hostname.trunk0 <==
| > up
| > trunkproto failover trunkport em0 trunkport em1
| >
| > ==> hostname.trunk1 <==
| > up
| > trunkproto failover trunkport em2 trunkport em3
| >
| > ==> hostname.vlan2 <==
| > up
| > inet 172.16.10.3 255.255.255.0 NONE vlan 2 vlandev trunk0
| >
| > ==> hostname.vlan3 <==
| > up
| > inet 1.1.1.2 255.255.255.0 NONE vlan 3 vlandev trunk1
| >
| > ==> ipsec.conf <==
| > fw_gw = "1.1.1.1"
| > fw_gif = "172.16.10.1"
| > fw_net = "172.16.10.0/24"
| >
| > lab_gw = "1.1.1.5"
| > lab_gif = "172.16.20.1"
| > lab_net = "172.16.20.0/24"
| >
| > ike esp from $fw_gif to $lab_gif \
| >         local $fw_gw peer $lab_gw \
| >         main auth hmac-sha1 enc aes-256 group modp1024 \
| >         quick auth hmac-sha1 enc aes-256 group modp1024 lifetime 5m \
| >         psk "password"
| >
| >
| >
| >
| > ##### fw2 configs #####
| >
| > ==> sasyncd.conf <==
| > interface carp3
| > group carp
| > peer 172.16.10.3
| > sharedkey
| > 0xf04c0d7fada85a2c0f3fec1db4e52e6d6cbd360936b163133df4917566308bd3
| >
| > ==> hostname.carp2 <==
| > up
| > inet 172.16.10.1 255.255.255.0 172.16.10.255 vhid 2 pass password
| > carpdev vlan2 advskew 128
| >
| > ==> hostname.carp3 <==
| > up
| > inet 1.1.1.1 255.255.255.0 1.1.1.255 vhid 3 pass password carpdev vlan3
| > advskew 128
| >
| > ==> hostname.em0 <==
| > up
| >
| > ==> hostname.em1 <==
| > up
| >
| > ==> hostname.em2 <==
| > up
| >
| > ==> hostname.em3 <==
| > up
| >
| > ==> hostname.enc0 <==
| > up
| >
| > ==> hostname.gif1 <==
| > create
| > tunnel 172.16.10.1 172.16.20.1
| > 10.10.10.1 10.10.20.1 netmask 255.255.255.252
| > mtu 1426
| > up
| > !route add 172.16.20.0/24 10.10.20.1
| >
| > ==> hostname.pfsync0 <==
| > up syncdev vlan2 syncpeer 172.16.10.3
| >
| > ==> hostname.trunk0 <==
| > up
| > trunkproto failover trunkport em0 trunkport em1
| >
| > ==> hostname.trunk1 <==
| > up
| > trunkproto failover trunkport em2 trunkport em3
| >
| > ==> hostname.vlan2 <==
| > up
| > inet 172.16.10.7 255.255.255.0 NONE vlan 2 vlandev trunk0
| >
| > ==> hostname.vlan3 <==
| > up
| > inet 1.1.1.3 255.255.255.0 NONE vlan 3 vlandev trunk1
| >
| > ==> ipsec.conf <==
| > fw_gw = "1.1.1.1"
| > fw_gif = "172.16.10.1"
| > fw_net = "172.16.10.0/24"
| >
| > lab_gw = "1.1.1.5"
| > lab_gif = "172.16.20.1"
| > lab_net = "172.16.20.0/24"
| >
| > ike esp from $fw_gif to $lab_gif \
| >         local $fw_gw peer $lab_gw \
| >         main auth hmac-sha1 enc aes-256 group modp1024 \
| >         quick auth hmac-sha1 enc aes-256 group modp1024 lifetime 5m \
| >         psk "password"
| >
| > ###### lab1 ######
| >
| > ==> hostname.em0 <==
| > up
| > inet 1.1.1.5 255.255.255.0
| >
| > ==> hostname.em2 <==
| > up
| > inet 172.16.20.1 255.255.255.0
| >
| > ==> hostname.enc0 <==
| > up
| >
| > ==> hostname.gif0 <==
| > create
| > tunnel 172.16.20.1 172.16.10.1
| > 10.10.20.1 10.10.10.1 netmask 255.255.255.252
| > mtu 1426
| > up
| > !route add 172.16.10.0/24 10.10.10.1
| >
| > So fw1 is not ready if you manually turn it off but come back if you
| reboot it !
|
| i guess a MASTER that fail need maintenance ;-)
|
| MAybe it is a missbehavior, but does it actually happen in real use
| scenarii ?

Yes.

--
Todd Fries .. [hidden email]

 ____________________________________________
|                                            \  1.636.410.0632 (voice)
| Free Daemon Consulting, LLC                \  1.405.227.9094 (voice)
| http://FreeDaemonConsulting.com            \  1.866.792.3418 (FAX)
| PO Box 16169, Oklahoma City, OK 73113      \  sip:[hidden email]
| "..in support of free software solutions." \  sip:[hidden email]
 \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
                                                 
              37E7 D3EB 74D0 8D66 A68D  B866 0326 204E 3F42 004A
                        http://todd.fries.net/pgp.txt

Reply | Threaded
Open this post in threaded view
|

Re: sasync phase 1 issue

sangdrax8
On Sat, Feb 23, 2013 at 2:14 AM, Todd T. Fries <[hidden email]> wrote:

> Penned by sven falempin on 20130222 17:05.33, we have:
> | On Fri, Feb 22, 2013 at 2:29 PM, sangdrax8 <[hidden email]> wrote:
> |
> | > I am new to OpenBSD, but would like to take advantage of a redundant
> | > setup with ipsec/carp/sasyncd.  I have run into a situation which seems
> | > to be a bug, and was directed to post to tech with config files.
> | >
> | > I believe my problem is that the phase 1 of an ipsec negotiation is not
> | > being synced with sasyncd, causing a repeatable condition where tunnels
> | > die for extended periods of time.  I have tried the following with all
> | > three machines running 5.1-stable, 5.2-stable, and 5.2-stable with a
> | > snapshot kernel from 2/17/2013.  My main problem exists across all three
> | > setup types.  I am running 5.2 with the snapshot kernel now as it
> | > provides the lifetime setting in ipsec phase 2 to make the testing
> | > faster.
> | >
> | >
> | > ####### Setup Description ######
> | >
> | > 172.16.10.0/24 behind the carp devices on vlan 2
> | > 172.16.20.0/24 the other side of the tunnel no vlan
> | > 1.1.1.0/24 is used for the internet
> | >
> | > vlan 3 is tagged on Fw's, untagged to the lab1 box connected with a
> | > switch between them
> | >
> | > fw boxes use trunk ports as follows
> | > em0 + em1 = trunk0
> | > em2 + em3 = trunk1
> | >
> | >
> | > ####### Setup Drawing ######
> | >
> | >
> | >                     172.16.10.0/24
> | >                 ................
> | >                 .              .   Vlan 2
> | >                 . .3           . .7
> | >            *****.****      ****.*****
> | >            *  fw1   *      *  fw2   *
> | >            *        *      *        *
> | >            *****.****      ****.*****
> | >                 . 1.1.1.2      .  1.1.1.3
> | >                 .              .
> | >                 ................   Vlan 3 to switch
> | >                        . 1.1.1.1
> | >                        .
> | >                        .
> | >                        .
> | >                        .
> | >                        .
> | >                        . 1.1.1.5
> | >                  ******.******
> | >                  *  Lab1     *
> | >                  *           *
> | >                  ******.******
> | >                        .
> | >                     172.16.20.0/24
> | >
> | >
> | >
> | > ###### How to re-create the problem #####
> | >
> | > Bring all machines up, and allow ipsec to come up (ensuring the fw1 is
> | > the master)
> | >
> | > start ping from 172.16.10.0/24 net to 172.16.20.0/24 net
> | >
> | > tcpdump on vlan3 on both fw1 and fw2 (only fw1 should show active esp
> | > traffic).  Note the spi's seen.  this is spi set 1
> | >
> | > carp demote fw1 'ifconfg -g carp carpdemote 128'.
> | >
> | > tcpdump on fw2 should now show the esp (same spi's as before, spi set
> | > 1), and a large increase in sequence numbers
> | >
> | > soon after transfer fw2 will do a full phase1 and phase2 re-negotiation
> | > (can be seen on the tcpdump).  Spi's will change (referring to this as
> | > spi set 2), sequence numbers will reset, and no pings are lost.  This is
> | > where I believe the phase 1 is now renegotiated between fw2 and lab1
> | > because it was not synced from fw1.
> | >
> | > recover fw1 as carp master 'ifconf -g carp -carpdemote 128'.
> | >
> | > tcpdump on fw1 should now show the esp packets (spi's now from set 2),
> | > and a large increase in sequence numbers
> | >
> | > sometimes soon after transfer fw1 will attempt a phase 2 re-key and be
> | > denied.  even if it doesn't do it quickly, when the phase 2 begins to
> | > time out it will attempt to re-key and be denied at that time.  I have
> | > reduced phase 2 to 5 minutes in my tests to allow this to happen more
> | > quickly.
> | >
> | > when phase 2 times out, the pings through the tunnel fail and the tunnel
> | > is down.
> | >
> | > You can fail back to fw2, and a new phase 2 negotiation will take place
> | > to resume traffic, otherwise fw1 will not be able to re-build the tunnel
> | > until the phase 1 times out (I believe 8 hours default)
> | >
> | > As a note, if you fail a firewall by actually rebooting it, this problem
> | > goes undetected as this clears the SA's.
> | >
> | > I know this is a long e-mail, but I have tried to provide all the
> | > details and configurations that could be needed to re-create this.  I
> | > have been able to consistently re-create this issue every time across
> | > multiple versions.  If there is anything I have left off, please let me
> | > know.
> | >
> | >
> | > #######################################################
> | > ############## Configuration Files Below ##############
> | > #######################################################
> | >
> | >
> | > ##### fw1 configs #####
> | >
> | > ==> sasyncd.conf <==
> | > interface carp3
> | > group carp
> | > peer 172.16.10.7
> | > sharedkey
> | > 0xf04c0d7fada85a2c0f3fec1db4e52e6d6cbd360936b163133df4917566308bd3
> | >
> | >
> | > ==> hostname.carp2 <==
> | > up
> | > inet 172.16.10.1 255.255.255.0 172.16.10.255 vhid 2 pass password
> | > carpdev vlan2
> | >
> | > ==> hostname.carp3 <==
> | > up
> | > inet 1.1.1.1 255.255.255.0 1.1.1.255 vhid 3 pass password carpdev vlan3
> | >
> | > ==> hostname.em0 <==
> | > up
> | >
> | > ==> hostname.em1 <==
> | > up
> | >
> | > ==> hostname.em2 <==
> | > up
> | >
> | > ==> hostname.em3 <==
> | > up
> | >
> | > ==> hostname.enc0 <==
> | > up
> | >
> | > ==> hostname.gif1 <==
> | > create
> | > tunnel 172.16.10.1 172.16.20.1
> | > 10.10.10.1 10.10.20.1 netmask 255.255.255.252
> | > mtu 1426
> | > up
> | > !route add 172.16.20.0/24 10.10.20.1
> | >
> | > ==> hostname.pfsync0 <==
> | > up syncdev vlan2 syncpeer 172.16.10.7
> | >
> | > ==> hostname.trunk0 <==
> | > up
> | > trunkproto failover trunkport em0 trunkport em1
> | >
> | > ==> hostname.trunk1 <==
> | > up
> | > trunkproto failover trunkport em2 trunkport em3
> | >
> | > ==> hostname.vlan2 <==
> | > up
> | > inet 172.16.10.3 255.255.255.0 NONE vlan 2 vlandev trunk0
> | >
> | > ==> hostname.vlan3 <==
> | > up
> | > inet 1.1.1.2 255.255.255.0 NONE vlan 3 vlandev trunk1
> | >
> | > ==> ipsec.conf <==
> | > fw_gw = "1.1.1.1"
> | > fw_gif = "172.16.10.1"
> | > fw_net = "172.16.10.0/24"
> | >
> | > lab_gw = "1.1.1.5"
> | > lab_gif = "172.16.20.1"
> | > lab_net = "172.16.20.0/24"
> | >
> | > ike esp from $fw_gif to $lab_gif \
> | >         local $fw_gw peer $lab_gw \
> | >         main auth hmac-sha1 enc aes-256 group modp1024 \
> | >         quick auth hmac-sha1 enc aes-256 group modp1024 lifetime 5m \
> | >         psk "password"
> | >
> | >
> | >
> | >
> | > ##### fw2 configs #####
> | >
> | > ==> sasyncd.conf <==
> | > interface carp3
> | > group carp
> | > peer 172.16.10.3
> | > sharedkey
> | > 0xf04c0d7fada85a2c0f3fec1db4e52e6d6cbd360936b163133df4917566308bd3
> | >
> | > ==> hostname.carp2 <==
> | > up
> | > inet 172.16.10.1 255.255.255.0 172.16.10.255 vhid 2 pass password
> | > carpdev vlan2 advskew 128
> | >
> | > ==> hostname.carp3 <==
> | > up
> | > inet 1.1.1.1 255.255.255.0 1.1.1.255 vhid 3 pass password carpdev vlan3
> | > advskew 128
> | >
> | > ==> hostname.em0 <==
> | > up
> | >
> | > ==> hostname.em1 <==
> | > up
> | >
> | > ==> hostname.em2 <==
> | > up
> | >
> | > ==> hostname.em3 <==
> | > up
> | >
> | > ==> hostname.enc0 <==
> | > up
> | >
> | > ==> hostname.gif1 <==
> | > create
> | > tunnel 172.16.10.1 172.16.20.1
> | > 10.10.10.1 10.10.20.1 netmask 255.255.255.252
> | > mtu 1426
> | > up
> | > !route add 172.16.20.0/24 10.10.20.1
> | >
> | > ==> hostname.pfsync0 <==
> | > up syncdev vlan2 syncpeer 172.16.10.3
> | >
> | > ==> hostname.trunk0 <==
> | > up
> | > trunkproto failover trunkport em0 trunkport em1
> | >
> | > ==> hostname.trunk1 <==
> | > up
> | > trunkproto failover trunkport em2 trunkport em3
> | >
> | > ==> hostname.vlan2 <==
> | > up
> | > inet 172.16.10.7 255.255.255.0 NONE vlan 2 vlandev trunk0
> | >
> | > ==> hostname.vlan3 <==
> | > up
> | > inet 1.1.1.3 255.255.255.0 NONE vlan 3 vlandev trunk1
> | >
> | > ==> ipsec.conf <==
> | > fw_gw = "1.1.1.1"
> | > fw_gif = "172.16.10.1"
> | > fw_net = "172.16.10.0/24"
> | >
> | > lab_gw = "1.1.1.5"
> | > lab_gif = "172.16.20.1"
> | > lab_net = "172.16.20.0/24"
> | >
> | > ike esp from $fw_gif to $lab_gif \
> | >         local $fw_gw peer $lab_gw \
> | >         main auth hmac-sha1 enc aes-256 group modp1024 \
> | >         quick auth hmac-sha1 enc aes-256 group modp1024 lifetime 5m \
> | >         psk "password"
> | >
> | > ###### lab1 ######
> | >
> | > ==> hostname.em0 <==
> | > up
> | > inet 1.1.1.5 255.255.255.0
> | >
> | > ==> hostname.em2 <==
> | > up
> | > inet 172.16.20.1 255.255.255.0
> | >
> | > ==> hostname.enc0 <==
> | > up
> | >
> | > ==> hostname.gif0 <==
> | > create
> | > tunnel 172.16.20.1 172.16.10.1
> | > 10.10.20.1 10.10.10.1 netmask 255.255.255.252
> | > mtu 1426
> | > up
> | > !route add 172.16.10.0/24 10.10.10.1
> | >
> | > So fw1 is not ready if you manually turn it off but come back if you
> | reboot it !
> |
> | i guess a MASTER that fail need maintenance ;-)
> |
> | MAybe it is a missbehavior, but does it actually happen in real use
> | scenarii ?
>
> Yes.
>
> --
> Todd Fries .. [hidden email]
>
>  ____________________________________________
> |                                            \  1.636.410.0632 (voice)
> | Free Daemon Consulting, LLC                \  1.405.227.9094 (voice)
> | http://FreeDaemonConsulting.com            \  1.866.792.3418 (FAX)
> | PO Box 16169, Oklahoma City, OK 73113      \  sip:[hidden email]
> | "..in support of free software solutions." \  sip:[hidden email]
>  \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
>
>               37E7 D3EB 74D0 8D66 A68D  B866 0326 204E 3F42 004A
>                         http://todd.fries.net/pgp.txt
>


Yes, this is a real world scenario.  The only thing required for this
to happen is the backup firewall taking over for any reason while
the primary is still powered on (aka didn't lose the phase1)

1) you use the carp demote because... <any reason>

2) a switch reboots and the backup firewall takes over for a few
minutes until the primary is once again available

3) a cable is lose and while replacing it the backup firewall takes over

There is even a case where nothing goes wrong and this happens.  If
both boxes are booting up for the first time one of them will come up
first.  If this happens to be fw2, it will be master until fw1 finishes.
It will negotiate phase 1 and 2, and bring up the tunnel. Once fw1
finishes it will become the master, and will re-do the phase1.

So for the next 8 hours (lifetime of phase1) your backup will be in this
state and you do not have redundant devices.  If fw1 dies for any reason
then fw2 will have an old valid phase 1 and your tunnel is down.

Reply | Threaded
Open this post in threaded view
|

Re: sasync phase 1 issue

sangdrax8
On Sat, Feb 23, 2013 at 11:14 AM, sven falempin <[hidden email]> wrote:

>
>
> On Sat, Feb 23, 2013 at 10:10 AM, sangdrax8 <[hidden email]> wrote:
>>
>> On Sat, Feb 23, 2013 at 2:14 AM, Todd T. Fries <[hidden email]> wrote:
>> > Penned by sven falempin on 20130222 17:05.33, we have:
>> > | On Fri, Feb 22, 2013 at 2:29 PM, sangdrax8 <[hidden email]>
>> > wrote:
>> > |
>> > | > I am new to OpenBSD, but would like to take advantage of a redundant
>> > | > setup with ipsec/carp/sasyncd.  I have run into a situation which
>> > seems
>> > | > to be a bug, and was directed to post to tech with config files.
>> > | >
>> > | > I believe my problem is that the phase 1 of an ipsec negotiation is
>> > not
>> > | > being synced with sasyncd, causing a repeatable condition where
>> > tunnels
>> > | > die for extended periods of time.  I have tried the following with
>> > all
>> > | > three machines running 5.1-stable, 5.2-stable, and 5.2-stable with a
>> > | > snapshot kernel from 2/17/2013.  My main problem exists across all
>> > three
>> > | > setup types.  I am running 5.2 with the snapshot kernel now as it
>> > | > provides the lifetime setting in ipsec phase 2 to make the testing
>> > | > faster.
>> > | >
>> > | >
>> > | > ####### Setup Description ######
>> > | >
>> > | > 172.16.10.0/24 behind the carp devices on vlan 2
>> > | > 172.16.20.0/24 the other side of the tunnel no vlan
>> > | > 1.1.1.0/24 is used for the internet
>> > | >
>> > | > vlan 3 is tagged on Fw's, untagged to the lab1 box connected with a
>> > | > switch between them
>> > | >
>> > | > fw boxes use trunk ports as follows
>> > | > em0 + em1 = trunk0
>> > | > em2 + em3 = trunk1
>> > | >
>> > | >
>> > | > ####### Setup Drawing ######
>> > | >
>> > | >
>> > | >                     172.16.10.0/24
>> > | >                 ................
>> > | >                 .              .   Vlan 2
>> > | >                 . .3           . .7
>> > | >            *****.****      ****.*****
>> > | >            *  fw1   *      *  fw2   *
>> > | >            *        *      *        *
>> > | >            *****.****      ****.*****
>> > | >                 . 1.1.1.2      .  1.1.1.3
>> > | >                 .              .
>> > | >                 ................   Vlan 3 to switch
>> > | >                        . 1.1.1.1
>> > | >                        .
>> > | >                        .
>> > | >                        .
>> > | >                        .
>> > | >                        .
>> > | >                        . 1.1.1.5
>> > | >                  ******.******
>> > | >                  *  Lab1     *
>> > | >                  *           *
>> > | >                  ******.******
>> > | >                        .
>> > | >                     172.16.20.0/24
>> > | >
>> > | >
>> > | >
>> > | > ###### How to re-create the problem #####
>> > | >
>> > | > Bring all machines up, and allow ipsec to come up (ensuring the fw1
>> > is
>> > | > the master)
>> > | >
>> > | > start ping from 172.16.10.0/24 net to 172.16.20.0/24 net
>> > | >
>> > | > tcpdump on vlan3 on both fw1 and fw2 (only fw1 should show active
>> > esp
>> > | > traffic).  Note the spi's seen.  this is spi set 1
>> > | >
>> > | > carp demote fw1 'ifconfg -g carp carpdemote 128'.
>> > | >
>> > | > tcpdump on fw2 should now show the esp (same spi's as before, spi
>> > set
>> > | > 1), and a large increase in sequence numbers
>> > | >
>> > | > soon after transfer fw2 will do a full phase1 and phase2
>> > re-negotiation
>> > | > (can be seen on the tcpdump).  Spi's will change (referring to this
>> > as
>> > | > spi set 2), sequence numbers will reset, and no pings are lost.
>> > This is
>> > | > where I believe the phase 1 is now renegotiated between fw2 and lab1
>> > | > because it was not synced from fw1.
>> > | >
>> > | > recover fw1 as carp master 'ifconf -g carp -carpdemote 128'.
>> > | >
>> > | > tcpdump on fw1 should now show the esp packets (spi's now from set
>> > 2),
>> > | > and a large increase in sequence numbers
>> > | >
>> > | > sometimes soon after transfer fw1 will attempt a phase 2 re-key and
>> > be
>> > | > denied.  even if it doesn't do it quickly, when the phase 2 begins
>> > to
>> > | > time out it will attempt to re-key and be denied at that time.  I
>> > have
>> > | > reduced phase 2 to 5 minutes in my tests to allow this to happen
>> > more
>> > | > quickly.
>> > | >
>> > | > when phase 2 times out, the pings through the tunnel fail and the
>> > tunnel
>> > | > is down.
>> > | >
>> > | > You can fail back to fw2, and a new phase 2 negotiation will take
>> > place
>> > | > to resume traffic, otherwise fw1 will not be able to re-build the
>> > tunnel
>> > | > until the phase 1 times out (I believe 8 hours default)
>> > | >
>> > | > As a note, if you fail a firewall by actually rebooting it, this
>> > problem
>> > | > goes undetected as this clears the SA's.
>> > | >
>> > | > I know this is a long e-mail, but I have tried to provide all the
>> > | > details and configurations that could be needed to re-create this.
>> > I
>> > | > have been able to consistently re-create this issue every time
>> > across
>> > | > multiple versions.  If there is anything I have left off, please let
>> > me
>> > | > know.
>> > | >
>> > | >
>> > | > #######################################################
>> > | > ############## Configuration Files Below ##############
>> > | > #######################################################
>> > | >
>> > | >
>> > | > ##### fw1 configs #####
>> > | >
>> > | > ==> sasyncd.conf <==
>> > | > interface carp3
>> > | > group carp
>> > | > peer 172.16.10.7
>> > | > sharedkey
>> > | > 0xf04c0d7fada85a2c0f3fec1db4e52e6d6cbd360936b163133df4917566308bd3
>> > | >
>> > | >
>> > | > ==> hostname.carp2 <==
>> > | > up
>> > | > inet 172.16.10.1 255.255.255.0 172.16.10.255 vhid 2 pass password
>> > | > carpdev vlan2
>> > | >
>> > | > ==> hostname.carp3 <==
>> > | > up
>> > | > inet 1.1.1.1 255.255.255.0 1.1.1.255 vhid 3 pass password carpdev
>> > vlan3
>> > | >
>> > | > ==> hostname.em0 <==
>> > | > up
>> > | >
>> > | > ==> hostname.em1 <==
>> > | > up
>> > | >
>> > | > ==> hostname.em2 <==
>> > | > up
>> > | >
>> > | > ==> hostname.em3 <==
>> > | > up
>> > | >
>> > | > ==> hostname.enc0 <==
>> > | > up
>> > | >
>> > | > ==> hostname.gif1 <==
>> > | > create
>> > | > tunnel 172.16.10.1 172.16.20.1
>> > | > 10.10.10.1 10.10.20.1 netmask 255.255.255.252
>> > | > mtu 1426
>> > | > up
>> > | > !route add 172.16.20.0/24 10.10.20.1
>> > | >
>> > | > ==> hostname.pfsync0 <==
>> > | > up syncdev vlan2 syncpeer 172.16.10.7
>> > | >
>> > | > ==> hostname.trunk0 <==
>> > | > up
>> > | > trunkproto failover trunkport em0 trunkport em1
>> > | >
>> > | > ==> hostname.trunk1 <==
>> > | > up
>> > | > trunkproto failover trunkport em2 trunkport em3
>> > | >
>> > | > ==> hostname.vlan2 <==
>> > | > up
>> > | > inet 172.16.10.3 255.255.255.0 NONE vlan 2 vlandev trunk0
>> > | >
>> > | > ==> hostname.vlan3 <==
>> > | > up
>> > | > inet 1.1.1.2 255.255.255.0 NONE vlan 3 vlandev trunk1
>> > | >
>> > | > ==> ipsec.conf <==
>> > | > fw_gw = "1.1.1.1"
>> > | > fw_gif = "172.16.10.1"
>> > | > fw_net = "172.16.10.0/24"
>> > | >
>> > | > lab_gw = "1.1.1.5"
>> > | > lab_gif = "172.16.20.1"
>> > | > lab_net = "172.16.20.0/24"
>> > | >
>> > | > ike esp from $fw_gif to $lab_gif \
>> > | >         local $fw_gw peer $lab_gw \
>> > | >         main auth hmac-sha1 enc aes-256 group modp1024 \
>> > | >         quick auth hmac-sha1 enc aes-256 group modp1024 lifetime 5m
>> > \
>> > | >         psk "password"
>> > | >
>> > | >
>> > | >
>> > | >
>> > | > ##### fw2 configs #####
>> > | >
>> > | > ==> sasyncd.conf <==
>> > | > interface carp3
>> > | > group carp
>> > | > peer 172.16.10.3
>> > | > sharedkey
>> > | > 0xf04c0d7fada85a2c0f3fec1db4e52e6d6cbd360936b163133df4917566308bd3
>> > | >
>> > | > ==> hostname.carp2 <==
>> > | > up
>> > | > inet 172.16.10.1 255.255.255.0 172.16.10.255 vhid 2 pass password
>> > | > carpdev vlan2 advskew 128
>> > | >
>> > | > ==> hostname.carp3 <==
>> > | > up
>> > | > inet 1.1.1.1 255.255.255.0 1.1.1.255 vhid 3 pass password carpdev
>> > vlan3
>> > | > advskew 128
>> > | >
>> > | > ==> hostname.em0 <==
>> > | > up
>> > | >
>> > | > ==> hostname.em1 <==
>> > | > up
>> > | >
>> > | > ==> hostname.em2 <==
>> > | > up
>> > | >
>> > | > ==> hostname.em3 <==
>> > | > up
>> > | >
>> > | > ==> hostname.enc0 <==
>> > | > up
>> > | >
>> > | > ==> hostname.gif1 <==
>> > | > create
>> > | > tunnel 172.16.10.1 172.16.20.1
>> > | > 10.10.10.1 10.10.20.1 netmask 255.255.255.252
>> > | > mtu 1426
>> > | > up
>> > | > !route add 172.16.20.0/24 10.10.20.1
>> > | >
>> > | > ==> hostname.pfsync0 <==
>> > | > up syncdev vlan2 syncpeer 172.16.10.3
>> > | >
>> > | > ==> hostname.trunk0 <==
>> > | > up
>> > | > trunkproto failover trunkport em0 trunkport em1
>> > | >
>> > | > ==> hostname.trunk1 <==
>> > | > up
>> > | > trunkproto failover trunkport em2 trunkport em3
>> > | >
>> > | > ==> hostname.vlan2 <==
>> > | > up
>> > | > inet 172.16.10.7 255.255.255.0 NONE vlan 2 vlandev trunk0
>> > | >
>> > | > ==> hostname.vlan3 <==
>> > | > up
>> > | > inet 1.1.1.3 255.255.255.0 NONE vlan 3 vlandev trunk1
>> > | >
>> > | > ==> ipsec.conf <==
>> > | > fw_gw = "1.1.1.1"
>> > | > fw_gif = "172.16.10.1"
>> > | > fw_net = "172.16.10.0/24"
>> > | >
>> > | > lab_gw = "1.1.1.5"
>> > | > lab_gif = "172.16.20.1"
>> > | > lab_net = "172.16.20.0/24"
>> > | >
>> > | > ike esp from $fw_gif to $lab_gif \
>> > | >         local $fw_gw peer $lab_gw \
>> > | >         main auth hmac-sha1 enc aes-256 group modp1024 \
>> > | >         quick auth hmac-sha1 enc aes-256 group modp1024 lifetime 5m
>> > \
>> > | >         psk "password"
>> > | >
>> > | > ###### lab1 ######
>> > | >
>> > | > ==> hostname.em0 <==
>> > | > up
>> > | > inet 1.1.1.5 255.255.255.0
>> > | >
>> > | > ==> hostname.em2 <==
>> > | > up
>> > | > inet 172.16.20.1 255.255.255.0
>> > | >
>> > | > ==> hostname.enc0 <==
>> > | > up
>> > | >
>> > | > ==> hostname.gif0 <==
>> > | > create
>> > | > tunnel 172.16.20.1 172.16.10.1
>> > | > 10.10.20.1 10.10.10.1 netmask 255.255.255.252
>> > | > mtu 1426
>> > | > up
>> > | > !route add 172.16.10.0/24 10.10.10.1
>> > | >
>> > | > So fw1 is not ready if you manually turn it off but come back if you
>> > | reboot it !
>> > |
>> > | i guess a MASTER that fail need maintenance ;-)
>> > |
>> > | MAybe it is a missbehavior, but does it actually happen in real use
>> > | scenarii ?
>> >
>> > Yes.
>> >
>> > --
>> > Todd Fries .. [hidden email]
>> >
>> >  ____________________________________________
>> > |                                            \  1.636.410.0632 (voice)
>> > | Free Daemon Consulting, LLC                \  1.405.227.9094 (voice)
>> > | http://FreeDaemonConsulting.com            \  1.866.792.3418 (FAX)
>> > | PO Box 16169, Oklahoma City, OK 73113      \  sip:[hidden email]
>> > | "..in support of free software solutions." \  sip:[hidden email]
>> >  \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
>> >
>> >               37E7 D3EB 74D0 8D66 A68D  B866 0326 204E 3F42 004A
>> >                         http://todd.fries.net/pgp.txt
>> >
>>
>>
>> Yes, this is a real world scenario.  The only thing required for this
>> to happen is the backup firewall taking over for any reason while
>> the primary is still powered on (aka didn't lose the phase1)
>>
>> 1) you use the carp demote because... <any reason>
>>
>> 2) a switch reboots and the backup firewall takes over for a few
>> minutes until the primary is once again available
>>
>> 3) a cable is lose and while replacing it the backup firewall takes over
>>
>> There is even a case where nothing goes wrong and this happens.  If
>> both boxes are booting up for the first time one of them will come up
>> first.  If this happens to be fw2, it will be master until fw1 finishes.
>> It will negotiate phase 1 and 2, and bring up the tunnel. Once fw1
>> finishes it will become the master, and will re-do the phase1.
>>
>> So for the next 8 hours (lifetime of phase1) your backup will be in this
>> state and you do not have redundant devices.  If fw1 dies for any reason
>> then fw2 will have an old valid phase 1 and your tunnel is down.
>>
>
> number 2 is convincing
>
>
> --
> ---------------------------------------------------------------------------------------------------------------------
> () ascii ribbon campaign - against html e-mail
> /\


Any chance someone has the time/knowledge to squash this bug? I
would like to deploy some syncing firewall/vpn devices but as they
are right now I can't put this in my production environment.

Reply | Threaded
Open this post in threaded view
|

Re: sasync phase 1 issue

sangdrax8
On Mon, Feb 25, 2013 at 3:06 PM, Stuart Henderson <[hidden email]> wrote:

> On 2013/02/25 14:41, sangdrax8 wrote:
>> Any chance someone has the time/knowledge to squash this bug? I
>> would like to deploy some syncing firewall/vpn devices but as they
>> are right now I can't put this in my production environment.
>
> Do you actually need sync? If you're in a situation where you can
> use dead peer detection then you can use ifstated to start up ipsec
> when a box becomes carp master and to kill/flush when a box becomes
> carp backup which has been working quite well for me.
>
> Not saying that a fix wouldn't be nice, but sasync has been known
> to have problems for some time..
>

The reason I was looking at OpenBSD for this project was the prospect
of having the sasync for seamless redundancy.  I believe I understand
what you have suggested, and that should avoid the bug by creating
new associations every time.  That would also ensure that each failure
would result in lost packets while the new master builds the tunnel.

I won't give up yet on my seamless failure, but I guess I will have to
look for ways around the bug.  Perhaps if I use isakmpd's fifo I can
clear only the phase 1 when switching from carp master to backup,
while still allowing sasync to keep phase 2 associations in sync.
Then I would not require lost packets, but each time there was a
failure it would re-build the phase one.  I will look into this further as
my time permits, although adding complexity to avoid a bug is not
usually the ideal solution to a problem.

It is disappointing if the main feature I was looking to use, is accepted
as non-functional.  If anyone has interest in fixing sasync to provide
true redundancy (with no loss) I would be very interested in hearing
from them.  In it's current state I would assume someone could
easily set it up and believe they are redundant, when in reality they
have a very real chance of taking them self down for a very extended
outage.

Is there a difference in filing a bug to the bug mailing list, as
opposed to my query here on the tech list?

Reply | Threaded
Open this post in threaded view
|

Re: sasync phase 1 issue

sangdrax8
On Tue, Feb 26, 2013 at 9:11 AM, sven falempin <[hidden email]> wrote:

>
>
> On Tue, Feb 26, 2013 at 8:09 AM, sangdrax8 <[hidden email]> wrote:
>>
>> On Mon, Feb 25, 2013 at 3:06 PM, Stuart Henderson <[hidden email]>
>> wrote:
>> > On 2013/02/25 14:41, sangdrax8 wrote:
>> >> Any chance someone has the time/knowledge to squash this bug? I
>> >> would like to deploy some syncing firewall/vpn devices but as they
>> >> are right now I can't put this in my production environment.
>> >
>> > Do you actually need sync? If you're in a situation where you can
>> > use dead peer detection then you can use ifstated to start up ipsec
>> > when a box becomes carp master and to kill/flush when a box becomes
>> > carp backup which has been working quite well for me.
>> >
>> > Not saying that a fix wouldn't be nice, but sasync has been known
>> > to have problems for some time..
>> >
>>
>> The reason I was looking at OpenBSD for this project was the prospect
>> of having the sasync for seamless redundancy.  I believe I understand
>> what you have suggested, and that should avoid the bug by creating
>> new associations every time.  That would also ensure that each failure
>> would result in lost packets while the new master builds the tunnel.
>>
>> I won't give up yet on my seamless failure, but I guess I will have to
>> look for ways around the bug.  Perhaps if I use isakmpd's fifo I can
>> clear only the phase 1 when switching from carp master to backup,
>> while still allowing sasync to keep phase 2 associations in sync.
>> Then I would not require lost packets, but each time there was a
>> failure it would re-build the phase one.  I will look into this further as
>> my time permits, although adding complexity to avoid a bug is not
>> usually the ideal solution to a problem.
>>
>> It is disappointing if the main feature I was looking to use, is accepted
>> as non-functional.  If anyone has interest in fixing sasync to provide
>> true redundancy (with no loss) I would be very interested in hearing
>> from them.  In it's current state I would assume someone could
>> easily set it up and believe they are redundant, when in reality they
>> have a very real chance of taking them self down for a very extended
>> outage.
>>
>> Is there a difference in filing a bug to the bug mailing list, as
>> opposed to my query here on the tech list?
>>
>
> imVHo
> for me the workaround is reboot when you become slave, after sending alert
> thus give you slave working to slave break time to do whatever you want.
>
> I guess people have no time to fix this right now,
> are you ready to put money on the table .... ?
>
>
> --
> ---------------------------------------------------------------------------------------------------------------------
> () ascii ribbon campaign - against html e-mail
> /\


So I have a band-aid that seems to provide the no loss failover that
I was looking for.

For anyone wanting the band-aid, you need to be running isakmpd.conf
with out ispec.conf.  Then you can use ifstated as mentioned earlier
to watch for switches to SLAVE on the carp interface.  When it is
detected, you can remove all phase 1 associations on the box with
out breaking the phase 2.  I just made a shell script to find all hosts
in isakmpd.conf and echo the tear down of phase 1 into the fifo.  When
the box takes back over as MASTER the phase 2's which sasync is
keeping in sync allows seamless traffic, while new phase 1's can be
brought up.

I will need further testing, but so far this appears to get around my
initial failure case.

If a developer knows sasync enough to look at fixing it that would
obviously be a better solution than working around the bug.  I can
talk to my employeer and see if we would like to sponsor some work
in this area.