More on sasyncd, pfsync and isakmpd

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

More on sasyncd, pfsync and isakmpd

nathanael-3
Normally when isakmpd replaces one expiring SA by another it marks the
older one as replaced. When isakmpd removes a replaced SA it leaves
the associated flows. If an SA is replaced by sasyncd, isakmpd does
not get to mark the older SA as replaced so when it expires isakmpd
removes the associated flows. This can cause trouble in an A-B-A
failover scenario as follows:

a) isakmpd creates SA and flow on A (propagated to B by sasyncd)
b) fail over to B
c) new SA created on B prior to original SA expiry (propagated to A by sasyncd)
d) original SA eventually expires and isakmpd removes the flow on A
e) fail over back to A - good SA, but no flow.

I have worked around this by disabling isakmpd flow removal
(pf_key_v2_disable_sa in pf_key_v2.c). I haven't included this in the
patch below since it is only a workaround.

A couple of related things I have noticed:

The isakmpd responder always creates SAs without replay detection
(replay window size of zero). I would prefer the responder to use the
same default replay window size the same as for the initiator. A
simple-minded patch is included below, but a better solution is
required.

The SPI in a TDB is stored in network order but pfsync assumes it is
stored in host order and converts to network order before sending. As
a result, pfsync TDB updates between little-endian and big-endian
machines will fail. In addition, pfsync compares the SPI against
SPI_RESERVED_MAX which could potentially produce the wrong result on
little endian machines. Another side effect of this is that on
little-endian machines tcpdump displays the same SPI differently when
dumping enc0 compared to dumping pfsync0. The SPIs displayed when
dumping enc0 agree with those displayed by ipsecadm and ipsecctl. A
patch for this is included below.

When using sasyncd, it is necessary to configure isakmpd to act as
responder only (passive connections) and the phase 1 SA lifetime must
be shorter than the phase 2 SA lifetime to avoid stale phase 1 SAs
after fail-over.

With all of this taken into account, IPsec failover has worked
flawlessly under the scenarios I have tested.

The diff below includes the patches from my two previous emails.

Nathanael

Index: ipsec.c
===================================================================
RCS file: /cvs/src/sbin/isakmpd/ipsec.c,v
retrieving revision 1.122
diff -u -r1.122 ipsec.c
--- ipsec.c 2005/09/23 14:44:03 1.122
+++ ipsec.c 2006/05/03 18:23:54
@@ -2076,9 +2076,10 @@
 {
  struct ipsec_proto *iproto = proto->data;

- if (proto->sa->phase == 2 && section)
- iproto->replay_window = conf_get_num(section, "ReplayWindow",
-    DEFAULT_REPLAY_WINDOW);
+ if (proto->sa->phase == 2)
+ iproto->replay_window = section ? conf_get_num(section,
+    "ReplayWindow", DEFAULT_REPLAY_WINDOW) :
+    DEFAULT_REPLAY_WINDOW;
 }

 /*
Index: if_pfsync.c
===================================================================
RCS file: /cvs/src/sys/net/if_pfsync.c,v
retrieving revision 1.62
diff -u -r1.62 if_pfsync.c
--- if_pfsync.c 2006/03/25 22:41:47 1.62
+++ if_pfsync.c 2006/05/03 18:25:00
@@ -1540,37 +1540,15 @@
  int s;

  /* check for invalid values */
- pt->spi = htonl(pt->spi);
- if (pt->spi <= SPI_RESERVED_MAX ||
+ if (ntohl(pt->spi) <= SPI_RESERVED_MAX ||
     (pt->dst.sa.sa_family != AF_INET &&
      pt->dst.sa.sa_family != AF_INET6))
  goto bad;

- if (pt->dst.sa.sa_family == AF_INET)
- pt->dst.sin.sin_addr.s_addr =
-    htonl(pt->dst.sin.sin_addr.s_addr);
-
  s = spltdb();
  tdb = gettdb(pt->spi, &pt->dst, pt->sproto);
  if (tdb) {
- /*
- * When a failover happens, the master's rpl is probably above
- * what we see here (we may be up to a second late), so
- * increase it a bit to manage most such situations.
- *
- * For now, just add an offset that is likely to be larger
- * than the number of packets we can see in one second. The RFC
- * just says the next packet must have a higher seq value.
- *
- * XXX What is a good algorithm for this? We could use
- * a rate-determined increase, but to know it, we would have
- * to extend struct tdb.
- * XXX pt->rpl can wrap over MAXINT, but if so the real tdb
- * will soon be replaced anyway. For now, just don't handle
- * this edge case.
- */
-#define RPL_INCR 16384
- pt->rpl = ntohl(pt->rpl) + RPL_INCR;
+ pt->rpl = ntohl(pt->rpl);
  pt->cur_bytes = betoh64(pt->cur_bytes);

  /* Neither replay nor byte counter should ever decrease. */
@@ -1596,7 +1574,7 @@

 /* One of our local tdbs have been updated, need to sync rpl with others */
 int
-pfsync_update_tdb(struct tdb *tdb)
+pfsync_update_tdb(struct tdb *tdb, int output)
 {
  struct ifnet *ifp = &pfsyncif.sc_if;
  struct pfsync_softc *sc = ifp->if_softc;
@@ -1649,18 +1627,11 @@
     tdb->tdb_sproto);

  for (i = 0; !pt && i < h->count; i++) {
- /* XXX Ugly, u is network ordered. */
- if (u->dst.sa.sa_family == AF_INET)
- u->dst.sin.sin_addr.s_addr =
-    ntohl(u->dst.sin.sin_addr.s_addr);
- if (tdb_hash(ntohl(u->spi), &u->dst,
+ if (tdb_hash(u->spi, &u->dst,
     u->sproto) == hash) {
  pt = u;
  pt->updates++;
  }
- if (u->dst.sa.sa_family == AF_INET)
- u->dst.sin.sin_addr.s_addr =
-    htonl(u->dst.sin.sin_addr.s_addr);
  u++;
  }
  }
@@ -1674,15 +1645,30 @@
  h->count++;
  bzero(pt, sizeof(*pt));

- pt->spi = htonl(tdb->tdb_spi);
+ pt->spi = tdb->tdb_spi;
  memcpy(&pt->dst, &tdb->tdb_dst, sizeof pt->dst);
- if (pt->dst.sa.sa_family == AF_INET)
- pt->dst.sin.sin_addr.s_addr =
-    htonl(pt->dst.sin.sin_addr.s_addr);
  pt->sproto = tdb->tdb_sproto;
  }

- pt->rpl = htonl(tdb->tdb_rpl);
+ /*
+ * When a failover happens, the master's rpl is probably above
+ * what we see here (we may be up to a second late), so
+ * increase it a bit for outbound tdbs to manage most such
+ * situations.
+ *
+ * For now, just add an offset that is likely to be larger
+ * than the number of packets we can see in one second. The RFC
+ * just says the next packet must have a higher seq value.
+ *
+ * XXX What is a good algorithm for this? We could use
+ * a rate-determined increase, but to know it, we would have
+ * to extend struct tdb.
+ * XXX pt->rpl can wrap over MAXINT, but if so the real tdb
+ * will soon be replaced anyway. For now, just don't handle
+ * this edge case.
+ */
+#define RPL_INCR 16384
+ pt->rpl = htonl(tdb->tdb_rpl + (output ? RPL_INCR : 0));
  pt->cur_bytes = htobe64(tdb->tdb_cur_bytes);

  if (h->count == sc->sc_maxcount ||
Index: if_pfsync.h
===================================================================
RCS file: /cvs/src/sys/net/if_pfsync.h,v
retrieving revision 1.28
diff -u -r1.28 if_pfsync.h
--- if_pfsync.h 2005/11/04 08:24:14 1.28
+++ if_pfsync.h 2006/05/03 18:25:02
@@ -330,7 +330,7 @@
  pfsync_pack_state(PFSYNC_ACT_DEL, (st), \
     PFSYNC_FLAG_COMPRESS); \
 } while (0)
-int pfsync_update_tdb(struct tdb *);
+int pfsync_update_tdb(struct tdb *, int);
 #endif

 #endif /* _NET_IF_PFSYNC_H_ */
Index: pfkeyv2.c
===================================================================
RCS file: /cvs/src/sys/net/pfkeyv2.c,v
retrieving revision 1.109
diff -u -r1.109 pfkeyv2.c
--- pfkeyv2.c 2005/12/06 14:25:21 1.109
+++ pfkeyv2.c 2006/05/03 18:26:02
@@ -1733,11 +1733,11 @@
  pfkeyv2_socket->flags |=
     PFKEYV2_SOCKETFLAGS_PROMISC;
  npromisc++;
+ } else {
+ pfkeyv2_socket->flags &=
+    ~PFKEYV2_SOCKETFLAGS_PROMISC;
+ npromisc--;
  }
- } else {
- pfkeyv2_socket->flags &=
-    ~PFKEYV2_SOCKETFLAGS_PROMISC;
- npromisc--;
  }
  }
 
Index: ip_esp.c
===================================================================
RCS file: /cvs/src/sys/netinet/ip_esp.c,v
retrieving revision 1.97
diff -u -r1.97 ip_esp.c
--- ip_esp.c 2006/03/25 22:41:48 1.97
+++ ip_esp.c 2006/05/03 18:26:30
@@ -588,7 +588,7 @@
     tdb->tdb_wnd, &(tdb->tdb_bitmap), 1)) {
  case 0: /* All's well */
 #if NPFSYNC > 0
- pfsync_update_tdb(tdb);
+ pfsync_update_tdb(tdb, 0);
 #endif
  break;

@@ -883,7 +883,7 @@
  bcopy((caddr_t) &replay, mtod(mo, caddr_t) + sizeof(u_int32_t),
     sizeof(u_int32_t));
 #if NPFSYNC > 0
- pfsync_update_tdb(tdb);
+ pfsync_update_tdb(tdb, 1);
 #endif
  }

Index: ip_ah.c
===================================================================
RCS file: /cvs/src/sys/netinet/ip_ah.c,v
retrieving revision 1.84
diff -u -r1.84 ip_ah.c
--- ip_ah.c 2006/03/25 22:41:48 1.84
+++ ip_ah.c 2006/05/03 18:26:49
@@ -814,7 +814,7 @@
     tdb->tdb_wnd, &(tdb->tdb_bitmap), 1)) {
  case 0: /* All's well. */
 #if NPFSYNC > 0
- pfsync_update_tdb(tdb);
+ pfsync_update_tdb(tdb, 0);
 #endif
  break;

@@ -1104,7 +1104,7 @@
  if (!(tdb->tdb_flags & TDBF_NOREPLAY)) {
  ah->ah_rpl = htonl(tdb->tdb_rpl++);
 #if NPFSYNC > 0
- pfsync_update_tdb(tdb);
+ pfsync_update_tdb(tdb, 1);
 #endif
  }

Reply | Threaded
Open this post in threaded view
|

Re: More on sasyncd, pfsync and isakmpd

nathanael-3
I have done some further work on the reliability of IPsec fail-over
with sasyncd. Rather than post long diffs and explanations here I have
created a web page with the details of what I have found:

http://members.iinet.net.au/~nathanael/OpenBSD/sasyncd.html

Some of the material has already been posted to this list (and a
couple of patches already committed) but I have chosen to keep the web
page a complete record. The sections that contain new material that
hasn't previously been posted are:

1.2. isakmpd flow deletion
1.3. isakmpd phase 2 retry
1.7. pfsync TDB update collapsing
1.8. sasyncd PF_ROUTE monitor
1.9. sasyncd multimaster

Section 1.7 is straight forward and should be easily reviewed.
Sections 1.2 and 1.8 are probably OK. Sections 1.3 and 1.9 have the
potential to introduce side-effects I hadn't considered and will need
the closest attention. In my testing I found it is necessary to
address each of these issues in order to achieve reliable fail-over.

Nathanael