IPv6/NDP/IPsec breakage in -current

classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

IPv6/NDP/IPsec breakage in -current

Christian Weisgerber
Something is very broken at the intersection of IPv6, NDP, and IPsec
in -current.

Two hosts in the same LAN; all static addresses; yes I use iked -6.
Here are the respective, trivial iked.conf files:

--------------------
ikev2 \
    from 2001:6f8:124a::2 to 2001:6f8:124a::6 \
    childsa enc aes-128-gcm psk "mekmitasdigoat"
--------------------
ikev2 active \
    from 2001:6f8:124a::6 to 2001:6f8:124a::2 \
    childsa enc aes-128-gcm psk "mekmitasdigoat"
--------------------

The ikeds successfully negotiate, and flows and SAs are set up.

Then both hosts forget each others neighbor address.  When they
want to pass traffic to each other (a ping for testing, or IKE
informational messages after a while),
- a neighbor solicitation is sent to the multicast address,
- the reply goes through the IPsec tunnel and is visible on the
  other side's enc0 interface,
- but the reply is never entered into the NDP cache.

Neighbor solicitations and replies fly back and forth for a while,
to no avail.  Eventually the ikeds notice that the other one is
gone, give up, and remove the flows and SAs.  When they try again,
they successfully renegotiate, and the cycle repeats.

netstat -s -picmp6 shows increasing counters for "bad neighbor
solicitation messages" and "bad neighbor advertisement messages".
Enabling nd6_debug shows that the reason is "packet from non-neighbor".

Oct  5 17:08:01 bardioc /bsd: nd6_na_input: ND packet from non-neighbor
Oct  5 17:08:03 bardioc last message repeated 2 times
Oct  5 17:08:06 bardioc /bsd: nd6_ns_input: NS packet from non-neighbor
Oct  5 17:08:06 bardioc /bsd: nd6_ns_input: src=2001:6f8:124a::6
Oct  5 17:08:06 bardioc /bsd: nd6_ns_input: dst=2001:6f8:124a::2
Oct  5 17:08:06 bardioc /bsd: nd6_ns_input: tgt=2001:6f8:124a::2

--
Christian "naddy" Weisgerber                          [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: IPv6/NDP/IPsec breakage in -current

Christian Weisgerber
On 2016-10-06, Christian Weisgerber <[hidden email]> wrote:

> Something is very broken at the intersection of IPv6, NDP, and IPsec
> in -current.
>
> Two hosts in the same LAN; all static addresses; yes I use iked -6.
> Here are the respective, trivial iked.conf files:

Actually, it is easier to examine this if you use manual flows and
SAs.

(When I started looking at this, I originally thought changes in
iked were to blame.  It has become clear that iked doesn't have
anything to do with the problem, and the IKE negotiation and the
neighbor discovery traffic triggered by it only muddy the waters.)

I have compared the neighbor discovery exchanges of -current with
those of 6.0-stable.  Let's call the hosts AAA and BBB.  There are
two patterns:

(1)
global-AAA > ff02::xxxx: icmp6: neighbor sol: who has global-BBB
global-BBB > global-AAA: icmp6: neighbor adv: tgt is global-BBB

(2)
lladdr-AAA > ff02::xxxx: icmp6: neighbor sol: who has global-BBB
lladdr-BBB > lladdr-AAA: icmp6: neighbor adv: tgt is global-BBB

In (1), the global addresses are used as the source addresses.
In (2), the link-local addresses are used as the source addresses.

And here's the overview which neighbor discovery exchange is used
by 6.0-stable and -current, respectively, with and without IPsec
flows:

                |   no flows   |    flows
                |              | established
    ------------+--------------+-------------
     6.0-stable | (1) global   | (2) local
    ------------+--------------+-------------
      -current  | (2) local    | (1) global
    ------------+--------------+-------------

Somehow the behavior got flipped in -current.

--
Christian "naddy" Weisgerber                          [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: IPv6/NDP/IPsec breakage in -current

Christian Weisgerber
In reply to this post by Christian Weisgerber
On 2016-10-06, Christian Weisgerber <[hidden email]> wrote:

> Something is very broken at the intersection of IPv6, NDP, and IPsec
> in -current.

Found by bisection.  The culprit is this commit:

------------------------------------------------------------------------
CVSROOT:        /cvs
Module name:    src
Changes by:     [hidden email]  2016/09/13 13:56:55

Modified files:
        sys/kern       : uipc_mbuf.c
        sys/netinet    : ip_ah.c ip_esp.c ip_ipcomp.c ipsec_output.c
        sys/sys        : mbuf.h
        share/man/man9 : mbuf.9

Log message:
avoid extensive mbuf allocation for IPsec by replacing m_inject(4)
with m_makespace(4) from freebsd; ok mpi@, bluhm@, mikeb@, dlg@
------------------------------------------------------------------------

Kudos to vgross@ for calling it a few days ago:

> If so, I think we are dealing with some kind of mbuf mangling.

A backout is somewhat involved due to further changes that have been
layered on top.  Backout patch below for completeness, but I don't
suggest that this be actually committed.

Index: sys/kern/uipc_mbuf.c
===================================================================
RCS file: /cvs/src/sys/kern/uipc_mbuf.c,v
retrieving revision 1.231
diff -u -p -r1.231 uipc_mbuf.c
--- sys/kern/uipc_mbuf.c 15 Sep 2016 02:00:16 -0000 1.231
+++ sys/kern/uipc_mbuf.c 9 Oct 2016 00:20:34 -0000
@@ -126,6 +126,7 @@ int max_hdr; /* largest link+protocol
 struct mutex m_extref_mtx = MUTEX_INITIALIZER(IPL_NET);
 
 void m_extfree(struct mbuf *);
+struct mbuf *m_copym0(struct mbuf *, int, int, int, int);
 void nmbclust_update(void);
 void m_zero(struct mbuf *);
 
@@ -567,7 +568,23 @@ m_prepend(struct mbuf *m, int len, int h
  * The wait parameter is a choice of M_WAIT/M_DONTWAIT from caller.
  */
 struct mbuf *
-m_copym(struct mbuf *m0, int off, int len, int wait)
+m_copym(struct mbuf *m, int off, int len, int wait)
+{
+ return m_copym0(m, off, len, wait, 0); /* shallow copy on M_EXT */
+}
+
+/*
+ * m_copym2() is like m_copym(), except it COPIES cluster mbufs, instead
+ * of merely bumping the reference count.
+ */
+struct mbuf *
+m_copym2(struct mbuf *m, int off, int len, int wait)
+{
+ return m_copym0(m, off, len, wait, 1); /* deep copy */
+}
+
+struct mbuf *
+m_copym0(struct mbuf *m0, int off, int len, int wait, int deep)
 {
  struct mbuf *m, *n, **np;
  struct mbuf *top;
@@ -600,9 +617,23 @@ m_copym(struct mbuf *m0, int off, int le
  }
  n->m_len = min(len, m->m_len - off);
  if (m->m_flags & M_EXT) {
- n->m_data = m->m_data + off;
- n->m_ext = m->m_ext;
- MCLADDREFERENCE(m, n);
+ if (!deep) {
+ n->m_data = m->m_data + off;
+ n->m_ext = m->m_ext;
+ MCLADDREFERENCE(m, n);
+ } else {
+ /*
+ * we are unsure about the way m was allocated.
+ * copy into multiple MCLBYTES cluster mbufs.
+ */
+ MCLGET(n, wait);
+ n->m_len = 0;
+ n->m_len = M_TRAILINGSPACE(n);
+ n->m_len = min(n->m_len, len);
+ n->m_len = min(n->m_len, m->m_len - off);
+ memcpy(mtod(n, caddr_t), mtod(m, caddr_t) + off,
+    n->m_len);
+ }
  } else
  memcpy(mtod(n, caddr_t), mtod(m, caddr_t) + off,
     n->m_len);
@@ -933,6 +964,67 @@ m_getptr(struct mbuf *m, int loc, int *o
 }
 
 /*
+ * Inject a new mbuf chain of length siz in mbuf chain m0 at
+ * position len0. Returns a pointer to the first injected mbuf, or
+ * NULL on failure (m0 is left undisturbed). Note that if there is
+ * enough space for an object of size siz in the appropriate position,
+ * no memory will be allocated. Also, there will be no data movement in
+ * the first len0 bytes (pointers to that will remain valid).
+ *
+ * XXX It is assumed that siz is less than the size of an mbuf at the moment.
+ */
+struct mbuf *
+m_inject(struct mbuf *m0, int len0, int siz, int wait)
+{
+ struct mbuf *m, *n, *n2 = NULL, *n3;
+ unsigned len = len0, remain;
+
+ if ((siz >= MHLEN) || (len0 <= 0))
+ return (NULL);
+ for (m = m0; m && len > m->m_len; m = m->m_next)
+ len -= m->m_len;
+ if (m == NULL)
+ return (NULL);
+ remain = m->m_len - len;
+ if (remain == 0) {
+ if ((m->m_next) && (M_LEADINGSPACE(m->m_next) >= siz)) {
+ m->m_next->m_len += siz;
+ if (m0->m_flags & M_PKTHDR)
+ m0->m_pkthdr.len += siz;
+ m->m_next->m_data -= siz;
+ return m->m_next;
+ }
+ } else {
+ n2 = m_copym2(m, len, remain, wait);
+ if (n2 == NULL)
+ return (NULL);
+ }
+
+ MGET(n, wait, MT_DATA);
+ if (n == NULL) {
+ if (n2)
+ m_freem(n2);
+ return (NULL);
+ }
+
+ n->m_len = siz;
+ if (m0->m_flags & M_PKTHDR)
+ m0->m_pkthdr.len += siz;
+ m->m_len -= remain; /* Trim */
+ if (n2) {
+ for (n3 = n; n3->m_next != NULL; n3 = n3->m_next)
+ ;
+ n3->m_next = n2;
+ } else
+ n3 = n;
+ for (; n3->m_next != NULL; n3 = n3->m_next)
+ ;
+ n3->m_next = m->m_next;
+ m->m_next = n;
+ return n;
+}
+
+/*
  * Partition an mbuf chain in two pieces, returning the tail --
  * all but the first len0 bytes.  In case of failure, it returns NULL and
  * attempts to restore the chain to its original state.
@@ -997,121 +1089,6 @@ extpacket:
  m->m_next = NULL;
  return (n);
 }
-
-/*
- * Make space for a new header of length hlen at skip bytes
- * into the packet.  When doing this we allocate new mbufs only
- * when absolutely necessary.  The mbuf where the new header
- * is to go is returned together with an offset into the mbuf.
- * If NULL is returned then the mbuf chain may have been modified;
- * the caller is assumed to always free the chain.
- */
-struct mbuf *
-m_makespace(struct mbuf *m0, int skip, int hlen, int *off)
-{
- struct mbuf *m;
- unsigned remain;
-
- KASSERT(m0 != NULL);
- KASSERT(hlen < MHLEN);
-
- for (m = m0; m && skip > m->m_len; m = m->m_next)
- skip -= m->m_len;
- if (m == NULL)
- return (NULL);
- /*
- * At this point skip is the offset into the mbuf m
- * where the new header should be placed.  Figure out
- * if there's space to insert the new header.  If so,
- * and copying the remainder makese sense then do so.
- * Otherwise insert a new mbuf in the chain, splitting
- * the contents of m as needed.
- */
- remain = m->m_len - skip; /* data to move */
- if (skip < remain && hlen <= M_LEADINGSPACE(m)) {
- if (skip)
- memmove(m->m_data-hlen, m->m_data, skip);
- m->m_data -= hlen;
- m->m_len += hlen;
- (*off) = skip;
- } else if (hlen > M_TRAILINGSPACE(m)) {
- struct mbuf *n0, *n, **np;
- int todo, len, done, alloc;
-
- n0 = NULL;
- np = &n0;
- alloc = 0;
- done = 0;
- todo = remain;
- while (todo > 0) {
- MGET(n, M_DONTWAIT, m->m_type);
- len = MHLEN;
- if (n && todo > MHLEN) {
- MCLGET(n, M_DONTWAIT);
- len = MCLBYTES;
- if ((n->m_flags & M_EXT) == 0) {
- m_free(n);
- n = NULL;
- }
- }
- if (n == NULL) {
- m_freem(n0);
- return NULL;
- }
- *np = n;
- np = &n->m_next;
- alloc++;
- len = min(todo, len);
- memcpy(n->m_data, mtod(m, char *) + skip + done, len);
- n->m_len = len;
- done += len;
- todo -= len;
- }
-
- if (hlen <= M_TRAILINGSPACE(m) + remain) {
- m->m_len = skip + hlen;
- *off = skip;
- if (n0 != NULL) {
- *np = m->m_next;
- m->m_next = n0;
- }
- }
- else {
- n = m_get(M_DONTWAIT, m->m_type);
- if (n == NULL) {
- m_freem(n0);
- return NULL;
- }
- alloc++;
-
- if ((n->m_next = n0) == NULL)
- np = &n->m_next;
- n0 = n;
-
- *np = m->m_next;
- m->m_next = n0;
-
- n->m_len = hlen;
- m->m_len = skip;
-
- m = n; /* header is at front ... */
- *off = 0; /* ... of new mbuf */
- }
- } else {
- /*
- * Copy the remainder to the back of the mbuf
- * so there's space to write the new header.
- */
- if (remain > 0)
- memmove(mtod(m, caddr_t) + skip + hlen,
-      mtod(m, caddr_t) + skip, remain);
- m->m_len += hlen;
- *off = skip;
- }
- m0->m_pkthdr.len += hlen; /* adjust packet length */
- return m;
-}
-
 
 /*
  * Routine to copy from device local memory into mbufs.
Index: sys/netinet/ip_ah.c
===================================================================
RCS file: /cvs/src/sys/netinet/ip_ah.c,v
retrieving revision 1.123
diff -u -p -r1.123 ip_ah.c
--- sys/netinet/ip_ah.c 19 Sep 2016 18:09:22 -0000 1.123
+++ sys/netinet/ip_ah.c 9 Oct 2016 00:03:45 -0000
@@ -932,7 +932,7 @@ ah_output(struct mbuf *m, struct tdb *td
  struct mbuf *mi;
  struct cryptop *crp;
  u_int16_t iplen;
- int len, rplen, roff;
+ int len, rplen;
  u_int8_t prot;
  struct ah *ah;
 #if NBPFILTER > 0
@@ -1057,7 +1057,7 @@ ah_output(struct mbuf *m, struct tdb *td
  }
 
  /* Inject AH header. */
- mi = m_makespace(m, skip, rplen + ahx->authsize, &roff);
+ mi = m_inject(m, skip, rplen + ahx->authsize, M_DONTWAIT);
  if (mi == NULL) {
  DPRINTF(("ah_output(): failed to inject AH header for SA "
     "%s/%08x\n", ipsp_address(&tdb->tdb_dst, buf, sizeof(buf)),
@@ -1069,10 +1069,10 @@ ah_output(struct mbuf *m, struct tdb *td
  }
 
  /*
- * The AH header is guaranteed by m_makespace() to be in
- * contiguous memory, at 'roff' of the returned mbuf.
+ * The AH header is guaranteed by m_inject() to be in
+ * contiguous memory, at the beginning of the returned mbuf.
  */
- ah = (struct ah *)(mtod(mi, caddr_t) + roff);
+ ah = mtod(mi, struct ah *);
 
  /* Initialize the AH header. */
  m_copydata(m, protoff, sizeof(u_int8_t), (caddr_t) &ah->ah_nh);
Index: sys/netinet/ip_esp.c
===================================================================
RCS file: /cvs/src/sys/netinet/ip_esp.c,v
retrieving revision 1.141
diff -u -p -r1.141 ip_esp.c
--- sys/netinet/ip_esp.c 19 Sep 2016 18:09:22 -0000 1.141
+++ sys/netinet/ip_esp.c 9 Oct 2016 00:03:45 -0000
@@ -775,7 +775,7 @@ esp_output(struct mbuf *m, struct tdb *t
 {
  struct enc_xform *espx = (struct enc_xform *) tdb->tdb_encalgxform;
  struct auth_hash *esph = (struct auth_hash *) tdb->tdb_authalgxform;
- int ilen, hlen, rlen, padding, blks, alen, roff;
+ int ilen, hlen, rlen, padding, blks, alen;
  u_int32_t replay;
  struct mbuf *mi, *mo = (struct mbuf *) NULL;
  struct tdb_crypto *tc;
@@ -907,7 +907,7 @@ esp_output(struct mbuf *m, struct tdb *t
  }
 
  /* Inject ESP header. */
- mo = m_makespace(m, skip, hlen, &roff);
+ mo = m_inject(m, skip, hlen, M_DONTWAIT);
  if (mo == NULL) {
  DPRINTF(("esp_output(): failed to inject ESP header for "
     "SA %s/%08x\n", ipsp_address(&tdb->tdb_dst, buf,
@@ -918,11 +918,10 @@ esp_output(struct mbuf *m, struct tdb *t
  }
 
  /* Initialize ESP header. */
- bcopy((caddr_t) &tdb->tdb_spi, mtod(mo, caddr_t) + roff,
-    sizeof(u_int32_t));
+ bcopy((caddr_t) &tdb->tdb_spi, mtod(mo, caddr_t), sizeof(u_int32_t));
  tdb->tdb_rpl++;
  replay = htonl((u_int32_t)tdb->tdb_rpl);
- memcpy(mtod(mo, caddr_t) + roff + sizeof(u_int32_t), (caddr_t) &replay,
+ memcpy(mtod(mo, caddr_t) + sizeof(u_int32_t), (caddr_t) &replay,
     sizeof(u_int32_t));
 
 #if NPFSYNC > 0
@@ -933,15 +932,15 @@ esp_output(struct mbuf *m, struct tdb *t
  * Add padding -- better to do it ourselves than use the crypto engine,
  * although if/when we support compression, we'd have to do that.
  */
- mo = m_makespace(m, m->m_pkthdr.len, padding + alen, &roff);
+ mo = m_inject(m, m->m_pkthdr.len, padding + alen, M_DONTWAIT);
  if (mo == NULL) {
- DPRINTF(("esp_output(): m_makespace() failed for SA %s/%08x\n",
+ DPRINTF(("esp_output(): m_inject failed for SA %s/%08x\n",
     ipsp_address(&tdb->tdb_dst, buf, sizeof(buf)),
     ntohl(tdb->tdb_spi)));
  m_freem(m);
  return ENOBUFS;
  }
- pad = mtod(mo, caddr_t) + roff;
+ pad = mtod(mo, u_char *);
 
  /* Apply self-describing padding */
  for (ilen = 0; ilen < padding - 2; ilen++)
Index: sys/netinet/ip_ipcomp.c
===================================================================
RCS file: /cvs/src/sys/netinet/ip_ipcomp.c,v
retrieving revision 1.48
diff -u -p -r1.48 ip_ipcomp.c
--- sys/netinet/ip_ipcomp.c 24 Sep 2016 14:51:37 -0000 1.48
+++ sys/netinet/ip_ipcomp.c 9 Oct 2016 00:03:45 -0000
@@ -528,7 +528,7 @@ ipcomp_output_cb(struct cryptop *crp)
  struct tdb_crypto *tc;
  struct tdb *tdb;
  struct mbuf *m, *mo;
- int error, s, skip, rlen, roff;
+ int error, s, skip, rlen;
  u_int16_t cpi;
  struct ip *ip;
 #ifdef INET6
@@ -593,7 +593,7 @@ ipcomp_output_cb(struct cryptop *crp)
  }
 
  /* Inject IPCOMP header */
- mo = m_makespace(m, skip, IPCOMP_HLENGTH, &roff);
+ mo = m_inject(m, skip, IPCOMP_HLENGTH, M_DONTWAIT);
  if (mo == NULL) {
  DPRINTF(("ipcomp_output_cb(): failed to inject IPCOMP header "
     "for IPCA %s/%08x\n", ipsp_address(&tdb->tdb_dst, buf,
@@ -604,7 +604,7 @@ ipcomp_output_cb(struct cryptop *crp)
  }
 
  /* Initialize the IPCOMP header */
- ipcomp = (struct ipcomp *)(mtod(mo, caddr_t) + roff);
+ ipcomp = mtod(mo, struct ipcomp *);
  memset(ipcomp, 0, sizeof(struct ipcomp));
  cpi = (u_int16_t) ntohl(tdb->tdb_spi);
  ipcomp->ipcomp_cpi = htons(cpi);
Index: sys/netinet/ipsec_output.c
===================================================================
RCS file: /cvs/src/sys/netinet/ipsec_output.c,v
retrieving revision 1.63
diff -u -p -r1.63 ipsec_output.c
--- sys/netinet/ipsec_output.c 13 Sep 2016 19:56:55 -0000 1.63
+++ sys/netinet/ipsec_output.c 9 Oct 2016 00:03:45 -0000
@@ -362,12 +362,13 @@ int
 ipsp_process_done(struct mbuf *m, struct tdb *tdb)
 {
  struct ip *ip;
+
 #ifdef INET6
  struct ip6_hdr *ip6;
 #endif /* INET6 */
+
  struct tdb_ident *tdbi;
  struct m_tag *mtag;
- int roff;
 
  tdb->tdb_last_used = time_second;
 
@@ -397,12 +398,12 @@ ipsp_process_done(struct mbuf *m, struct
  return ENXIO;
  }
 
- mi = m_makespace(m, iphlen, sizeof(struct udphdr), &roff);
+ mi = m_inject(m, iphlen, sizeof(struct udphdr), M_DONTWAIT);
  if (mi == NULL) {
  m_freem(m);
  return ENOMEM;
  }
- uh = (struct udphdr *)(mtod(mi, caddr_t) + roff);
+ uh = mtod(mi, struct udphdr *);
  uh->uh_sport = uh->uh_dport = htons(udpencap_port);
  if (tdb->tdb_udpencap_port)
  uh->uh_dport = tdb->tdb_udpencap_port;
Index: sys/sys/mbuf.h
===================================================================
RCS file: /cvs/src/sys/sys/mbuf.h,v
retrieving revision 1.220
diff -u -p -r1.220 mbuf.h
--- sys/sys/mbuf.h 17 Sep 2016 00:38:43 -0000 1.220
+++ sys/sys/mbuf.h 9 Oct 2016 00:20:34 -0000
@@ -419,6 +419,7 @@ extern int max_protohdr; /* largest pro
 extern int max_hdr; /* largest link+protocol header */
 
 void mbinit(void);
+struct mbuf *m_copym2(struct mbuf *, int, int, int);
 struct mbuf *m_copym(struct mbuf *, int, int, int);
 struct mbuf *m_free(struct mbuf *);
 struct mbuf *m_get(int, int);
@@ -431,7 +432,7 @@ struct mbuf *m_prepend(struct mbuf *, in
 struct mbuf *m_pulldown(struct mbuf *, int, int, int *);
 struct mbuf *m_pullup(struct mbuf *, int);
 struct mbuf *m_split(struct mbuf *, int, int);
-struct mbuf *m_makespace(struct mbuf *, int, int, int *);
+struct  mbuf *m_inject(struct mbuf *, int, int, int);
 struct  mbuf *m_getptr(struct mbuf *, int, int *);
 int m_leadingspace(struct mbuf *);
 int m_trailingspace(struct mbuf *);
Index: share/man/man9/mbuf.9
===================================================================
RCS file: /cvs/src/share/man/man9/mbuf.9,v
retrieving revision 1.104
diff -u -p -r1.104 mbuf.9
--- share/man/man9/mbuf.9 15 Sep 2016 00:00:40 -0000 1.104
+++ share/man/man9/mbuf.9 9 Oct 2016 00:03:45 -0000
@@ -42,7 +42,7 @@
 .Nm m_pulldown ,
 .Nm m_pullup ,
 .Nm m_split ,
-.Nm m_makespace ,
+.Nm m_inject ,
 .Nm m_getptr ,
 .Nm m_adj ,
 .Nm m_copyback ,
@@ -92,7 +92,7 @@
 .Ft struct mbuf *
 .Fn m_split "struct mbuf *m0" "int len0" "int wait"
 .Ft struct mbuf *
-.Fn m_makespace "struct mbuf *m0" "int skip" "int hlen" "int *off"
+.Fn m_inject "struct mbuf *m0" "int len0" "int siz" "int wait"
 .Ft struct mbuf *
 .Fn m_getptr "struct mbuf *m" "int loc" "int *off"
 .Ft void
@@ -559,18 +559,18 @@ Split an mbuf chain in two pieces, retur
 the tail (which is made of the previous mbuf chain except the first
 .Fa len0
 bytes).
-.It Fn m_makespace "struct mbuf *m0" "int skip" "int hlen" "int *off"
-Make space for a continuous memory region of length
-.Fa hlen
-at
-.Fa skip
-bytes into the mbuf chain.
-On success, the mbuf of the continuous memory is returned
-together with an offset
-.Fa off
-into the mbuf.
-On failure, NULL is returned and the mbuf chain may have been modified.
-The caller is assumed to always free the chain.
+.It Fn m_inject "struct mbuf *m0" "int len0" "int siz" "int wait"
+Inject a new mbuf chain of length
+.Fa siz
+into the mbuf chain pointed to by
+.Fa m0
+at position
+.Fa len0 .
+If there is enough space for an object of size
+.Fa siz
+in the appropriate location, no memory will be allocated.
+On failure, the function returns NULL (the mbuf is left untouched) and
+on success, a pointer to the first injected mbuf is returned.
 .It Fn m_getptr "struct mbuf *m" "int loc" "int *off"
 Returns a pointer to the mbuf containing the data located at
 .Fa loc
--
Christian "naddy" Weisgerber                          [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: IPv6/NDP/IPsec breakage in -current

Christian Weisgerber
On 2016-10-09, Christian Weisgerber <[hidden email]> wrote:

> Found by bisection.  The culprit is this commit:
>
> ------------------------------------------------------------------------
> CVSROOT:        /cvs
> Module name:    src
> Changes by:     [hidden email]  2016/09/13 13:56:55
>
> Modified files:
>         sys/kern       : uipc_mbuf.c
>         sys/netinet    : ip_ah.c ip_esp.c ip_ipcomp.c ipsec_output.c
>         sys/sys        : mbuf.h
>         share/man/man9 : mbuf.9
>
> Log message:
> avoid extensive mbuf allocation for IPsec by replacing m_inject(4)
> with m_makespace(4) from freebsd; ok mpi@, bluhm@, mikeb@, dlg@
> ------------------------------------------------------------------------

I don't see anything wrong in there.  Maybe the problem is elsewhere
and that change just triggers it.

Meanwhile, here's a less invasive "backout" that neuters m_makespace()
so it produces the same mbuf chains as m_inject() did.  This makes
the bug disappear.

Index: uipc_mbuf.c
===================================================================
RCS file: /cvs/src/sys/kern/uipc_mbuf.c,v
retrieving revision 1.228
diff -u -p -r1.228 uipc_mbuf.c
--- uipc_mbuf.c 13 Sep 2016 19:56:55 -0000 1.228
+++ uipc_mbuf.c 10 Oct 2016 20:54:40 -0000
@@ -1062,13 +1062,16 @@ m_makespace(struct mbuf *m0, int skip, i
  * the contents of m as needed.
  */
  remain = m->m_len - skip; /* data to move */
+#if 0
  if (skip < remain && hlen <= M_LEADINGSPACE(m)) {
  if (skip)
  memmove(m->m_data-hlen, m->m_data, skip);
  m->m_data -= hlen;
  m->m_len += hlen;
  (*off) = skip;
- } else if (hlen > M_TRAILINGSPACE(m)) {
+ } else if (hlen > M_TRAILINGSPACE(m))
+#endif
+ {
  struct mbuf *n0, *n, **np;
  int todo, len, done, alloc;
 
@@ -1102,6 +1105,7 @@ m_makespace(struct mbuf *m0, int skip, i
  todo -= len;
  }
 
+#if 0
  if (hlen <= M_TRAILINGSPACE(m) + remain) {
  m->m_len = skip + hlen;
  *off = skip;
@@ -1109,8 +1113,9 @@ m_makespace(struct mbuf *m0, int skip, i
  *np = m->m_next;
  m->m_next = n0;
  }
- }
- else {
+ } else
+#endif
+ {
  n = m_get(M_DONTWAIT, m->m_type);
  if (n == NULL) {
  m_freem(n0);
@@ -1131,7 +1136,9 @@ m_makespace(struct mbuf *m0, int skip, i
  m = n; /* header is at front ... */
  *off = 0; /* ... of new mbuf */
  }
- } else {
+ }
+#if 0
+ else {
  /*
  * Copy the remainder to the back of the mbuf
  * so there's space to write the new header.
@@ -1142,6 +1149,7 @@ m_makespace(struct mbuf *m0, int skip, i
  m->m_len += hlen;
  *off = skip;
  }
+#endif
  m0->m_pkthdr.len += hlen; /* adjust packet length */
  return m;
 }
--
Christian "naddy" Weisgerber                          [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: IPv6/NDP/IPsec breakage in -current

Mike Belopuhov-5
On Mon, Oct 10, 2016 at 21:13 +0000, Christian Weisgerber wrote:

> On 2016-10-09, Christian Weisgerber <[hidden email]> wrote:
>
> > Found by bisection.  The culprit is this commit:
> >
> > ------------------------------------------------------------------------
> > CVSROOT:        /cvs
> > Module name:    src
> > Changes by:     [hidden email]  2016/09/13 13:56:55
> >
> > Modified files:
> >         sys/kern       : uipc_mbuf.c
> >         sys/netinet    : ip_ah.c ip_esp.c ip_ipcomp.c ipsec_output.c
> >         sys/sys        : mbuf.h
> >         share/man/man9 : mbuf.9
> >
> > Log message:
> > avoid extensive mbuf allocation for IPsec by replacing m_inject(4)
> > with m_makespace(4) from freebsd; ok mpi@, bluhm@, mikeb@, dlg@
> > ------------------------------------------------------------------------
>
> I don't see anything wrong in there.  Maybe the problem is elsewhere
> and that change just triggers it.
>
> Meanwhile, here's a less invasive "backout" that neuters m_makespace()
> so it produces the same mbuf chains as m_inject() did.  This makes
> the bug disappear.
>

I can't find any immediate deficiencies in the m_makespace.
It's also not clear what's wrong with those broken NS/ND
packets that you receive.  Do you get any ESP errors?
We need to know what kind of chains are being affected.

Could you please try the following diff.  Unfortunately,
it might produce too much output.  If you could narrow it
down to affected packets this would help a lot.

diff --git sys/kern/uipc_mbuf.c sys/kern/uipc_mbuf.c
index d6b248f..cf9c650 100644
--- sys/kern/uipc_mbuf.c
+++ sys/kern/uipc_mbuf.c
@@ -996,10 +996,34 @@ extpacket:
  n->m_next = m->m_next;
  m->m_next = NULL;
  return (n);
 }
 
+static void
+m_hexdump(const char *where, struct mbuf *m)
+{
+ char *desc;
+ int len;
+
+ while (m != NULL) {
+ len = MLEN;
+ desc = "MBUF";
+ if (m->m_flags & M_EXT) {
+ len = m->m_ext.ext_size;
+ desc = "CLUSTER";
+ } else if (m->m_flags & M_PKTHDR) {
+ len = MHLEN;
+ desc = "PKTHDR";
+ }
+ printf("%s: %s (%p): len %d, total %d, leading(-) %d, "
+    "trailing(+) %d\n", where, desc, m, m->m_len, len,
+    m_leadingspace(m), m_trailingspace(m));
+ m = m->m_next;
+ }
+ printf("=======================\n");
+}
+
 /*
  * Make space for a new header of length hlen at skip bytes
  * into the packet.  When doing this we allocate new mbufs only
  * when absolutely necessary.  The mbuf where the new header
  * is to go is returned together with an offset into the mbuf.
@@ -1032,10 +1056,11 @@ m_makespace(struct mbuf *m0, int skip, int hlen, int *off)
  if (skip)
  memmove(m->m_data-hlen, m->m_data, skip);
  m->m_data -= hlen;
  m->m_len += hlen;
  (*off) = skip;
+ m_hexdump("1", m0);
  } else if (hlen > M_TRAILINGSPACE(m)) {
  struct mbuf *n0, *n, **np;
  int todo, len, done, alloc;
 
  n0 = NULL;
@@ -1073,10 +1098,11 @@ m_makespace(struct mbuf *m0, int skip, int hlen, int *off)
  *off = skip;
  if (n0 != NULL) {
  *np = m->m_next;
  m->m_next = n0;
  }
+ m_hexdump("2", m0);
  }
  else {
  n = m_get(M_DONTWAIT, m->m_type);
  if (n == NULL) {
  m_freem(n0);
@@ -1105,10 +1131,11 @@ m_makespace(struct mbuf *m0, int skip, int hlen, int *off)
  if (remain > 0)
  memmove(mtod(m, caddr_t) + skip + hlen,
       mtod(m, caddr_t) + skip, remain);
  m->m_len += hlen;
  *off = skip;
+ m_hexdump("3", m0);
  }
  m0->m_pkthdr.len += hlen; /* adjust packet length */
  return m;
 }
 

Reply | Threaded
Open this post in threaded view
|

Re: IPv6/NDP/IPsec breakage in -current

Christian Weisgerber
Mike Belopuhov:

> It's also not clear what's wrong with those broken NS/ND
> packets that you receive.

Oct 12 17:30:10 bardioc /bsd: nd6_na_input: ND packet from non-neighbor
Oct 12 17:30:12 bardioc last message repeated 2 times
Oct 12 17:30:15 bardioc /bsd: nd6_ns_input: NS packet from non-neighbor
Oct 12 17:30:15 bardioc /bsd: nd6_ns_input: src=2001:6f8:124a::4
Oct 12 17:30:15 bardioc /bsd: nd6_ns_input: dst=2001:6f8:124a::2
Oct 12 17:30:15 bardioc /bsd: nd6_ns_input: tgt=2001:6f8:124a::2
Oct 12 17:30:16 bardioc /bsd: nd6_ns_input: NS packet from non-neighbor
Oct 12 17:30:16 bardioc /bsd: nd6_ns_input: src=2001:6f8:124a::4
Oct 12 17:30:16 bardioc /bsd: nd6_ns_input: dst=2001:6f8:124a::2
Oct 12 17:30:16 bardioc /bsd: nd6_ns_input: tgt=2001:6f8:124a::2
Oct 12 17:30:17 bardioc /bsd: nd6_ns_input: NS packet from non-neighbor
Oct 12 17:30:17 bardioc /bsd: nd6_ns_input: src=2001:6f8:124a::4
Oct 12 17:30:17 bardioc /bsd: nd6_ns_input: dst=2001:6f8:124a::2
Oct 12 17:30:17 bardioc /bsd: nd6_ns_input: tgt=2001:6f8:124a::2

What seems to be wrong is that they are going through the tunnel.

> Do you get any ESP errors?

No.

> Could you please try the following diff.  Unfortunately,
> it might produce too much output.  If you could narrow it
> down to affected packets this would help a lot.

I can narrow it down to a single packet.  Here's the result from
ping6 -c1 2001:6f8:124a::4 :

2: PKTHDR (0xffffff00dfa9d200): len 64, total 152, leading(-) 72, trailing(+) 16
2: MBUF (0xffffff00dfa9db00): len 40, total 224, leading(-) 0, trailing(+) 184
2: MBUF (0xffffff00dfa9da00): len 32, total 224, leading(-) 72, trailing(+) 120
=======================
3: PKTHDR (0xffffff00dfa9d200): len 64, total 152, leading(-) 72, trailing(+) 16
3: MBUF (0xffffff00dfa9db00): len 40, total 224, leading(-) 0, trailing(+) 184
3: MBUF (0xffffff00dfa9da00): len 56, total 224, leading(-) 72, trailing(+) 96
=======================

That corresponds to the two m_makespace() calls in esp_output().

I already dug around earlier.  The input packet layout looks like
this:

          mbuf               mbuf
    +------+------+      +--------+
    | IPv6 | IPv6 | ---- | ICMPv6 |
    +------+------+      +--------+

After the first m_makespace():

    +------+-----+      +------+      +--------+
    | IPv6 | ESP | ---- | IPv6 | ---- | ICMPv6 |
    +------+-----+      +------+      +--------+

After the second m_makespace():

    +------+-----+      +------+      +--------+-----+
    | IPv6 | ESP | ---- | IPv6 | ---- | ICMPv6 | ESP |
    +------+-----+      +------+      +--------+-----+

With m_inject(), it would instead be something like this:

    +------+    +-----+      +------+      +--------
    | IPv6 |----| ESP | ---- | IPv6 | ---- | ICMPv6  ...
    +------+    +-----+      +------+      +--------


The corresponding traffic this ping6 triggers on the outgoing em0
interface:

17:30:10.192307 2001:6f8:124a::2 > ff02::1:ff00:4: icmp6: neighbor sol: who has 2001:6f8:124a::4
17:30:10.192737 esp 2001:6f8:124a::4 > 2001:6f8:124a::2 spi 0xbeefdead seq 2 len 120
17:30:11.189521 2001:6f8:124a::2 > ff02::1:ff00:4: icmp6: neighbor sol: who has 2001:6f8:124a::4
17:30:11.189738 esp 2001:6f8:124a::4 > 2001:6f8:124a::2 spi 0xbeefdead seq 3 len 120
17:30:12.189831 2001:6f8:124a::2 > ff02::1:ff00:4: icmp6: neighbor sol: who has 2001:6f8:124a::4
17:30:12.190033 esp 2001:6f8:124a::4 > 2001:6f8:124a::2 spi 0xbeefdead seq 4 len 120
17:30:15.187608 esp 2001:6f8:124a::4 > 2001:6f8:124a::2 spi 0xbeefdead seq 5 len 120
17:30:16.188826 esp 2001:6f8:124a::4 > 2001:6f8:124a::2 spi 0xbeefdead seq 6 len 120
17:30:17.194123 esp 2001:6f8:124a::4 > 2001:6f8:124a::2 spi 0xbeefdead seq 7 len 120

And on enc0:

17:30:10.141545 (authentic,confidential): SPI 0xdeadbeef: 2001:6f8:124a::2 > 2001:6f8:124a::4: icmp6: echo request (encap)
17:30:10.192872 (authentic,confidential): SPI 0xbeefdead: 2001:6f8:124a::4 > 2001:6f8:124a::2: icmp6: neighbor adv: tgt is 2001:6f8:124a::4 (encap)
17:30:11.189809 (authentic,confidential): SPI 0xbeefdead: 2001:6f8:124a::4 > 2001:6f8:124a::2: icmp6: neighbor adv: tgt is 2001:6f8:124a::4 (encap)
17:30:12.190101 (authentic,confidential): SPI 0xbeefdead: 2001:6f8:124a::4 > 2001:6f8:124a::2: icmp6: neighbor adv: tgt is 2001:6f8:124a::4 (encap)
17:30:15.187706 (authentic,confidential): SPI 0xbeefdead: 2001:6f8:124a::4 > 2001:6f8:124a::2: icmp6: neighbor sol: who has 2001:6f8:124a::2 (encap)
17:30:16.188924 (authentic,confidential): SPI 0xbeefdead: 2001:6f8:124a::4 > 2001:6f8:124a::2: icmp6: neighbor sol: who has 2001:6f8:124a::2 (encap)
17:30:17.194226 (authentic,confidential): SPI 0xbeefdead: 2001:6f8:124a::4 > 2001:6f8:124a::2: icmp6: neighbor sol: who has 2001:6f8:124a::2 (encap)

--
Christian "naddy" Weisgerber                          [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: IPv6/NDP/IPsec breakage in -current

Mike Belopuhov-5
On Wed, Oct 12, 2016 at 18:00 +0200, Christian Weisgerber wrote:

> Mike Belopuhov:
>
> > It's also not clear what's wrong with those broken NS/ND
> > packets that you receive.
>
> Oct 12 17:30:10 bardioc /bsd: nd6_na_input: ND packet from non-neighbor
> Oct 12 17:30:12 bardioc last message repeated 2 times
> Oct 12 17:30:15 bardioc /bsd: nd6_ns_input: NS packet from non-neighbor
> Oct 12 17:30:15 bardioc /bsd: nd6_ns_input: src=2001:6f8:124a::4
> Oct 12 17:30:15 bardioc /bsd: nd6_ns_input: dst=2001:6f8:124a::2
> Oct 12 17:30:15 bardioc /bsd: nd6_ns_input: tgt=2001:6f8:124a::2
> Oct 12 17:30:16 bardioc /bsd: nd6_ns_input: NS packet from non-neighbor
> Oct 12 17:30:16 bardioc /bsd: nd6_ns_input: src=2001:6f8:124a::4
> Oct 12 17:30:16 bardioc /bsd: nd6_ns_input: dst=2001:6f8:124a::2
> Oct 12 17:30:16 bardioc /bsd: nd6_ns_input: tgt=2001:6f8:124a::2
> Oct 12 17:30:17 bardioc /bsd: nd6_ns_input: NS packet from non-neighbor
> Oct 12 17:30:17 bardioc /bsd: nd6_ns_input: src=2001:6f8:124a::4
> Oct 12 17:30:17 bardioc /bsd: nd6_ns_input: dst=2001:6f8:124a::2
> Oct 12 17:30:17 bardioc /bsd: nd6_ns_input: tgt=2001:6f8:124a::2
>
> What seems to be wrong is that they are going through the tunnel.
>

But what does it have to do with m_makespace then?  It's called by
the esp_output, therefore the decision to apply the transformation
has been already made at this point.  Which SPD lookup succeeds
that shouldn't?

> > Do you get any ESP errors?
>
> No.
>
> > Could you please try the following diff.  Unfortunately,
> > it might produce too much output.  If you could narrow it
> > down to affected packets this would help a lot.
>
> I can narrow it down to a single packet.  Here's the result from
> ping6 -c1 2001:6f8:124a::4 :
>
> 2: PKTHDR (0xffffff00dfa9d200): len 64, total 152, leading(-) 72, trailing(+) 16
> 2: MBUF (0xffffff00dfa9db00): len 40, total 224, leading(-) 0, trailing(+) 184
> 2: MBUF (0xffffff00dfa9da00): len 32, total 224, leading(-) 72, trailing(+) 120
> =======================
> 3: PKTHDR (0xffffff00dfa9d200): len 64, total 152, leading(-) 72, trailing(+) 16
> 3: MBUF (0xffffff00dfa9db00): len 40, total 224, leading(-) 0, trailing(+) 184
> 3: MBUF (0xffffff00dfa9da00): len 56, total 224, leading(-) 72, trailing(+) 96
> =======================
>
> That corresponds to the two m_makespace() calls in esp_output().
>
> I already dug around earlier.  The input packet layout looks like
> this:
>
>           mbuf               mbuf
>     +------+------+      +--------+
>     | IPv6 | IPv6 | ---- | ICMPv6 |
>     +------+------+      +--------+
>
> After the first m_makespace():
>
>     +------+-----+      +------+      +--------+
>     | IPv6 | ESP | ---- | IPv6 | ---- | ICMPv6 |
>     +------+-----+      +------+      +--------+
>
> After the second m_makespace():
>
>     +------+-----+      +------+      +--------+-----+
>     | IPv6 | ESP | ---- | IPv6 | ---- | ICMPv6 | ESP |
>     +------+-----+      +------+      +--------+-----+
>
> With m_inject(), it would instead be something like this:
>
>     +------+    +-----+      +------+      +--------
>     | IPv6 |----| ESP | ---- | IPv6 | ---- | ICMPv6  ...
>     +------+    +-----+      +------+      +--------
>
>

Yes, I don't see anything wrong with this.  For some reason you
get the same addresses for your mbuf chain components.  Are they
statically allocated somehow?

> The corresponding traffic this ping6 triggers on the outgoing em0
> interface:
>
> 17:30:10.192307 2001:6f8:124a::2 > ff02::1:ff00:4: icmp6: neighbor sol: who has 2001:6f8:124a::4
> 17:30:10.192737 esp 2001:6f8:124a::4 > 2001:6f8:124a::2 spi 0xbeefdead seq 2 len 120
> 17:30:11.189521 2001:6f8:124a::2 > ff02::1:ff00:4: icmp6: neighbor sol: who has 2001:6f8:124a::4
> 17:30:11.189738 esp 2001:6f8:124a::4 > 2001:6f8:124a::2 spi 0xbeefdead seq 3 len 120
> 17:30:12.189831 2001:6f8:124a::2 > ff02::1:ff00:4: icmp6: neighbor sol: who has 2001:6f8:124a::4
> 17:30:12.190033 esp 2001:6f8:124a::4 > 2001:6f8:124a::2 spi 0xbeefdead seq 4 len 120
> 17:30:15.187608 esp 2001:6f8:124a::4 > 2001:6f8:124a::2 spi 0xbeefdead seq 5 len 120
> 17:30:16.188826 esp 2001:6f8:124a::4 > 2001:6f8:124a::2 spi 0xbeefdead seq 6 len 120
> 17:30:17.194123 esp 2001:6f8:124a::4 > 2001:6f8:124a::2 spi 0xbeefdead seq 7 len 120
>
> And on enc0:
>
> 17:30:10.141545 (authentic,confidential): SPI 0xdeadbeef: 2001:6f8:124a::2 > 2001:6f8:124a::4: icmp6: echo request (encap)
> 17:30:10.192872 (authentic,confidential): SPI 0xbeefdead: 2001:6f8:124a::4 > 2001:6f8:124a::2: icmp6: neighbor adv: tgt is 2001:6f8:124a::4 (encap)
> 17:30:11.189809 (authentic,confidential): SPI 0xbeefdead: 2001:6f8:124a::4 > 2001:6f8:124a::2: icmp6: neighbor adv: tgt is 2001:6f8:124a::4 (encap)
> 17:30:12.190101 (authentic,confidential): SPI 0xbeefdead: 2001:6f8:124a::4 > 2001:6f8:124a::2: icmp6: neighbor adv: tgt is 2001:6f8:124a::4 (encap)
> 17:30:15.187706 (authentic,confidential): SPI 0xbeefdead: 2001:6f8:124a::4 > 2001:6f8:124a::2: icmp6: neighbor sol: who has 2001:6f8:124a::2 (encap)
> 17:30:16.188924 (authentic,confidential): SPI 0xbeefdead: 2001:6f8:124a::4 > 2001:6f8:124a::2: icmp6: neighbor sol: who has 2001:6f8:124a::2 (encap)
> 17:30:17.194226 (authentic,confidential): SPI 0xbeefdead: 2001:6f8:124a::4 > 2001:6f8:124a::2: icmp6: neighbor sol: who has 2001:6f8:124a::2 (encap)
>
> --
> Christian "naddy" Weisgerber                          [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: IPv6/NDP/IPsec breakage in -current

Christian Weisgerber
Mike Belopuhov:

> > > It's also not clear what's wrong with those broken NS/ND
> > > packets that you receive.
> >
> > What seems to be wrong is that they are going through the tunnel.
>
> But what does it have to do with m_makespace then?  It's called by
> the esp_output, therefore the decision to apply the transformation
> has been already made at this point.  Which SPD lookup succeeds
> that shouldn't?

My ping6 packet (IPv6-ICMPv6) goes into the network stack.  It gets
IPv6-encapsulated for the tunnel.  Then it gets ESP-encapsulated,
which is where m_makespace() is called.  Then, somewhere after
esp_output(), the stack notices that it doesn't know where to send
the packet and generates the first neighbor solicitation.  This NS
packet is not encapsulated, so m_makespace() is not involved there.

The different mbuf layout generated by m_makespace() somehow causes
the neighbor solicitation code to choose a different source address.

Combined from the tcpdumps on em0 and enc0 on the host that tries to
send the ping6:

# the initial ping6 packet goes through IPsec processing
17:30:10.141545 (authentic,confidential): SPI 0xdeadbeef: 2001:6f8:124a::2 > 2001:6f8:124a::4: icmp6: echo request (encap)
# the neighbor solicitation
17:30:10.192307 2001:6f8:124a::2 > ff02::1:ff00:4: icmp6: neighbor sol: who has 2001:6f8:124a::4
# in reply, the neighbor advertisement comes through the tunnel
17:30:10.192737 esp 2001:6f8:124a::4 > 2001:6f8:124a::2 spi 0xbeefdead seq 2 len 120
17:30:10.192872 (authentic,confidential): SPI 0xbeefdead: 2001:6f8:124a::4 > 2001:6f8:124a::2: icmp6: neighbor adv: tgt is 2001:6f8:124a::4 (encap)

> > I can narrow it down to a single packet.  Here's the result from
> > ping6 -c1 2001:6f8:124a::4 :
> >
> > 2: PKTHDR (0xffffff00dfa9d200): len 64, total 152, leading(-) 72, trailing(+) 16
> > 2: MBUF (0xffffff00dfa9db00): len 40, total 224, leading(-) 0, trailing(+) 184
> > 2: MBUF (0xffffff00dfa9da00): len 32, total 224, leading(-) 72, trailing(+) 120
> > =======================
> > 3: PKTHDR (0xffffff00dfa9d200): len 64, total 152, leading(-) 72, trailing(+) 16
> > 3: MBUF (0xffffff00dfa9db00): len 40, total 224, leading(-) 0, trailing(+) 184
> > 3: MBUF (0xffffff00dfa9da00): len 56, total 224, leading(-) 72, trailing(+) 96
> > =======================
>
> Yes, I don't see anything wrong with this.  For some reason you
> get the same addresses for your mbuf chain components.  Are they
> statically allocated somehow?

What do you mean?  That's the same mbuf chain above.  The first
m_makespace() call makes room for the ESP header, the second for
the ESP trailer.

--
Christian "naddy" Weisgerber                          [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: IPv6/NDP/IPsec breakage in -current

Christian Weisgerber
In reply to this post by Christian Weisgerber
On 2016-10-12, Christian Weisgerber <[hidden email]> wrote:

> After the second m_makespace():
>
>     +------+-----+      +------+      +--------+-----+
>     | IPv6 | ESP | ---- | IPv6 | ---- | ICMPv6 | ESP |
>     +------+-----+      +------+      +--------+-----+
>
> With m_inject(), it would instead be something like this:
>
>     +------+    +-----+      +------+      +--------
>     | IPv6 |----| ESP | ---- | IPv6 | ---- | ICMPv6  ...
>     +------+    +-----+      +------+      +--------

Found it.  It's this snippet of nd6_ns_output() that handles those
mbuf chains differently:

    454                 if (ln && ln->ln_hold) {
    455                         hip6 = mtod(ln->ln_hold, struct ip6_hdr *);
    456                         /* XXX pullup? */
    457                         if (sizeof(*hip6) < ln->ln_hold->m_len)
    458                                 saddr6 = &hip6->ip6_src;
    459                         else
    460                                 saddr6 = NULL;
    461                 } else
    462                         saddr6 = NULL;

Did this only ever work by accident?

--
Christian "naddy" Weisgerber                          [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: IPv6/NDP/IPsec breakage in -current

Mike Belopuhov-5
On Thu, Oct 13, 2016 at 11:06 +0000, Christian Weisgerber wrote:

> On 2016-10-12, Christian Weisgerber <[hidden email]> wrote:
>
> > After the second m_makespace():
> >
> >     +------+-----+      +------+      +--------+-----+
> >     | IPv6 | ESP | ---- | IPv6 | ---- | ICMPv6 | ESP |
> >     +------+-----+      +------+      +--------+-----+
> >
> > With m_inject(), it would instead be something like this:
> >
> >     +------+    +-----+      +------+      +--------
> >     | IPv6 |----| ESP | ---- | IPv6 | ---- | ICMPv6  ...
> >     +------+    +-----+      +------+      +--------
>
> Found it.  It's this snippet of nd6_ns_output() that handles those
> mbuf chains differently:
>
>     454                 if (ln && ln->ln_hold) {
>     455                         hip6 = mtod(ln->ln_hold, struct ip6_hdr *);
>     456                         /* XXX pullup? */
>     457                         if (sizeof(*hip6) < ln->ln_hold->m_len)
>     458                                 saddr6 = &hip6->ip6_src;
>     459                         else
>     460                                 saddr6 = NULL;
>     461                 } else
>     462                         saddr6 = NULL;
>
> Did this only ever work by accident?
>

Does reversing this condition work? (sizeof > m_len)
I believe the comment about pullup is pointless.

FreeBSD has moved this code into nd6_llinfo_get_holdsrc and
fixed this condition in this diff:
https://svnweb.freebsd.org/base?view=revision&revision=288652

> --
> Christian "naddy" Weisgerber                          [hidden email]
>

Reply | Threaded
Open this post in threaded view
|

Re: IPv6/NDP/IPsec breakage in -current

Mike Belopuhov-5
On Thu, Oct 13, 2016 at 13:48 +0200, Mike Belopuhov wrote:

> On Thu, Oct 13, 2016 at 11:06 +0000, Christian Weisgerber wrote:
> > On 2016-10-12, Christian Weisgerber <[hidden email]> wrote:
> >
> > > After the second m_makespace():
> > >
> > >     +------+-----+      +------+      +--------+-----+
> > >     | IPv6 | ESP | ---- | IPv6 | ---- | ICMPv6 | ESP |
> > >     +------+-----+      +------+      +--------+-----+
> > >
> > > With m_inject(), it would instead be something like this:
> > >
> > >     +------+    +-----+      +------+      +--------
> > >     | IPv6 |----| ESP | ---- | IPv6 | ---- | ICMPv6  ...
> > >     +------+    +-----+      +------+      +--------
> >
> > Found it.  It's this snippet of nd6_ns_output() that handles those
> > mbuf chains differently:
> >
> >     454                 if (ln && ln->ln_hold) {
> >     455                         hip6 = mtod(ln->ln_hold, struct ip6_hdr *);
> >     456                         /* XXX pullup? */
> >     457                         if (sizeof(*hip6) < ln->ln_hold->m_len)
> >     458                                 saddr6 = &hip6->ip6_src;
> >     459                         else
> >     460                                 saddr6 = NULL;
> >     461                 } else
> >     462                         saddr6 = NULL;
> >
> > Did this only ever work by accident?
> >
>
> Does reversing this condition work? (sizeof > m_len)
> I believe the comment about pullup is pointless.
>
> FreeBSD has moved this code into nd6_llinfo_get_holdsrc and
> fixed this condition in this diff:
> https://svnweb.freebsd.org/base?view=revision&revision=288652
>

I wonder if KAME people are actually right and we don't need to
check this at all.   Can we get a packet w/o an IPv6 header there
or an mbuf chain with an empty first mbuf?
https://github.com/kame/kame/blob/master/kame/sys/netinet6/nd6_nbr.c#L520

> > --
> > Christian "naddy" Weisgerber                          [hidden email]
> >

Reply | Threaded
Open this post in threaded view
|

Re: IPv6/NDP/IPsec breakage in -current

Christian Weisgerber
In reply to this post by Mike Belopuhov-5
Mike Belopuhov:

> >     454                 if (ln && ln->ln_hold) {
> >     455                         hip6 = mtod(ln->ln_hold, struct ip6_hdr *);
> >     456                         /* XXX pullup? */
> >     457                         if (sizeof(*hip6) < ln->ln_hold->m_len)
> >     458                                 saddr6 = &hip6->ip6_src;
> >     459                         else
> >     460                                 saddr6 = NULL;
> >     461                 } else
> >     462                         saddr6 = NULL;
>
> Does reversing this condition work? (sizeof > m_len)

That doesn't make sense, is effectively an if(0), and turns the
whole snippet into saddr6 = NULL.  But yes, for testing purposes
that fixes the problem.

> FreeBSD has moved this code into nd6_llinfo_get_holdsrc and
> fixed this condition in this diff:
> https://svnweb.freebsd.org/base?view=revision&revision=288652

But the way I understand the FreeBSD code it is equivalent to

        if (sizeof(*hip6) > ln->ln_hold->m_len)
                saddr6 = NULL;
        else
                saddr6 = &hip6->ip6_src;

and thus corresponds to our "not working" case.

--
Christian "naddy" Weisgerber                          [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: IPv6/NDP/IPsec breakage in -current

Markus Friedl-4
In reply to this post by Christian Weisgerber

> Am 13.10.2016 um 13:06 schrieb Christian Weisgerber <[hidden email]>:
>
>> After the second m_makespace():
>>
>>    +------+-----+      +------+      +--------+-----+
>>    | IPv6 | ESP | ---- | IPv6 | ---- | ICMPv6 | ESP |
>>    +------+-----+      +------+      +--------+-----+
>>
>> With m_inject(), it would instead be something like this:
>>
>>    +------+    +-----+      +------+      +--------
>>    | IPv6 |----| ESP | ---- | IPv6 | ---- | ICMPv6  ...
>>    +------+    +-----+      +------+      +--------
>
> Found it.  It's this snippet of nd6_ns_output() that handles those
> mbuf chains differently:
>
>    454                 if (ln && ln->ln_hold) {
>    455                         hip6 = mtod(ln->ln_hold, struct ip6_hdr *);
>    456                         /* XXX pullup? */
>    457                         if (sizeof(*hip6) < ln->ln_hold->m_len)
>    458                                 saddr6 = &hip6->ip6_src;
>    459                         else
>    460                                 saddr6 = NULL;
>    461                 } else
>    462                         saddr6 = NULL;
>
> Did this only ever work by accident?

ok, to get it right, the following is the difference:

with m_inject() the first mbuf always contains the 40 byte ipv6 header
while with m_makespace() it also contains the ESP header.

so with m_inject() the ln_hold->m_len is 40 and since this is
exactly the size of hip6, the code falls back to saddr6 = NULL.

IMHO the code should use <= and not <:
       if (sizeof(*hip6) <=  ln->ln_hold->m_len)
but then your example will also fail with the old m_inject() code.

If this intended address selection is indeed correct then we
need to figure out if a bypass flow for NDP is necessary, or
if NDP should always bypass IPsec (but what about bringing NDP over IPsec?)

With IPv4 this problem does not exist, because ARP packet are not IP
packets, so they are not matched by the IPsec flow.

-m
Reply | Threaded
Open this post in threaded view
|

Re: IPv6/NDP/IPsec breakage in -current

Alexander Bluhm
In reply to this post by Christian Weisgerber
On Thu, Oct 06, 2016 at 11:12:18PM +0200, Christian Weisgerber wrote:
> Something is very broken at the intersection of IPv6, NDP, and IPsec
> in -current.

I also see issues with IPv6 and NDP, but no IPsec involved.  There
are several other threads on bugs@ about broken IPv6.

It seems that sending neighbor solicitation retries for expired ND
entries does not work.  The diff below helps in my case, although
it is only a workaround and not MP safe.  It would be interesting
to know wether it also affects your scenario.

The RTF_CACHED code was introduced with this commit:
----------------------------
revision 1.190
date: 2016/08/22 16:01:52;  author: mpi;  state: Exp;  lines: +24 -6;  commitid: Jx7agqiuXqs8RRGd;
Make the ``rt_gwroute'' pointer of RTF_GATEWAY entries immutable.

This means that no protection is needed to guarantee that the next hop
route wont be modified by CPU1 while CPU0 is dereferencing it in a L2
resolution functions.

While here also fix an ``ifa'' leak resulting in RTF_GATEWAY being always
invalid.

dlg@ likes it, inputs and ok bluhm@
----------------------------

bluhm

Index: netinet6/nd6.c
===================================================================
RCS file: /data/mirror/openbsd/cvs/src/sys/netinet6/nd6.c,v
retrieving revision 1.193
diff -u -p -r1.193 nd6.c
--- netinet6/nd6.c 3 Oct 2016 12:33:21 -0000 1.193
+++ netinet6/nd6.c 13 Oct 2016 21:47:25 -0000
@@ -827,7 +827,7 @@ nd6_free(struct rtentry *rt, int gc)
  * caches, and disable the route entry not to be used in already
  * cached routes.
  */
- if (!ISSET(rt->rt_flags, RTF_STATIC|RTF_CACHED))
+ if (!ISSET(rt->rt_flags, RTF_STATIC))
  rtdeletemsg(rt, ifp, ifp->if_rdomain);
  splx(s);
 

Reply | Threaded
Open this post in threaded view
|

Re: IPv6/NDP/IPsec breakage in -current

Christian Weisgerber
Alexander Bluhm:

> I also see issues with IPv6 and NDP, but no IPsec involved.  There
> are several other threads on bugs@ about broken IPv6.
>
> It seems that sending neighbor solicitation retries for expired ND
> entries does not work.  The diff below helps in my case, although
> it is only a workaround and not MP safe.  It would be interesting
> to know wether it also affects your scenario.

It does not.

> --- netinet6/nd6.c 3 Oct 2016 12:33:21 -0000 1.193
> +++ netinet6/nd6.c 13 Oct 2016 21:47:25 -0000
> @@ -827,7 +827,7 @@ nd6_free(struct rtentry *rt, int gc)
>   * caches, and disable the route entry not to be used in already
>   * cached routes.
>   */
> - if (!ISSET(rt->rt_flags, RTF_STATIC|RTF_CACHED))
> + if (!ISSET(rt->rt_flags, RTF_STATIC))
>   rtdeletemsg(rt, ifp, ifp->if_rdomain);
>   splx(s);
>  

--
Christian "naddy" Weisgerber                          [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: IPv6/NDP/IPsec breakage in -current

Mike Belopuhov-5
In reply to this post by Markus Friedl-4
On Thu, Oct 13, 2016 at 21:43 +0200, Markus Friedl wrote:

>
> > Am 13.10.2016 um 13:06 schrieb Christian Weisgerber <[hidden email]>:
> >
> >> After the second m_makespace():
> >>
> >>    +------+-----+      +------+      +--------+-----+
> >>    | IPv6 | ESP | ---- | IPv6 | ---- | ICMPv6 | ESP |
> >>    +------+-----+      +------+      +--------+-----+
> >>
> >> With m_inject(), it would instead be something like this:
> >>
> >>    +------+    +-----+      +------+      +--------
> >>    | IPv6 |----| ESP | ---- | IPv6 | ---- | ICMPv6  ...
> >>    +------+    +-----+      +------+      +--------
> >
> > Found it.  It's this snippet of nd6_ns_output() that handles those
> > mbuf chains differently:
> >
> >    454                 if (ln && ln->ln_hold) {
> >    455                         hip6 = mtod(ln->ln_hold, struct ip6_hdr *);
> >    456                         /* XXX pullup? */
> >    457                         if (sizeof(*hip6) < ln->ln_hold->m_len)
> >    458                                 saddr6 = &hip6->ip6_src;
> >    459                         else
> >    460                                 saddr6 = NULL;
> >    461                 } else
> >    462                         saddr6 = NULL;
> >
> > Did this only ever work by accident?
>
> ok, to get it right, the following is the difference:
>
> with m_inject() the first mbuf always contains the 40 byte ipv6 header
> while with m_makespace() it also contains the ESP header.
>
> so with m_inject() the ln_hold->m_len is 40 and since this is
> exactly the size of hip6, the code falls back to saddr6 = NULL.
>
> IMHO the code should use <= and not <:
>        if (sizeof(*hip6) <=  ln->ln_hold->m_len)
> but then your example will also fail with the old m_inject() code.
>

I agree with the above.

> If this intended address selection is indeed correct then we
> need to figure out if a bypass flow for NDP is necessary, or
> if NDP should always bypass IPsec (but what about bringing NDP
> over IPsec?)
>

There are apparently some discussions in infomational RFCs regarding
this issue.  For instance https://tools.ietf.org/html/rfc3756 states:

   More specifically, the currently used key agreement protocol, IKE,
   suffers from a chicken-and-egg problem [8]: one needs an IP address
   to run IKE, IKE is needed to establish IPsec SAs, and IPsec SAs are
   required to configure an IP address.

Which goes one step further: how to protect all ND in general, but is
still applicable in our situation.  There were attempts to protect ND
in alternative way, e.g. SEND (https://tools.ietf.org/html/rfc3971).
FreeBSD has picked up on it and has had a SoC project which seems to
be integrated right now:

   https://wiki.freebsd.org/SOC2009AnaKukec
   https://www.freebsd.org/cgi/man.cgi?query=send&sektion=4

Would it be possible for us to disable the check and always set saddr6
to NULL for now?

> With IPv4 this problem does not exist, because ARP packet are not IP
> packets, so they are not matched by the IPsec flow.
>
> -m

Reply | Threaded
Open this post in threaded view
|

Re: IPv6/NDP/IPsec breakage in -current

Markus Friedl-4

> Am 25.10.2016 um 17:13 schrieb Mike Belopuhov <[hidden email]>:
>
>
> There are apparently some discussions in infomational RFCs regarding
> this issue.  For instance https://tools.ietf.org/html/rfc3756 <https://tools.ietf.org/html/rfc3756> states:
>
>   More specifically, the currently used key agreement protocol, IKE,
>   suffers from a chicken-and-egg problem [8]: one needs an IP address
>   to run IKE, IKE is needed to establish IPsec SAs, and IPsec SAs are
>   required to configure an IP address.
>
> Which goes one step further: how to protect all ND in general, but is
> still applicable in our situation.  There were attempts to protect ND
> in alternative way, e.g. SEND (https://tools.ietf.org/html/rfc3971 <https://tools.ietf.org/html/rfc3971>).
> FreeBSD has picked up on it and has had a SoC project which seems to
> be integrated right now:
>
>   https://wiki.freebsd.org/SOC2009AnaKukec <https://wiki.freebsd.org/SOC2009AnaKukec>
>   https://www.freebsd.org/cgi/man.cgi?query=send&sektion=4 <https://www.freebsd.org/cgi/man.cgi?query=send&sektion=4>
>
> Would it be possible for us to disable the check and always set saddr6
> to NULL for now?

Fine w/me.

Or we could check if the packet has been IPsec encapsulated
and set saddr6 to NULL in this case.
Reply | Threaded
Open this post in threaded view
|

Re: IPv6/NDP/IPsec breakage in -current

Martin Pieuchot
On 25/10/16(Tue) 22:13, Markus Friedl wrote:

>
> > Am 25.10.2016 um 17:13 schrieb Mike Belopuhov <[hidden email]>:
> >
> >
> > There are apparently some discussions in infomational RFCs regarding
> > this issue.  For instance https://tools.ietf.org/html/rfc3756 <https://tools.ietf.org/html/rfc3756> states:
> >
> >   More specifically, the currently used key agreement protocol, IKE,
> >   suffers from a chicken-and-egg problem [8]: one needs an IP address
> >   to run IKE, IKE is needed to establish IPsec SAs, and IPsec SAs are
> >   required to configure an IP address.
> >
> > Which goes one step further: how to protect all ND in general, but is
> > still applicable in our situation.  There were attempts to protect ND
> > in alternative way, e.g. SEND (https://tools.ietf.org/html/rfc3971 <https://tools.ietf.org/html/rfc3971>).
> > FreeBSD has picked up on it and has had a SoC project which seems to
> > be integrated right now:
> >
> >   https://wiki.freebsd.org/SOC2009AnaKukec <https://wiki.freebsd.org/SOC2009AnaKukec>
> >   https://www.freebsd.org/cgi/man.cgi?query=send&sektion=4 <https://www.freebsd.org/cgi/man.cgi?query=send&sektion=4>
> >
> > Would it be possible for us to disable the check and always set saddr6
> > to NULL for now?
>
> Fine w/me.
>
> Or we could check if the packet has been IPsec encapsulated
> and set saddr6 to NULL in this case.

Is this fixed?  Anything we're still waiting for?

Reply | Threaded
Open this post in threaded view
|

Re: IPv6/NDP/IPsec breakage in -current

Martin Pieuchot
On 02/11/16(Wed) 10:19, Martin Pieuchot wrote:

> On 25/10/16(Tue) 22:13, Markus Friedl wrote:
> >
> > > Am 25.10.2016 um 17:13 schrieb Mike Belopuhov <[hidden email]>:
> > >
> > >
> > > There are apparently some discussions in infomational RFCs regarding
> > > this issue.  For instance https://tools.ietf.org/html/rfc3756 <https://tools.ietf.org/html/rfc3756> states:
> > >
> > >   More specifically, the currently used key agreement protocol, IKE,
> > >   suffers from a chicken-and-egg problem [8]: one needs an IP address
> > >   to run IKE, IKE is needed to establish IPsec SAs, and IPsec SAs are
> > >   required to configure an IP address.
> > >
> > > Which goes one step further: how to protect all ND in general, but is
> > > still applicable in our situation.  There were attempts to protect ND
> > > in alternative way, e.g. SEND (https://tools.ietf.org/html/rfc3971 <https://tools.ietf.org/html/rfc3971>).
> > > FreeBSD has picked up on it and has had a SoC project which seems to
> > > be integrated right now:
> > >
> > >   https://wiki.freebsd.org/SOC2009AnaKukec <https://wiki.freebsd.org/SOC2009AnaKukec>
> > >   https://www.freebsd.org/cgi/man.cgi?query=send&sektion=4 <https://www.freebsd.org/cgi/man.cgi?query=send&sektion=4>
> > >
> > > Would it be possible for us to disable the check and always set saddr6
> > > to NULL for now?
> >
> > Fine w/me.
> >
> > Or we could check if the packet has been IPsec encapsulated
> > and set saddr6 to NULL in this case.
>
> Is this fixed?  Anything we're still waiting for?

So something like that?  FWIW I'm happy with fewer in6ifa_ifpwithaddr().

Index: netinet6/nd6_nbr.c
===================================================================
RCS file: /cvs/src/sys/netinet6/nd6_nbr.c,v
retrieving revision 1.110
diff -u -p -r1.110 nd6_nbr.c
--- netinet6/nd6_nbr.c 23 Aug 2016 11:03:10 -0000 1.110
+++ netinet6/nd6_nbr.c 4 Nov 2016 09:02:47 -0000
@@ -433,54 +433,23 @@ nd6_ns_output(struct ifnet *ifp, struct
  }
  ip6->ip6_dst = dst_sa.sin6_addr;
  if (!dad) {
- /*
- * RFC2461 7.2.2:
- * "If the source address of the packet prompting the
- * solicitation is the same as one of the addresses assigned
- * to the outgoing interface, that address SHOULD be placed
- * in the IP Source Address of the outgoing solicitation.
- * Otherwise, any one of the addresses assigned to the
- * interface should be used."
- *
- * We use the source address for the prompting packet
- * (saddr6), if:
- * - saddr6 is given from the caller (by giving "ln"), and
- * - saddr6 belongs to the outgoing interface.
- * Otherwise, we perform the source address selection as usual.
- */
- struct ip6_hdr *hip6; /* hold ip6 */
- struct in6_addr *saddr6;
+ /* Perform source address selection. */
+ struct rtentry *rt;
 
- if (ln && ln->ln_hold) {
- hip6 = mtod(ln->ln_hold, struct ip6_hdr *);
- /* XXX pullup? */
- if (sizeof(*hip6) < ln->ln_hold->m_len)
- saddr6 = &hip6->ip6_src;
- else
- saddr6 = NULL;
- } else
- saddr6 = NULL;
- if (saddr6 && in6ifa_ifpwithaddr(ifp, saddr6))
- src_sa.sin6_addr = *saddr6;
- else {
- struct rtentry *rt;
+ rt = rtalloc(sin6tosa(&dst_sa), RT_RESOLVE,
+    m->m_pkthdr.ph_rtableid);
+ if (!rtisvalid(rt)) {
+ char addr[INET6_ADDRSTRLEN];
 
- rt = rtalloc(sin6tosa(&dst_sa), RT_RESOLVE,
-    m->m_pkthdr.ph_rtableid);
- if (!rtisvalid(rt)) {
- char addr[INET6_ADDRSTRLEN];
-
- nd6log((LOG_DEBUG,
-    "%s: source can't be determined: dst=%s\n",
-    __func__, inet_ntop(AF_INET6,
-    &dst_sa.sin6_addr, addr, sizeof(addr))));
- rtfree(rt);
- goto bad;
- }
- src_sa.sin6_addr =
-    ifatoia6(rt->rt_ifa)->ia_addr.sin6_addr;
+ nd6log((LOG_DEBUG,
+    "%s: source can't be determined: dst=%s\n",
+    __func__, inet_ntop(AF_INET6,
+    &dst_sa.sin6_addr, addr, sizeof(addr))));
  rtfree(rt);
+ goto bad;
  }
+ src_sa.sin6_addr = ifatoia6(rt->rt_ifa)->ia_addr.sin6_addr;
+ rtfree(rt);
  } else {
  /*
  * Source address for DAD packet must always be IPv6

Reply | Threaded
Open this post in threaded view
|

Re: IPv6/NDP/IPsec breakage in -current

Martin Pieuchot
On 04/11/16(Fri) 10:09, Martin Pieuchot wrote:

> On 02/11/16(Wed) 10:19, Martin Pieuchot wrote:
> > On 25/10/16(Tue) 22:13, Markus Friedl wrote:
> > >
> > > > Am 25.10.2016 um 17:13 schrieb Mike Belopuhov <[hidden email]>:
> > > >
> > > >
> > > > There are apparently some discussions in infomational RFCs regarding
> > > > this issue.  For instance https://tools.ietf.org/html/rfc3756 <https://tools.ietf.org/html/rfc3756> states:
> > > >
> > > >   More specifically, the currently used key agreement protocol, IKE,
> > > >   suffers from a chicken-and-egg problem [8]: one needs an IP address
> > > >   to run IKE, IKE is needed to establish IPsec SAs, and IPsec SAs are
> > > >   required to configure an IP address.
> > > >
> > > > Which goes one step further: how to protect all ND in general, but is
> > > > still applicable in our situation.  There were attempts to protect ND
> > > > in alternative way, e.g. SEND (https://tools.ietf.org/html/rfc3971 <https://tools.ietf.org/html/rfc3971>).
> > > > FreeBSD has picked up on it and has had a SoC project which seems to
> > > > be integrated right now:
> > > >
> > > >   https://wiki.freebsd.org/SOC2009AnaKukec <https://wiki.freebsd.org/SOC2009AnaKukec>
> > > >   https://www.freebsd.org/cgi/man.cgi?query=send&sektion=4 <https://www.freebsd.org/cgi/man.cgi?query=send&sektion=4>
> > > >
> > > > Would it be possible for us to disable the check and always set saddr6
> > > > to NULL for now?
> > >
> > > Fine w/me.
> > >
> > > Or we could check if the packet has been IPsec encapsulated
> > > and set saddr6 to NULL in this case.
> >
> > Is this fixed?  Anything we're still waiting for?
>
> So something like that?  FWIW I'm happy with fewer in6ifa_ifpwithaddr().

naddy@ confirmed this diff fixes his tunnel mode setup, ok?

> Index: netinet6/nd6_nbr.c
> ===================================================================
> RCS file: /cvs/src/sys/netinet6/nd6_nbr.c,v
> retrieving revision 1.110
> diff -u -p -r1.110 nd6_nbr.c
> --- netinet6/nd6_nbr.c 23 Aug 2016 11:03:10 -0000 1.110
> +++ netinet6/nd6_nbr.c 4 Nov 2016 09:02:47 -0000
> @@ -433,54 +433,23 @@ nd6_ns_output(struct ifnet *ifp, struct
>   }
>   ip6->ip6_dst = dst_sa.sin6_addr;
>   if (!dad) {
> - /*
> - * RFC2461 7.2.2:
> - * "If the source address of the packet prompting the
> - * solicitation is the same as one of the addresses assigned
> - * to the outgoing interface, that address SHOULD be placed
> - * in the IP Source Address of the outgoing solicitation.
> - * Otherwise, any one of the addresses assigned to the
> - * interface should be used."
> - *
> - * We use the source address for the prompting packet
> - * (saddr6), if:
> - * - saddr6 is given from the caller (by giving "ln"), and
> - * - saddr6 belongs to the outgoing interface.
> - * Otherwise, we perform the source address selection as usual.
> - */
> - struct ip6_hdr *hip6; /* hold ip6 */
> - struct in6_addr *saddr6;
> + /* Perform source address selection. */
> + struct rtentry *rt;
>  
> - if (ln && ln->ln_hold) {
> - hip6 = mtod(ln->ln_hold, struct ip6_hdr *);
> - /* XXX pullup? */
> - if (sizeof(*hip6) < ln->ln_hold->m_len)
> - saddr6 = &hip6->ip6_src;
> - else
> - saddr6 = NULL;
> - } else
> - saddr6 = NULL;
> - if (saddr6 && in6ifa_ifpwithaddr(ifp, saddr6))
> - src_sa.sin6_addr = *saddr6;
> - else {
> - struct rtentry *rt;
> + rt = rtalloc(sin6tosa(&dst_sa), RT_RESOLVE,
> +    m->m_pkthdr.ph_rtableid);
> + if (!rtisvalid(rt)) {
> + char addr[INET6_ADDRSTRLEN];
>  
> - rt = rtalloc(sin6tosa(&dst_sa), RT_RESOLVE,
> -    m->m_pkthdr.ph_rtableid);
> - if (!rtisvalid(rt)) {
> - char addr[INET6_ADDRSTRLEN];
> -
> - nd6log((LOG_DEBUG,
> -    "%s: source can't be determined: dst=%s\n",
> -    __func__, inet_ntop(AF_INET6,
> -    &dst_sa.sin6_addr, addr, sizeof(addr))));
> - rtfree(rt);
> - goto bad;
> - }
> - src_sa.sin6_addr =
> -    ifatoia6(rt->rt_ifa)->ia_addr.sin6_addr;
> + nd6log((LOG_DEBUG,
> +    "%s: source can't be determined: dst=%s\n",
> +    __func__, inet_ntop(AF_INET6,
> +    &dst_sa.sin6_addr, addr, sizeof(addr))));
>   rtfree(rt);
> + goto bad;
>   }
> + src_sa.sin6_addr = ifatoia6(rt->rt_ifa)->ia_addr.sin6_addr;
> + rtfree(rt);
>   } else {
>   /*
>   * Source address for DAD packet must always be IPv6
>

12