GRE datagram socket support

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

GRE datagram socket support

David Gwynne-5
i've been toying with this idea of implementing GRE as a datagram
protocol that userland can use just like UDP. the idea is to make it
easy to support the implementation of NHRP in userland for mgre(4),
and also for ERSPAN* support without going down the path linux took**.

so this is the result of having a go at implementing the idea. the diff
includes several independent parts, but they all work together to make
GRE as comfortable to use as UDP. the two main parts are the actual
protocol implementation in src/sys/netinet/ip_gre.c, and the tweaks to
getaddrinfo to allow the resolution of gre services. the /etc/services
chunk gets used by the getaddrinfo bits.

so, the first chunk lets you do this (as root in userland):

        int s = socket(AF_INET, SOCK_DGRAM, IPPROTO_GRE);

that gives you a file descriptor you can then use with bind(),
connect(), sendto(), recvfrom(), etc. you write a message to the
kernel and it prepends the GRE and IP headers and pushes it out.
it is set up so the GRE protocol is handed to the kernel via the
sin_port or sin6_port member of struct sockaddr_in an sockaddr_in6
respectively. there's no source and destination protocol fields, just
one that both ends agree on, so if you connect then bind, your
sockaddrs have to agree on the proto. unfortunately there's no such
thing as a wildcard or reserved protocol in GRE, so 0 can't be used
as a wildcard like it can in udp and tcp.

the sockets support the configuration of optional GRE headers, as
defined in RFC 2890, using setsockopt. importantly you can enable
the key and sequence number headers, which again, the kernel offloads
for you.

the second chunk tweaks getaddrinfo so it lets you specify things other
than IPPROTO_UDP and IPPROTO_TCP. protocols other than those are now
looked up in /etc/protocols to get their name, which in turn is used to
look up entries in /etc/services. while i was there and reading rfcs, i
noted different behaviour for wildcarded socktypes and protocols, which
i've tried to implement. eric@ seems generally ok with this stuff, and
suggested the tweak to pledge to allow access to /etc/protocols using
the dns pledge. tcp and udp are still special though, and are still
omgoptimised.

all this together lets the program at
https://mild.embarrassm.net/~dlg/diff/egred.c work. it is a userland
reimplementation of a simplified egre(4) using tap(4) and a gre socket.
the io path is literally reading from one fd and writing it to the othe,
everything else is boilerplate.

i suspect the kernel stuff is a bit rough as i havent had to test every
path, but it supports common functionality.

thoughts? i am pretty pleased with this has turned out, and would be
keen to put it in the tree and work on it some more.

* https://tools.ietf.org/html/draft-foschiano-erspan-03
** http://vger.kernel.org/lpc_net2018_talks/erspan-linux-presentation.pdf

Index: etc/services
===================================================================
RCS file: /cvs/src/etc/services,v
retrieving revision 1.96
diff -u -p -r1.96 services
--- etc/services 27 Jan 2019 20:35:06 -0000 1.96
+++ etc/services 29 Oct 2019 07:57:44 -0000
@@ -332,6 +332,21 @@ spamd-cfg 8026/tcp # spamd(8) configur
 dhcpd-sync 8067/udp # dhcpd(8) synchronisation
 hunt 26740/udp # hunt(6)
 #
+# GRE Protocol Types
+#
+keepalive 0/gre # 0x0000: IP tunnel keepalive
+ipv4 2048/gre # 0x0800: IPv4
+nhrp 8193/gre # 0x2001: Next Hop Resolution Protocol
+erspan3 8939/gre # 0x22eb: ERSPAN III
+transether 25944/gre ethernet # 0x6558: Trans Ether Bridging
+ipv6 34525/gre # 0x86dd: IPv6
+wccp 34878/gre # 0x883e: Web Content Cache Protocol
+mpls 34887/gre # 0x8847: MPLS
+#mpls 34888/gre # 0x8848: MPLS Multicast
+erspan 35006/gre erspan2 # 0x88be: ERSPAN I/II
+nsh 35151/gre # 0x894f: Network Service Header
+control 47082/gre # 0xb7ea: RFC 8157
+#
 # Appletalk
 #
 rtmp 1/ddp # Routing Table Maintenance Protocol
Index: lib/libc/asr/getaddrinfo_async.c
===================================================================
RCS file: /cvs/src/lib/libc/asr/getaddrinfo_async.c,v
retrieving revision 1.56
diff -u -p -r1.56 getaddrinfo_async.c
--- lib/libc/asr/getaddrinfo_async.c 3 Nov 2018 09:13:24 -0000 1.56
+++ lib/libc/asr/getaddrinfo_async.c 29 Oct 2019 07:57:54 -0000
@@ -34,36 +34,15 @@
 
 #include "asr_private.h"
 
-struct match {
- int family;
- int socktype;
- int protocol;
-};
-
 static int getaddrinfo_async_run(struct asr_query *, struct asr_result *);
 static int get_port(const char *, const char *, int);
+static int get_service(const char *, int, int);
 static int iter_family(struct asr_query *, int);
 static int addrinfo_add(struct asr_query *, const struct sockaddr *, const char *);
 static int addrinfo_from_file(struct asr_query *, int,  FILE *);
 static int addrinfo_from_pkt(struct asr_query *, char *, size_t);
 static int addrconfig_setup(struct asr_query *);
 
-static const struct match matches[] = {
- { PF_INET, SOCK_DGRAM, IPPROTO_UDP },
- { PF_INET, SOCK_STREAM, IPPROTO_TCP },
- { PF_INET, SOCK_RAW, 0 },
- { PF_INET6, SOCK_DGRAM, IPPROTO_UDP },
- { PF_INET6, SOCK_STREAM, IPPROTO_TCP },
- { PF_INET6, SOCK_RAW, 0 },
- { -1, 0, 0, },
-};
-
-#define MATCH_FAMILY(a, b) ((a) == matches[(b)].family || (a) == PF_UNSPEC)
-#define MATCH_PROTO(a, b) ((a) == matches[(b)].protocol || (a) == 0 || matches[(b)].protocol == 0)
-/* Do not match SOCK_RAW unless explicitly specified */
-#define MATCH_SOCKTYPE(a, b) ((a) == matches[(b)].socktype || ((a) == 0 && \
- matches[(b)].socktype != SOCK_RAW))
-
 enum {
  DOM_INIT,
  DOM_DOMAIN,
@@ -199,24 +178,27 @@ getaddrinfo_async_run(struct asr_query *
  }
  }
 
- /* Make sure there is at least a valid combination */
- for (i = 0; matches[i].family != -1; i++)
- if (MATCH_FAMILY(ai->ai_family, i) &&
-    MATCH_SOCKTYPE(ai->ai_socktype, i) &&
-    MATCH_PROTO(ai->ai_protocol, i))
- break;
- if (matches[i].family == -1) {
- ar->ar_gai_errno = EAI_BADHINTS;
- async_set_state(as, ASR_STATE_HALT);
- break;
- }
-
- if (ai->ai_protocol == 0 || ai->ai_protocol == IPPROTO_UDP)
+ switch (ai->ai_protocol) {
+ case 0:
  as->as.ai.port_udp = get_port(as->as.ai.servname, "udp",
     as->as.ai.hints.ai_flags & AI_NUMERICSERV);
- if (ai->ai_protocol == 0 || ai->ai_protocol == IPPROTO_TCP)
  as->as.ai.port_tcp = get_port(as->as.ai.servname, "tcp",
     as->as.ai.hints.ai_flags & AI_NUMERICSERV);
+ break;
+ case IPPROTO_TCP:
+ as->as.ai.port_tcp = get_port(as->as.ai.servname, "tcp",
+    as->as.ai.hints.ai_flags & AI_NUMERICSERV);
+ break;
+ case IPPROTO_UDP:
+ as->as.ai.port_udp = get_port(as->as.ai.servname, "udp",
+    as->as.ai.hints.ai_flags & AI_NUMERICSERV);
+ break;
+ default:
+ as->as.ai.port_udp = get_service(as->as.ai.servname,
+    ai->ai_protocol,
+    as->as.ai.hints.ai_flags & AI_NUMERICSERV);
+ break;
+ }
  if (as->as.ai.port_tcp == -2 || as->as.ai.port_udp == -2 ||
     (as->as.ai.port_tcp == -1 && as->as.ai.port_udp == -1) ||
     (ai->ai_protocol && (as->as.ai.port_udp == -1 ||
@@ -491,6 +473,24 @@ get_port(const char *servname, const cha
  return (port);
 }
 
+static int
+get_service(const char *servname, int protocol, int numonly)
+{
+ struct protoent pe;
+ struct protoent_data ped;
+ int rv;
+
+ memset(&ped, 0, sizeof(ped));
+ rv = getprotobynumber_r(protocol, &pe, &ped);
+ if (rv == -1)
+ return (-1);
+
+ rv = get_port(servname, pe.p_name, numonly);
+ endprotoent_r(&ped);
+
+ return (rv);
+}
+
 /*
  * Iterate over the address families that are to be queried. Use the
  * list on the async context, unless a specific family was given in hints.
@@ -519,65 +519,107 @@ iter_family(struct asr_query *as, int fi
  * entry per protocol/socktype match.
  */
 static int
-addrinfo_add(struct asr_query *as, const struct sockaddr *sa, const char *cname)
+addrinfo_add_ai(struct asr_query *as, const struct sockaddr *sa,
+    const char *cname, int socktype, int proto, int port)
 {
  struct addrinfo *ai;
- int i, port, proto;
-
- for (i = 0; matches[i].family != -1; i++) {
- if (matches[i].family != sa->sa_family ||
-    !MATCH_SOCKTYPE(as->as.ai.hints.ai_socktype, i) ||
-    !MATCH_PROTO(as->as.ai.hints.ai_protocol, i))
- continue;
-
- proto = as->as.ai.hints.ai_protocol;
- if (!proto)
- proto = matches[i].protocol;
-
- if (proto == IPPROTO_TCP)
- port = as->as.ai.port_tcp;
- else if (proto == IPPROTO_UDP)
- port = as->as.ai.port_udp;
- else
- port = 0;
+ int i;
 
- /* servname specified, but not defined for this protocol */
- if (port == -1)
- continue;
+ if (port == -1)
+ return (0);
 
- ai = calloc(1, sizeof(*ai) + sa->sa_len);
- if (ai == NULL)
+ ai = calloc(1, sizeof(*ai) + sa->sa_len);
+ if (ai == NULL)
+ return (EAI_MEMORY);
+ ai->ai_family = sa->sa_family;
+ ai->ai_socktype = socktype;
+ ai->ai_protocol = proto;
+ ai->ai_flags = as->as.ai.hints.ai_flags;
+ ai->ai_addrlen = sa->sa_len;
+ ai->ai_addr = (void *)(ai + 1);
+ if (cname &&
+    as->as.ai.hints.ai_flags & (AI_CANONNAME | AI_FQDN)) {
+ if ((ai->ai_canonname = strdup(cname)) == NULL) {
+ free(ai);
  return (EAI_MEMORY);
- ai->ai_family = sa->sa_family;
- ai->ai_socktype = matches[i].socktype;
- ai->ai_protocol = proto;
- ai->ai_flags = as->as.ai.hints.ai_flags;
- ai->ai_addrlen = sa->sa_len;
- ai->ai_addr = (void *)(ai + 1);
- if (cname &&
-    as->as.ai.hints.ai_flags & (AI_CANONNAME | AI_FQDN)) {
- if ((ai->ai_canonname = strdup(cname)) == NULL) {
- free(ai);
- return (EAI_MEMORY);
- }
  }
- memmove(ai->ai_addr, sa, sa->sa_len);
- if (sa->sa_family == PF_INET)
- ((struct sockaddr_in *)ai->ai_addr)->sin_port =
-    htons(port);
- else if (sa->sa_family == PF_INET6)
- ((struct sockaddr_in6 *)ai->ai_addr)->sin6_port =
-    htons(port);
-
- if (as->as.ai.aifirst == NULL)
- as->as.ai.aifirst = ai;
- if (as->as.ai.ailast)
- as->as.ai.ailast->ai_next = ai;
- as->as.ai.ailast = ai;
- as->as_count += 1;
  }
+ memmove(ai->ai_addr, sa, sa->sa_len);
+ if (sa->sa_family == PF_INET)
+ ((struct sockaddr_in *)ai->ai_addr)->sin_port =
+    htons(port);
+ else if (sa->sa_family == PF_INET6)
+ ((struct sockaddr_in6 *)ai->ai_addr)->sin6_port =
+    htons(port);
+
+ if (as->as.ai.aifirst == NULL)
+ as->as.ai.aifirst = ai;
+ if (as->as.ai.ailast)
+ as->as.ai.ailast->ai_next = ai;
+ as->as.ai.ailast = ai;
+ as->as_count += 1;
 
  return (0);
+}
+
+static int
+addrinfo_add_proto(struct asr_query *as, const struct sockaddr *sa,
+    const char *cname, int proto, int port)
+{
+ int rv;
+
+ switch (as->as.ai.hints.ai_socktype) {
+ case 0:
+ rv = addrinfo_add_ai(as, sa, cname, SOCK_STREAM, proto, port);
+ if (rv != 0)
+ break;
+
+ rv = addrinfo_add_ai(as, sa, cname, SOCK_DGRAM, proto, port);
+ if (rv != 0)
+ break;
+
+ break;
+
+ default:
+ rv = addrinfo_add_ai(as, sa, cname,
+    as->as.ai.hints.ai_socktype, proto, port);
+ break;
+ }
+
+ return (rv);
+}
+
+static int
+addrinfo_add(struct asr_query *as, const struct sockaddr *sa, const char *cname)
+{
+ int rv;
+
+ switch (as->as.ai.hints.ai_protocol) {
+ case 0:
+ rv = addrinfo_add_proto(as, sa, cname,
+    IPPROTO_TCP, as->as.ai.port_tcp);
+ if (rv != 0)
+ break;
+
+ rv = addrinfo_add_proto(as, sa, cname,
+    IPPROTO_UDP, as->as.ai.port_udp);
+ if (rv != 0)
+ break;
+
+ break;
+
+ case IPPROTO_TCP:
+ rv = addrinfo_add_proto(as, sa, cname,
+    IPPROTO_TCP, as->as.ai.port_tcp);
+ break;
+
+ default: /* includes IPPROTO_UDP */
+ rv = addrinfo_add_proto(as, sa, cname,
+    as->as.ai.hints.ai_protocol, as->as.ai.port_udp);
+ break;
+ }
+
+ return (rv);
 }
 
 static int
Index: sys/conf/files
===================================================================
RCS file: /cvs/src/sys/conf/files,v
retrieving revision 1.675
diff -u -p -r1.675 files
--- sys/conf/files 5 Oct 2019 05:33:14 -0000 1.675
+++ sys/conf/files 29 Oct 2019 07:57:58 -0000
@@ -862,7 +862,7 @@ file netinet/tcp_subr.c
 file netinet/tcp_timer.c
 file netinet/tcp_usrreq.c
 file netinet/udp_usrreq.c
-file netinet/ip_gre.c
+file netinet/ip_gre.c gre
 file netinet/ip_ipsp.c ipsec | tcp_signature
 file netinet/ip_spd.c ipsec | tcp_signature
 file netinet/ip_ipip.c
Index: sys/kern/kern_pledge.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_pledge.c,v
retrieving revision 1.255
diff -u -p -r1.255 kern_pledge.c
--- sys/kern/kern_pledge.c 25 Aug 2019 18:46:40 -0000 1.255
+++ sys/kern/kern_pledge.c 29 Oct 2019 07:57:58 -0000
@@ -666,7 +666,7 @@ pledge_namei(struct proc *p, struct name
  }
  }
 
- /* DNS needs /etc/{resolv.conf,hosts,services}. */
+ /* DNS needs /etc/{resolv.conf,hosts,services,protocols}. */
  if ((ni->ni_pledge == PLEDGE_RPATH) &&
     (p->p_p->ps_pledge & PLEDGE_DNS)) {
  if (strcmp(path, "/etc/resolv.conf") == 0) {
@@ -678,6 +678,10 @@ pledge_namei(struct proc *p, struct name
  return (0);
  }
  if (strcmp(path, "/etc/services") == 0) {
+ ni->ni_cnd.cn_flags |= BYPASSUNVEIL;
+ return (0);
+ }
+ if (strcmp(path, "/etc/protocols") == 0) {
  ni->ni_cnd.cn_flags |= BYPASSUNVEIL;
  return (0);
  }
Index: sys/net/if_gre.c
===================================================================
RCS file: /cvs/src/sys/net/if_gre.c,v
retrieving revision 1.152
diff -u -p -r1.152 if_gre.c
--- sys/net/if_gre.c 29 Jul 2019 16:28:25 -0000 1.152
+++ sys/net/if_gre.c 29 Oct 2019 07:57:58 -0000
@@ -69,6 +69,8 @@
 #include <netinet/ip_var.h>
 #include <netinet/ip_ecn.h>
 
+#include <netinet/gre_proto.h>
+
 #ifdef INET6
 #include <netinet/ip6.h>
 #include <netinet6/ip6_var.h>
@@ -103,28 +105,6 @@
 /*
  * packet formats
  */
-struct gre_header {
- uint16_t gre_flags;
-#define GRE_CP 0x8000  /* Checksum Present */
-#define GRE_KP 0x2000  /* Key Present */
-#define GRE_SP 0x1000  /* Sequence Present */
-
-#define GRE_VERS_MASK 0x0007
-#define GRE_VERS_0 0x0000
-#define GRE_VERS_1 0x0001
-
- uint16_t gre_proto;
-} __packed __aligned(4);
-
-struct gre_h_cksum {
- uint16_t gre_cksum;
- uint16_t gre_reserved1;
-} __packed __aligned(4);
-
-struct gre_h_key {
- uint32_t gre_key;
-} __packed __aligned(4);
-
 #define GRE_EOIP 0x6400
 
 struct gre_h_key_eoip {
@@ -132,13 +112,7 @@ struct gre_h_key_eoip {
  uint16_t eoip_tunnel_id; /* little endian */
 } __packed __aligned(4);
 
-#define NVGRE_VSID_RES_MIN 0x000000 /* reserved for future use */
-#define NVGRE_VSID_RES_MAX 0x000fff
-#define NVGRE_VSID_NVE2NVE 0xffffff /* vendor specific NVE-to-NVE comms */
-
-struct gre_h_seq {
- uint32_t gre_seq;
-} __packed __aligned(4);
+#define GRE_WCCP 0x883e
 
 struct gre_h_wccp {
  uint8_t wccp_flags;
@@ -147,7 +121,10 @@ struct gre_h_wccp {
  uint8_t pri_bucket;
 } __packed __aligned(4);
 
-#define GRE_WCCP 0x883e
+
+#define NVGRE_VSID_RES_MIN 0x000000 /* reserved for future use */
+#define NVGRE_VSID_RES_MAX 0x000fff
+#define NVGRE_VSID_NVE2NVE 0xffffff /* vendor specific NVE-to-NVE comms */
 
 #define GRE_HDRLEN (sizeof(struct ip) + sizeof(struct gre_header))
 
@@ -289,8 +266,8 @@ static int gre_up(struct gre_softc *);
 static int gre_down(struct gre_softc *);
 static void gre_link_state(struct ifnet *, unsigned int);
 
-static int gre_input_key(struct mbuf **, int *, int, int, uint8_t,
-    struct gre_tunnel *);
+static struct mbuf *
+ gre_if_input(struct mbuf *, int, uint8_t, struct gre_tunnel *);
 
 static struct mbuf *
  gre_ipv4_patch(const struct gre_tunnel *, struct mbuf *,
@@ -893,10 +870,9 @@ eoip_clone_destroy(struct ifnet *ifp)
  return (0);
 }
 
-int
-gre_input(struct mbuf **mp, int *offp, int type, int af)
+struct mbuf *
+gre_if4_input(struct mbuf *m, int hlen)
 {
- struct mbuf *m = *mp;
  struct gre_tunnel key;
  struct ip *ip;
 
@@ -908,17 +884,13 @@ gre_input(struct mbuf **mp, int *offp, i
  key.t_src4 = ip->ip_dst;
  key.t_dst4 = ip->ip_src;
 
- if (gre_input_key(mp, offp, type, af, ip->ip_tos, &key) == -1)
- return (rip_input(mp, offp, type, af));
-
- return (IPPROTO_DONE);
+ return (gre_if_input(m, hlen, ip->ip_tos, &key));
 }
 
 #ifdef INET6
-int
-gre_input6(struct mbuf **mp, int *offp, int type, int af)
+struct mbuf *
+gre_if6_input(struct mbuf *m, int hlen)
 {
- struct mbuf *m = *mp;
  struct gre_tunnel key;
  struct ip6_hdr *ip6;
  uint32_t flow;
@@ -933,10 +905,7 @@ gre_input6(struct mbuf **mp, int *offp,
 
  flow = bemtoh32(&ip6->ip6_flow);
 
- if (gre_input_key(mp, offp, type, af, flow >> 20, &key) == -1)
- return (rip6_input(mp, offp, type, af));
-
- return (IPPROTO_DONE);
+ return (gre_if_input(m, hlen, flow >> 20, &key));
 }
 #endif /* INET6 */
 
@@ -996,12 +965,10 @@ gre_input_1(struct gre_tunnel *key, stru
  return (m);
 }
 
-static int
-gre_input_key(struct mbuf **mp, int *offp, int type, int af, uint8_t otos,
-    struct gre_tunnel *key)
+static struct mbuf *
+gre_if_input(struct mbuf *m, int iphlen, uint8_t otos, struct gre_tunnel *key)
 {
- struct mbuf *m = *mp;
- int iphlen = *offp, hlen, rxprio;
+ int hlen, rxprio;
  struct ifnet *ifp;
  const struct gre_tunnel *tunnel;
  caddr_t buf;
@@ -1025,7 +992,7 @@ gre_input_key(struct mbuf **mp, int *off
 
  m = m_pullup(m, hlen);
  if (m == NULL)
- return (IPPROTO_DONE);
+ return (NULL);
 
  buf = mtod(m, caddr_t);
  gh = (struct gre_header *)(buf + iphlen);
@@ -1038,7 +1005,7 @@ gre_input_key(struct mbuf **mp, int *off
  case htons(GRE_VERS_1):
  m = gre_input_1(key, m, gh, otos, iphlen);
  if (m == NULL)
- return (IPPROTO_DONE);
+ return (NULL);
  /* FALLTHROUGH */
  default:
  goto decline;
@@ -1055,7 +1022,7 @@ gre_input_key(struct mbuf **mp, int *off
 
  m = m_pullup(m, hlen);
  if (m == NULL)
- return (IPPROTO_DONE);
+ return (NULL);
 
  buf = mtod(m, caddr_t);
  gh = (struct gre_header *)(buf + iphlen);
@@ -1071,7 +1038,7 @@ gre_input_key(struct mbuf **mp, int *off
     nvgre_input(key, m, hlen, otos) == -1)
  goto decline;
 
- return (IPPROTO_DONE);
+ return (NULL);
  }
 
  ifp = gre_find(key);
@@ -1148,7 +1115,7 @@ gre_input_key(struct mbuf **mp, int *off
 
  m_adj(m, hlen);
  gre_keepalive_recv(ifp, m);
- return (IPPROTO_DONE);
+ return (NULL);
 
  default:
  goto decline;
@@ -1162,7 +1129,7 @@ gre_input_key(struct mbuf **mp, int *off
 
  m = (*patch)(tunnel, m, &itos, otos);
  if (m == NULL)
- return (IPPROTO_DONE);
+ return (NULL);
 
  if (tunnel->t_key_mask == GRE_KEY_ENTROPY) {
  m->m_pkthdr.ph_flowid = M_FLOWID_VALID |
@@ -1203,10 +1170,9 @@ gre_input_key(struct mbuf **mp, int *off
 #endif
 
  (*input)(ifp, m);
- return (IPPROTO_DONE);
+ return (NULL);
 decline:
- *mp = m;
- return (-1);
+ return (m);
 }
 
 static struct mbuf *
Index: sys/netinet/gre_proto.h
===================================================================
RCS file: sys/netinet/gre_proto.h
diff -N sys/netinet/gre_proto.h
--- /dev/null 1 Jan 1970 00:00:00 -0000
+++ sys/netinet/gre_proto.h 29 Oct 2019 07:57:58 -0000
@@ -0,0 +1,48 @@
+/* $OpenBSD$ */
+
+/*
+ * Copyright (c) 2019 David Gwynne <[hidden email]>
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#ifndef _NETINET_GRE_H_
+#define _NETINET_GRE_H_
+
+struct gre_header {
+ uint16_t gre_flags;
+#define GRE_CP 0x8000 /* Checksum Present */
+#define GRE_KP 0x2000 /* Key Present */
+#define GRE_SP 0x1000 /* Sequence Present */
+
+#define GRE_VERS_MASK 0x0007
+#define GRE_VERS_0 0x0000
+#define GRE_VERS_1 0x0001
+
+ uint16_t gre_proto;
+};
+
+struct gre_h_cksum {
+ uint16_t gre_cksum;
+ uint16_t gre_reserved1;
+};
+
+struct gre_h_key {
+ uint32_t gre_key;
+};
+
+struct gre_h_seq {
+ uint32_t gre_seq;
+};
+
+#endif /* _NETINET_GRE_H_ */
Index: sys/netinet/gre_var.h
===================================================================
RCS file: sys/netinet/gre_var.h
diff -N sys/netinet/gre_var.h
--- /dev/null 1 Jan 1970 00:00:00 -0000
+++ sys/netinet/gre_var.h 29 Oct 2019 07:57:58 -0000
@@ -0,0 +1,64 @@
+/* $OpenBSD$ */
+
+/*
+ * Copyright (c) 2019 David Gwynne <[hidden email]>
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#ifndef _NETINET_GRE_VAR_H_
+#define _NETINET_GRE_VAR_H_
+
+/*
+ * setsockopt(s, IPPROTO_GRE, ...
+ */
+
+#define GRE_CKSUM 1 /* bool; enable GRE checksum headers */
+#define GRE_KEY 2 /* uint32_t; enable and set GRE key */
+ /* NULL; disable GRE key header */
+#define GRE_SEQ 3 /* uint32_t; enable and set GRE seq */
+ /* NULL; disable GRE seq header */
+
+#define GRE_RECVSEQ 4 /* bool; enable reception of seq numbers */
+#define GRE_SENDSEQ GRE_RECVSEQ
+
+#ifdef _KERNEL
+int gre_raw_usrreq(struct socket *, int, struct mbuf *, struct mbuf *,
+    struct mbuf *, struct proc *);
+int gre_sysctl(int *, u_int, void *, size_t *, void *, size_t);
+
+void gre_init(void);
+
+int gre_attach(struct socket *, int);
+int gre_detach(struct socket *);
+
+int gre_ip4_usrreq(struct socket *, int, struct mbuf *, struct mbuf *,
+    struct mbuf *, struct proc *);
+int gre_ip4_ctloutput(int, struct socket *, int, int, struct mbuf *);
+
+int gre_ip4_input(struct mbuf **, int *, int, int);
+
+struct mbuf *
+ gre_if4_input(struct mbuf *, int); /* interface glue */
+
+#ifdef INET6
+int gre_ip6_usrreq(struct socket *, int, struct mbuf *, struct mbuf *,
+    struct mbuf *, struct proc *);
+int gre_ip6_ctloutput(int, struct socket *, int, int, struct mbuf *);
+int gre_ip6_input(struct mbuf **, int *, int, int);
+
+struct mbuf *
+ gre_if6_input(struct mbuf *, int); /* interface glue */
+#endif /* INET6 */
+#endif /* _KERNEL */
+#endif /* _NETINET_GRE_VAR_H_ */
Index: sys/netinet/in_proto.c
===================================================================
RCS file: /cvs/src/sys/netinet/in_proto.c,v
retrieving revision 1.93
diff -u -p -r1.93 in_proto.c
--- sys/netinet/in_proto.c 15 Jul 2019 12:40:42 -0000 1.93
+++ sys/netinet/in_proto.c 29 Oct 2019 07:57:58 -0000
@@ -147,8 +147,7 @@
 
 #include "gre.h"
 #if NGRE > 0
-#include <netinet/ip_gre.h>
-#include <net/if_gre.h>
+#include <netinet/gre_var.h>
 #endif
 
 #include "carp.h"
@@ -346,11 +345,24 @@ const struct protosw inetsw[] = {
   .pr_domain = &inetdomain,
   .pr_protocol = IPPROTO_GRE,
   .pr_flags = PR_ATOMIC|PR_ADDR,
-  .pr_input = gre_input,
+  .pr_input = gre_ip4_input,
   .pr_ctloutput = rip_ctloutput,
-  .pr_usrreq = gre_usrreq,
+  .pr_usrreq = gre_raw_usrreq,
   .pr_attach = rip_attach,
   .pr_detach = rip_detach,
+  .pr_sysctl = gre_sysctl
+},
+{
+  .pr_type = SOCK_DGRAM,
+  .pr_domain = &inetdomain,
+  .pr_protocol = IPPROTO_GRE,
+  .pr_flags = PR_ATOMIC|PR_ADDR,
+  .pr_input = gre_ip4_input,
+  .pr_ctloutput = gre_ip4_ctloutput,
+  .pr_usrreq = gre_ip4_usrreq,
+  .pr_attach = gre_attach,
+  .pr_detach = gre_detach,
+  .pr_init = gre_init,
   .pr_sysctl = gre_sysctl
 },
 #endif /* NGRE > 0 */
Index: sys/netinet/ip_gre.c
===================================================================
RCS file: /cvs/src/sys/netinet/ip_gre.c,v
retrieving revision 1.71
diff -u -p -r1.71 ip_gre.c
--- sys/netinet/ip_gre.c 7 Feb 2018 22:30:59 -0000 1.71
+++ sys/netinet/ip_gre.c 29 Oct 2019 07:57:58 -0000
@@ -1,7 +1,23 @@
-/*      $OpenBSD: ip_gre.c,v 1.71 2018/02/07 22:30:59 dlg Exp $ */
+/* $OpenBSD: ip_gre.c,v 1.71 2018/02/07 22:30:59 dlg Exp $ */
 /* $NetBSD: ip_gre.c,v 1.9 1999/10/25 19:18:11 drochner Exp $ */
 
 /*
+ * Copyright (c) 2019 David Gwynne <[hidden email]>
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+/*
  * Copyright (c) 1998 The NetBSD Foundation, Inc.
  * All rights reserved.
  *
@@ -30,16 +46,6 @@
  * POSSIBILITY OF SUCH DAMAGE.
  */
 
-/*
- * decapsulate tunneled packets and send them on
- * output half is in net/if_gre.[ch]
- * This currently handles IPPROTO_GRE, IPPROTO_MOBILE
- */
-
-
-#include "gre.h"
-#if NGRE > 0
-
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/mbuf.h>
@@ -47,24 +53,40 @@
 #include <sys/socket.h>
 #include <sys/socketvar.h>
 #include <sys/sysctl.h>
+#include <sys/proc.h>
+#include <sys/atomic.h>
+#include <sys/pool.h>
+#include <sys/tree.h>
+
+#include <sys/domain.h>
 
 #include <net/if.h>
+#include <net/if_var.h>
 #include <net/route.h>
 
 #include <netinet/in.h>
+#include <netinet/in_var.h>
 #include <netinet/ip.h>
 #include <netinet/ip_var.h>
 #include <netinet/in_pcb.h>
 
+#include <netinet/gre_var.h>
+#include <netinet/gre_proto.h>
+
 #ifdef PIPEX
 #include <net/pipex.h>
 #endif
 
+
+/*
+ * socket({AF_INET,AF_INET6}, SOCK_RAW, IPPROTO_GRE);
+ */
+
 int
-gre_usrreq(struct socket *so, int req, struct mbuf *m, struct mbuf *nam,
+gre_raw_usrreq(struct socket *so, int req, struct mbuf *m, struct mbuf *nam,
     struct mbuf *control, struct proc *p)
 {
-#ifdef  PIPEX
+#ifdef  PIPEX
  struct inpcb *inp = sotoinpcb(so);
 
  if (inp != NULL && inp->inp_pipex && req == PRU_SEND) {
@@ -92,4 +114,1817 @@ gre_usrreq(struct socket *so, int req, s
 #endif
  return rip_usrreq(so, req, m, nam, control, p);
 }
-#endif /* if NGRE > 0 */
+
+/*
+ * support socket({AF_INET,AF_INET6}, SOCK_DGRAM, IPPROTO_GRE);
+ *
+ * GRE datagram sockets provide support for GRE version 0 packets
+ * in the kernel.
+ */
+
+#define GRE_VALID_MASK (GRE_VERS_MASK | GRE_CP | GRE_KP | GRE_SP)
+
+struct gre_pcb_key;
+
+struct gre_pcb {
+ struct inpcb gpcb_inpcb;
+ unsigned int gpcb_pflags; /* pcb flags */
+#define GREPCB_RECVSEQ (1 << 0)
+ uint16_t gpcb_flags; /* network byteorder */
+ uint32_t gpcb_key; /* network byteorder */
+ uint32_t gpcb_seq; /* host byteorder */
+
+ struct gre_pcb_key *gpcb_pcb_key;
+ int gpcb_reuse;
+ uint8_t gpcb_ttl;
+};
+TAILQ_HEAD(gre_pcb_list, inpcb);
+
+static inline struct gre_pcb *
+inp_gpcb(struct inpcb *inp)
+{
+ return ((struct gre_pcb *)inp);
+}
+
+#define gpcb_inp(_gpcb) (&(_gpcb)->gpcb_inpcb)
+
+static inline struct socket *
+gpcb_so(struct gre_pcb *gpcb)
+{
+ return (gpcb_inp(gpcb)->inp_socket);
+}
+
+struct gre_pcb_key {
+ union inpaddru gk_laddr;
+#define gk_laddr4 gk_laddr.iau_a4u.inaddr
+#define gk_laddr6 gk_laddr.iau_addr6
+ union inpaddru gk_faddr;
+#define gk_faddr4 gk_faddr.iau_a4u.inaddr
+#define gk_faddr6 gk_faddr.iau_addr6
+
+ uint16_t gk_flags; /* network byteorder */
+ uint16_t gk_proto; /* network byteorder */
+ uint32_t gk_key; /* network byteorder */
+
+ unsigned int gk_rtableid;
+ sa_family_t gk_family;
+
+ RBT_ENTRY(gre_pcb_key) gk_entry;
+ struct gre_pcb_list gk_pcbs;
+ unsigned int gk_state;
+#define GRE_S_DISCONNECTED 0
+#define GRE_S_WILDCARD 1
+#define GRE_S_BOUND 2
+#define GRE_S_CONNECTED 3
+};
+
+RBT_HEAD(gre_tree_wildcards, gre_pcb_key);
+RBT_HEAD(gre_tree_bound, gre_pcb_key);
+RBT_HEAD(gre_tree_connected, gre_pcb_key);
+
+static int gre_pcb_key_cmp_wildcard(const struct gre_pcb_key *,
+    const struct gre_pcb_key *);
+static int gre_pcb_key_cmp_bound(const struct gre_pcb_key *,
+    const struct gre_pcb_key *);
+
+RBT_PROTOTYPE(gre_tree_wildcards, gre_pcb_key, gk_entry,
+    gre_pcb_key_cmp_wildcard);
+RBT_PROTOTYPE(gre_tree_bound, gre_pcb_key, gk_entry,
+    gre_pcb_key_cmp_bound);
+RBT_PROTOTYPE(gre_tree_connected, gre_pcb_key, gk_entry,
+    gre_pcb_key_cmp_connected);
+
+struct gre_ops {
+ int (*op_nametosa)(struct inpcb *, struct mbuf *,
+    struct sockaddr **);
+ int (*op_is_wildcard)(struct sockaddr *);
+ int (*op_is_multicast)(struct sockaddr *);
+ int (*op_is_broadcast)(unsigned int, struct sockaddr *);
+ int (*op_is_local)(unsigned int, struct sockaddr *);
+
+ uint16_t (*op_proto)(struct sockaddr *);
+ void (*op_addr)(union inpaddru *, struct sockaddr *);
+
+ int (*op_selsrc)(struct inpcb *, struct sockaddr *,
+    union inpaddru *, void *);
+ int (*op_control)(struct socket *, u_long, caddr_t, struct ifnet *);
+ void (*op_getsockname)(struct inpcb *, struct mbuf *);
+ void (*op_getpeername)(struct inpcb *, struct mbuf *);
+ int (*op_ctloutput)(int, struct socket *, int, int, struct mbuf *);
+
+ void (*op_sbappend)(struct gre_pcb *, struct mbuf *, int, uint32_t);
+
+ /* PRU_SEND is split into two parts, with gre_output in the middle: */
+
+ /* 1: the "pre" phase, so ipv6 can do it's stupid pktops stuff */
+ int (*op_send)(const struct gre_ops *, struct gre_pcb *,
+    struct mbuf *, struct mbuf *, struct mbuf *);
+ /* 2: actually doing the ip encap and output */
+ int (*op_output)(struct gre_pcb *, const struct gre_pcb_key *,
+    struct mbuf *, void *);
+
+ int * const defttl;
+};
+struct gre_tree_wildcards gre_wildcards = RBT_INITIALIZER();
+struct gre_tree_bound gre_bound = RBT_INITIALIZER();
+struct gre_tree_connected gre_connected = RBT_INITIALIZER();
+
+struct pool gre_pcb_key_pool;
+struct pool gre_pcb_pool;
+
+static struct mbuf *
+ gre_ip_input(const struct gre_ops *, struct mbuf *, int,
+    uint8_t, struct gre_pcb_key *);
+
+static int gre_usrreq(const struct gre_ops *, struct socket *, int,
+    struct mbuf *, struct mbuf *, struct mbuf *,
+    struct proc *);
+static int gre_ctloutput(const struct gre_ops *, int, struct socket *,
+    int, int, struct mbuf *);
+
+static int gre_send(const struct gre_ops *, struct gre_pcb *,
+    struct mbuf *, struct mbuf *, struct mbuf *);
+static int gre_output(const struct gre_ops *, struct gre_pcb *,
+    struct mbuf *, struct mbuf *, struct mbuf *, void *);
+static int gre_disconnect(struct gre_pcb *);
+
+#define GRE_OPT_EINVAL ((void *)-1)
+static void *gre_opt(struct mbuf *, int, int, socklen_t);
+
+unsigned int gre_sendspace = 9216; /* XXX sysctl? */
+unsigned int gre_recvspace = (40 * 1024);
+
+static struct gre_pcb_key *
+gre_pcb_key_get(const struct gre_pcb *gpcb)
+{
+ struct gre_pcb_key *gk;
+
+ gk = pool_get(&gre_pcb_key_pool, PR_NOWAIT|PR_ZERO);
+ if (gk == NULL)
+ return (NULL);
+
+ gk->gk_flags = gpcb->gpcb_flags;
+ gk->gk_key = gpcb->gpcb_key;
+
+ TAILQ_INIT(&gk->gk_pcbs);
+
+ return (gk);
+}
+
+static void
+gre_pcb_key_put(struct gre_pcb_key *gk)
+{
+ pool_put(&gre_pcb_key_pool, gk);
+}
+
+static struct gre_pcb_key *
+gre_pcb_key_insert(unsigned int state, struct gre_pcb_key *gk)
+{
+ struct gre_pcb_key *ogk;
+
+ gk->gk_state = state;
+
+ switch (state) {
+ case GRE_S_WILDCARD:
+ ogk = RBT_INSERT(gre_tree_wildcards, &gre_wildcards, gk);
+ break;
+ case GRE_S_BOUND:
+ ogk = RBT_INSERT(gre_tree_bound, &gre_bound, gk);
+ break;
+ case GRE_S_CONNECTED:
+ ogk = RBT_INSERT(gre_tree_connected, &gre_connected, gk);
+ break;
+ default:
+ panic("%s unexpected state %u", __func__, state);
+ }
+
+ return (ogk);
+}
+
+static inline int
+gre_pcb_empty(struct gre_pcb_list *l)
+{
+ return (TAILQ_EMPTY(l));
+}
+
+static inline void
+gre_pcb_insert(struct gre_pcb_list *l, struct gre_pcb *gpcb)
+{
+ TAILQ_INSERT_TAIL(l, gpcb_inp(gpcb), inp_queue);
+}
+
+static inline void
+gre_pcb_remove(struct gre_pcb_list *l, struct gre_pcb *gpcb)
+{
+ TAILQ_REMOVE(l, gpcb_inp(gpcb), inp_queue);
+}
+
+static inline struct gre_pcb *
+gre_pcb_first(struct gre_pcb_list *l)
+{
+ return (inp_gpcb(TAILQ_FIRST(l)));
+}
+
+static inline struct gre_pcb *
+gre_pcb_next(struct gre_pcb *gpcb)
+{
+ return (inp_gpcb(TAILQ_NEXT(gpcb_inp(gpcb), inp_queue)));
+}
+
+/*
+ * INET4
+ */
+
+static int gre_ip4_nametosa(struct inpcb *, struct mbuf *,
+    struct sockaddr **);
+static int gre_ip4_is_wildcard(struct sockaddr *);
+static int gre_ip4_is_multicast(struct sockaddr *);
+static int gre_ip4_is_broadcast(unsigned int, struct sockaddr *);
+static int gre_ip4_is_local(unsigned int, struct sockaddr *);
+static uint16_t gre_ip4_proto(struct sockaddr *);
+static void gre_ip4_addr(union inpaddru *, struct sockaddr *);
+static int gre_ip4_selsrc(struct inpcb *, struct sockaddr *,
+    union inpaddru *, void *);
+static void gre_ip4_sbappend(struct gre_pcb *, struct mbuf *, int,
+    uint32_t);
+static int gre_ip4_send(const struct gre_ops *, struct gre_pcb *,
+    struct mbuf *, struct mbuf *, struct mbuf *);
+static int gre_ip4_output(struct gre_pcb *, const struct gre_pcb_key *,
+    struct mbuf *, void *);
+
+static const struct gre_ops gre_ip4_ops = {
+ .op_nametosa = gre_ip4_nametosa,
+ .op_is_wildcard = gre_ip4_is_wildcard,
+ .op_is_multicast = gre_ip4_is_multicast,
+ .op_is_broadcast = gre_ip4_is_broadcast,
+ .op_is_local = gre_ip4_is_local,
+
+ .op_proto = gre_ip4_proto,
+ .op_addr = gre_ip4_addr,
+
+ .op_selsrc = gre_ip4_selsrc,
+ .op_control = in_control,
+ .op_getsockname = in_setsockaddr,
+ .op_getpeername = in_setpeeraddr,
+ .op_ctloutput = ip_ctloutput,
+
+ .op_sbappend = gre_ip4_sbappend,
+ .op_send = gre_ip4_send,
+ .op_output = gre_ip4_output,
+
+ .defttl = &ip_defttl,
+};
+
+
+static int
+gre_ip4_nametosa(struct inpcb *inp, struct mbuf *addr, struct sockaddr **sa)
+{
+ return (in_nam2sin(addr, (struct sockaddr_in **)sa));
+}
+
+static int
+gre_ip4_is_wildcard(struct sockaddr *sa)
+{
+ struct sockaddr_in *sin = satosin(sa);
+ return (sin->sin_addr.s_addr == INADDR_ANY);
+}
+
+static int
+gre_ip4_is_multicast(struct sockaddr *sa)
+{
+ struct sockaddr_in *sin = satosin(sa);
+ return (IN_MULTICAST(sin->sin_addr.s_addr));
+}
+
+static int
+gre_ip4_is_broadcast(unsigned int rtableid, struct sockaddr *sa)
+{
+ struct sockaddr_in *sin = satosin(sa);
+
+ return ((sin->sin_addr.s_addr == INADDR_BROADCAST) ||
+    in_broadcast(sin->sin_addr, rtableid));
+}
+
+static int
+gre_ip4_is_local(unsigned int rtableid, struct sockaddr *sa)
+{
+ struct sockaddr_in *sin = satosin(sa);
+ /* cope with ifa_ifwithaddr using memcmp */
+ struct sockaddr_in needle = {
+ .sin_len = sin->sin_len,
+ .sin_family = sin->sin_family,
+ .sin_addr = sin->sin_addr,
+ };
+ struct ifaddr *ifa;
+
+ ifa = ifa_ifwithaddr(sintosa(&needle), rtableid);
+ if (ifa == NULL)
+ return (EADDRNOTAVAIL);
+
+ return (0);
+}
+
+static uint16_t
+gre_ip4_proto(struct sockaddr *sa)
+{
+ struct sockaddr_in *sin = satosin(sa);
+ return (sin->sin_port);
+}
+
+static void
+gre_ip4_addr(union inpaddru *addr, struct sockaddr *sa)
+{
+ struct sockaddr_in *sin = satosin(sa);
+ addr->iau_a4u.inaddr = sin->sin_addr;
+}
+
+static int
+gre_ip4_selsrc(struct inpcb *inp, struct sockaddr *sa, union inpaddru *laddr,
+    void *opts)
+{
+ struct in_addr *insrc;
+ int error;
+
+ insrc = gre_opt(opts, IPPROTO_IP, IP_SENDSRCADDR, sizeof(*insrc));
+ if (insrc == GRE_OPT_EINVAL)
+ return (EINVAL);
+
+ if (insrc == NULL) {
+ error = in_pcbselsrc(&insrc, satosin(sa), inp);
+ if (error != 0)
+ return (error);
+ } else {
+ struct sockaddr_in sin = {
+ .sin_len = sizeof(sin),
+ .sin_family = AF_INET,
+ .sin_addr = *insrc,
+ };
+
+ /* XXX sigh. */
+ error = in_pcbaddrisavail(inp, &sin, 0, NULL);
+ if (error != 0)
+ return (error);
+ }
+
+ laddr->iau_a4u.inaddr = *insrc;
+
+ return (0);
+}
+
+static int
+gre_ip4_send(const struct gre_ops *ops, struct gre_pcb *gpcb,
+    struct mbuf *m, struct mbuf *addr, struct mbuf *control)
+{
+ return (gre_output(ops, gpcb, m, addr, control, control));
+}
+
+static int
+gre_ip4_output(struct gre_pcb *gpcb, const struct gre_pcb_key *gk,
+    struct mbuf *m, void *opts)
+{
+ struct inpcb *inp = gpcb_inp(gpcb);
+ struct ip *ip;
+ int error;
+ uint32_t ipsecflowinfo = 0;
+
+ m = m_prepend(m, sizeof(*ip), M_DONTWAIT);
+ if (m == NULL) {
+ error = ENOBUFS;
+ goto dropped;
+ }
+ if (m->m_pkthdr.len > IP_MAXPACKET) {
+ error = EMSGSIZE;
+ goto drop;
+ }
+
+#ifdef IPSEC
+ if (ISSET(inp->inp_flags, INP_IPSECFLOWINFO)) {
+ uint32_t *p = gre_opt(opts, IPPROTO_IP, IP_IPSECFLOWINFO,
+    sizeof(*p));
+ if (p == GRE_OPT_EINVAL) {
+ error = EINVAL;
+ goto drop;
+ }
+
+ if (p != NULL)
+ ipsecflowinfo = *p;
+ }
+#endif /* IPSEC */
+
+ ip = mtod(m, struct ip *);
+ ip->ip_v = IPVERSION;
+ ip->ip_hl = sizeof(*ip) >> 2;
+ ip->ip_off = 0; /* XXX nodf? */
+ ip->ip_tos = 0; /* XXX */;
+ ip->ip_len = htons(m->m_pkthdr.len);
+ ip->ip_ttl = gpcb->gpcb_ttl;
+ ip->ip_p = IPPROTO_GRE;
+ ip->ip_src = gk->gk_laddr4;
+ ip->ip_dst = gk->gk_faddr4;
+
+ error = ip_output(m, inp->inp_options, &inp->inp_route,
+    gpcb_so(gpcb)->so_options & SO_BROADCAST, inp->inp_moptions, inp,
+    ipsecflowinfo);
+
+ return (error);
+
+drop:
+ m_freem(m);
+dropped:
+ return (error);
+}
+
+static void *
+gre_opt(struct mbuf *control, int level, int type, socklen_t len)
+{
+ u_int clen;
+ struct cmsghdr *cm;
+ caddr_t cmsgs;
+ size_t mlen;
+
+ if (control == NULL)
+ return (NULL);
+
+ if (control->m_next != NULL)
+ return (GRE_OPT_EINVAL);
+
+ mlen = CMSG_LEN(len);
+
+ clen = control->m_len;
+ cmsgs = mtod(control, caddr_t);
+ do {
+ if (clen < CMSG_LEN(0))
+ return (GRE_OPT_EINVAL);
+
+ cm = (struct cmsghdr *)cmsgs;
+ if (cm->cmsg_len < CMSG_LEN(0) ||
+    CMSG_ALIGN(cm->cmsg_len) > clen)
+ return (GRE_OPT_EINVAL);
+
+ if (cm->cmsg_level == level &&
+    cm->cmsg_type == type &&
+    cm->cmsg_len == mlen)
+ return (CMSG_DATA(cm));
+
+ clen -= CMSG_ALIGN(cm->cmsg_len);
+ cmsgs += CMSG_ALIGN(cm->cmsg_len);
+ } while (clen);
+
+ return (NULL);
+}
+
+static int
+gre_send(const struct gre_ops *ops, struct gre_pcb *gpcb,
+    struct mbuf *m, struct mbuf *addr, struct mbuf *control)
+{
+ int error;
+
+ error = (*ops->op_send)(ops, gpcb, m, addr, control);
+
+ m_freem(control);
+
+ return (error);
+}
+
+static int
+gre_output(const struct gre_ops *ops, struct gre_pcb *gpcb,
+    struct mbuf *m, struct mbuf *addr, struct mbuf *control, void *opts)
+{
+ const struct gre_pcb_key *gk = gpcb->gpcb_pcb_key;
+ struct gre_pcb_key key;
+ struct gre_header *gh;
+ int error;
+
+ if (addr != NULL) {
+ struct gre_pcb_key *ogk;
+ struct inpcb *inp = gpcb_inp(gpcb);
+ struct sockaddr *sa;
+ int state = GRE_S_DISCONNECTED;
+
+ if (gk != NULL)
+ state = gk->gk_state;
+
+ if (state == GRE_S_CONNECTED) {
+ error = EISCONN;
+ goto drop;
+ }
+
+ error = (*ops->op_nametosa)(inp, addr, &sa);
+ if (error != 0)
+ goto drop;
+
+ key.gk_family = sa->sa_family;
+ key.gk_proto = (*ops->op_proto)(sa);
+
+ if (state != GRE_S_DISCONNECTED) {
+ KASSERT(key.gk_family == gk->gk_family);
+ if (key.gk_proto != gk->gk_proto) {
+ error = EADDRNOTAVAIL;
+ goto drop;
+ }
+ }
+ if (state == GRE_S_BOUND) {
+ key.gk_laddr = gk->gk_laddr;
+ } else {
+ error = (*ops->op_selsrc)(inp, sa, &key.gk_laddr,
+    opts);
+ if (error != 0)
+ goto drop;
+ }
+
+ (*ops->op_addr)(&key.gk_faddr, sa);
+
+ key.gk_rtableid = inp->inp_rtableid;
+ key.gk_flags = gpcb->gpcb_flags;
+ key.gk_key = gpcb->gpcb_key;
+
+ ogk = RBT_FIND(gre_tree_connected, &gre_connected, &key);
+ if (ogk != NULL) {
+ struct gre_pcb *ogpcb;
+ int reuse = gpcb->gpcb_reuse;
+
+ ogpcb = gre_pcb_first(&ogk->gk_pcbs);
+ if (ogpcb != NULL) {
+ struct socket *oso = gpcb_so(ogpcb);
+ if (!ISSET(reuse, oso->so_options)) {
+ error = EADDRINUSE;
+ goto drop;
+ }
+ }
+ }
+
+ gk = &key;
+ } else {
+ if (gk == NULL || gk->gk_state != GRE_S_CONNECTED) {
+ error = ENOTCONN;
+ goto drop;
+ }
+ }
+
+ if (ISSET(gk->gk_flags, htons(GRE_SP))) {
+ struct gre_h_seq *gsh;
+ uint32_t *seqno;
+
+ seqno = gre_opt(control, IPPROTO_GRE, GRE_SENDSEQ,
+    sizeof(*seqno));
+ if (seqno == GRE_OPT_EINVAL) {
+ error = EINVAL;
+ goto drop;
+ }
+
+ m = m_prepend(m, sizeof(*gsh), M_DONTWAIT);
+ if (m == NULL)
+ return (ENOBUFS);
+
+ gsh = mtod(m, struct gre_h_seq *);
+ htobem32(&gsh->gre_seq, seqno != NULL ?
+    (gpcb->gpcb_seq = *seqno) : /* keep track of new start */
+    atomic_inc_int_nv(&gpcb->gpcb_seq));
+ }
+
+ if (ISSET(gk->gk_flags, htons(GRE_KP))) {
+ struct gre_h_key *gkh;
+
+ m = m_prepend(m, sizeof(*gkh), M_DONTWAIT);
+ if (m == NULL)
+ return (ENOBUFS);
+
+ gkh = mtod(m, struct gre_h_key *);
+ gkh->gre_key = gk->gk_key;
+ }
+
+ if (ISSET(gk->gk_flags, htons(GRE_CP))) {
+ struct gre_h_cksum *gch;
+
+ m = m_prepend(m, sizeof(*gch), M_DONTWAIT);
+ if (m == NULL)
+ return (ENOBUFS);
+
+ gch = mtod(m, struct gre_h_cksum *);
+ gch->gre_cksum = 0; /* XXX need to checksum */
+ gch->gre_reserved1 = 0;
+ }
+
+ m = m_prepend(m, sizeof(*gh), M_DONTWAIT);
+ if (m == NULL)
+ return (ENOBUFS);
+
+ gh = mtod(m, struct gre_header *);
+ gh->gre_flags = gk->gk_flags;
+ gh->gre_proto = gk->gk_proto;
+
+ KASSERT(ISSET(m->m_flags, M_PKTHDR));
+
+ m->m_pkthdr.ph_rtableid = gpcb_inp(gpcb)->inp_rtableid;
+
+ return ((*ops->op_output)(gpcb, gk, m, opts));
+
+drop:
+ m_freem(m);
+ return (error);
+}
+
+static void
+gre_sbappend(struct gre_pcb *gpcb, struct sockaddr *sa, struct mbuf *m,
+    struct mbuf *opts, int hlen, uint32_t seqno)
+{
+ struct socket *so = gpcb_so(gpcb);
+
+ if (ISSET(gpcb->gpcb_pflags, GREPCB_RECVSEQ)) {
+ struct mbuf *opt = sbcreatecontrol(&seqno, sizeof(seqno),
+    GRE_RECVSEQ, IPPROTO_GRE);
+ if (opt != NULL) {
+ opt->m_next = opts;
+ opts = opt;
+ }
+ }
+
+ m_adj(m, hlen);
+ if (sbappendaddr(so, &so->so_rcv, sa, m, opts) == 0) {
+ m_freem(m);
+ m_freem(opts);
+ return;
+ }
+
+ sorwakeup(so);
+}
+
+int
+gre_ip4_usrreq(struct socket *so, int req, struct mbuf *m, struct mbuf *addr,
+    struct mbuf *control, struct proc *p)
+{
+ return (gre_usrreq(&gre_ip4_ops, so, req, m, addr, control, p));
+}
+
+int
+gre_ip4_ctloutput(int op, struct socket *so, int level, int optname,
+    struct mbuf *m)
+{
+ return (gre_ctloutput(&gre_ip4_ops, op, so, level, optname, m));
+}
+
+static void
+gre_ip4_sbappend(struct gre_pcb *gpcb, struct mbuf *m, int hlen, uint32_t seqno)
+{
+ struct inpcb *inp = gpcb_inp(gpcb);
+ struct socket *so = gpcb_so(gpcb);
+ struct mbuf *opts = NULL;
+ struct sockaddr_in sin = {
+ .sin_len = sizeof(sin),
+ .sin_family = AF_INET,
+ .sin_port = inp->inp_lport,
+ .sin_addr = mtod(m, struct ip *)->ip_src,
+ };
+
+ if (ISSET(inp->inp_flags, INP_CONTROLOPTS) ||
+    ISSET(so->so_options, SO_TIMESTAMP))
+ ip_savecontrol(inp, &opts, mtod(m, struct ip *), m);
+
+ if (ISSET(inp->inp_flags, INP_RECVDSTPORT)) {
+ struct mbuf *opt;
+
+ opt = sbcreatecontrol(&inp->inp_lport, sizeof(inp->inp_lport),
+    IP_RECVDSTPORT, IPPROTO_IP);
+ if (opt != NULL) {
+ opt->m_next = opts;
+ opts = opt;
+ }
+ }
+
+ gre_sbappend(gpcb, sintosa(&sin), m, opts, hlen, seqno);
+}
+
+int
+gre_ip4_input(struct mbuf **mp, int *offp, int proto, int af)
+{
+ struct mbuf *m = *mp;
+ struct gre_pcb_key gk;
+ struct ip *ip;
+
+ m = gre_if4_input(m, *offp);
+ if (m == NULL)
+ return (IPPROTO_DONE);
+
+ ip = mtod(m, struct ip *);
+
+ gk.gk_family = AF_INET;
+ gk.gk_laddr4 = ip->ip_dst;
+ gk.gk_faddr4 = ip->ip_src;
+
+ m = gre_ip_input(&gre_ip4_ops, m, *offp, ip->ip_ttl, &gk);
+ if (m == NULL)
+ return (IPPROTO_DONE);
+
+ *mp = m;
+ return (rip_input(mp, offp, proto, af));
+}
+
+#if INET6
+#include <netinet6/ip6_var.h>
+#include <netinet6/in6_var.h>
+
+static int gre_ip6_nametosa(struct inpcb *, struct mbuf *,
+    struct sockaddr **);
+static int gre_ip6_is_wildcard(struct sockaddr *);
+static int gre_ip6_is_multicast(struct sockaddr *);
+static int gre_ip6_is_broadcast(unsigned int, struct sockaddr *);
+static int gre_ip6_is_local(unsigned int, struct sockaddr *);
+static uint16_t gre_ip6_proto(struct sockaddr *);
+static void gre_ip6_addr(union inpaddru *, struct sockaddr *);
+static int gre_ip6_selsrc(struct inpcb *, struct sockaddr *,
+    union inpaddru *, void *);
+static void gre_ip6_sbappend(struct gre_pcb *, struct mbuf *, int,
+    uint32_t);
+static int gre_ip6_send(const struct gre_ops *, struct gre_pcb *,
+    struct mbuf *, struct mbuf *, struct mbuf *);
+static int gre_ip6_output(struct gre_pcb *, const struct gre_pcb_key *,
+    struct mbuf *, void *);
+
+static const struct gre_ops gre_ip6_ops = {
+ .op_nametosa = gre_ip6_nametosa,
+ .op_is_wildcard = gre_ip6_is_wildcard,
+ .op_is_multicast = gre_ip6_is_multicast,
+ .op_is_broadcast = gre_ip6_is_broadcast,
+ .op_is_local = gre_ip6_is_local,
+
+ .op_proto = gre_ip6_proto,
+ .op_addr = gre_ip6_addr,
+
+ .op_selsrc = gre_ip6_selsrc,
+ .op_control = in6_control,
+ .op_getsockname = in6_setsockaddr,
+ .op_getpeername = in6_setpeeraddr,
+ .op_ctloutput = ip6_ctloutput,
+ .op_sbappend = gre_ip6_sbappend,
+ .op_send = gre_ip6_send,
+ .op_output = gre_ip6_output,
+
+ .defttl = &ip6_defhlim,
+};
+
+static int
+gre_ip6_nametosa(struct inpcb *inp, struct mbuf *addr, struct sockaddr **sa)
+{
+ struct sockaddr_in6 *sin6;
+ int error;
+
+ error = in6_nam2sin6(addr, &sin6);
+ if (error != 0)
+ return (error);
+
+ /* reject IPv4 mapped addresses, we have no support for them */
+ if (IN6_IS_ADDR_V4MAPPED(&sin6->sin6_addr))
+ return (EADDRNOTAVAIL);
+
+ if (in6_embedscope(&sin6->sin6_addr, sin6, inp) != 0)
+ return (EINVAL);
+
+ *sa = sin6tosa(sin6);
+ return (0);
+}
+
+static int
+gre_ip6_is_wildcard(struct sockaddr *sa)
+{
+ struct sockaddr_in6 *sin6 = satosin6(sa);
+ return (IN6_IS_ADDR_UNSPECIFIED(&sin6->sin6_addr));
+}
+
+static int
+gre_ip6_is_multicast(struct sockaddr *sa)
+{
+ struct sockaddr_in6 *sin6 = satosin6(sa);
+ return (IN6_IS_ADDR_MULTICAST(&sin6->sin6_addr));
+}
+
+static int
+gre_ip6_is_broadcast(unsigned int rtableid, struct sockaddr *sa)
+{
+ return (0);
+}
+
+static int
+gre_ip6_is_local(unsigned int rtableid, struct sockaddr *sa)
+{
+ struct sockaddr_in6 *sin6 = satosin6(sa);
+ struct sockaddr_in6 needle = {
+ .sin6_len = sin6->sin6_len,
+ .sin6_family = sin6->sin6_family,
+ .sin6_addr = sin6->sin6_addr,
+ };
+ struct ifaddr *ifa;
+
+ ifa = ifa_ifwithaddr(sin6tosa(&needle), rtableid);
+ if (ifa == NULL)
+ return (EADDRNOTAVAIL);
+
+ /*
+ * bind to an anycast address might accidentally
+ * cause sending a packet with an anycast source
+ * address, so we forbid it.
+ *
+ * We should allow to bind to a deprecated address,
+ * since the application dare to use it.
+ * But, can we assume that they are careful enough
+ * to check if the address is deprecated or not?
+ * Maybe, as a safeguard, we should have a setsockopt
+ * flag to control the bind(2) behavior against
+ * deprecated addresses (default: forbid bind(2)).
+ */
+ if (ISSET(ifatoia6(ifa)->ia6_flags, IN6_IFF_ANYCAST|
+    IN6_IFF_TENTATIVE|IN6_IFF_DUPLICATED|IN6_IFF_DETACHED))
+ return (EADDRNOTAVAIL);
+
+ return (0);
+}
+
+static uint16_t
+gre_ip6_proto(struct sockaddr *sa)
+{
+ struct sockaddr_in6 *sin6 = satosin6(sa);
+ return (sin6->sin6_port);
+}
+
+static void
+gre_ip6_addr(union inpaddru *addr, struct sockaddr *sa)
+{
+ struct sockaddr_in6 *sin6 = satosin6(sa);
+ addr->iau_addr6 = sin6->sin6_addr;
+}
+
+static int
+gre_ip6_selsrc(struct inpcb *inp, struct sockaddr *sa, union inpaddru *laddr,
+    void *opts)
+{
+ struct in6_addr *in6src;
+ int error;
+
+ if (opts == NULL)
+ opts = inp->inp_outputopts6;
+
+ error = in6_pcbselsrc(&in6src, satosin6(sa), inp, opts);
+ if (error != 0)
+ return (error);
+
+ laddr->iau_addr6 = *in6src;
+
+ return (0);
+}
+
+int
+gre_ip6_usrreq(struct socket *so, int req, struct mbuf *m, struct mbuf *addr,
+    struct mbuf *control, struct proc *p)
+{
+ return (gre_usrreq(&gre_ip6_ops, so, req, m, addr, control, p));
+}
+
+int
+gre_ip6_ctloutput(int op, struct socket *so, int level, int optname,
+    struct mbuf *m)
+{
+ return (gre_ctloutput(&gre_ip6_ops, op, so, level, optname, m));
+}
+
+static void
+gre_ip6_sbappend(struct gre_pcb *gpcb, struct mbuf *m, int hlen, uint32_t seqno)
+{
+ struct inpcb *inp = gpcb_inp(gpcb);
+ struct socket *so = gpcb_so(gpcb);
+ struct mbuf *opts = NULL;
+ struct sockaddr_in6 sin6 = {
+ .sin6_len = sizeof(sin6),
+ .sin6_family = AF_INET6,
+ .sin6_port = inp->inp_lport,
+ };
+
+ in6_recoverscope(&sin6, &mtod(m, struct ip6_hdr *)->ip6_src);
+
+ if (ISSET(inp->inp_flags, IN6P_CONTROLOPTS) ||
+    ISSET(so->so_options, SO_TIMESTAMP))
+ ip6_savecontrol(inp, m, &opts);
+
+ if (ISSET(inp->inp_flags, IN6P_RECVDSTPORT)) {
+ struct mbuf *opt;
+
+ opt = sbcreatecontrol(&inp->inp_fport, sizeof(inp->inp_fport),
+    IPV6_RECVDSTPORT, IPPROTO_IPV6);
+ if (opt != NULL) {
+ opt->m_next = opts;
+ opts = opt;
+ }
+ }
+
+ gre_sbappend(gpcb, sin6tosa(&sin6), m, opts, hlen, seqno);
+}
+
+static int
+gre_ip6_send(const struct gre_ops *ops, struct gre_pcb *gpcb, struct mbuf *m,
+    struct mbuf *addr, struct mbuf *control)
+{
+ struct inpcb *inp = gpcb_inp(gpcb);
+ struct ip6_pktopts *opts = inp->inp_outputopts6;
+ struct ip6_pktopts opt;
+ int error;
+
+ if (control != NULL) {
+ error = ip6_setpktopts(control, &opt, opts, /* priv */ 1,
+    IPPROTO_GRE);
+ if (error != 0) {
+ m_freem(m);
+ return (error);
+ }
+ opts = &opt;
+ }
+
+ error = gre_output(ops, gpcb, m, addr, control, opts);
+ if (control != NULL)
+ ip6_clearpktopts(&opt, -1);
+ return (error);
+}
+
+static int
+gre_ip6_output(struct gre_pcb *gpcb, const struct gre_pcb_key *gk,
+    struct mbuf *m, void *opts)
+{
+ struct inpcb *inp = gpcb_inp(gpcb);
+ uint16_t len = m->m_pkthdr.len;
+ struct ip6_hdr *ip6;
+ int flags = 0;
+ int error;
+
+ if (len > IP_MAXPACKET) {
+ error = EMSGSIZE;
+ goto drop;
+ }
+
+ m = m_prepend(m, sizeof(*ip6), M_DONTWAIT);
+ if (m == NULL) {
+ error = ENOBUFS;
+ goto dropped;
+ }
+
+ ip6 = mtod(m, struct ip6_hdr *);
+ ip6->ip6_vfc = IPV6_VERSION;
+ ip6->ip6_plen = htons(len);
+ ip6->ip6_nxt = IPPROTO_GRE;
+ ip6->ip6_hlim = gpcb->gpcb_ttl;
+ ip6->ip6_src = gk->gk_laddr6;
+ ip6->ip6_dst = gk->gk_faddr6;
+
+ if (ISSET(inp->inp_flags, IN6P_MINMTU)) /* wtf */
+ flags |= IPV6_MINMTU;
+
+ error = ip6_output(m, opts, &inp->inp_route6,
+    flags, inp->inp_moptions6, inp);
+
+ return (error);
+
+drop:
+ m_freem(m);
+dropped:
+ return (error);
+}
+
+int
+gre_ip6_input(struct mbuf **mp, int *offp, int proto, int af)
+{
+ struct mbuf *m = *mp;
+ struct gre_pcb_key gk;
+ struct ip6_hdr *ip6;
+
+ m = gre_if6_input(m, *offp);
+ if (m == NULL)
+ return (IPPROTO_DONE);
+
+ ip6 = mtod(m, struct ip6_hdr *);
+
+ gk.gk_family = AF_INET6;
+ gk.gk_laddr6 = ip6->ip6_dst;
+ gk.gk_faddr6 = ip6->ip6_src;
+
+ m = gre_ip_input(&gre_ip6_ops, m, *offp, ip6->ip6_hlim, &gk);
+ if (m == NULL)
+ return (IPPROTO_DONE);
+
+ *mp = m;
+ return (rip6_input(mp, offp, proto, af));
+}
+#endif /* INET6 */
+
+/*
+ * generic GRE protocol handling
+ */
+
+void
+gre_init(void)
+{
+ pool_init(&gre_pcb_key_pool, sizeof(struct gre_pcb_key), 0,
+    IPL_NONE, PR_WAITOK, "grekey", NULL);
+ pool_init(&gre_pcb_pool, sizeof(struct gre_pcb), 0,
+    IPL_NONE, PR_WAITOK, "grepcb", NULL);
+}
+
+int
+gre_attach(struct socket *so, int proto)
+{
+ const struct gre_ops *ops = &gre_ip4_ops;
+ struct gre_pcb *gpcb;
+ struct inpcb *inp;
+ int error;
+ int flags = 0;
+
+ if (so->so_pcb != NULL)
+ return (EINVAL);
+
+ error = suser(curproc);
+ if (error != 0)
+ return (error);
+
+ error = soreserve(so, gre_sendspace, gre_recvspace);
+ if (error != 0)
+ return (error);
+
+#ifdef INET6
+ if (sotopf(so) == PF_INET6) {
+ ops = &gre_ip6_ops;
+ flags = INP_IPV6;
+ }
+#endif
+
+ gpcb = pool_get(&gre_pcb_pool, PR_NOWAIT | PR_ZERO);
+ if (gpcb == NULL)
+ return (ENOBUFS);
+
+ inp = gpcb_inp(gpcb);
+ inp->inp_socket = so;
+ refcnt_init(&inp->inp_refcnt);
+ inp->inp_seclevel[SL_AUTH] = IPSEC_AUTH_LEVEL_DEFAULT;
+ inp->inp_seclevel[SL_ESP_TRANS] = IPSEC_ESP_TRANS_LEVEL_DEFAULT;
+ inp->inp_seclevel[SL_ESP_NETWORK] = IPSEC_ESP_NETWORK_LEVEL_DEFAULT;
+ inp->inp_seclevel[SL_IPCOMP] = IPSEC_IPCOMP_LEVEL_DEFAULT;
+ inp->inp_rtableid = curproc->p_p->ps_rtableid;
+ inp->inp_ip_minttl = 0;
+ inp->inp_flags |= flags;
+ inp->inp_hops = *ops->defttl;
+ inp->inp_ppcb = (caddr_t)gpcb;
+
+ so->so_pcb = inp;
+
+ return (0);
+}
+
+static void
+gre_inpdetach(struct gre_pcb *gpcb)
+{
+ struct socket *so = gpcb_so(gpcb);
+ struct inpcb *inp = gpcb_inp(gpcb);
+
+ KASSERT(so->so_pcb == inp);
+
+ so->so_pcb = NULL;
+ sofree(so, SL_NOUNLOCK);
+
+ m_freem(inp->inp_options);
+ if (inp->inp_route.ro_rt) {
+ rtfree(inp->inp_route.ro_rt);
+ inp->inp_route.ro_rt = NULL;
+ }
+
+ switch (ISSET(inp->inp_flags, INP_IPV6)) {
+ case 0:
+ ip_freemoptions(inp->inp_moptions);
+ break;
+#ifdef INET6
+ case INP_IPV6:
+ ip6_freepcbopts(inp->inp_outputopts6);
+ ip6_freemoptions(inp->inp_moptions6);
+ break;
+#endif
+ }
+
+ KASSERT((struct gre_pcb *)inp->inp_ppcb == (struct gre_pcb *)inp);
+
+ (void)gre_disconnect(gpcb);
+ pool_put(&gre_pcb_pool, gpcb);
+}
+
+int
+gre_detach(struct socket *so)
+{
+ struct inpcb *inp;
+
+ soassertlocked(so);
+
+ inp = sotoinpcb(so);
+ if (inp == NULL)
+ return (EINVAL);
+
+ gre_inpdetach(inp_gpcb(inp));
+
+ return (0);
+}
+
+static int
+gre_bind(const struct gre_ops *ops, struct gre_pcb *gpcb, struct mbuf *addr)
+{
+ struct inpcb *inp = gpcb_inp(gpcb);
+ struct socket *so = gpcb_so(gpcb);
+ struct sockaddr *sa;
+ unsigned int state = GRE_S_BOUND;
+ struct gre_pcb_key *gk, *ogk;
+ int reuse = ISSET(so->so_options, SO_REUSEPORT);
+ int error;
+
+ if (gpcb->gpcb_pcb_key != NULL)
+ return (EISCONN);
+
+ error = (*ops->op_nametosa)(inp, addr, &sa);
+ if (error != 0)
+ return (error);
+
+ if ((*ops->op_is_wildcard)(sa)) {
+ state = GRE_S_WILDCARD;
+ } else if ((*ops->op_is_multicast)(sa)) {
+ /*
+ * Treat SO_REUSEADDR as SO_REUSEPORT for multicast;
+ * allow complete duplication of binding if
+ * SO_REUSEPORT is set, or if SO_REUSEADDR is set
+ * and a multicast address is bound on both
+ * new and duplicated sockets.
+ */
+ if (ISSET(so->so_options, SO_REUSEADDR|SO_REUSEPORT))
+ reuse = SO_REUSEADDR|SO_REUSEPORT;
+ } else if (ISSET(so->so_options, SO_BINDANY) ||
+    (*ops->op_is_broadcast)(inp->inp_rtableid, sa) == 0) {
+ /*
+ * we must check that we are binding to an address we
+ * own except when:
+ * - SO_BINDANY is set or
+ * - we are binding a UDP socket to 255.255.255.255 or
+ * - we are binding a UDP socket to one of our broadcast
+ *   addresses
+ */
+ ;
+ } else {
+ error = (*ops->op_is_local)(inp->inp_rtableid, sa);
+ if (error != 0)
+ return (error);
+ }
+
+ gk = gre_pcb_key_get(gpcb);
+ if (gk == NULL)
+ return (ENOMEM);
+
+ gk->gk_family = sa->sa_family;
+ (*ops->op_addr)(&gk->gk_laddr, sa);
+ gk->gk_proto = (*ops->op_proto)(sa);
+
+ ogk = gre_pcb_key_insert(state, gk);
+ if (ogk != NULL) {
+ struct gre_pcb *ogpcb;
+
+ gre_pcb_key_put(gk);
+
+ ogpcb = gre_pcb_first(&ogk->gk_pcbs);
+ if (ogpcb != NULL) {
+ struct socket *oso = gpcb_so(ogpcb);
+ if (!ISSET(reuse, oso->so_options))
+ return (EADDRINUSE);
+ }
+
+ gk = ogk;
+ }
+
+ /* commit */
+
+ gpcb->gpcb_pcb_key = gk;
+ gre_pcb_insert(&gk->gk_pcbs, gpcb);
+
+ inp->inp_laddru = gk->gk_laddr;
+ inp->inp_lport = gk->gk_proto;
+
+ gpcb->gpcb_reuse = reuse;
+
+ return (0);
+}
+
+static int
+gre_connect(const struct gre_ops *ops, struct gre_pcb *gpcb, struct mbuf *addr)
+{
+ struct inpcb *inp = gpcb_inp(gpcb);
+ struct socket *so = gpcb_so(gpcb);
+ struct gre_pcb_key *bgk = gpcb->gpcb_pcb_key;
+ struct gre_pcb_key *gk, *ogk;
+ struct sockaddr *sa;
+ int reuse = ISSET(so->so_options, SO_REUSEPORT);
+ int error;
+
+ if (bgk != NULL && bgk->gk_state == GRE_S_CONNECTED)
+ return (EISCONN);
+
+ error = (*ops->op_nametosa)(inp, addr, &sa);
+ if (error != 0)
+ return (error);
+
+ /* don't allow connections to wildcard addresses */
+ if ((*ops->op_is_wildcard)(sa))
+ return (EADDRNOTAVAIL);
+
+ gk = gre_pcb_key_get(gpcb);
+ if (gk == NULL)
+ return (ENOBUFS);
+
+ if (bgk == NULL || bgk->gk_state == GRE_S_WILDCARD) {
+ error = (*ops->op_selsrc)(inp, sa, &gk->gk_laddr, NULL);
+ if (error != 0)
+ goto put;
+
+ gk->gk_proto = (*ops->op_proto)(sa);
+ } else {
+ if (bgk->gk_proto != (*ops->op_proto)(sa)) {
+ error = EADDRNOTAVAIL;
+ goto put;
+ }
+
+ gk->gk_laddr = bgk->gk_laddr;
+ gk->gk_proto = bgk->gk_proto;
+ reuse |= gpcb->gpcb_reuse;
+ }
+
+ gk->gk_family = sa->sa_family;
+ (*ops->op_addr)(&gk->gk_faddr, sa);
+
+ ogk = gre_pcb_key_insert(GRE_S_CONNECTED, gk);
+ if (ogk != NULL) {
+ struct gre_pcb *ogpcb;
+
+ gre_pcb_key_put(gk);
+
+ ogpcb = gre_pcb_first(&ogk->gk_pcbs);
+ if (ogpcb != NULL) {
+ struct socket *oso = gpcb_so(ogpcb);
+ KASSERTMSG(oso != NULL, "ogk %p ogpcb %p oso %p",
+    ogk, ogpcb, oso);
+ if (!ISSET(reuse, oso->so_options))
+ return (EADDRINUSE);
+ }
+
+ gk = ogk;
+ }
+
+ /* commit */
+
+ gre_disconnect(gpcb);
+ gpcb->gpcb_pcb_key = gk;
+ gre_pcb_insert(&gk->gk_pcbs, gpcb);
+
+ inp->inp_laddru = gk->gk_laddr;
+ inp->inp_faddru = gk->gk_faddr;
+ inp->inp_lport = inp->inp_fport = gk->gk_proto;
+
+ gpcb->gpcb_reuse = reuse;
+ soisconnected(so);
+ return (0);
+
+put:
+ gre_pcb_key_put(gk);
+ return (error);
+}
+
+
+static int
+gre_getsockname(const struct gre_ops *ops, struct gre_pcb *gpcb,
+    struct mbuf *addr)
+{
+ struct gre_pcb_key *gk = gpcb->gpcb_pcb_key;
+
+ if (gk == NULL)
+ return (ENOTCONN);
+
+ (*ops->op_getsockname)(gpcb_inp(gpcb), addr);
+
+ return (0);
+}
+
+static int
+gre_getpeername(const struct gre_ops *ops, struct gre_pcb *gpcb,
+    struct mbuf *addr)
+{
+ struct gre_pcb_key *gk = gpcb->gpcb_pcb_key;
+
+ if (gk == NULL || gk->gk_state != GRE_S_CONNECTED)
+ return (ENOTCONN);
+
+ (*ops->op_getpeername)(gpcb_inp(gpcb), addr);
+
+ return (0);
+}
+
+static int
+gre_disconnect(struct gre_pcb *gpcb)
+{
+ struct gre_pcb_key *gk;
+
+ gk = gpcb->gpcb_pcb_key;
+ if (gk == NULL) {
+ return (ENOTCONN);
+ }
+
+ gre_pcb_remove(&gk->gk_pcbs, gpcb);
+ if (gre_pcb_empty(&gk->gk_pcbs)) {
+ switch (gk->gk_state) {
+ case GRE_S_WILDCARD:
+ RBT_REMOVE(gre_tree_wildcards, &gre_wildcards, gk);
+ break;
+ case GRE_S_BOUND:
+ RBT_REMOVE(gre_tree_bound, &gre_bound, gk);
+ break;
+ case GRE_S_CONNECTED:
+ RBT_REMOVE(gre_tree_connected, &gre_connected, gk);
+ break;
+ }
+ pool_put(&gre_pcb_key_pool, gk);
+ }
+
+ gpcb->gpcb_pcb_key = NULL;
+
+ return (0);
+}
+
+static int
+gre_usrreq(const struct gre_ops *ops, struct socket *so, int req,
+    struct mbuf *m, struct mbuf *addr, struct mbuf *control, struct proc *p)
+{
+ struct inpcb *inp;
+ struct gre_pcb *gpcb;
+ int error = 0;
+
+ if (req == PRU_CONTROL) {
+ return ((*ops->op_control)(so, (u_long)m, (caddr_t)addr,
+    (struct ifnet *)control));
+ }
+
+ soassertlocked(so);
+
+ inp = sotoinpcb(so);
+ if (inp == NULL) {
+ error = EINVAL;
+ goto release;
+ }
+ gpcb = inp_gpcb(inp);
+
+ switch (req) {
+ case PRU_BIND:
+ error = gre_bind(ops, gpcb, addr);
+ break;
+
+ case PRU_LISTEN:
+ error = EOPNOTSUPP;
+ break;
+
+ case PRU_CONNECT:
+ error = gre_connect(ops, gpcb, addr);
+ break;
+
+ case PRU_CONNECT2:
+ error = EOPNOTSUPP;
+ break;
+
+ case PRU_ACCEPT:
+ error = EOPNOTSUPP;
+ break;
+
+ case PRU_DISCONNECT:
+ error = gre_disconnect(gpcb);
+ if (error != 0)
+ break;
+
+ gpcb->gpcb_reuse = 0;
+ CLR(so->so_state, SS_ISCONNECTED); /* XXX cos udp_usrreq.c */
+ memset(&inp->inp_laddru, 0, sizeof(inp->inp_laddru));
+ memset(&inp->inp_faddru, 0, sizeof(inp->inp_faddru));
+ inp->inp_lport = inp->inp_fport = 0;
+ break;
+
+ case PRU_SHUTDOWN:
+ socantsendmore(so);
+ break;
+
+ case PRU_SEND:
+ return (gre_send(ops, gpcb, m, addr, control));
+
+ case PRU_ABORT:
+ soisdisconnected(so);
+ gre_inpdetach(gpcb);
+ break;
+
+ case PRU_SOCKADDR:
+ return (gre_getsockname(ops, gpcb, addr));
+ break;
+ case PRU_PEERADDR:
+ return (gre_getpeername(ops, gpcb, addr));
+ break;
+
+ case PRU_SENSE:
+ /* stat: don't bother with a block size. */
+ break;
+
+ case PRU_SENDOOB:
+ case PRU_FASTTIMO:
+ case PRU_SLOWTIMO:
+ case PRU_PROTORCV:
+ case PRU_PROTOSEND:
+ case PRU_RCVD:
+ case PRU_RCVOOB:
+ error = EOPNOTSUPP;
+ break;
+
+ default:
+ panic("%s req %d", __func__, req);
+ }
+
+release:
+ switch (req) {
+ case PRU_RCVD:
+ case PRU_RCVOOB:
+ case PRU_SENSE:
+ break;
+ default:
+ m_freem(control);
+ m_freem(m);
+ break;
+ }
+
+ return (error);
+}
+
+static int
+gre_setopt(struct gre_pcb *gpcb, int optname, struct mbuf *m)
+{
+ /*
+ * only support changing these options when the socket is
+ * completely disconnected. the amount of code needed to try
+ * changing the gre_pcb_key was "quite large" and arguably
+ * not worth it.
+ */
+ switch (optname) {
+ case GRE_CKSUM:
+ case GRE_KEY:
+ case GRE_SEQ:
+ if (gpcb->gpcb_pcb_key != NULL)
+ return (EISCONN);
+ break;
+ }
+
+ switch (optname) {
+ case GRE_CKSUM:
+ if (m == NULL || m->m_len != sizeof(int))
+ return (EINVAL);
+
+ if (*mtod(m, int *))
+ SET(gpcb->gpcb_flags, htons(GRE_CP));
+ else
+ CLR(gpcb->gpcb_flags, htons(GRE_CP));
+ break;
+ case GRE_KEY:
+ if (m == NULL || m->m_len == 0) { /* disable key */
+ CLR(gpcb->gpcb_flags, htons(GRE_KP));
+ gpcb->gpcb_key = htonl(0);
+ break;
+ }
+
+ if (m->m_len != sizeof(gpcb->gpcb_key))
+ return (EINVAL);
+
+ SET(gpcb->gpcb_flags, htons(GRE_KP));
+ htobem32(&gpcb->gpcb_key, *mtod(m, uint32_t *));
+ break;
+ case GRE_SEQ:
+ if (m == NULL || m->m_len == 0) { /* disable seq */
+ CLR(gpcb->gpcb_flags, htons(GRE_SP));
+ gpcb->gpcb_seq = 0;
+ break;
+ }
+
+ if (m->m_len != sizeof(gpcb->gpcb_seq))
+ return (EINVAL);
+
+ SET(gpcb->gpcb_flags, htons(GRE_SP));
+ gpcb->gpcb_seq = *mtod(m, uint32_t *);
+ break;
+ case GRE_RECVSEQ:
+ if (m == NULL || m->m_len != sizeof(int))
+ return (EINVAL);
+
+ if (*mtod(m, int *))
+ SET(gpcb->gpcb_pflags, GREPCB_RECVSEQ);
+ else
+ CLR(gpcb->gpcb_pflags, GREPCB_RECVSEQ);
+ break;
+ default:
+ return (ENOPROTOOPT);
+ break;
+ }
+
+ return (0);
+}
+
+static int
+gre_getopt(struct gre_pcb *gpcb, int optname, struct mbuf *m)
+{
+ switch (optname) {
+ case GRE_KEY:
+ if (!ISSET(gpcb->gpcb_flags, htons(GRE_KP)))
+ return (ENOTCONN);
+
+ m->m_len = sizeof(gpcb->gpcb_key);
+ *mtod(m, uint32_t *) = bemtoh32(&gpcb->gpcb_key);
+ break;
+ case GRE_CKSUM:
+ m->m_len = sizeof(int);
+ *mtod(m, int *) = !!ISSET(gpcb->gpcb_flags, htons(GRE_CP));
+ break;
+ case GRE_SEQ:
+ if (!ISSET(gpcb->gpcb_flags, htons(GRE_SP)))
+ return (ENOTCONN);
+
+ m->m_len = sizeof(gpcb->gpcb_seq);
+ *mtod(m, uint32_t *) = gpcb->gpcb_key;
+ break;
+ default:
+ return (ENOPROTOOPT);
+ }
+
+ return (0);
+}
+
+/*
+ * IP socket option processing.
+ */
+static int
+gre_ctloutput(const struct gre_ops *ops, int op, struct socket *so,
+    int level, int optname, struct mbuf *m)
+{
+ struct inpcb *inp;
+ struct gre_pcb *gpcb;
+ int error;
+
+ inp = sotoinpcb(so);
+ if (inp == NULL)
+ return (ECONNRESET);
+ if (level != IPPROTO_GRE)
+ return ((*ops->op_ctloutput)(level, so, level, optname, m));
+
+ gpcb = inp_gpcb(inp);
+
+ switch (op) {
+ case PRCO_SETOPT:
+ error = gre_setopt(gpcb, optname, m);
+ break;
+ case PRCO_GETOPT:
+ error = gre_getopt(gpcb, optname, m);
+ break;
+ default:
+ panic("%s op %d", __func__, op);
+ }
+
+ return (error);
+}
+
+static void *
+gre_pullup(struct mbuf **mp, int *offp, int len)
+{
+ int hlen = *offp + len;
+ void *h;
+
+ if ((*mp)->m_pkthdr.len < hlen)
+ return (NULL); /* decline */
+
+ *mp = m_pullup(*mp, hlen);
+ if (*mp == NULL)
+ return (NULL);
+
+ h = mtod(*mp, caddr_t) + *offp;
+ *offp = hlen;
+
+ return (h);
+}
+
+static int
+gre_candeliver(const struct gre_pcb *gpcb, uint8_t ttl)
+{
+ const struct inpcb *inp = gpcb_inp(gpcb);
+
+ if (ISSET(inp->inp_socket->so_state, SS_CANTRCVMORE))
+ return (0);
+
+ if (inp->inp_ip_minttl > ttl)
+ return (0);
+
+ return (1);
+}
+
+static struct mbuf *
+gre_ip_input(const struct gre_ops *ops, struct mbuf *m, int iphlen,
+    uint8_t ttl, struct gre_pcb_key *key)
+{
+ int hlen = iphlen;
+ struct gre_header *gh;
+ struct gre_pcb_key *gk;
+ struct gre_pcb *gpcb, *ngpcb;
+ uint16_t cksum = 0;
+ uint32_t seqno = 0;
+
+ gh = gre_pullup(&m, &hlen, sizeof(*gh));
+ if (gh == NULL)
+ return (m);
+
+ if ((gh->gre_flags & htons(GRE_VERS_MASK)) != htons(GRE_VERS_0))
+ return (m); /* decline */
+
+ if (ISSET(gh->gre_flags, ~htons(GRE_VALID_MASK)))
+ return (m); /* decline */
+
+ key->gk_flags = gh->gre_flags;
+ key->gk_proto = gh->gre_proto;
+
+ if (ISSET(key->gk_flags, htons(GRE_CP))) {
+ struct gre_h_cksum *gch;
+
+ gch = gre_pullup(&m, &hlen, sizeof(*gch));
+ if (gch == NULL)
+ return (m);
+
+ cksum = gch->gre_cksum;
+
+ /* XXX ignore Reserved (Offset) field */
+ }
+
+ if (ISSET(key->gk_flags, htons(GRE_KP))) {
+ struct gre_h_key *gkh;
+
+ gkh = gre_pullup(&m, &hlen, sizeof(*gkh));
+ if (gkh == NULL)
+ return (m);
+
+ key->gk_key = gkh->gre_key;
+ }
+
+ if (ISSET(key->gk_flags, htons(GRE_SP))) {
+ struct gre_h_seq *gsh;
+
+ gsh = gre_pullup(&m, &hlen, sizeof(*gsh));
+ if (gsh == NULL)
+ return (m);
+
+ seqno = bemtoh32(&gsh->gre_seq);
+ }
+
+ key->gk_rtableid = m->m_pkthdr.ph_rtableid;
+
+ gk = RBT_FIND(gre_tree_connected, &gre_connected, key);
+ if (gk == NULL) {
+ gk = RBT_FIND(gre_tree_bound, &gre_bound, key);
+ if (gk == NULL) {
+ gk = RBT_FIND(gre_tree_wildcards, &gre_wildcards, key);
+ if (gk == NULL)
+ return (m); /* decline */
+ }
+ }
+
+ /* it's ours now */
+
+ if (ISSET(key->gk_flags, htons(GRE_CP))) {
+ /* XXX actually do the checksum calc */
+ }
+
+ gpcb = gre_pcb_first(&gk->gk_pcbs);
+ while (!gre_candeliver(gpcb, ttl)) {
+ gpcb = gre_pcb_next(gpcb);
+ if (gpcb == NULL)
+ goto drop;
+ }
+
+ ngpcb = gpcb;
+ for (;;) {
+ struct mbuf *mm;
+
+ ngpcb = gre_pcb_next(ngpcb);
+ if (ngpcb == NULL)
+ break;
+
+ if (!gre_candeliver(ngpcb, ttl))
+ continue;
+
+ mm = m_dup_pkt(m, 0, M_DONTWAIT);
+ if (mm == NULL) {
+ /* assume further copies will also fail */
+ break;
+ }
+
+ (*ops->op_sbappend)(gpcb, mm, hlen, seqno);
+ gpcb = ngpcb;
+ }
+
+ (*ops->op_sbappend)(gpcb, m, hlen, seqno);
+
+ return (NULL);
+
+drop:
+ m_freem(m);
+ return (NULL);
+}
+
+static int
+gre_pcb_key_cmp_wildcard(const struct gre_pcb_key *a,
+    const struct gre_pcb_key *b)
+{
+ if (a->gk_proto > b->gk_proto)
+ return (1);
+ if (a->gk_proto < b->gk_proto)
+ return (-1);
+
+ if (a->gk_flags > b->gk_flags)
+ return (1);
+ if (a->gk_flags < b->gk_flags)
+ return (-1);
+
+ if (ISSET(a->gk_flags, htons(GRE_KP))) {
+ if (a->gk_key > b->gk_key)
+ return (1);
+ if (a->gk_key < b->gk_key)
+ return (-1);
+ }
+
+ if (a->gk_rtableid > b->gk_rtableid)
+ return (1);
+ if (a->gk_rtableid < b->gk_rtableid)
+ return (-1);
+
+ if (a->gk_family > b->gk_family)
+ return (1);
+ if (a->gk_family < b->gk_family)
+ return (-1);
+
+ return (0);
+}
+
+RBT_GENERATE(gre_tree_wildcards, gre_pcb_key, gk_entry,
+    gre_pcb_key_cmp_wildcard);
+
+static int
+gre_pcb_key_cmp_bound(const struct gre_pcb_key *a,
+    const struct gre_pcb_key *b)
+{
+ int rv;
+
+ rv = gre_pcb_key_cmp_wildcard(a, b);
+ if (rv != 0)
+ return (rv);
+
+ switch (a->gk_family) {
+ case AF_INET:
+ rv = memcmp(&a->gk_laddr4, &b->gk_laddr4, sizeof(a->gk_laddr4));
+ break;
+#ifdef INET
+ case AF_INET6:
+ rv = memcmp(&a->gk_laddr6, &b->gk_laddr6, sizeof(a->gk_laddr6));
+ break;
+#endif
+ default:
+ unhandled_af(a->gk_family);
+ }
+
+ return (rv);
+}
+
+RBT_GENERATE(gre_tree_bound, gre_pcb_key, gk_entry, gre_pcb_key_cmp_bound);
+
+static int
+gre_pcb_key_cmp_connected(const struct gre_pcb_key *a,
+    const struct gre_pcb_key *b)
+{
+ int rv;
+
+ rv = gre_pcb_key_cmp_bound(a, b);
+ if (rv != 0)
+ return (rv);
+
+ switch (a->gk_family) {
+ case AF_INET:
+ rv = memcmp(&a->gk_faddr4, &b->gk_faddr4, sizeof(a->gk_faddr4));
+ break;
+#ifdef INET
+ case AF_INET6:
+ rv = memcmp(&a->gk_faddr6, &b->gk_faddr6, sizeof(a->gk_faddr6));
+ break;
+#endif
+ default:
+ unhandled_af(a->gk_family);
+ }
+
+ return (rv);
+}
+
+RBT_GENERATE(gre_tree_connected, gre_pcb_key, gk_entry,
+    gre_pcb_key_cmp_connected);
Index: sys/netinet6/in6_proto.c
===================================================================
RCS file: /cvs/src/sys/netinet6/in6_proto.c,v
retrieving revision 1.104
diff -u -p -r1.104 in6_proto.c
--- sys/netinet6/in6_proto.c 13 Jun 2019 08:12:11 -0000 1.104
+++ sys/netinet6/in6_proto.c 29 Oct 2019 07:57:58 -0000
@@ -117,7 +117,7 @@
 
 #include "gre.h"
 #if NGRE > 0
-#include <net/if_gre.h>
+#include <netinet/gre_var.h>
 #endif
 
 /*
@@ -340,11 +340,22 @@ const struct protosw inet6sw[] = {
   .pr_domain = &inet6domain,
   .pr_protocol = IPPROTO_GRE,
   .pr_flags = PR_ATOMIC|PR_ADDR,
-  .pr_input = gre_input6,
+  .pr_input = gre_ip6_input,
   .pr_ctloutput = rip6_ctloutput,
   .pr_usrreq = rip6_usrreq,
   .pr_attach = rip6_attach,
   .pr_detach = rip6_detach,
+},
+{
+  .pr_type = SOCK_DGRAM,
+  .pr_domain = &inet6domain,
+  .pr_protocol = IPPROTO_GRE,
+  .pr_flags = PR_ATOMIC|PR_ADDR,
+  .pr_input = gre_ip6_input,
+  .pr_ctloutput = gre_ip6_ctloutput,
+  .pr_usrreq = gre_ip6_usrreq,
+  .pr_attach = gre_attach,
+  .pr_detach = gre_detach,
 },
 #endif /* NGRE */

Reply | Threaded
Open this post in threaded view
|

Re: GRE datagram socket support

David Gwynne-5
Has anyone got an opinion on this? I am still interested in doing more
packet capture things on OpenBSD using GRE as a transport, and the idea
of maintaining this out of tree just makes me feel tired.

On Tue, Oct 29, 2019 at 06:34:50PM +1000, David Gwynne wrote:

> i've been toying with this idea of implementing GRE as a datagram
> protocol that userland can use just like UDP. the idea is to make it
> easy to support the implementation of NHRP in userland for mgre(4),
> and also for ERSPAN* support without going down the path linux took**.
>
> so this is the result of having a go at implementing the idea. the diff
> includes several independent parts, but they all work together to make
> GRE as comfortable to use as UDP. the two main parts are the actual
> protocol implementation in src/sys/netinet/ip_gre.c, and the tweaks to
> getaddrinfo to allow the resolution of gre services. the /etc/services
> chunk gets used by the getaddrinfo bits.
>
> so, the first chunk lets you do this (as root in userland):
>
> int s = socket(AF_INET, SOCK_DGRAM, IPPROTO_GRE);
>
> that gives you a file descriptor you can then use with bind(),
> connect(), sendto(), recvfrom(), etc. you write a message to the
> kernel and it prepends the GRE and IP headers and pushes it out.
> it is set up so the GRE protocol is handed to the kernel via the
> sin_port or sin6_port member of struct sockaddr_in an sockaddr_in6
> respectively. there's no source and destination protocol fields, just
> one that both ends agree on, so if you connect then bind, your
> sockaddrs have to agree on the proto. unfortunately there's no such
> thing as a wildcard or reserved protocol in GRE, so 0 can't be used
> as a wildcard like it can in udp and tcp.
>
> the sockets support the configuration of optional GRE headers, as
> defined in RFC 2890, using setsockopt. importantly you can enable
> the key and sequence number headers, which again, the kernel offloads
> for you.
>
> the second chunk tweaks getaddrinfo so it lets you specify things other
> than IPPROTO_UDP and IPPROTO_TCP. protocols other than those are now
> looked up in /etc/protocols to get their name, which in turn is used to
> look up entries in /etc/services. while i was there and reading rfcs, i
> noted different behaviour for wildcarded socktypes and protocols, which
> i've tried to implement. eric@ seems generally ok with this stuff, and
> suggested the tweak to pledge to allow access to /etc/protocols using
> the dns pledge. tcp and udp are still special though, and are still
> omgoptimised.
>
> all this together lets the program at
> https://mild.embarrassm.net/~dlg/diff/egred.c work. it is a userland
> reimplementation of a simplified egre(4) using tap(4) and a gre socket.
> the io path is literally reading from one fd and writing it to the othe,
> everything else is boilerplate.
>
> i suspect the kernel stuff is a bit rough as i havent had to test every
> path, but it supports common functionality.
>
> thoughts? i am pretty pleased with this has turned out, and would be
> keen to put it in the tree and work on it some more.
>
> * https://tools.ietf.org/html/draft-foschiano-erspan-03
> ** http://vger.kernel.org/lpc_net2018_talks/erspan-linux-presentation.pdf
>
> Index: etc/services
> ===================================================================
> RCS file: /cvs/src/etc/services,v
> retrieving revision 1.96
> diff -u -p -r1.96 services
> --- etc/services 27 Jan 2019 20:35:06 -0000 1.96
> +++ etc/services 29 Oct 2019 07:57:44 -0000
> @@ -332,6 +332,21 @@ spamd-cfg 8026/tcp # spamd(8) configur
>  dhcpd-sync 8067/udp # dhcpd(8) synchronisation
>  hunt 26740/udp # hunt(6)
>  #
> +# GRE Protocol Types
> +#
> +keepalive 0/gre # 0x0000: IP tunnel keepalive
> +ipv4 2048/gre # 0x0800: IPv4
> +nhrp 8193/gre # 0x2001: Next Hop Resolution Protocol
> +erspan3 8939/gre # 0x22eb: ERSPAN III
> +transether 25944/gre ethernet # 0x6558: Trans Ether Bridging
> +ipv6 34525/gre # 0x86dd: IPv6
> +wccp 34878/gre # 0x883e: Web Content Cache Protocol
> +mpls 34887/gre # 0x8847: MPLS
> +#mpls 34888/gre # 0x8848: MPLS Multicast
> +erspan 35006/gre erspan2 # 0x88be: ERSPAN I/II
> +nsh 35151/gre # 0x894f: Network Service Header
> +control 47082/gre # 0xb7ea: RFC 8157
> +#
>  # Appletalk
>  #
>  rtmp 1/ddp # Routing Table Maintenance Protocol
> Index: lib/libc/asr/getaddrinfo_async.c
> ===================================================================
> RCS file: /cvs/src/lib/libc/asr/getaddrinfo_async.c,v
> retrieving revision 1.56
> diff -u -p -r1.56 getaddrinfo_async.c
> --- lib/libc/asr/getaddrinfo_async.c 3 Nov 2018 09:13:24 -0000 1.56
> +++ lib/libc/asr/getaddrinfo_async.c 29 Oct 2019 07:57:54 -0000
> @@ -34,36 +34,15 @@
>  
>  #include "asr_private.h"
>  
> -struct match {
> - int family;
> - int socktype;
> - int protocol;
> -};
> -
>  static int getaddrinfo_async_run(struct asr_query *, struct asr_result *);
>  static int get_port(const char *, const char *, int);
> +static int get_service(const char *, int, int);
>  static int iter_family(struct asr_query *, int);
>  static int addrinfo_add(struct asr_query *, const struct sockaddr *, const char *);
>  static int addrinfo_from_file(struct asr_query *, int,  FILE *);
>  static int addrinfo_from_pkt(struct asr_query *, char *, size_t);
>  static int addrconfig_setup(struct asr_query *);
>  
> -static const struct match matches[] = {
> - { PF_INET, SOCK_DGRAM, IPPROTO_UDP },
> - { PF_INET, SOCK_STREAM, IPPROTO_TCP },
> - { PF_INET, SOCK_RAW, 0 },
> - { PF_INET6, SOCK_DGRAM, IPPROTO_UDP },
> - { PF_INET6, SOCK_STREAM, IPPROTO_TCP },
> - { PF_INET6, SOCK_RAW, 0 },
> - { -1, 0, 0, },
> -};
> -
> -#define MATCH_FAMILY(a, b) ((a) == matches[(b)].family || (a) == PF_UNSPEC)
> -#define MATCH_PROTO(a, b) ((a) == matches[(b)].protocol || (a) == 0 || matches[(b)].protocol == 0)
> -/* Do not match SOCK_RAW unless explicitly specified */
> -#define MATCH_SOCKTYPE(a, b) ((a) == matches[(b)].socktype || ((a) == 0 && \
> - matches[(b)].socktype != SOCK_RAW))
> -
>  enum {
>   DOM_INIT,
>   DOM_DOMAIN,
> @@ -199,24 +178,27 @@ getaddrinfo_async_run(struct asr_query *
>   }
>   }
>  
> - /* Make sure there is at least a valid combination */
> - for (i = 0; matches[i].family != -1; i++)
> - if (MATCH_FAMILY(ai->ai_family, i) &&
> -    MATCH_SOCKTYPE(ai->ai_socktype, i) &&
> -    MATCH_PROTO(ai->ai_protocol, i))
> - break;
> - if (matches[i].family == -1) {
> - ar->ar_gai_errno = EAI_BADHINTS;
> - async_set_state(as, ASR_STATE_HALT);
> - break;
> - }
> -
> - if (ai->ai_protocol == 0 || ai->ai_protocol == IPPROTO_UDP)
> + switch (ai->ai_protocol) {
> + case 0:
>   as->as.ai.port_udp = get_port(as->as.ai.servname, "udp",
>      as->as.ai.hints.ai_flags & AI_NUMERICSERV);
> - if (ai->ai_protocol == 0 || ai->ai_protocol == IPPROTO_TCP)
>   as->as.ai.port_tcp = get_port(as->as.ai.servname, "tcp",
>      as->as.ai.hints.ai_flags & AI_NUMERICSERV);
> + break;
> + case IPPROTO_TCP:
> + as->as.ai.port_tcp = get_port(as->as.ai.servname, "tcp",
> +    as->as.ai.hints.ai_flags & AI_NUMERICSERV);
> + break;
> + case IPPROTO_UDP:
> + as->as.ai.port_udp = get_port(as->as.ai.servname, "udp",
> +    as->as.ai.hints.ai_flags & AI_NUMERICSERV);
> + break;
> + default:
> + as->as.ai.port_udp = get_service(as->as.ai.servname,
> +    ai->ai_protocol,
> +    as->as.ai.hints.ai_flags & AI_NUMERICSERV);
> + break;
> + }
>   if (as->as.ai.port_tcp == -2 || as->as.ai.port_udp == -2 ||
>      (as->as.ai.port_tcp == -1 && as->as.ai.port_udp == -1) ||
>      (ai->ai_protocol && (as->as.ai.port_udp == -1 ||
> @@ -491,6 +473,24 @@ get_port(const char *servname, const cha
>   return (port);
>  }
>  
> +static int
> +get_service(const char *servname, int protocol, int numonly)
> +{
> + struct protoent pe;
> + struct protoent_data ped;
> + int rv;
> +
> + memset(&ped, 0, sizeof(ped));
> + rv = getprotobynumber_r(protocol, &pe, &ped);
> + if (rv == -1)
> + return (-1);
> +
> + rv = get_port(servname, pe.p_name, numonly);
> + endprotoent_r(&ped);
> +
> + return (rv);
> +}
> +
>  /*
>   * Iterate over the address families that are to be queried. Use the
>   * list on the async context, unless a specific family was given in hints.
> @@ -519,65 +519,107 @@ iter_family(struct asr_query *as, int fi
>   * entry per protocol/socktype match.
>   */
>  static int
> -addrinfo_add(struct asr_query *as, const struct sockaddr *sa, const char *cname)
> +addrinfo_add_ai(struct asr_query *as, const struct sockaddr *sa,
> +    const char *cname, int socktype, int proto, int port)
>  {
>   struct addrinfo *ai;
> - int i, port, proto;
> -
> - for (i = 0; matches[i].family != -1; i++) {
> - if (matches[i].family != sa->sa_family ||
> -    !MATCH_SOCKTYPE(as->as.ai.hints.ai_socktype, i) ||
> -    !MATCH_PROTO(as->as.ai.hints.ai_protocol, i))
> - continue;
> -
> - proto = as->as.ai.hints.ai_protocol;
> - if (!proto)
> - proto = matches[i].protocol;
> -
> - if (proto == IPPROTO_TCP)
> - port = as->as.ai.port_tcp;
> - else if (proto == IPPROTO_UDP)
> - port = as->as.ai.port_udp;
> - else
> - port = 0;
> + int i;
>  
> - /* servname specified, but not defined for this protocol */
> - if (port == -1)
> - continue;
> + if (port == -1)
> + return (0);
>  
> - ai = calloc(1, sizeof(*ai) + sa->sa_len);
> - if (ai == NULL)
> + ai = calloc(1, sizeof(*ai) + sa->sa_len);
> + if (ai == NULL)
> + return (EAI_MEMORY);
> + ai->ai_family = sa->sa_family;
> + ai->ai_socktype = socktype;
> + ai->ai_protocol = proto;
> + ai->ai_flags = as->as.ai.hints.ai_flags;
> + ai->ai_addrlen = sa->sa_len;
> + ai->ai_addr = (void *)(ai + 1);
> + if (cname &&
> +    as->as.ai.hints.ai_flags & (AI_CANONNAME | AI_FQDN)) {
> + if ((ai->ai_canonname = strdup(cname)) == NULL) {
> + free(ai);
>   return (EAI_MEMORY);
> - ai->ai_family = sa->sa_family;
> - ai->ai_socktype = matches[i].socktype;
> - ai->ai_protocol = proto;
> - ai->ai_flags = as->as.ai.hints.ai_flags;
> - ai->ai_addrlen = sa->sa_len;
> - ai->ai_addr = (void *)(ai + 1);
> - if (cname &&
> -    as->as.ai.hints.ai_flags & (AI_CANONNAME | AI_FQDN)) {
> - if ((ai->ai_canonname = strdup(cname)) == NULL) {
> - free(ai);
> - return (EAI_MEMORY);
> - }
>   }
> - memmove(ai->ai_addr, sa, sa->sa_len);
> - if (sa->sa_family == PF_INET)
> - ((struct sockaddr_in *)ai->ai_addr)->sin_port =
> -    htons(port);
> - else if (sa->sa_family == PF_INET6)
> - ((struct sockaddr_in6 *)ai->ai_addr)->sin6_port =
> -    htons(port);
> -
> - if (as->as.ai.aifirst == NULL)
> - as->as.ai.aifirst = ai;
> - if (as->as.ai.ailast)
> - as->as.ai.ailast->ai_next = ai;
> - as->as.ai.ailast = ai;
> - as->as_count += 1;
>   }
> + memmove(ai->ai_addr, sa, sa->sa_len);
> + if (sa->sa_family == PF_INET)
> + ((struct sockaddr_in *)ai->ai_addr)->sin_port =
> +    htons(port);
> + else if (sa->sa_family == PF_INET6)
> + ((struct sockaddr_in6 *)ai->ai_addr)->sin6_port =
> +    htons(port);
> +
> + if (as->as.ai.aifirst == NULL)
> + as->as.ai.aifirst = ai;
> + if (as->as.ai.ailast)
> + as->as.ai.ailast->ai_next = ai;
> + as->as.ai.ailast = ai;
> + as->as_count += 1;
>  
>   return (0);
> +}
> +
> +static int
> +addrinfo_add_proto(struct asr_query *as, const struct sockaddr *sa,
> +    const char *cname, int proto, int port)
> +{
> + int rv;
> +
> + switch (as->as.ai.hints.ai_socktype) {
> + case 0:
> + rv = addrinfo_add_ai(as, sa, cname, SOCK_STREAM, proto, port);
> + if (rv != 0)
> + break;
> +
> + rv = addrinfo_add_ai(as, sa, cname, SOCK_DGRAM, proto, port);
> + if (rv != 0)
> + break;
> +
> + break;
> +
> + default:
> + rv = addrinfo_add_ai(as, sa, cname,
> +    as->as.ai.hints.ai_socktype, proto, port);
> + break;
> + }
> +
> + return (rv);
> +}
> +
> +static int
> +addrinfo_add(struct asr_query *as, const struct sockaddr *sa, const char *cname)
> +{
> + int rv;
> +
> + switch (as->as.ai.hints.ai_protocol) {
> + case 0:
> + rv = addrinfo_add_proto(as, sa, cname,
> +    IPPROTO_TCP, as->as.ai.port_tcp);
> + if (rv != 0)
> + break;
> +
> + rv = addrinfo_add_proto(as, sa, cname,
> +    IPPROTO_UDP, as->as.ai.port_udp);
> + if (rv != 0)
> + break;
> +
> + break;
> +
> + case IPPROTO_TCP:
> + rv = addrinfo_add_proto(as, sa, cname,
> +    IPPROTO_TCP, as->as.ai.port_tcp);
> + break;
> +
> + default: /* includes IPPROTO_UDP */
> + rv = addrinfo_add_proto(as, sa, cname,
> +    as->as.ai.hints.ai_protocol, as->as.ai.port_udp);
> + break;
> + }
> +
> + return (rv);
>  }
>  
>  static int
> Index: sys/conf/files
> ===================================================================
> RCS file: /cvs/src/sys/conf/files,v
> retrieving revision 1.675
> diff -u -p -r1.675 files
> --- sys/conf/files 5 Oct 2019 05:33:14 -0000 1.675
> +++ sys/conf/files 29 Oct 2019 07:57:58 -0000
> @@ -862,7 +862,7 @@ file netinet/tcp_subr.c
>  file netinet/tcp_timer.c
>  file netinet/tcp_usrreq.c
>  file netinet/udp_usrreq.c
> -file netinet/ip_gre.c
> +file netinet/ip_gre.c gre
>  file netinet/ip_ipsp.c ipsec | tcp_signature
>  file netinet/ip_spd.c ipsec | tcp_signature
>  file netinet/ip_ipip.c
> Index: sys/kern/kern_pledge.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/kern_pledge.c,v
> retrieving revision 1.255
> diff -u -p -r1.255 kern_pledge.c
> --- sys/kern/kern_pledge.c 25 Aug 2019 18:46:40 -0000 1.255
> +++ sys/kern/kern_pledge.c 29 Oct 2019 07:57:58 -0000
> @@ -666,7 +666,7 @@ pledge_namei(struct proc *p, struct name
>   }
>   }
>  
> - /* DNS needs /etc/{resolv.conf,hosts,services}. */
> + /* DNS needs /etc/{resolv.conf,hosts,services,protocols}. */
>   if ((ni->ni_pledge == PLEDGE_RPATH) &&
>      (p->p_p->ps_pledge & PLEDGE_DNS)) {
>   if (strcmp(path, "/etc/resolv.conf") == 0) {
> @@ -678,6 +678,10 @@ pledge_namei(struct proc *p, struct name
>   return (0);
>   }
>   if (strcmp(path, "/etc/services") == 0) {
> + ni->ni_cnd.cn_flags |= BYPASSUNVEIL;
> + return (0);
> + }
> + if (strcmp(path, "/etc/protocols") == 0) {
>   ni->ni_cnd.cn_flags |= BYPASSUNVEIL;
>   return (0);
>   }
> Index: sys/net/if_gre.c
> ===================================================================
> RCS file: /cvs/src/sys/net/if_gre.c,v
> retrieving revision 1.152
> diff -u -p -r1.152 if_gre.c
> --- sys/net/if_gre.c 29 Jul 2019 16:28:25 -0000 1.152
> +++ sys/net/if_gre.c 29 Oct 2019 07:57:58 -0000
> @@ -69,6 +69,8 @@
>  #include <netinet/ip_var.h>
>  #include <netinet/ip_ecn.h>
>  
> +#include <netinet/gre_proto.h>
> +
>  #ifdef INET6
>  #include <netinet/ip6.h>
>  #include <netinet6/ip6_var.h>
> @@ -103,28 +105,6 @@
>  /*
>   * packet formats
>   */
> -struct gre_header {
> - uint16_t gre_flags;
> -#define GRE_CP 0x8000  /* Checksum Present */
> -#define GRE_KP 0x2000  /* Key Present */
> -#define GRE_SP 0x1000  /* Sequence Present */
> -
> -#define GRE_VERS_MASK 0x0007
> -#define GRE_VERS_0 0x0000
> -#define GRE_VERS_1 0x0001
> -
> - uint16_t gre_proto;
> -} __packed __aligned(4);
> -
> -struct gre_h_cksum {
> - uint16_t gre_cksum;
> - uint16_t gre_reserved1;
> -} __packed __aligned(4);
> -
> -struct gre_h_key {
> - uint32_t gre_key;
> -} __packed __aligned(4);
> -
>  #define GRE_EOIP 0x6400
>  
>  struct gre_h_key_eoip {
> @@ -132,13 +112,7 @@ struct gre_h_key_eoip {
>   uint16_t eoip_tunnel_id; /* little endian */
>  } __packed __aligned(4);
>  
> -#define NVGRE_VSID_RES_MIN 0x000000 /* reserved for future use */
> -#define NVGRE_VSID_RES_MAX 0x000fff
> -#define NVGRE_VSID_NVE2NVE 0xffffff /* vendor specific NVE-to-NVE comms */
> -
> -struct gre_h_seq {
> - uint32_t gre_seq;
> -} __packed __aligned(4);
> +#define GRE_WCCP 0x883e
>  
>  struct gre_h_wccp {
>   uint8_t wccp_flags;
> @@ -147,7 +121,10 @@ struct gre_h_wccp {
>   uint8_t pri_bucket;
>  } __packed __aligned(4);
>  
> -#define GRE_WCCP 0x883e
> +
> +#define NVGRE_VSID_RES_MIN 0x000000 /* reserved for future use */
> +#define NVGRE_VSID_RES_MAX 0x000fff
> +#define NVGRE_VSID_NVE2NVE 0xffffff /* vendor specific NVE-to-NVE comms */
>  
>  #define GRE_HDRLEN (sizeof(struct ip) + sizeof(struct gre_header))
>  
> @@ -289,8 +266,8 @@ static int gre_up(struct gre_softc *);
>  static int gre_down(struct gre_softc *);
>  static void gre_link_state(struct ifnet *, unsigned int);
>  
> -static int gre_input_key(struct mbuf **, int *, int, int, uint8_t,
> -    struct gre_tunnel *);
> +static struct mbuf *
> + gre_if_input(struct mbuf *, int, uint8_t, struct gre_tunnel *);
>  
>  static struct mbuf *
>   gre_ipv4_patch(const struct gre_tunnel *, struct mbuf *,
> @@ -893,10 +870,9 @@ eoip_clone_destroy(struct ifnet *ifp)
>   return (0);
>  }
>  
> -int
> -gre_input(struct mbuf **mp, int *offp, int type, int af)
> +struct mbuf *
> +gre_if4_input(struct mbuf *m, int hlen)
>  {
> - struct mbuf *m = *mp;
>   struct gre_tunnel key;
>   struct ip *ip;
>  
> @@ -908,17 +884,13 @@ gre_input(struct mbuf **mp, int *offp, i
>   key.t_src4 = ip->ip_dst;
>   key.t_dst4 = ip->ip_src;
>  
> - if (gre_input_key(mp, offp, type, af, ip->ip_tos, &key) == -1)
> - return (rip_input(mp, offp, type, af));
> -
> - return (IPPROTO_DONE);
> + return (gre_if_input(m, hlen, ip->ip_tos, &key));
>  }
>  
>  #ifdef INET6
> -int
> -gre_input6(struct mbuf **mp, int *offp, int type, int af)
> +struct mbuf *
> +gre_if6_input(struct mbuf *m, int hlen)
>  {
> - struct mbuf *m = *mp;
>   struct gre_tunnel key;
>   struct ip6_hdr *ip6;
>   uint32_t flow;
> @@ -933,10 +905,7 @@ gre_input6(struct mbuf **mp, int *offp,
>  
>   flow = bemtoh32(&ip6->ip6_flow);
>  
> - if (gre_input_key(mp, offp, type, af, flow >> 20, &key) == -1)
> - return (rip6_input(mp, offp, type, af));
> -
> - return (IPPROTO_DONE);
> + return (gre_if_input(m, hlen, flow >> 20, &key));
>  }
>  #endif /* INET6 */
>  
> @@ -996,12 +965,10 @@ gre_input_1(struct gre_tunnel *key, stru
>   return (m);
>  }
>  
> -static int
> -gre_input_key(struct mbuf **mp, int *offp, int type, int af, uint8_t otos,
> -    struct gre_tunnel *key)
> +static struct mbuf *
> +gre_if_input(struct mbuf *m, int iphlen, uint8_t otos, struct gre_tunnel *key)
>  {
> - struct mbuf *m = *mp;
> - int iphlen = *offp, hlen, rxprio;
> + int hlen, rxprio;
>   struct ifnet *ifp;
>   const struct gre_tunnel *tunnel;
>   caddr_t buf;
> @@ -1025,7 +992,7 @@ gre_input_key(struct mbuf **mp, int *off
>  
>   m = m_pullup(m, hlen);
>   if (m == NULL)
> - return (IPPROTO_DONE);
> + return (NULL);
>  
>   buf = mtod(m, caddr_t);
>   gh = (struct gre_header *)(buf + iphlen);
> @@ -1038,7 +1005,7 @@ gre_input_key(struct mbuf **mp, int *off
>   case htons(GRE_VERS_1):
>   m = gre_input_1(key, m, gh, otos, iphlen);
>   if (m == NULL)
> - return (IPPROTO_DONE);
> + return (NULL);
>   /* FALLTHROUGH */
>   default:
>   goto decline;
> @@ -1055,7 +1022,7 @@ gre_input_key(struct mbuf **mp, int *off
>  
>   m = m_pullup(m, hlen);
>   if (m == NULL)
> - return (IPPROTO_DONE);
> + return (NULL);
>  
>   buf = mtod(m, caddr_t);
>   gh = (struct gre_header *)(buf + iphlen);
> @@ -1071,7 +1038,7 @@ gre_input_key(struct mbuf **mp, int *off
>      nvgre_input(key, m, hlen, otos) == -1)
>   goto decline;
>  
> - return (IPPROTO_DONE);
> + return (NULL);
>   }
>  
>   ifp = gre_find(key);
> @@ -1148,7 +1115,7 @@ gre_input_key(struct mbuf **mp, int *off
>  
>   m_adj(m, hlen);
>   gre_keepalive_recv(ifp, m);
> - return (IPPROTO_DONE);
> + return (NULL);
>  
>   default:
>   goto decline;
> @@ -1162,7 +1129,7 @@ gre_input_key(struct mbuf **mp, int *off
>  
>   m = (*patch)(tunnel, m, &itos, otos);
>   if (m == NULL)
> - return (IPPROTO_DONE);
> + return (NULL);
>  
>   if (tunnel->t_key_mask == GRE_KEY_ENTROPY) {
>   m->m_pkthdr.ph_flowid = M_FLOWID_VALID |
> @@ -1203,10 +1170,9 @@ gre_input_key(struct mbuf **mp, int *off
>  #endif
>  
>   (*input)(ifp, m);
> - return (IPPROTO_DONE);
> + return (NULL);
>  decline:
> - *mp = m;
> - return (-1);
> + return (m);
>  }
>  
>  static struct mbuf *
> Index: sys/netinet/gre_proto.h
> ===================================================================
> RCS file: sys/netinet/gre_proto.h
> diff -N sys/netinet/gre_proto.h
> --- /dev/null 1 Jan 1970 00:00:00 -0000
> +++ sys/netinet/gre_proto.h 29 Oct 2019 07:57:58 -0000
> @@ -0,0 +1,48 @@
> +/* $OpenBSD$ */
> +
> +/*
> + * Copyright (c) 2019 David Gwynne <[hidden email]>
> + *
> + * Permission to use, copy, modify, and distribute this software for any
> + * purpose with or without fee is hereby granted, provided that the above
> + * copyright notice and this permission notice appear in all copies.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
> + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
> + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
> + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
> + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
> + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
> + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
> + */
> +
> +#ifndef _NETINET_GRE_H_
> +#define _NETINET_GRE_H_
> +
> +struct gre_header {
> + uint16_t gre_flags;
> +#define GRE_CP 0x8000 /* Checksum Present */
> +#define GRE_KP 0x2000 /* Key Present */
> +#define GRE_SP 0x1000 /* Sequence Present */
> +
> +#define GRE_VERS_MASK 0x0007
> +#define GRE_VERS_0 0x0000
> +#define GRE_VERS_1 0x0001
> +
> + uint16_t gre_proto;
> +};
> +
> +struct gre_h_cksum {
> + uint16_t gre_cksum;
> + uint16_t gre_reserved1;
> +};
> +
> +struct gre_h_key {
> + uint32_t gre_key;
> +};
> +
> +struct gre_h_seq {
> + uint32_t gre_seq;
> +};
> +
> +#endif /* _NETINET_GRE_H_ */
> Index: sys/netinet/gre_var.h
> ===================================================================
> RCS file: sys/netinet/gre_var.h
> diff -N sys/netinet/gre_var.h
> --- /dev/null 1 Jan 1970 00:00:00 -0000
> +++ sys/netinet/gre_var.h 29 Oct 2019 07:57:58 -0000
> @@ -0,0 +1,64 @@
> +/* $OpenBSD$ */
> +
> +/*
> + * Copyright (c) 2019 David Gwynne <[hidden email]>
> + *
> + * Permission to use, copy, modify, and distribute this software for any
> + * purpose with or without fee is hereby granted, provided that the above
> + * copyright notice and this permission notice appear in all copies.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
> + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
> + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
> + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
> + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
> + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
> + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
> + */
> +
> +#ifndef _NETINET_GRE_VAR_H_
> +#define _NETINET_GRE_VAR_H_
> +
> +/*
> + * setsockopt(s, IPPROTO_GRE, ...
> + */
> +
> +#define GRE_CKSUM 1 /* bool; enable GRE checksum headers */
> +#define GRE_KEY 2 /* uint32_t; enable and set GRE key */
> + /* NULL; disable GRE key header */
> +#define GRE_SEQ 3 /* uint32_t; enable and set GRE seq */
> + /* NULL; disable GRE seq header */
> +
> +#define GRE_RECVSEQ 4 /* bool; enable reception of seq numbers */
> +#define GRE_SENDSEQ GRE_RECVSEQ
> +
> +#ifdef _KERNEL
> +int gre_raw_usrreq(struct socket *, int, struct mbuf *, struct mbuf *,
> +    struct mbuf *, struct proc *);
> +int gre_sysctl(int *, u_int, void *, size_t *, void *, size_t);
> +
> +void gre_init(void);
> +
> +int gre_attach(struct socket *, int);
> +int gre_detach(struct socket *);
> +
> +int gre_ip4_usrreq(struct socket *, int, struct mbuf *, struct mbuf *,
> +    struct mbuf *, struct proc *);
> +int gre_ip4_ctloutput(int, struct socket *, int, int, struct mbuf *);
> +
> +int gre_ip4_input(struct mbuf **, int *, int, int);
> +
> +struct mbuf *
> + gre_if4_input(struct mbuf *, int); /* interface glue */
> +
> +#ifdef INET6
> +int gre_ip6_usrreq(struct socket *, int, struct mbuf *, struct mbuf *,
> +    struct mbuf *, struct proc *);
> +int gre_ip6_ctloutput(int, struct socket *, int, int, struct mbuf *);
> +int gre_ip6_input(struct mbuf **, int *, int, int);
> +
> +struct mbuf *
> + gre_if6_input(struct mbuf *, int); /* interface glue */
> +#endif /* INET6 */
> +#endif /* _KERNEL */
> +#endif /* _NETINET_GRE_VAR_H_ */
> Index: sys/netinet/in_proto.c
> ===================================================================
> RCS file: /cvs/src/sys/netinet/in_proto.c,v
> retrieving revision 1.93
> diff -u -p -r1.93 in_proto.c
> --- sys/netinet/in_proto.c 15 Jul 2019 12:40:42 -0000 1.93
> +++ sys/netinet/in_proto.c 29 Oct 2019 07:57:58 -0000
> @@ -147,8 +147,7 @@
>  
>  #include "gre.h"
>  #if NGRE > 0
> -#include <netinet/ip_gre.h>
> -#include <net/if_gre.h>
> +#include <netinet/gre_var.h>
>  #endif
>  
>  #include "carp.h"
> @@ -346,11 +345,24 @@ const struct protosw inetsw[] = {
>    .pr_domain = &inetdomain,
>    .pr_protocol = IPPROTO_GRE,
>    .pr_flags = PR_ATOMIC|PR_ADDR,
> -  .pr_input = gre_input,
> +  .pr_input = gre_ip4_input,
>    .pr_ctloutput = rip_ctloutput,
> -  .pr_usrreq = gre_usrreq,
> +  .pr_usrreq = gre_raw_usrreq,
>    .pr_attach = rip_attach,
>    .pr_detach = rip_detach,
> +  .pr_sysctl = gre_sysctl
> +},
> +{
> +  .pr_type = SOCK_DGRAM,
> +  .pr_domain = &inetdomain,
> +  .pr_protocol = IPPROTO_GRE,
> +  .pr_flags = PR_ATOMIC|PR_ADDR,
> +  .pr_input = gre_ip4_input,
> +  .pr_ctloutput = gre_ip4_ctloutput,
> +  .pr_usrreq = gre_ip4_usrreq,
> +  .pr_attach = gre_attach,
> +  .pr_detach = gre_detach,
> +  .pr_init = gre_init,
>    .pr_sysctl = gre_sysctl
>  },
>  #endif /* NGRE > 0 */
> Index: sys/netinet/ip_gre.c
> ===================================================================
> RCS file: /cvs/src/sys/netinet/ip_gre.c,v
> retrieving revision 1.71
> diff -u -p -r1.71 ip_gre.c
> --- sys/netinet/ip_gre.c 7 Feb 2018 22:30:59 -0000 1.71
> +++ sys/netinet/ip_gre.c 29 Oct 2019 07:57:58 -0000
> @@ -1,7 +1,23 @@
> -/*      $OpenBSD: ip_gre.c,v 1.71 2018/02/07 22:30:59 dlg Exp $ */
> +/* $OpenBSD: ip_gre.c,v 1.71 2018/02/07 22:30:59 dlg Exp $ */
>  /* $NetBSD: ip_gre.c,v 1.9 1999/10/25 19:18:11 drochner Exp $ */
>  
>  /*
> + * Copyright (c) 2019 David Gwynne <[hidden email]>
> + *
> + * Permission to use, copy, modify, and distribute this software for any
> + * purpose with or without fee is hereby granted, provided that the above
> + * copyright notice and this permission notice appear in all copies.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
> + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
> + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
> + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
> + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
> + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
> + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
> + */
> +
> +/*
>   * Copyright (c) 1998 The NetBSD Foundation, Inc.
>   * All rights reserved.
>   *
> @@ -30,16 +46,6 @@
>   * POSSIBILITY OF SUCH DAMAGE.
>   */
>  
> -/*
> - * decapsulate tunneled packets and send them on
> - * output half is in net/if_gre.[ch]
> - * This currently handles IPPROTO_GRE, IPPROTO_MOBILE
> - */
> -
> -
> -#include "gre.h"
> -#if NGRE > 0
> -
>  #include <sys/param.h>
>  #include <sys/systm.h>
>  #include <sys/mbuf.h>
> @@ -47,24 +53,40 @@
>  #include <sys/socket.h>
>  #include <sys/socketvar.h>
>  #include <sys/sysctl.h>
> +#include <sys/proc.h>
> +#include <sys/atomic.h>
> +#include <sys/pool.h>
> +#include <sys/tree.h>
> +
> +#include <sys/domain.h>
>  
>  #include <net/if.h>
> +#include <net/if_var.h>
>  #include <net/route.h>
>  
>  #include <netinet/in.h>
> +#include <netinet/in_var.h>
>  #include <netinet/ip.h>
>  #include <netinet/ip_var.h>
>  #include <netinet/in_pcb.h>
>  
> +#include <netinet/gre_var.h>
> +#include <netinet/gre_proto.h>
> +
>  #ifdef PIPEX
>  #include <net/pipex.h>
>  #endif
>  
> +
> +/*
> + * socket({AF_INET,AF_INET6}, SOCK_RAW, IPPROTO_GRE);
> + */
> +
>  int
> -gre_usrreq(struct socket *so, int req, struct mbuf *m, struct mbuf *nam,
> +gre_raw_usrreq(struct socket *so, int req, struct mbuf *m, struct mbuf *nam,
>      struct mbuf *control, struct proc *p)
>  {
> -#ifdef  PIPEX
> +#ifdef  PIPEX
>   struct inpcb *inp = sotoinpcb(so);
>  
>   if (inp != NULL && inp->inp_pipex && req == PRU_SEND) {
> @@ -92,4 +114,1817 @@ gre_usrreq(struct socket *so, int req, s
>  #endif
>   return rip_usrreq(so, req, m, nam, control, p);
>  }
> -#endif /* if NGRE > 0 */
> +
> +/*
> + * support socket({AF_INET,AF_INET6}, SOCK_DGRAM, IPPROTO_GRE);
> + *
> + * GRE datagram sockets provide support for GRE version 0 packets
> + * in the kernel.
> + */
> +
> +#define GRE_VALID_MASK (GRE_VERS_MASK | GRE_CP | GRE_KP | GRE_SP)
> +
> +struct gre_pcb_key;
> +
> +struct gre_pcb {
> + struct inpcb gpcb_inpcb;
> + unsigned int gpcb_pflags; /* pcb flags */
> +#define GREPCB_RECVSEQ (1 << 0)
> + uint16_t gpcb_flags; /* network byteorder */
> + uint32_t gpcb_key; /* network byteorder */
> + uint32_t gpcb_seq; /* host byteorder */
> +
> + struct gre_pcb_key *gpcb_pcb_key;
> + int gpcb_reuse;
> + uint8_t gpcb_ttl;
> +};
> +TAILQ_HEAD(gre_pcb_list, inpcb);
> +
> +static inline struct gre_pcb *
> +inp_gpcb(struct inpcb *inp)
> +{
> + return ((struct gre_pcb *)inp);
> +}
> +
> +#define gpcb_inp(_gpcb) (&(_gpcb)->gpcb_inpcb)
> +
> +static inline struct socket *
> +gpcb_so(struct gre_pcb *gpcb)
> +{
> + return (gpcb_inp(gpcb)->inp_socket);
> +}
> +
> +struct gre_pcb_key {
> + union inpaddru gk_laddr;
> +#define gk_laddr4 gk_laddr.iau_a4u.inaddr
> +#define gk_laddr6 gk_laddr.iau_addr6
> + union inpaddru gk_faddr;
> +#define gk_faddr4 gk_faddr.iau_a4u.inaddr
> +#define gk_faddr6 gk_faddr.iau_addr6
> +
> + uint16_t gk_flags; /* network byteorder */
> + uint16_t gk_proto; /* network byteorder */
> + uint32_t gk_key; /* network byteorder */
> +
> + unsigned int gk_rtableid;
> + sa_family_t gk_family;
> +
> + RBT_ENTRY(gre_pcb_key) gk_entry;
> + struct gre_pcb_list gk_pcbs;
> + unsigned int gk_state;
> +#define GRE_S_DISCONNECTED 0
> +#define GRE_S_WILDCARD 1
> +#define GRE_S_BOUND 2
> +#define GRE_S_CONNECTED 3
> +};
> +
> +RBT_HEAD(gre_tree_wildcards, gre_pcb_key);
> +RBT_HEAD(gre_tree_bound, gre_pcb_key);
> +RBT_HEAD(gre_tree_connected, gre_pcb_key);
> +
> +static int gre_pcb_key_cmp_wildcard(const struct gre_pcb_key *,
> +    const struct gre_pcb_key *);
> +static int gre_pcb_key_cmp_bound(const struct gre_pcb_key *,
> +    const struct gre_pcb_key *);
> +
> +RBT_PROTOTYPE(gre_tree_wildcards, gre_pcb_key, gk_entry,
> +    gre_pcb_key_cmp_wildcard);
> +RBT_PROTOTYPE(gre_tree_bound, gre_pcb_key, gk_entry,
> +    gre_pcb_key_cmp_bound);
> +RBT_PROTOTYPE(gre_tree_connected, gre_pcb_key, gk_entry,
> +    gre_pcb_key_cmp_connected);
> +
> +struct gre_ops {
> + int (*op_nametosa)(struct inpcb *, struct mbuf *,
> +    struct sockaddr **);
> + int (*op_is_wildcard)(struct sockaddr *);
> + int (*op_is_multicast)(struct sockaddr *);
> + int (*op_is_broadcast)(unsigned int, struct sockaddr *);
> + int (*op_is_local)(unsigned int, struct sockaddr *);
> +
> + uint16_t (*op_proto)(struct sockaddr *);
> + void (*op_addr)(union inpaddru *, struct sockaddr *);
> +
> + int (*op_selsrc)(struct inpcb *, struct sockaddr *,
> +    union inpaddru *, void *);
> + int (*op_control)(struct socket *, u_long, caddr_t, struct ifnet *);
> + void (*op_getsockname)(struct inpcb *, struct mbuf *);
> + void (*op_getpeername)(struct inpcb *, struct mbuf *);
> + int (*op_ctloutput)(int, struct socket *, int, int, struct mbuf *);
> +
> + void (*op_sbappend)(struct gre_pcb *, struct mbuf *, int, uint32_t);
> +
> + /* PRU_SEND is split into two parts, with gre_output in the middle: */
> +
> + /* 1: the "pre" phase, so ipv6 can do it's stupid pktops stuff */
> + int (*op_send)(const struct gre_ops *, struct gre_pcb *,
> +    struct mbuf *, struct mbuf *, struct mbuf *);
> + /* 2: actually doing the ip encap and output */
> + int (*op_output)(struct gre_pcb *, const struct gre_pcb_key *,
> +    struct mbuf *, void *);
> +
> + int * const defttl;
> +};
> +struct gre_tree_wildcards gre_wildcards = RBT_INITIALIZER();
> +struct gre_tree_bound gre_bound = RBT_INITIALIZER();
> +struct gre_tree_connected gre_connected = RBT_INITIALIZER();
> +
> +struct pool gre_pcb_key_pool;
> +struct pool gre_pcb_pool;
> +
> +static struct mbuf *
> + gre_ip_input(const struct gre_ops *, struct mbuf *, int,
> +    uint8_t, struct gre_pcb_key *);
> +
> +static int gre_usrreq(const struct gre_ops *, struct socket *, int,
> +    struct mbuf *, struct mbuf *, struct mbuf *,
> +    struct proc *);
> +static int gre_ctloutput(const struct gre_ops *, int, struct socket *,
> +    int, int, struct mbuf *);
> +
> +static int gre_send(const struct gre_ops *, struct gre_pcb *,
> +    struct mbuf *, struct mbuf *, struct mbuf *);
> +static int gre_output(const struct gre_ops *, struct gre_pcb *,
> +    struct mbuf *, struct mbuf *, struct mbuf *, void *);
> +static int gre_disconnect(struct gre_pcb *);
> +
> +#define GRE_OPT_EINVAL ((void *)-1)
> +static void *gre_opt(struct mbuf *, int, int, socklen_t);
> +
> +unsigned int gre_sendspace = 9216; /* XXX sysctl? */
> +unsigned int gre_recvspace = (40 * 1024);
> +
> +static struct gre_pcb_key *
> +gre_pcb_key_get(const struct gre_pcb *gpcb)
> +{
> + struct gre_pcb_key *gk;
> +
> + gk = pool_get(&gre_pcb_key_pool, PR_NOWAIT|PR_ZERO);
> + if (gk == NULL)
> + return (NULL);
> +
> + gk->gk_flags = gpcb->gpcb_flags;
> + gk->gk_key = gpcb->gpcb_key;
> +
> + TAILQ_INIT(&gk->gk_pcbs);
> +
> + return (gk);
> +}
> +
> +static void
> +gre_pcb_key_put(struct gre_pcb_key *gk)
> +{
> + pool_put(&gre_pcb_key_pool, gk);
> +}
> +
> +static struct gre_pcb_key *
> +gre_pcb_key_insert(unsigned int state, struct gre_pcb_key *gk)
> +{
> + struct gre_pcb_key *ogk;
> +
> + gk->gk_state = state;
> +
> + switch (state) {
> + case GRE_S_WILDCARD:
> + ogk = RBT_INSERT(gre_tree_wildcards, &gre_wildcards, gk);
> + break;
> + case GRE_S_BOUND:
> + ogk = RBT_INSERT(gre_tree_bound, &gre_bound, gk);
> + break;
> + case GRE_S_CONNECTED:
> + ogk = RBT_INSERT(gre_tree_connected, &gre_connected, gk);
> + break;
> + default:
> + panic("%s unexpected state %u", __func__, state);
> + }
> +
> + return (ogk);
> +}
> +
> +static inline int
> +gre_pcb_empty(struct gre_pcb_list *l)
> +{
> + return (TAILQ_EMPTY(l));
> +}
> +
> +static inline void
> +gre_pcb_insert(struct gre_pcb_list *l, struct gre_pcb *gpcb)
> +{
> + TAILQ_INSERT_TAIL(l, gpcb_inp(gpcb), inp_queue);
> +}
> +
> +static inline void
> +gre_pcb_remove(struct gre_pcb_list *l, struct gre_pcb *gpcb)
> +{
> + TAILQ_REMOVE(l, gpcb_inp(gpcb), inp_queue);
> +}
> +
> +static inline struct gre_pcb *
> +gre_pcb_first(struct gre_pcb_list *l)
> +{
> + return (inp_gpcb(TAILQ_FIRST(l)));
> +}
> +
> +static inline struct gre_pcb *
> +gre_pcb_next(struct gre_pcb *gpcb)
> +{
> + return (inp_gpcb(TAILQ_NEXT(gpcb_inp(gpcb), inp_queue)));
> +}
> +
> +/*
> + * INET4
> + */
> +
> +static int gre_ip4_nametosa(struct inpcb *, struct mbuf *,
> +    struct sockaddr **);
> +static int gre_ip4_is_wildcard(struct sockaddr *);
> +static int gre_ip4_is_multicast(struct sockaddr *);
> +static int gre_ip4_is_broadcast(unsigned int, struct sockaddr *);
> +static int gre_ip4_is_local(unsigned int, struct sockaddr *);
> +static uint16_t gre_ip4_proto(struct sockaddr *);
> +static void gre_ip4_addr(union inpaddru *, struct sockaddr *);
> +static int gre_ip4_selsrc(struct inpcb *, struct sockaddr *,
> +    union inpaddru *, void *);
> +static void gre_ip4_sbappend(struct gre_pcb *, struct mbuf *, int,
> +    uint32_t);
> +static int gre_ip4_send(const struct gre_ops *, struct gre_pcb *,
> +    struct mbuf *, struct mbuf *, struct mbuf *);
> +static int gre_ip4_output(struct gre_pcb *, const struct gre_pcb_key *,
> +    struct mbuf *, void *);
> +
> +static const struct gre_ops gre_ip4_ops = {
> + .op_nametosa = gre_ip4_nametosa,
> + .op_is_wildcard = gre_ip4_is_wildcard,
> + .op_is_multicast = gre_ip4_is_multicast,
> + .op_is_broadcast = gre_ip4_is_broadcast,
> + .op_is_local = gre_ip4_is_local,
> +
> + .op_proto = gre_ip4_proto,
> + .op_addr = gre_ip4_addr,
> +
> + .op_selsrc = gre_ip4_selsrc,
> + .op_control = in_control,
> + .op_getsockname = in_setsockaddr,
> + .op_getpeername = in_setpeeraddr,
> + .op_ctloutput = ip_ctloutput,
> +
> + .op_sbappend = gre_ip4_sbappend,
> + .op_send = gre_ip4_send,
> + .op_output = gre_ip4_output,
> +
> + .defttl = &ip_defttl,
> +};
> +
> +
> +static int
> +gre_ip4_nametosa(struct inpcb *inp, struct mbuf *addr, struct sockaddr **sa)
> +{
> + return (in_nam2sin(addr, (struct sockaddr_in **)sa));
> +}
> +
> +static int
> +gre_ip4_is_wildcard(struct sockaddr *sa)
> +{
> + struct sockaddr_in *sin = satosin(sa);
> + return (sin->sin_addr.s_addr == INADDR_ANY);
> +}
> +
> +static int
> +gre_ip4_is_multicast(struct sockaddr *sa)
> +{
> + struct sockaddr_in *sin = satosin(sa);
> + return (IN_MULTICAST(sin->sin_addr.s_addr));
> +}
> +
> +static int
> +gre_ip4_is_broadcast(unsigned int rtableid, struct sockaddr *sa)
> +{
> + struct sockaddr_in *sin = satosin(sa);
> +
> + return ((sin->sin_addr.s_addr == INADDR_BROADCAST) ||
> +    in_broadcast(sin->sin_addr, rtableid));
> +}
> +
> +static int
> +gre_ip4_is_local(unsigned int rtableid, struct sockaddr *sa)
> +{
> + struct sockaddr_in *sin = satosin(sa);
> + /* cope with ifa_ifwithaddr using memcmp */
> + struct sockaddr_in needle = {
> + .sin_len = sin->sin_len,
> + .sin_family = sin->sin_family,
> + .sin_addr = sin->sin_addr,
> + };
> + struct ifaddr *ifa;
> +
> + ifa = ifa_ifwithaddr(sintosa(&needle), rtableid);
> + if (ifa == NULL)
> + return (EADDRNOTAVAIL);
> +
> + return (0);
> +}
> +
> +static uint16_t
> +gre_ip4_proto(struct sockaddr *sa)
> +{
> + struct sockaddr_in *sin = satosin(sa);
> + return (sin->sin_port);
> +}
> +
> +static void
> +gre_ip4_addr(union inpaddru *addr, struct sockaddr *sa)
> +{
> + struct sockaddr_in *sin = satosin(sa);
> + addr->iau_a4u.inaddr = sin->sin_addr;
> +}
> +
> +static int
> +gre_ip4_selsrc(struct inpcb *inp, struct sockaddr *sa, union inpaddru *laddr,
> +    void *opts)
> +{
> + struct in_addr *insrc;
> + int error;
> +
> + insrc = gre_opt(opts, IPPROTO_IP, IP_SENDSRCADDR, sizeof(*insrc));
> + if (insrc == GRE_OPT_EINVAL)
> + return (EINVAL);
> +
> + if (insrc == NULL) {
> + error = in_pcbselsrc(&insrc, satosin(sa), inp);
> + if (error != 0)
> + return (error);
> + } else {
> + struct sockaddr_in sin = {
> + .sin_len = sizeof(sin),
> + .sin_family = AF_INET,
> + .sin_addr = *insrc,
> + };
> +
> + /* XXX sigh. */
> + error = in_pcbaddrisavail(inp, &sin, 0, NULL);
> + if (error != 0)
> + return (error);
> + }
> +
> + laddr->iau_a4u.inaddr = *insrc;
> +
> + return (0);
> +}
> +
> +static int
> +gre_ip4_send(const struct gre_ops *ops, struct gre_pcb *gpcb,
> +    struct mbuf *m, struct mbuf *addr, struct mbuf *control)
> +{
> + return (gre_output(ops, gpcb, m, addr, control, control));
> +}
> +
> +static int
> +gre_ip4_output(struct gre_pcb *gpcb, const struct gre_pcb_key *gk,
> +    struct mbuf *m, void *opts)
> +{
> + struct inpcb *inp = gpcb_inp(gpcb);
> + struct ip *ip;
> + int error;
> + uint32_t ipsecflowinfo = 0;
> +
> + m = m_prepend(m, sizeof(*ip), M_DONTWAIT);
> + if (m == NULL) {
> + error = ENOBUFS;
> + goto dropped;
> + }
> + if (m->m_pkthdr.len > IP_MAXPACKET) {
> + error = EMSGSIZE;
> + goto drop;
> + }
> +
> +#ifdef IPSEC
> + if (ISSET(inp->inp_flags, INP_IPSECFLOWINFO)) {
> + uint32_t *p = gre_opt(opts, IPPROTO_IP, IP_IPSECFLOWINFO,
> +    sizeof(*p));
> + if (p == GRE_OPT_EINVAL) {
> + error = EINVAL;
> + goto drop;
> + }
> +
> + if (p != NULL)
> + ipsecflowinfo = *p;
> + }
> +#endif /* IPSEC */
> +
> + ip = mtod(m, struct ip *);
> + ip->ip_v = IPVERSION;
> + ip->ip_hl = sizeof(*ip) >> 2;
> + ip->ip_off = 0; /* XXX nodf? */
> + ip->ip_tos = 0; /* XXX */;
> + ip->ip_len = htons(m->m_pkthdr.len);
> + ip->ip_ttl = gpcb->gpcb_ttl;
> + ip->ip_p = IPPROTO_GRE;
> + ip->ip_src = gk->gk_laddr4;
> + ip->ip_dst = gk->gk_faddr4;
> +
> + error = ip_output(m, inp->inp_options, &inp->inp_route,
> +    gpcb_so(gpcb)->so_options & SO_BROADCAST, inp->inp_moptions, inp,
> +    ipsecflowinfo);
> +
> + return (error);
> +
> +drop:
> + m_freem(m);
> +dropped:
> + return (error);
> +}
> +
> +static void *
> +gre_opt(struct mbuf *control, int level, int type, socklen_t len)
> +{
> + u_int clen;
> + struct cmsghdr *cm;
> + caddr_t cmsgs;
> + size_t mlen;
> +
> + if (control == NULL)
> + return (NULL);
> +
> + if (control->m_next != NULL)
> + return (GRE_OPT_EINVAL);
> +
> + mlen = CMSG_LEN(len);
> +
> + clen = control->m_len;
> + cmsgs = mtod(control, caddr_t);
> + do {
> + if (clen < CMSG_LEN(0))
> + return (GRE_OPT_EINVAL);
> +
> + cm = (struct cmsghdr *)cmsgs;
> + if (cm->cmsg_len < CMSG_LEN(0) ||
> +    CMSG_ALIGN(cm->cmsg_len) > clen)
> + return (GRE_OPT_EINVAL);
> +
> + if (cm->cmsg_level == level &&
> +    cm->cmsg_type == type &&
> +    cm->cmsg_len == mlen)
> + return (CMSG_DATA(cm));
> +
> + clen -= CMSG_ALIGN(cm->cmsg_len);
> + cmsgs += CMSG_ALIGN(cm->cmsg_len);
> + } while (clen);
> +
> + return (NULL);
> +}
> +
> +static int
> +gre_send(const struct gre_ops *ops, struct gre_pcb *gpcb,
> +    struct mbuf *m, struct mbuf *addr, struct mbuf *control)
> +{
> + int error;
> +
> + error = (*ops->op_send)(ops, gpcb, m, addr, control);
> +
> + m_freem(control);
> +
> + return (error);
> +}
> +
> +static int
> +gre_output(const struct gre_ops *ops, struct gre_pcb *gpcb,
> +    struct mbuf *m, struct mbuf *addr, struct mbuf *control, void *opts)
> +{
> + const struct gre_pcb_key *gk = gpcb->gpcb_pcb_key;
> + struct gre_pcb_key key;
> + struct gre_header *gh;
> + int error;
> +
> + if (addr != NULL) {
> + struct gre_pcb_key *ogk;
> + struct inpcb *inp = gpcb_inp(gpcb);
> + struct sockaddr *sa;
> + int state = GRE_S_DISCONNECTED;
> +
> + if (gk != NULL)
> + state = gk->gk_state;
> +
> + if (state == GRE_S_CONNECTED) {
> + error = EISCONN;
> + goto drop;
> + }
> +
> + error = (*ops->op_nametosa)(inp, addr, &sa);
> + if (error != 0)
> + goto drop;
> +
> + key.gk_family = sa->sa_family;
> + key.gk_proto = (*ops->op_proto)(sa);
> +
> + if (state != GRE_S_DISCONNECTED) {
> + KASSERT(key.gk_family == gk->gk_family);
> + if (key.gk_proto != gk->gk_proto) {
> + error = EADDRNOTAVAIL;
> + goto drop;
> + }
> + }
> + if (state == GRE_S_BOUND) {
> + key.gk_laddr = gk->gk_laddr;
> + } else {
> + error = (*ops->op_selsrc)(inp, sa, &key.gk_laddr,
> +    opts);
> + if (error != 0)
> + goto drop;
> + }
> +
> + (*ops->op_addr)(&key.gk_faddr, sa);
> +
> + key.gk_rtableid = inp->inp_rtableid;
> + key.gk_flags = gpcb->gpcb_flags;
> + key.gk_key = gpcb->gpcb_key;
> +
> + ogk = RBT_FIND(gre_tree_connected, &gre_connected, &key);
> + if (ogk != NULL) {
> + struct gre_pcb *ogpcb;
> + int reuse = gpcb->gpcb_reuse;
> +
> + ogpcb = gre_pcb_first(&ogk->gk_pcbs);
> + if (ogpcb != NULL) {
> + struct socket *oso = gpcb_so(ogpcb);
> + if (!ISSET(reuse, oso->so_options)) {
> + error = EADDRINUSE;
> + goto drop;
> + }
> + }
> + }
> +
> + gk = &key;
> + } else {
> + if (gk == NULL || gk->gk_state != GRE_S_CONNECTED) {
> + error = ENOTCONN;
> + goto drop;
> + }
> + }
> +
> + if (ISSET(gk->gk_flags, htons(GRE_SP))) {
> + struct gre_h_seq *gsh;
> + uint32_t *seqno;
> +
> + seqno = gre_opt(control, IPPROTO_GRE, GRE_SENDSEQ,
> +    sizeof(*seqno));
> + if (seqno == GRE_OPT_EINVAL) {
> + error = EINVAL;
> + goto drop;
> + }
> +
> + m = m_prepend(m, sizeof(*gsh), M_DONTWAIT);
> + if (m == NULL)
> + return (ENOBUFS);
> +
> + gsh = mtod(m, struct gre_h_seq *);
> + htobem32(&gsh->gre_seq, seqno != NULL ?
> +    (gpcb->gpcb_seq = *seqno) : /* keep track of new start */
> +    atomic_inc_int_nv(&gpcb->gpcb_seq));
> + }
> +
> + if (ISSET(gk->gk_flags, htons(GRE_KP))) {
> + struct gre_h_key *gkh;
> +
> + m = m_prepend(m, sizeof(*gkh), M_DONTWAIT);
> + if (m == NULL)
> + return (ENOBUFS);
> +
> + gkh = mtod(m, struct gre_h_key *);
> + gkh->gre_key = gk->gk_key;
> + }
> +
> + if (ISSET(gk->gk_flags, htons(GRE_CP))) {
> + struct gre_h_cksum *gch;
> +
> + m = m_prepend(m, sizeof(*gch), M_DONTWAIT);
> + if (m == NULL)
> + return (ENOBUFS);
> +
> + gch = mtod(m, struct gre_h_cksum *);
> + gch->gre_cksum = 0; /* XXX need to checksum */
> + gch->gre_reserved1 = 0;
> + }
> +
> + m = m_prepend(m, sizeof(*gh), M_DONTWAIT);
> + if (m == NULL)
> + return (ENOBUFS);
> +
> + gh = mtod(m, struct gre_header *);
> + gh->gre_flags = gk->gk_flags;
> + gh->gre_proto = gk->gk_proto;
> +
> + KASSERT(ISSET(m->m_flags, M_PKTHDR));
> +
> + m->m_pkthdr.ph_rtableid = gpcb_inp(gpcb)->inp_rtableid;
> +
> + return ((*ops->op_output)(gpcb, gk, m, opts));
> +
> +drop:
> + m_freem(m);
> + return (error);
> +}
> +
> +static void
> +gre_sbappend(struct gre_pcb *gpcb, struct sockaddr *sa, struct mbuf *m,
> +    struct mbuf *opts, int hlen, uint32_t seqno)
> +{
> + struct socket *so = gpcb_so(gpcb);
> +
> + if (ISSET(gpcb->gpcb_pflags, GREPCB_RECVSEQ)) {
> + struct mbuf *opt = sbcreatecontrol(&seqno, sizeof(seqno),
> +    GRE_RECVSEQ, IPPROTO_GRE);
> + if (opt != NULL) {
> + opt->m_next = opts;
> + opts = opt;
> + }
> + }
> +
> + m_adj(m, hlen);
> + if (sbappendaddr(so, &so->so_rcv, sa, m, opts) == 0) {
> + m_freem(m);
> + m_freem(opts);
> + return;
> + }
> +
> + sorwakeup(so);
> +}
> +
> +int
> +gre_ip4_usrreq(struct socket *so, int req, struct mbuf *m, struct mbuf *addr,
> +    struct mbuf *control, struct proc *p)
> +{
> + return (gre_usrreq(&gre_ip4_ops, so, req, m, addr, control, p));
> +}
> +
> +int
> +gre_ip4_ctloutput(int op, struct socket *so, int level, int optname,
> +    struct mbuf *m)
> +{
> + return (gre_ctloutput(&gre_ip4_ops, op, so, level, optname, m));
> +}
> +
> +static void
> +gre_ip4_sbappend(struct gre_pcb *gpcb, struct mbuf *m, int hlen, uint32_t seqno)
> +{
> + struct inpcb *inp = gpcb_inp(gpcb);
> + struct socket *so = gpcb_so(gpcb);
> + struct mbuf *opts = NULL;
> + struct sockaddr_in sin = {
> + .sin_len = sizeof(sin),
> + .sin_family = AF_INET,
> + .sin_port = inp->inp_lport,
> + .sin_addr = mtod(m, struct ip *)->ip_src,
> + };
> +
> + if (ISSET(inp->inp_flags, INP_CONTROLOPTS) ||
> +    ISSET(so->so_options, SO_TIMESTAMP))
> + ip_savecontrol(inp, &opts, mtod(m, struct ip *), m);
> +
> + if (ISSET(inp->inp_flags, INP_RECVDSTPORT)) {
> + struct mbuf *opt;
> +
> + opt = sbcreatecontrol(&inp->inp_lport, sizeof(inp->inp_lport),
> +    IP_RECVDSTPORT, IPPROTO_IP);
> + if (opt != NULL) {
> + opt->m_next = opts;
> + opts = opt;
> + }
> + }
> +
> + gre_sbappend(gpcb, sintosa(&sin), m, opts, hlen, seqno);
> +}
> +
> +int
> +gre_ip4_input(struct mbuf **mp, int *offp, int proto, int af)
> +{
> + struct mbuf *m = *mp;
> + struct gre_pcb_key gk;
> + struct ip *ip;
> +
> + m = gre_if4_input(m, *offp);
> + if (m == NULL)
> + return (IPPROTO_DONE);
> +
> + ip = mtod(m, struct ip *);
> +
> + gk.gk_family = AF_INET;
> + gk.gk_laddr4 = ip->ip_dst;
> + gk.gk_faddr4 = ip->ip_src;
> +
> + m = gre_ip_input(&gre_ip4_ops, m, *offp, ip->ip_ttl, &gk);
> + if (m == NULL)
> + return (IPPROTO_DONE);
> +
> + *mp = m;
> + return (rip_input(mp, offp, proto, af));
> +}
> +
> +#if INET6
> +#include <netinet6/ip6_var.h>
> +#include <netinet6/in6_var.h>
> +
> +static int gre_ip6_nametosa(struct inpcb *, struct mbuf *,
> +    struct sockaddr **);
> +static int gre_ip6_is_wildcard(struct sockaddr *);
> +static int gre_ip6_is_multicast(struct sockaddr *);
> +static int gre_ip6_is_broadcast(unsigned int, struct sockaddr *);
> +static int gre_ip6_is_local(unsigned int, struct sockaddr *);
> +static uint16_t gre_ip6_proto(struct sockaddr *);
> +static void gre_ip6_addr(union inpaddru *, struct sockaddr *);
> +static int gre_ip6_selsrc(struct inpcb *, struct sockaddr *,
> +    union inpaddru *, void *);
> +static void gre_ip6_sbappend(struct gre_pcb *, struct mbuf *, int,
> +    uint32_t);
> +static int gre_ip6_send(const struct gre_ops *, struct gre_pcb *,
> +    struct mbuf *, struct mbuf *, struct mbuf *);
> +static int gre_ip6_output(struct gre_pcb *, const struct gre_pcb_key *,
> +    struct mbuf *, void *);
> +
> +static const struct gre_ops gre_ip6_ops = {
> + .op_nametosa = gre_ip6_nametosa,
> + .op_is_wildcard = gre_ip6_is_wildcard,
> + .op_is_multicast = gre_ip6_is_multicast,
> + .op_is_broadcast = gre_ip6_is_broadcast,
> + .op_is_local = gre_ip6_is_local,
> +
> + .op_proto = gre_ip6_proto,
> + .op_addr = gre_ip6_addr,
> +
> + .op_selsrc = gre_ip6_selsrc,
> + .op_control = in6_control,
> + .op_getsockname = in6_setsockaddr,
> + .op_getpeername = in6_setpeeraddr,
> + .op_ctloutput = ip6_ctloutput,
> + .op_sbappend = gre_ip6_sbappend,
> + .op_send = gre_ip6_send,
> + .op_output = gre_ip6_output,
> +
> + .defttl = &ip6_defhlim,
> +};
> +
> +static int
> +gre_ip6_nametosa(struct inpcb *inp, struct mbuf *addr, struct sockaddr **sa)
> +{
> + struct sockaddr_in6 *sin6;
> + int error;
> +
> + error = in6_nam2sin6(addr, &sin6);
> + if (error != 0)
> + return (error);
> +
> + /* reject IPv4 mapped addresses, we have no support for them */
> + if (IN6_IS_ADDR_V4MAPPED(&sin6->sin6_addr))
> + return (EADDRNOTAVAIL);
> +
> + if (in6_embedscope(&sin6->sin6_addr, sin6, inp) != 0)
> + return (EINVAL);
> +
> + *sa = sin6tosa(sin6);
> + return (0);
> +}
> +
> +static int
> +gre_ip6_is_wildcard(struct sockaddr *sa)
> +{
> + struct sockaddr_in6 *sin6 = satosin6(sa);
> + return (IN6_IS_ADDR_UNSPECIFIED(&sin6->sin6_addr));
> +}
> +
> +static int
> +gre_ip6_is_multicast(struct sockaddr *sa)
> +{
> + struct sockaddr_in6 *sin6 = satosin6(sa);
> + return (IN6_IS_ADDR_MULTICAST(&sin6->sin6_addr));
> +}
> +
> +static int
> +gre_ip6_is_broadcast(unsigned int rtableid, struct sockaddr *sa)
> +{
> + return (0);
> +}
> +
> +static int
> +gre_ip6_is_local(unsigned int rtableid, struct sockaddr *sa)
> +{
> + struct sockaddr_in6 *sin6 = satosin6(sa);
> + struct sockaddr_in6 needle = {
> + .sin6_len = sin6->sin6_len,
> + .sin6_family = sin6->sin6_family,
> + .sin6_addr = sin6->sin6_addr,
> + };
> + struct ifaddr *ifa;
> +
> + ifa = ifa_ifwithaddr(sin6tosa(&needle), rtableid);
> + if (ifa == NULL)
> + return (EADDRNOTAVAIL);
> +
> + /*
> + * bind to an anycast address might accidentally
> + * cause sending a packet with an anycast source
> + * address, so we forbid it.
> + *
> + * We should allow to bind to a deprecated address,
> + * since the application dare to use it.
> + * But, can we assume that they are careful enough
> + * to check if the address is deprecated or not?
> + * Maybe, as a safeguard, we should have a setsockopt
> + * flag to control the bind(2) behavior against
> + * deprecated addresses (default: forbid bind(2)).
> + */
> + if (ISSET(ifatoia6(ifa)->ia6_flags, IN6_IFF_ANYCAST|
> +    IN6_IFF_TENTATIVE|IN6_IFF_DUPLICATED|IN6_IFF_DETACHED))
> + return (EADDRNOTAVAIL);
> +
> + return (0);
> +}
> +
> +static uint16_t
> +gre_ip6_proto(struct sockaddr *sa)
> +{
> + struct sockaddr_in6 *sin6 = satosin6(sa);
> + return (sin6->sin6_port);
> +}
> +
> +static void
> +gre_ip6_addr(union inpaddru *addr, struct sockaddr *sa)
> +{
> + struct sockaddr_in6 *sin6 = satosin6(sa);
> + addr->iau_addr6 = sin6->sin6_addr;
> +}
> +
> +static int
> +gre_ip6_selsrc(struct inpcb *inp, struct sockaddr *sa, union inpaddru *laddr,
> +    void *opts)
> +{
> + struct in6_addr *in6src;
> + int error;
> +
> + if (opts == NULL)
> + opts = inp->inp_outputopts6;
> +
> + error = in6_pcbselsrc(&in6src, satosin6(sa), inp, opts);
> + if (error != 0)
> + return (error);
> +
> + laddr->iau_addr6 = *in6src;
> +
> + return (0);
> +}
> +
> +int
> +gre_ip6_usrreq(struct socket *so, int req, struct mbuf *m, struct mbuf *addr,
> +    struct mbuf *control, struct proc *p)
> +{
> + return (gre_usrreq(&gre_ip6_ops, so, req, m, addr, control, p));
> +}
> +
> +int
> +gre_ip6_ctloutput(int op, struct socket *so, int level, int optname,
> +    struct mbuf *m)
> +{
> + return (gre_ctloutput(&gre_ip6_ops, op, so, level, optname, m));
> +}
> +
> +static void
> +gre_ip6_sbappend(struct gre_pcb *gpcb, struct mbuf *m, int hlen, uint32_t seqno)
> +{
> + struct inpcb *inp = gpcb_inp(gpcb);
> + struct socket *so = gpcb_so(gpcb);
> + struct mbuf *opts = NULL;
> + struct sockaddr_in6 sin6 = {
> + .sin6_len = sizeof(sin6),
> + .sin6_family = AF_INET6,
> + .sin6_port = inp->inp_lport,
> + };
> +
> + in6_recoverscope(&sin6, &mtod(m, struct ip6_hdr *)->ip6_src);
> +
> + if (ISSET(inp->inp_flags, IN6P_CONTROLOPTS) ||
> +    ISSET(so->so_options, SO_TIMESTAMP))
> + ip6_savecontrol(inp, m, &opts);
> +
> + if (ISSET(inp->inp_flags, IN6P_RECVDSTPORT)) {
> + struct mbuf *opt;
> +
> + opt = sbcreatecontrol(&inp->inp_fport, sizeof(inp->inp_fport),
> +    IPV6_RECVDSTPORT, IPPROTO_IPV6);
> + if (opt != NULL) {
> + opt->m_next = opts;
> + opts = opt;
> + }
> + }
> +
> + gre_sbappend(gpcb, sin6tosa(&sin6), m, opts, hlen, seqno);
> +}
> +
> +static int
> +gre_ip6_send(const struct gre_ops *ops, struct gre_pcb *gpcb, struct mbuf *m,
> +    struct mbuf *addr, struct mbuf *control)
> +{
> + struct inpcb *inp = gpcb_inp(gpcb);
> + struct ip6_pktopts *opts = inp->inp_outputopts6;
> + struct ip6_pktopts opt;
> + int error;
> +
> + if (control != NULL) {
> + error = ip6_setpktopts(control, &opt, opts, /* priv */ 1,
> +    IPPROTO_GRE);
> + if (error != 0) {
> + m_freem(m);
> + return (error);
> + }
> + opts = &opt;
> + }
> +
> + error = gre_output(ops, gpcb, m, addr, control, opts);
> + if (control != NULL)
> + ip6_clearpktopts(&opt, -1);
> + return (error);
> +}
> +
> +static int
> +gre_ip6_output(struct gre_pcb *gpcb, const struct gre_pcb_key *gk,
> +    struct mbuf *m, void *opts)
> +{
> + struct inpcb *inp = gpcb_inp(gpcb);
> + uint16_t len = m->m_pkthdr.len;
> + struct ip6_hdr *ip6;
> + int flags = 0;
> + int error;
> +
> + if (len > IP_MAXPACKET) {
> + error = EMSGSIZE;
> + goto drop;
> + }
> +
> + m = m_prepend(m, sizeof(*ip6), M_DONTWAIT);
> + if (m == NULL) {
> + error = ENOBUFS;
> + goto dropped;
> + }
> +
> + ip6 = mtod(m, struct ip6_hdr *);
> + ip6->ip6_vfc = IPV6_VERSION;
> + ip6->ip6_plen = htons(len);
> + ip6->ip6_nxt = IPPROTO_GRE;
> + ip6->ip6_hlim = gpcb->gpcb_ttl;
> + ip6->ip6_src = gk->gk_laddr6;
> + ip6->ip6_dst = gk->gk_faddr6;
> +
> + if (ISSET(inp->inp_flags, IN6P_MINMTU)) /* wtf */
> + flags |= IPV6_MINMTU;
> +
> + error = ip6_output(m, opts, &inp->inp_route6,
> +    flags, inp->inp_moptions6, inp);
> +
> + return (error);
> +
> +drop:
> + m_freem(m);
> +dropped:
> + return (error);
> +}
> +
> +int
> +gre_ip6_input(struct mbuf **mp, int *offp, int proto, int af)
> +{
> + struct mbuf *m = *mp;
> + struct gre_pcb_key gk;
> + struct ip6_hdr *ip6;
> +
> + m = gre_if6_input(m, *offp);
> + if (m == NULL)
> + return (IPPROTO_DONE);
> +
> + ip6 = mtod(m, struct ip6_hdr *);
> +
> + gk.gk_family = AF_INET6;
> + gk.gk_laddr6 = ip6->ip6_dst;
> + gk.gk_faddr6 = ip6->ip6_src;
> +
> + m = gre_ip_input(&gre_ip6_ops, m, *offp, ip6->ip6_hlim, &gk);
> + if (m == NULL)
> + return (IPPROTO_DONE);
> +
> + *mp = m;
> + return (rip6_input(mp, offp, proto, af));
> +}
> +#endif /* INET6 */
> +
> +/*
> + * generic GRE protocol handling
> + */
> +
> +void
> +gre_init(void)
> +{
> + pool_init(&gre_pcb_key_pool, sizeof(struct gre_pcb_key), 0,
> +    IPL_NONE, PR_WAITOK, "grekey", NULL);
> + pool_init(&gre_pcb_pool, sizeof(struct gre_pcb), 0,
> +    IPL_NONE, PR_WAITOK, "grepcb", NULL);
> +}
> +
> +int
> +gre_attach(struct socket *so, int proto)
> +{
> + const struct gre_ops *ops = &gre_ip4_ops;
> + struct gre_pcb *gpcb;
> + struct inpcb *inp;
> + int error;
> + int flags = 0;
> +
> + if (so->so_pcb != NULL)
> + return (EINVAL);
> +
> + error = suser(curproc);
> + if (error != 0)
> + return (error);
> +
> + error = soreserve(so, gre_sendspace, gre_recvspace);
> + if (error != 0)
> + return (error);
> +
> +#ifdef INET6
> + if (sotopf(so) == PF_INET6) {
> + ops = &gre_ip6_ops;
> + flags = INP_IPV6;
> + }
> +#endif
> +
> + gpcb = pool_get(&gre_pcb_pool, PR_NOWAIT | PR_ZERO);
> + if (gpcb == NULL)
> + return (ENOBUFS);
> +
> + inp = gpcb_inp(gpcb);
> + inp->inp_socket = so;
> + refcnt_init(&inp->inp_refcnt);
> + inp->inp_seclevel[SL_AUTH] = IPSEC_AUTH_LEVEL_DEFAULT;
> + inp->inp_seclevel[SL_ESP_TRANS] = IPSEC_ESP_TRANS_LEVEL_DEFAULT;
> + inp->inp_seclevel[SL_ESP_NETWORK] = IPSEC_ESP_NETWORK_LEVEL_DEFAULT;
> + inp->inp_seclevel[SL_IPCOMP] = IPSEC_IPCOMP_LEVEL_DEFAULT;
> + inp->inp_rtableid = curproc->p_p->ps_rtableid;
> + inp->inp_ip_minttl = 0;
> + inp->inp_flags |= flags;
> + inp->inp_hops = *ops->defttl;
> + inp->inp_ppcb = (caddr_t)gpcb;
> +
> + so->so_pcb = inp;
> +
> + return (0);
> +}
> +
> +static void
> +gre_inpdetach(struct gre_pcb *gpcb)
> +{
> + struct socket *so = gpcb_so(gpcb);
> + struct inpcb *inp = gpcb_inp(gpcb);
> +
> + KASSERT(so->so_pcb == inp);
> +
> + so->so_pcb = NULL;
> + sofree(so, SL_NOUNLOCK);
> +
> + m_freem(inp->inp_options);
> + if (inp->inp_route.ro_rt) {
> + rtfree(inp->inp_route.ro_rt);
> + inp->inp_route.ro_rt = NULL;
> + }
> +
> + switch (ISSET(inp->inp_flags, INP_IPV6)) {
> + case 0:
> + ip_freemoptions(inp->inp_moptions);
> + break;
> +#ifdef INET6
> + case INP_IPV6:
> + ip6_freepcbopts(inp->inp_outputopts6);
> + ip6_freemoptions(inp->inp_moptions6);
> + break;
> +#endif
> + }
> +
> + KASSERT((struct gre_pcb *)inp->inp_ppcb == (struct gre_pcb *)inp);
> +
> + (void)gre_disconnect(gpcb);
> + pool_put(&gre_pcb_pool, gpcb);
> +}
> +
> +int
> +gre_detach(struct socket *so)
> +{
> + struct inpcb *inp;
> +
> + soassertlocked(so);
> +
> + inp = sotoinpcb(so);
> + if (inp == NULL)
> + return (EINVAL);
> +
> + gre_inpdetach(inp_gpcb(inp));
> +
> + return (0);
> +}
> +
> +static int
> +gre_bind(const struct gre_ops *ops, struct gre_pcb *gpcb, struct mbuf *addr)
> +{
> + struct inpcb *inp = gpcb_inp(gpcb);
> + struct socket *so = gpcb_so(gpcb);
> + struct sockaddr *sa;
> + unsigned int state = GRE_S_BOUND;
> + struct gre_pcb_key *gk, *ogk;
> + int reuse = ISSET(so->so_options, SO_REUSEPORT);
> + int error;
> +
> + if (gpcb->gpcb_pcb_key != NULL)
> + return (EISCONN);
> +
> + error = (*ops->op_nametosa)(inp, addr, &sa);
> + if (error != 0)
> + return (error);
> +
> + if ((*ops->op_is_wildcard)(sa)) {
> + state = GRE_S_WILDCARD;
> + } else if ((*ops->op_is_multicast)(sa)) {
> + /*
> + * Treat SO_REUSEADDR as SO_REUSEPORT for multicast;
> + * allow complete duplication of binding if
> + * SO_REUSEPORT is set, or if SO_REUSEADDR is set
> + * and a multicast address is bound on both
> + * new and duplicated sockets.
> + */
> + if (ISSET(so->so_options, SO_REUSEADDR|SO_REUSEPORT))
> + reuse = SO_REUSEADDR|SO_REUSEPORT;
> + } else if (ISSET(so->so_options, SO_BINDANY) ||
> +    (*ops->op_is_broadcast)(inp->inp_rtableid, sa) == 0) {
> + /*
> + * we must check that we are binding to an address we
> + * own except when:
> + * - SO_BINDANY is set or
> + * - we are binding a UDP socket to 255.255.255.255 or
> + * - we are binding a UDP socket to one of our broadcast
> + *   addresses
> + */
> + ;
> + } else {
> + error = (*ops->op_is_local)(inp->inp_rtableid, sa);
> + if (error != 0)
> + return (error);
> + }
> +
> + gk = gre_pcb_key_get(gpcb);
> + if (gk == NULL)
> + return (ENOMEM);
> +
> + gk->gk_family = sa->sa_family;
> + (*ops->op_addr)(&gk->gk_laddr, sa);
> + gk->gk_proto = (*ops->op_proto)(sa);
> +
> + ogk = gre_pcb_key_insert(state, gk);
> + if (ogk != NULL) {
> + struct gre_pcb *ogpcb;
> +
> + gre_pcb_key_put(gk);
> +
> + ogpcb = gre_pcb_first(&ogk->gk_pcbs);
> + if (ogpcb != NULL) {
> + struct socket *oso = gpcb_so(ogpcb);
> + if (!ISSET(reuse, oso->so_options))
> + return (EADDRINUSE);
> + }
> +
> + gk = ogk;
> + }
> +
> + /* commit */
> +
> + gpcb->gpcb_pcb_key = gk;
> + gre_pcb_insert(&gk->gk_pcbs, gpcb);
> +
> + inp->inp_laddru = gk->gk_laddr;
> + inp->inp_lport = gk->gk_proto;
> +
> + gpcb->gpcb_reuse = reuse;
> +
> + return (0);
> +}
> +
> +static int
> +gre_connect(const struct gre_ops *ops, struct gre_pcb *gpcb, struct mbuf *addr)
> +{
> + struct inpcb *inp = gpcb_inp(gpcb);
> + struct socket *so = gpcb_so(gpcb);
> + struct gre_pcb_key *bgk = gpcb->gpcb_pcb_key;
> + struct gre_pcb_key *gk, *ogk;
> + struct sockaddr *sa;
> + int reuse = ISSET(so->so_options, SO_REUSEPORT);
> + int error;
> +
> + if (bgk != NULL && bgk->gk_state == GRE_S_CONNECTED)
> + return (EISCONN);
> +
> + error = (*ops->op_nametosa)(inp, addr, &sa);
> + if (error != 0)
> + return (error);
> +
> + /* don't allow connections to wildcard addresses */
> + if ((*ops->op_is_wildcard)(sa))
> + return (EADDRNOTAVAIL);
> +
> + gk = gre_pcb_key_get(gpcb);
> + if (gk == NULL)
> + return (ENOBUFS);
> +
> + if (bgk == NULL || bgk->gk_state == GRE_S_WILDCARD) {
> + error = (*ops->op_selsrc)(inp, sa, &gk->gk_laddr, NULL);
> + if (error != 0)
> + goto put;
> +
> + gk->gk_proto = (*ops->op_proto)(sa);
> + } else {
> + if (bgk->gk_proto != (*ops->op_proto)(sa)) {
> + error = EADDRNOTAVAIL;
> + goto put;
> + }
> +
> + gk->gk_laddr = bgk->gk_laddr;
> + gk->gk_proto = bgk->gk_proto;
> + reuse |= gpcb->gpcb_reuse;
> + }
> +
> + gk->gk_family = sa->sa_family;
> + (*ops->op_addr)(&gk->gk_faddr, sa);
> +
> + ogk = gre_pcb_key_insert(GRE_S_CONNECTED, gk);
> + if (ogk != NULL) {
> + struct gre_pcb *ogpcb;
> +
> + gre_pcb_key_put(gk);
> +
> + ogpcb = gre_pcb_first(&ogk->gk_pcbs);
> + if (ogpcb != NULL) {
> + struct socket *oso = gpcb_so(ogpcb);
> + KASSERTMSG(oso != NULL, "ogk %p ogpcb %p oso %p",
> +    ogk, ogpcb, oso);
> + if (!ISSET(reuse, oso->so_options))
> + return (EADDRINUSE);
> + }
> +
> + gk = ogk;
> + }
> +
> + /* commit */
> +
> + gre_disconnect(gpcb);
> + gpcb->gpcb_pcb_key = gk;
> + gre_pcb_insert(&gk->gk_pcbs, gpcb);
> +
> + inp->inp_laddru = gk->gk_laddr;
> + inp->inp_faddru = gk->gk_faddr;
> + inp->inp_lport = inp->inp_fport = gk->gk_proto;
> +
> + gpcb->gpcb_reuse = reuse;
> + soisconnected(so);
> + return (0);
> +
> +put:
> + gre_pcb_key_put(gk);
> + return (error);
> +}
> +
> +
> +static int
> +gre_getsockname(const struct gre_ops *ops, struct gre_pcb *gpcb,
> +    struct mbuf *addr)
> +{
> + struct gre_pcb_key *gk = gpcb->gpcb_pcb_key;
> +
> + if (gk == NULL)
> + return (ENOTCONN);
> +
> + (*ops->op_getsockname)(gpcb_inp(gpcb), addr);
> +
> + return (0);
> +}
> +
> +static int
> +gre_getpeername(const struct gre_ops *ops, struct gre_pcb *gpcb,
> +    struct mbuf *addr)
> +{
> + struct gre_pcb_key *gk = gpcb->gpcb_pcb_key;
> +
> + if (gk == NULL || gk->gk_state != GRE_S_CONNECTED)
> + return (ENOTCONN);
> +
> + (*ops->op_getpeername)(gpcb_inp(gpcb), addr);
> +
> + return (0);
> +}
> +
> +static int
> +gre_disconnect(struct gre_pcb *gpcb)
> +{
> + struct gre_pcb_key *gk;
> +
> + gk = gpcb->gpcb_pcb_key;
> + if (gk == NULL) {
> + return (ENOTCONN);
> + }
> +
> + gre_pcb_remove(&gk->gk_pcbs, gpcb);
> + if (gre_pcb_empty(&gk->gk_pcbs)) {
> + switch (gk->gk_state) {
> + case GRE_S_WILDCARD:
> + RBT_REMOVE(gre_tree_wildcards, &gre_wildcards, gk);
> + break;
> + case GRE_S_BOUND:
> + RBT_REMOVE(gre_tree_bound, &gre_bound, gk);
> + break;
> + case GRE_S_CONNECTED:
> + RBT_REMOVE(gre_tree_connected, &gre_connected, gk);
> + break;
> + }
> + pool_put(&gre_pcb_key_pool, gk);
> + }
> +
> + gpcb->gpcb_pcb_key = NULL;
> +
> + return (0);
> +}
> +
> +static int
> +gre_usrreq(const struct gre_ops *ops, struct socket *so, int req,
> +    struct mbuf *m, struct mbuf *addr, struct mbuf *control, struct proc *p)
> +{
> + struct inpcb *inp;
> + struct gre_pcb *gpcb;
> + int error = 0;
> +
> + if (req == PRU_CONTROL) {
> + return ((*ops->op_control)(so, (u_long)m, (caddr_t)addr,
> +    (struct ifnet *)control));
> + }
> +
> + soassertlocked(so);
> +
> + inp = sotoinpcb(so);
> + if (inp == NULL) {
> + error = EINVAL;
> + goto release;
> + }
> + gpcb = inp_gpcb(inp);
> +
> + switch (req) {
> + case PRU_BIND:
> + error = gre_bind(ops, gpcb, addr);
> + break;
> +
> + case PRU_LISTEN:
> + error = EOPNOTSUPP;
> + break;
> +
> + case PRU_CONNECT:
> + error = gre_connect(ops, gpcb, addr);
> + break;
> +
> + case PRU_CONNECT2:
> + error = EOPNOTSUPP;
> + break;
> +
> + case PRU_ACCEPT:
> + error = EOPNOTSUPP;
> + break;
> +
> + case PRU_DISCONNECT:
> + error = gre_disconnect(gpcb);
> + if (error != 0)
> + break;
> +
> + gpcb->gpcb_reuse = 0;
> + CLR(so->so_state, SS_ISCONNECTED); /* XXX cos udp_usrreq.c */
> + memset(&inp->inp_laddru, 0, sizeof(inp->inp_laddru));
> + memset(&inp->inp_faddru, 0, sizeof(inp->inp_faddru));
> + inp->inp_lport = inp->inp_fport = 0;
> + break;
> +
> + case PRU_SHUTDOWN:
> + socantsendmore(so);
> + break;
> +
> + case PRU_SEND:
> + return (gre_send(ops, gpcb, m, addr, control));
> +
> + case PRU_ABORT:
> + soisdisconnected(so);
> + gre_inpdetach(gpcb);
> + break;
> +
> + case PRU_SOCKADDR:
> + return (gre_getsockname(ops, gpcb, addr));
> + break;
> + case PRU_PEERADDR:
> + return (gre_getpeername(ops, gpcb, addr));
> + break;
> +
> + case PRU_SENSE:
> + /* stat: don't bother with a block size. */
> + break;
> +
> + case PRU_SENDOOB:
> + case PRU_FASTTIMO:
> + case PRU_SLOWTIMO:
> + case PRU_PROTORCV:
> + case PRU_PROTOSEND:
> + case PRU_RCVD:
> + case PRU_RCVOOB:
> + error = EOPNOTSUPP;
> + break;
> +
> + default:
> + panic("%s req %d", __func__, req);
> + }
> +
> +release:
> + switch (req) {
> + case PRU_RCVD:
> + case PRU_RCVOOB:
> + case PRU_SENSE:
> + break;
> + default:
> + m_freem(control);
> + m_freem(m);
> + break;
> + }
> +
> + return (error);
> +}
> +
> +static int
> +gre_setopt(struct gre_pcb *gpcb, int optname, struct mbuf *m)
> +{
> + /*
> + * only support changing these options when the socket is
> + * completely disconnected. the amount of code needed to try
> + * changing the gre_pcb_key was "quite large" and arguably
> + * not worth it.
> + */
> + switch (optname) {
> + case GRE_CKSUM:
> + case GRE_KEY:
> + case GRE_SEQ:
> + if (gpcb->gpcb_pcb_key != NULL)
> + return (EISCONN);
> + break;
> + }
> +
> + switch (optname) {
> + case GRE_CKSUM:
> + if (m == NULL || m->m_len != sizeof(int))
> + return (EINVAL);
> +
> + if (*mtod(m, int *))
> + SET(gpcb->gpcb_flags, htons(GRE_CP));
> + else
> + CLR(gpcb->gpcb_flags, htons(GRE_CP));
> + break;
> + case GRE_KEY:
> + if (m == NULL || m->m_len == 0) { /* disable key */
> + CLR(gpcb->gpcb_flags, htons(GRE_KP));
> + gpcb->gpcb_key = htonl(0);
> + break;
> + }
> +
> + if (m->m_len != sizeof(gpcb->gpcb_key))
> + return (EINVAL);
> +
> + SET(gpcb->gpcb_flags, htons(GRE_KP));
> + htobem32(&gpcb->gpcb_key, *mtod(m, uint32_t *));
> + break;
> + case GRE_SEQ:
> + if (m == NULL || m->m_len == 0) { /* disable seq */
> + CLR(gpcb->gpcb_flags, htons(GRE_SP));
> + gpcb->gpcb_seq = 0;
> + break;
> + }
> +
> + if (m->m_len != sizeof(gpcb->gpcb_seq))
> + return (EINVAL);
> +
> + SET(gpcb->gpcb_flags, htons(GRE_SP));
> + gpcb->gpcb_seq = *mtod(m, uint32_t *);
> + break;
> + case GRE_RECVSEQ:
> + if (m == NULL || m->m_len != sizeof(int))
> + return (EINVAL);
> +
> + if (*mtod(m, int *))
> + SET(gpcb->gpcb_pflags, GREPCB_RECVSEQ);
> + else
> + CLR(gpcb->gpcb_pflags, GREPCB_RECVSEQ);
> + break;
> + default:
> + return (ENOPROTOOPT);
> + break;
> + }
> +
> + return (0);
> +}
> +
> +static int
> +gre_getopt(struct gre_pcb *gpcb, int optname, struct mbuf *m)
> +{
> + switch (optname) {
> + case GRE_KEY:
> + if (!ISSET(gpcb->gpcb_flags, htons(GRE_KP)))
> + return (ENOTCONN);
> +
> + m->m_len = sizeof(gpcb->gpcb_key);
> + *mtod(m, uint32_t *) = bemtoh32(&gpcb->gpcb_key);
> + break;
> + case GRE_CKSUM:
> + m->m_len = sizeof(int);
> + *mtod(m, int *) = !!ISSET(gpcb->gpcb_flags, htons(GRE_CP));
> + break;
> + case GRE_SEQ:
> + if (!ISSET(gpcb->gpcb_flags, htons(GRE_SP)))
> + return (ENOTCONN);
> +
> + m->m_len = sizeof(gpcb->gpcb_seq);
> + *mtod(m, uint32_t *) = gpcb->gpcb_key;
> + break;
> + default:
> + return (ENOPROTOOPT);
> + }
> +
> + return (0);
> +}
> +
> +/*
> + * IP socket option processing.
> + */
> +static int
> +gre_ctloutput(const struct gre_ops *ops, int op, struct socket *so,
> +    int level, int optname, struct mbuf *m)
> +{
> + struct inpcb *inp;
> + struct gre_pcb *gpcb;
> + int error;
> +
> + inp = sotoinpcb(so);
> + if (inp == NULL)
> + return (ECONNRESET);
> + if (level != IPPROTO_GRE)
> + return ((*ops->op_ctloutput)(level, so, level, optname, m));
> +
> + gpcb = inp_gpcb(inp);
> +
> + switch (op) {
> + case PRCO_SETOPT:
> + error = gre_setopt(gpcb, optname, m);
> + break;
> + case PRCO_GETOPT:
> + error = gre_getopt(gpcb, optname, m);
> + break;
> + default:
> + panic("%s op %d", __func__, op);
> + }
> +
> + return (error);
> +}
> +
> +static void *
> +gre_pullup(struct mbuf **mp, int *offp, int len)
> +{
> + int hlen = *offp + len;
> + void *h;
> +
> + if ((*mp)->m_pkthdr.len < hlen)
> + return (NULL); /* decline */
> +
> + *mp = m_pullup(*mp, hlen);
> + if (*mp == NULL)
> + return (NULL);
> +
> + h = mtod(*mp, caddr_t) + *offp;
> + *offp = hlen;
> +
> + return (h);
> +}
> +
> +static int
> +gre_candeliver(const struct gre_pcb *gpcb, uint8_t ttl)
> +{
> + const struct inpcb *inp = gpcb_inp(gpcb);
> +
> + if (ISSET(inp->inp_socket->so_state, SS_CANTRCVMORE))
> + return (0);
> +
> + if (inp->inp_ip_minttl > ttl)
> + return (0);
> +
> + return (1);
> +}
> +
> +static struct mbuf *
> +gre_ip_input(const struct gre_ops *ops, struct mbuf *m, int iphlen,
> +    uint8_t ttl, struct gre_pcb_key *key)
> +{
> + int hlen = iphlen;
> + struct gre_header *gh;
> + struct gre_pcb_key *gk;
> + struct gre_pcb *gpcb, *ngpcb;
> + uint16_t cksum = 0;
> + uint32_t seqno = 0;
> +
> + gh = gre_pullup(&m, &hlen, sizeof(*gh));
> + if (gh == NULL)
> + return (m);
> +
> + if ((gh->gre_flags & htons(GRE_VERS_MASK)) != htons(GRE_VERS_0))
> + return (m); /* decline */
> +
> + if (ISSET(gh->gre_flags, ~htons(GRE_VALID_MASK)))
> + return (m); /* decline */
> +
> + key->gk_flags = gh->gre_flags;
> + key->gk_proto = gh->gre_proto;
> +
> + if (ISSET(key->gk_flags, htons(GRE_CP))) {
> + struct gre_h_cksum *gch;
> +
> + gch = gre_pullup(&m, &hlen, sizeof(*gch));
> + if (gch == NULL)
> + return (m);
> +
> + cksum = gch->gre_cksum;
> +
> + /* XXX ignore Reserved (Offset) field */
> + }
> +
> + if (ISSET(key->gk_flags, htons(GRE_KP))) {
> + struct gre_h_key *gkh;
> +
> + gkh = gre_pullup(&m, &hlen, sizeof(*gkh));
> + if (gkh == NULL)
> + return (m);
> +
> + key->gk_key = gkh->gre_key;
> + }
> +
> + if (ISSET(key->gk_flags, htons(GRE_SP))) {
> + struct gre_h_seq *gsh;
> +
> + gsh = gre_pullup(&m, &hlen, sizeof(*gsh));
> + if (gsh == NULL)
> + return (m);
> +
> + seqno = bemtoh32(&gsh->gre_seq);
> + }
> +
> + key->gk_rtableid = m->m_pkthdr.ph_rtableid;
> +
> + gk = RBT_FIND(gre_tree_connected, &gre_connected, key);
> + if (gk == NULL) {
> + gk = RBT_FIND(gre_tree_bound, &gre_bound, key);
> + if (gk == NULL) {
> + gk = RBT_FIND(gre_tree_wildcards, &gre_wildcards, key);
> + if (gk == NULL)
> + return (m); /* decline */
> + }
> + }
> +
> + /* it's ours now */
> +
> + if (ISSET(key->gk_flags, htons(GRE_CP))) {
> + /* XXX actually do the checksum calc */
> + }
> +
> + gpcb = gre_pcb_first(&gk->gk_pcbs);
> + while (!gre_candeliver(gpcb, ttl)) {
> + gpcb = gre_pcb_next(gpcb);
> + if (gpcb == NULL)
> + goto drop;
> + }
> +
> + ngpcb = gpcb;
> + for (;;) {
> + struct mbuf *mm;
> +
> + ngpcb = gre_pcb_next(ngpcb);
> + if (ngpcb == NULL)
> + break;
> +
> + if (!gre_candeliver(ngpcb, ttl))
> + continue;
> +
> + mm = m_dup_pkt(m, 0, M_DONTWAIT);
> + if (mm == NULL) {
> + /* assume further copies will also fail */
> + break;
> + }
> +
> + (*ops->op_sbappend)(gpcb, mm, hlen, seqno);
> + gpcb = ngpcb;
> + }
> +
> + (*ops->op_sbappend)(gpcb, m, hlen, seqno);
> +
> + return (NULL);
> +
> +drop:
> + m_freem(m);
> + return (NULL);
> +}
> +
> +static int
> +gre_pcb_key_cmp_wildcard(const struct gre_pcb_key *a,
> +    const struct gre_pcb_key *b)
> +{
> + if (a->gk_proto > b->gk_proto)
> + return (1);
> + if (a->gk_proto < b->gk_proto)
> + return (-1);
> +
> + if (a->gk_flags > b->gk_flags)
> + return (1);
> + if (a->gk_flags < b->gk_flags)
> + return (-1);
> +
> + if (ISSET(a->gk_flags, htons(GRE_KP))) {
> + if (a->gk_key > b->gk_key)
> + return (1);
> + if (a->gk_key < b->gk_key)
> + return (-1);
> + }
> +
> + if (a->gk_rtableid > b->gk_rtableid)
> + return (1);
> + if (a->gk_rtableid < b->gk_rtableid)
> + return (-1);
> +
> + if (a->gk_family > b->gk_family)
> + return (1);
> + if (a->gk_family < b->gk_family)
> + return (-1);
> +
> + return (0);
> +}
> +
> +RBT_GENERATE(gre_tree_wildcards, gre_pcb_key, gk_entry,
> +    gre_pcb_key_cmp_wildcard);
> +
> +static int
> +gre_pcb_key_cmp_bound(const struct gre_pcb_key *a,
> +    const struct gre_pcb_key *b)
> +{
> + int rv;
> +
> + rv = gre_pcb_key_cmp_wildcard(a, b);
> + if (rv != 0)
> + return (rv);
> +
> + switch (a->gk_family) {
> + case AF_INET:
> + rv = memcmp(&a->gk_laddr4, &b->gk_laddr4, sizeof(a->gk_laddr4));
> + break;
> +#ifdef INET
> + case AF_INET6:
> + rv = memcmp(&a->gk_laddr6, &b->gk_laddr6, sizeof(a->gk_laddr6));
> + break;
> +#endif
> + default:
> + unhandled_af(a->gk_family);
> + }
> +
> + return (rv);
> +}
> +
> +RBT_GENERATE(gre_tree_bound, gre_pcb_key, gk_entry, gre_pcb_key_cmp_bound);
> +
> +static int
> +gre_pcb_key_cmp_connected(const struct gre_pcb_key *a,
> +    const struct gre_pcb_key *b)
> +{
> + int rv;
> +
> + rv = gre_pcb_key_cmp_bound(a, b);
> + if (rv != 0)
> + return (rv);
> +
> + switch (a->gk_family) {
> + case AF_INET:
> + rv = memcmp(&a->gk_faddr4, &b->gk_faddr4, sizeof(a->gk_faddr4));
> + break;
> +#ifdef INET
> + case AF_INET6:
> + rv = memcmp(&a->gk_faddr6, &b->gk_faddr6, sizeof(a->gk_faddr6));
> + break;
> +#endif
> + default:
> + unhandled_af(a->gk_family);
> + }
> +
> + return (rv);
> +}
> +
> +RBT_GENERATE(gre_tree_connected, gre_pcb_key, gk_entry,
> +    gre_pcb_key_cmp_connected);
> Index: sys/netinet6/in6_proto.c
> ===================================================================
> RCS file: /cvs/src/sys/netinet6/in6_proto.c,v
> retrieving revision 1.104
> diff -u -p -r1.104 in6_proto.c
> --- sys/netinet6/in6_proto.c 13 Jun 2019 08:12:11 -0000 1.104
> +++ sys/netinet6/in6_proto.c 29 Oct 2019 07:57:58 -0000
> @@ -117,7 +117,7 @@
>  
>  #include "gre.h"
>  #if NGRE > 0
> -#include <net/if_gre.h>
> +#include <netinet/gre_var.h>
>  #endif
>  
>  /*
> @@ -340,11 +340,22 @@ const struct protosw inet6sw[] = {
>    .pr_domain = &inet6domain,
>    .pr_protocol = IPPROTO_GRE,
>    .pr_flags = PR_ATOMIC|PR_ADDR,
> -  .pr_input = gre_input6,
> +  .pr_input = gre_ip6_input,
>    .pr_ctloutput = rip6_ctloutput,
>    .pr_usrreq = rip6_usrreq,
>    .pr_attach = rip6_attach,
>    .pr_detach = rip6_detach,
> +},
> +{
> +  .pr_type = SOCK_DGRAM,
> +  .pr_domain = &inet6domain,
> +  .pr_protocol = IPPROTO_GRE,
> +  .pr_flags = PR_ATOMIC|PR_ADDR,
> +  .pr_input = gre_ip6_input,
> +  .pr_ctloutput = gre_ip6_ctloutput,
> +  .pr_usrreq = gre_ip6_usrreq,
> +  .pr_attach = gre_attach,
> +  .pr_detach = gre_detach,
>  },
>  #endif /* NGRE */

Reply | Threaded
Open this post in threaded view
|

Re: GRE datagram socket support

Damien Miller
On Wed, 22 Jan 2020, David Gwynne wrote:

> Has anyone got an opinion on this? I am still interested in doing more
> packet capture things on OpenBSD using GRE as a transport, and the idea
> of maintaining this out of tree just makes me feel tired.

This is cool. I don't spot any major problems with this, but I'm rusty on
kernel networking.

> > Index: sys/kern/kern_pledge.c
> > ===================================================================
> > RCS file: /cvs/src/sys/kern/kern_pledge.c,v
> > retrieving revision 1.255
> > diff -u -p -r1.255 kern_pledge.c
> > --- sys/kern/kern_pledge.c 25 Aug 2019 18:46:40 -0000 1.255
> > +++ sys/kern/kern_pledge.c 29 Oct 2019 07:57:58 -0000
> > @@ -666,7 +666,7 @@ pledge_namei(struct proc *p, struct name
> >   }
> >   }
> >  
> > - /* DNS needs /etc/{resolv.conf,hosts,services}. */
> > + /* DNS needs /etc/{resolv.conf,hosts,services,protocols}. */
> >   if ((ni->ni_pledge == PLEDGE_RPATH) &&
> >      (p->p_p->ps_pledge & PLEDGE_DNS)) {
> >   if (strcmp(path, "/etc/resolv.conf") == 0) {
> > @@ -678,6 +678,10 @@ pledge_namei(struct proc *p, struct name
> >   return (0);
> >   }
> >   if (strcmp(path, "/etc/services") == 0) {
> > + ni->ni_cnd.cn_flags |= BYPASSUNVEIL;
> > + return (0);
> > + }
> > + if (strcmp(path, "/etc/protocols") == 0) {
> >   ni->ni_cnd.cn_flags |= BYPASSUNVEIL;
> >   return (0);

This looks like it is fixing a real, separate bug in pledge vs
getaddrinfo, no? (specifically: that lookups for named ports will fail
currently).

Reply | Threaded
Open this post in threaded view
|

Re: GRE datagram socket support

YASUOKA Masahiko-3
In reply to this post by David Gwynne-5
Hi,

I think that is a good idea.

On Wed, 22 Jan 2020 08:35:05 +1000
David Gwynne <[hidden email]> wrote:

> Has anyone got an opinion on this? I am still interested in doing more
> packet capture things on OpenBSD using GRE as a transport, and the idea
> of maintaining this out of tree just makes me feel tired.
>
> On Tue, Oct 29, 2019 at 06:34:50PM +1000, David Gwynne wrote:
>> i've been toying with this idea of implementing GRE as a datagram
>> protocol that userland can use just like UDP. the idea is to make it
>> easy to support the implementation of NHRP in userland for mgre(4),
>> and also for ERSPAN* support without going down the path linux took**.
>>
>> so this is the result of having a go at implementing the idea. the diff
>> includes several independent parts, but they all work together to make
>> GRE as comfortable to use as UDP. the two main parts are the actual
>> protocol implementation in src/sys/netinet/ip_gre.c, and the tweaks to
>> getaddrinfo to allow the resolution of gre services. the /etc/services
>> chunk gets used by the getaddrinfo bits.
>>
>> so, the first chunk lets you do this (as root in userland):
>>
>> int s = socket(AF_INET, SOCK_DGRAM, IPPROTO_GRE);
>>
>> that gives you a file descriptor you can then use with bind(),
>> connect(), sendto(), recvfrom(), etc. you write a message to the
>> kernel and it prepends the GRE and IP headers and pushes it out.
>> it is set up so the GRE protocol is handed to the kernel via the
>> sin_port or sin6_port member of struct sockaddr_in an sockaddr_in6
>> respectively. there's no source and destination protocol fields, just
>> one that both ends agree on, so if you connect then bind, your
>> sockaddrs have to agree on the proto. unfortunately there's no such
>> thing as a wildcard or reserved protocol in GRE, so 0 can't be used
>> as a wildcard like it can in udp and tcp.
>>
>> the sockets support the configuration of optional GRE headers, as
>> defined in RFC 2890, using setsockopt. importantly you can enable
>> the key and sequence number headers, which again, the kernel offloads
>> for you.
>>
>> the second chunk tweaks getaddrinfo so it lets you specify things other
>> than IPPROTO_UDP and IPPROTO_TCP. protocols other than those are now
>> looked up in /etc/protocols to get their name, which in turn is used to
>> look up entries in /etc/services. while i was there and reading rfcs, i
>> noted different behaviour for wildcarded socktypes and protocols, which
>> i've tried to implement. eric@ seems generally ok with this stuff, and
>> suggested the tweak to pledge to allow access to /etc/protocols using
>> the dns pledge. tcp and udp are still special though, and are still
>> omgoptimised.
>>
>> all this together lets the program at
>> https://mild.embarrassm.net/~dlg/diff/egred.c work. it is a userland
>> reimplementation of a simplified egre(4) using tap(4) and a gre socket.
>> the io path is literally reading from one fd and writing it to the othe,
>> everything else is boilerplate.
>>
>> i suspect the kernel stuff is a bit rough as i havent had to test every
>> path, but it supports common functionality.
>>
>> thoughts? i am pretty pleased with this has turned out, and would be
>> keen to put it in the tree and work on it some more.
>>
>> * https://tools.ietf.org/html/draft-foschiano-erspan-03
>> ** http://vger.kernel.org/lpc_net2018_talks/erspan-linux-presentation.pdf
>>
>> Index: etc/services
>> ===================================================================
>> RCS file: /cvs/src/etc/services,v
>> retrieving revision 1.96
>> diff -u -p -r1.96 services
>> --- etc/services 27 Jan 2019 20:35:06 -0000 1.96
>> +++ etc/services 29 Oct 2019 07:57:44 -0000
>> @@ -332,6 +332,21 @@ spamd-cfg 8026/tcp # spamd(8) configur
>>  dhcpd-sync 8067/udp # dhcpd(8) synchronisation
>>  hunt 26740/udp # hunt(6)
>>  #
>> +# GRE Protocol Types
>> +#
>> +keepalive 0/gre # 0x0000: IP tunnel keepalive
>> +ipv4 2048/gre # 0x0800: IPv4
>> +nhrp 8193/gre # 0x2001: Next Hop Resolution Protocol
>> +erspan3 8939/gre # 0x22eb: ERSPAN III
>> +transether 25944/gre ethernet # 0x6558: Trans Ether Bridging
>> +ipv6 34525/gre # 0x86dd: IPv6
>> +wccp 34878/gre # 0x883e: Web Content Cache Protocol
>> +mpls 34887/gre # 0x8847: MPLS
>> +#mpls 34888/gre # 0x8848: MPLS Multicast
>> +erspan 35006/gre erspan2 # 0x88be: ERSPAN I/II
>> +nsh 35151/gre # 0x894f: Network Service Header
>> +control 47082/gre # 0xb7ea: RFC 8157
>> +#
>>  # Appletalk
>>  #
>>  rtmp 1/ddp # Routing Table Maintenance Protocol
>> Index: lib/libc/asr/getaddrinfo_async.c
>> ===================================================================
>> RCS file: /cvs/src/lib/libc/asr/getaddrinfo_async.c,v
>> retrieving revision 1.56
>> diff -u -p -r1.56 getaddrinfo_async.c
>> --- lib/libc/asr/getaddrinfo_async.c 3 Nov 2018 09:13:24 -0000 1.56
>> +++ lib/libc/asr/getaddrinfo_async.c 29 Oct 2019 07:57:54 -0000
>> @@ -34,36 +34,15 @@
>>  
>>  #include "asr_private.h"
>>  
>> -struct match {
>> - int family;
>> - int socktype;
>> - int protocol;
>> -};
>> -
>>  static int getaddrinfo_async_run(struct asr_query *, struct asr_result *);
>>  static int get_port(const char *, const char *, int);
>> +static int get_service(const char *, int, int);
>>  static int iter_family(struct asr_query *, int);
>>  static int addrinfo_add(struct asr_query *, const struct sockaddr *, const char *);
>>  static int addrinfo_from_file(struct asr_query *, int,  FILE *);
>>  static int addrinfo_from_pkt(struct asr_query *, char *, size_t);
>>  static int addrconfig_setup(struct asr_query *);
>>  
>> -static const struct match matches[] = {
>> - { PF_INET, SOCK_DGRAM, IPPROTO_UDP },
>> - { PF_INET, SOCK_STREAM, IPPROTO_TCP },
>> - { PF_INET, SOCK_RAW, 0 },
>> - { PF_INET6, SOCK_DGRAM, IPPROTO_UDP },
>> - { PF_INET6, SOCK_STREAM, IPPROTO_TCP },
>> - { PF_INET6, SOCK_RAW, 0 },
>> - { -1, 0, 0, },
>> -};
>> -
>> -#define MATCH_FAMILY(a, b) ((a) == matches[(b)].family || (a) == PF_UNSPEC)
>> -#define MATCH_PROTO(a, b) ((a) == matches[(b)].protocol || (a) == 0 || matches[(b)].protocol == 0)
>> -/* Do not match SOCK_RAW unless explicitly specified */
>> -#define MATCH_SOCKTYPE(a, b) ((a) == matches[(b)].socktype || ((a) == 0 && \
>> - matches[(b)].socktype != SOCK_RAW))
>> -
>>  enum {
>>   DOM_INIT,
>>   DOM_DOMAIN,
>> @@ -199,24 +178,27 @@ getaddrinfo_async_run(struct asr_query *
>>   }
>>   }
>>  
>> - /* Make sure there is at least a valid combination */
>> - for (i = 0; matches[i].family != -1; i++)
>> - if (MATCH_FAMILY(ai->ai_family, i) &&
>> -    MATCH_SOCKTYPE(ai->ai_socktype, i) &&
>> -    MATCH_PROTO(ai->ai_protocol, i))
>> - break;
>> - if (matches[i].family == -1) {
>> - ar->ar_gai_errno = EAI_BADHINTS;
>> - async_set_state(as, ASR_STATE_HALT);
>> - break;
>> - }
>> -
>> - if (ai->ai_protocol == 0 || ai->ai_protocol == IPPROTO_UDP)
>> + switch (ai->ai_protocol) {
>> + case 0:
>>   as->as.ai.port_udp = get_port(as->as.ai.servname, "udp",
>>      as->as.ai.hints.ai_flags & AI_NUMERICSERV);
>> - if (ai->ai_protocol == 0 || ai->ai_protocol == IPPROTO_TCP)
>>   as->as.ai.port_tcp = get_port(as->as.ai.servname, "tcp",
>>      as->as.ai.hints.ai_flags & AI_NUMERICSERV);
>> + break;
>> + case IPPROTO_TCP:
>> + as->as.ai.port_tcp = get_port(as->as.ai.servname, "tcp",
>> +    as->as.ai.hints.ai_flags & AI_NUMERICSERV);
>> + break;
>> + case IPPROTO_UDP:
>> + as->as.ai.port_udp = get_port(as->as.ai.servname, "udp",
>> +    as->as.ai.hints.ai_flags & AI_NUMERICSERV);
>> + break;
>> + default:
>> + as->as.ai.port_udp = get_service(as->as.ai.servname,
>> +    ai->ai_protocol,
>> +    as->as.ai.hints.ai_flags & AI_NUMERICSERV);
>> + break;
>> + }
>>   if (as->as.ai.port_tcp == -2 || as->as.ai.port_udp == -2 ||
>>      (as->as.ai.port_tcp == -1 && as->as.ai.port_udp == -1) ||
>>      (ai->ai_protocol && (as->as.ai.port_udp == -1 ||
>> @@ -491,6 +473,24 @@ get_port(const char *servname, const cha
>>   return (port);
>>  }
>>  
>> +static int
>> +get_service(const char *servname, int protocol, int numonly)
>> +{
>> + struct protoent pe;
>> + struct protoent_data ped;
>> + int rv;
>> +
>> + memset(&ped, 0, sizeof(ped));
>> + rv = getprotobynumber_r(protocol, &pe, &ped);
>> + if (rv == -1)
>> + return (-1);
>> +
>> + rv = get_port(servname, pe.p_name, numonly);
>> + endprotoent_r(&ped);
>> +
>> + return (rv);
>> +}
>> +
>>  /*
>>   * Iterate over the address families that are to be queried. Use the
>>   * list on the async context, unless a specific family was given in hints.
>> @@ -519,65 +519,107 @@ iter_family(struct asr_query *as, int fi
>>   * entry per protocol/socktype match.
>>   */
>>  static int
>> -addrinfo_add(struct asr_query *as, const struct sockaddr *sa, const char *cname)
>> +addrinfo_add_ai(struct asr_query *as, const struct sockaddr *sa,
>> +    const char *cname, int socktype, int proto, int port)
>>  {
>>   struct addrinfo *ai;
>> - int i, port, proto;
>> -
>> - for (i = 0; matches[i].family != -1; i++) {
>> - if (matches[i].family != sa->sa_family ||
>> -    !MATCH_SOCKTYPE(as->as.ai.hints.ai_socktype, i) ||
>> -    !MATCH_PROTO(as->as.ai.hints.ai_protocol, i))
>> - continue;
>> -
>> - proto = as->as.ai.hints.ai_protocol;
>> - if (!proto)
>> - proto = matches[i].protocol;
>> -
>> - if (proto == IPPROTO_TCP)
>> - port = as->as.ai.port_tcp;
>> - else if (proto == IPPROTO_UDP)
>> - port = as->as.ai.port_udp;
>> - else
>> - port = 0;
>> + int i;
>>  
>> - /* servname specified, but not defined for this protocol */
>> - if (port == -1)
>> - continue;
>> + if (port == -1)
>> + return (0);
>>  
>> - ai = calloc(1, sizeof(*ai) + sa->sa_len);
>> - if (ai == NULL)
>> + ai = calloc(1, sizeof(*ai) + sa->sa_len);
>> + if (ai == NULL)
>> + return (EAI_MEMORY);
>> + ai->ai_family = sa->sa_family;
>> + ai->ai_socktype = socktype;
>> + ai->ai_protocol = proto;
>> + ai->ai_flags = as->as.ai.hints.ai_flags;
>> + ai->ai_addrlen = sa->sa_len;
>> + ai->ai_addr = (void *)(ai + 1);
>> + if (cname &&
>> +    as->as.ai.hints.ai_flags & (AI_CANONNAME | AI_FQDN)) {
>> + if ((ai->ai_canonname = strdup(cname)) == NULL) {
>> + free(ai);
>>   return (EAI_MEMORY);
>> - ai->ai_family = sa->sa_family;
>> - ai->ai_socktype = matches[i].socktype;
>> - ai->ai_protocol = proto;
>> - ai->ai_flags = as->as.ai.hints.ai_flags;
>> - ai->ai_addrlen = sa->sa_len;
>> - ai->ai_addr = (void *)(ai + 1);
>> - if (cname &&
>> -    as->as.ai.hints.ai_flags & (AI_CANONNAME | AI_FQDN)) {
>> - if ((ai->ai_canonname = strdup(cname)) == NULL) {
>> - free(ai);
>> - return (EAI_MEMORY);
>> - }
>>   }
>> - memmove(ai->ai_addr, sa, sa->sa_len);
>> - if (sa->sa_family == PF_INET)
>> - ((struct sockaddr_in *)ai->ai_addr)->sin_port =
>> -    htons(port);
>> - else if (sa->sa_family == PF_INET6)
>> - ((struct sockaddr_in6 *)ai->ai_addr)->sin6_port =
>> -    htons(port);
>> -
>> - if (as->as.ai.aifirst == NULL)
>> - as->as.ai.aifirst = ai;
>> - if (as->as.ai.ailast)
>> - as->as.ai.ailast->ai_next = ai;
>> - as->as.ai.ailast = ai;
>> - as->as_count += 1;
>>   }
>> + memmove(ai->ai_addr, sa, sa->sa_len);
>> + if (sa->sa_family == PF_INET)
>> + ((struct sockaddr_in *)ai->ai_addr)->sin_port =
>> +    htons(port);
>> + else if (sa->sa_family == PF_INET6)
>> + ((struct sockaddr_in6 *)ai->ai_addr)->sin6_port =
>> +    htons(port);
>> +
>> + if (as->as.ai.aifirst == NULL)
>> + as->as.ai.aifirst = ai;
>> + if (as->as.ai.ailast)
>> + as->as.ai.ailast->ai_next = ai;
>> + as->as.ai.ailast = ai;
>> + as->as_count += 1;
>>  
>>   return (0);
>> +}
>> +
>> +static int
>> +addrinfo_add_proto(struct asr_query *as, const struct sockaddr *sa,
>> +    const char *cname, int proto, int port)
>> +{
>> + int rv;
>> +
>> + switch (as->as.ai.hints.ai_socktype) {
>> + case 0:
>> + rv = addrinfo_add_ai(as, sa, cname, SOCK_STREAM, proto, port);
>> + if (rv != 0)
>> + break;
>> +
>> + rv = addrinfo_add_ai(as, sa, cname, SOCK_DGRAM, proto, port);
>> + if (rv != 0)
>> + break;
>> +
>> + break;
>> +
>> + default:
>> + rv = addrinfo_add_ai(as, sa, cname,
>> +    as->as.ai.hints.ai_socktype, proto, port);
>> + break;
>> + }
>> +
>> + return (rv);
>> +}
>> +
>> +static int
>> +addrinfo_add(struct asr_query *as, const struct sockaddr *sa, const char *cname)
>> +{
>> + int rv;
>> +
>> + switch (as->as.ai.hints.ai_protocol) {
>> + case 0:
>> + rv = addrinfo_add_proto(as, sa, cname,
>> +    IPPROTO_TCP, as->as.ai.port_tcp);
>> + if (rv != 0)
>> + break;
>> +
>> + rv = addrinfo_add_proto(as, sa, cname,
>> +    IPPROTO_UDP, as->as.ai.port_udp);
>> + if (rv != 0)
>> + break;
>> +
>> + break;
>> +
>> + case IPPROTO_TCP:
>> + rv = addrinfo_add_proto(as, sa, cname,
>> +    IPPROTO_TCP, as->as.ai.port_tcp);
>> + break;
>> +
>> + default: /* includes IPPROTO_UDP */
>> + rv = addrinfo_add_proto(as, sa, cname,
>> +    as->as.ai.hints.ai_protocol, as->as.ai.port_udp);
>> + break;
>> + }
>> +
>> + return (rv);
>>  }
>>  
>>  static int
>> Index: sys/conf/files
>> ===================================================================
>> RCS file: /cvs/src/sys/conf/files,v
>> retrieving revision 1.675
>> diff -u -p -r1.675 files
>> --- sys/conf/files 5 Oct 2019 05:33:14 -0000 1.675
>> +++ sys/conf/files 29 Oct 2019 07:57:58 -0000
>> @@ -862,7 +862,7 @@ file netinet/tcp_subr.c
>>  file netinet/tcp_timer.c
>>  file netinet/tcp_usrreq.c
>>  file netinet/udp_usrreq.c
>> -file netinet/ip_gre.c
>> +file netinet/ip_gre.c gre
>>  file netinet/ip_ipsp.c ipsec | tcp_signature
>>  file netinet/ip_spd.c ipsec | tcp_signature
>>  file netinet/ip_ipip.c
>> Index: sys/kern/kern_pledge.c
>> ===================================================================
>> RCS file: /cvs/src/sys/kern/kern_pledge.c,v
>> retrieving revision 1.255
>> diff -u -p -r1.255 kern_pledge.c
>> --- sys/kern/kern_pledge.c 25 Aug 2019 18:46:40 -0000 1.255
>> +++ sys/kern/kern_pledge.c 29 Oct 2019 07:57:58 -0000
>> @@ -666,7 +666,7 @@ pledge_namei(struct proc *p, struct name
>>   }
>>   }
>>  
>> - /* DNS needs /etc/{resolv.conf,hosts,services}. */
>> + /* DNS needs /etc/{resolv.conf,hosts,services,protocols}. */
>>   if ((ni->ni_pledge == PLEDGE_RPATH) &&
>>      (p->p_p->ps_pledge & PLEDGE_DNS)) {
>>   if (strcmp(path, "/etc/resolv.conf") == 0) {
>> @@ -678,6 +678,10 @@ pledge_namei(struct proc *p, struct name
>>   return (0);
>>   }
>>   if (strcmp(path, "/etc/services") == 0) {
>> + ni->ni_cnd.cn_flags |= BYPASSUNVEIL;
>> + return (0);
>> + }
>> + if (strcmp(path, "/etc/protocols") == 0) {
>>   ni->ni_cnd.cn_flags |= BYPASSUNVEIL;
>>   return (0);
>>   }
>> Index: sys/net/if_gre.c
>> ===================================================================
>> RCS file: /cvs/src/sys/net/if_gre.c,v
>> retrieving revision 1.152
>> diff -u -p -r1.152 if_gre.c
>> --- sys/net/if_gre.c 29 Jul 2019 16:28:25 -0000 1.152
>> +++ sys/net/if_gre.c 29 Oct 2019 07:57:58 -0000
>> @@ -69,6 +69,8 @@
>>  #include <netinet/ip_var.h>
>>  #include <netinet/ip_ecn.h>
>>  
>> +#include <netinet/gre_proto.h>
>> +
>>  #ifdef INET6
>>  #include <netinet/ip6.h>
>>  #include <netinet6/ip6_var.h>
>> @@ -103,28 +105,6 @@
>>  /*
>>   * packet formats
>>   */
>> -struct gre_header {
>> - uint16_t gre_flags;
>> -#define GRE_CP 0x8000  /* Checksum Present */
>> -#define GRE_KP 0x2000  /* Key Present */
>> -#define GRE_SP 0x1000  /* Sequence Present */
>> -
>> -#define GRE_VERS_MASK 0x0007
>> -#define GRE_VERS_0 0x0000
>> -#define GRE_VERS_1 0x0001
>> -
>> - uint16_t gre_proto;
>> -} __packed __aligned(4);
>> -
>> -struct gre_h_cksum {
>> - uint16_t gre_cksum;
>> - uint16_t gre_reserved1;
>> -} __packed __aligned(4);
>> -
>> -struct gre_h_key {
>> - uint32_t gre_key;
>> -} __packed __aligned(4);
>> -
>>  #define GRE_EOIP 0x6400
>>  
>>  struct gre_h_key_eoip {
>> @@ -132,13 +112,7 @@ struct gre_h_key_eoip {
>>   uint16_t eoip_tunnel_id; /* little endian */
>>  } __packed __aligned(4);
>>  
>> -#define NVGRE_VSID_RES_MIN 0x000000 /* reserved for future use */
>> -#define NVGRE_VSID_RES_MAX 0x000fff
>> -#define NVGRE_VSID_NVE2NVE 0xffffff /* vendor specific NVE-to-NVE comms */
>> -
>> -struct gre_h_seq {
>> - uint32_t gre_seq;
>> -} __packed __aligned(4);
>> +#define GRE_WCCP 0x883e
>>  
>>  struct gre_h_wccp {
>>   uint8_t wccp_flags;
>> @@ -147,7 +121,10 @@ struct gre_h_wccp {
>>   uint8_t pri_bucket;
>>  } __packed __aligned(4);
>>  
>> -#define GRE_WCCP 0x883e
>> +
>> +#define NVGRE_VSID_RES_MIN 0x000000 /* reserved for future use */
>> +#define NVGRE_VSID_RES_MAX 0x000fff
>> +#define NVGRE_VSID_NVE2NVE 0xffffff /* vendor specific NVE-to-NVE comms */
>>  
>>  #define GRE_HDRLEN (sizeof(struct ip) + sizeof(struct gre_header))
>>  
>> @@ -289,8 +266,8 @@ static int gre_up(struct gre_softc *);
>>  static int gre_down(struct gre_softc *);
>>  static void gre_link_state(struct ifnet *, unsigned int);
>>  
>> -static int gre_input_key(struct mbuf **, int *, int, int, uint8_t,
>> -    struct gre_tunnel *);
>> +static struct mbuf *
>> + gre_if_input(struct mbuf *, int, uint8_t, struct gre_tunnel *);
>>  
>>  static struct mbuf *
>>   gre_ipv4_patch(const struct gre_tunnel *, struct mbuf *,
>> @@ -893,10 +870,9 @@ eoip_clone_destroy(struct ifnet *ifp)
>>   return (0);
>>  }
>>  
>> -int
>> -gre_input(struct mbuf **mp, int *offp, int type, int af)
>> +struct mbuf *
>> +gre_if4_input(struct mbuf *m, int hlen)
>>  {
>> - struct mbuf *m = *mp;
>>   struct gre_tunnel key;
>>   struct ip *ip;
>>  
>> @@ -908,17 +884,13 @@ gre_input(struct mbuf **mp, int *offp, i
>>   key.t_src4 = ip->ip_dst;
>>   key.t_dst4 = ip->ip_src;
>>  
>> - if (gre_input_key(mp, offp, type, af, ip->ip_tos, &key) == -1)
>> - return (rip_input(mp, offp, type, af));
>> -
>> - return (IPPROTO_DONE);
>> + return (gre_if_input(m, hlen, ip->ip_tos, &key));
>>  }
>>  
>>  #ifdef INET6
>> -int
>> -gre_input6(struct mbuf **mp, int *offp, int type, int af)
>> +struct mbuf *
>> +gre_if6_input(struct mbuf *m, int hlen)
>>  {
>> - struct mbuf *m = *mp;
>>   struct gre_tunnel key;
>>   struct ip6_hdr *ip6;
>>   uint32_t flow;
>> @@ -933,10 +905,7 @@ gre_input6(struct mbuf **mp, int *offp,
>>  
>>   flow = bemtoh32(&ip6->ip6_flow);
>>  
>> - if (gre_input_key(mp, offp, type, af, flow >> 20, &key) == -1)
>> - return (rip6_input(mp, offp, type, af));
>> -
>> - return (IPPROTO_DONE);
>> + return (gre_if_input(m, hlen, flow >> 20, &key));
>>  }
>>  #endif /* INET6 */
>>  
>> @@ -996,12 +965,10 @@ gre_input_1(struct gre_tunnel *key, stru
>>   return (m);
>>  }
>>  
>> -static int
>> -gre_input_key(struct mbuf **mp, int *offp, int type, int af, uint8_t otos,
>> -    struct gre_tunnel *key)
>> +static struct mbuf *
>> +gre_if_input(struct mbuf *m, int iphlen, uint8_t otos, struct gre_tunnel *key)
>>  {
>> - struct mbuf *m = *mp;
>> - int iphlen = *offp, hlen, rxprio;
>> + int hlen, rxprio;
>>   struct ifnet *ifp;
>>   const struct gre_tunnel *tunnel;
>>   caddr_t buf;
>> @@ -1025,7 +992,7 @@ gre_input_key(struct mbuf **mp, int *off
>>  
>>   m = m_pullup(m, hlen);
>>   if (m == NULL)
>> - return (IPPROTO_DONE);
>> + return (NULL);
>>  
>>   buf = mtod(m, caddr_t);
>>   gh = (struct gre_header *)(buf + iphlen);
>> @@ -1038,7 +1005,7 @@ gre_input_key(struct mbuf **mp, int *off
>>   case htons(GRE_VERS_1):
>>   m = gre_input_1(key, m, gh, otos, iphlen);
>>   if (m == NULL)
>> - return (IPPROTO_DONE);
>> + return (NULL);
>>   /* FALLTHROUGH */
>>   default:
>>   goto decline;
>> @@ -1055,7 +1022,7 @@ gre_input_key(struct mbuf **mp, int *off
>>  
>>   m = m_pullup(m, hlen);
>>   if (m == NULL)
>> - return (IPPROTO_DONE);
>> + return (NULL);
>>  
>>   buf = mtod(m, caddr_t);
>>   gh = (struct gre_header *)(buf + iphlen);
>> @@ -1071,7 +1038,7 @@ gre_input_key(struct mbuf **mp, int *off
>>      nvgre_input(key, m, hlen, otos) == -1)
>>   goto decline;
>>  
>> - return (IPPROTO_DONE);
>> + return (NULL);
>>   }
>>  
>>   ifp = gre_find(key);
>> @@ -1148,7 +1115,7 @@ gre_input_key(struct mbuf **mp, int *off
>>  
>>   m_adj(m, hlen);
>>   gre_keepalive_recv(ifp, m);
>> - return (IPPROTO_DONE);
>> + return (NULL);
>>  
>>   default:
>>   goto decline;
>> @@ -1162,7 +1129,7 @@ gre_input_key(struct mbuf **mp, int *off
>>  
>>   m = (*patch)(tunnel, m, &itos, otos);
>>   if (m == NULL)
>> - return (IPPROTO_DONE);
>> + return (NULL);
>>  
>>   if (tunnel->t_key_mask == GRE_KEY_ENTROPY) {
>>   m->m_pkthdr.ph_flowid = M_FLOWID_VALID |
>> @@ -1203,10 +1170,9 @@ gre_input_key(struct mbuf **mp, int *off
>>  #endif
>>  
>>   (*input)(ifp, m);
>> - return (IPPROTO_DONE);
>> + return (NULL);
>>  decline:
>> - *mp = m;
>> - return (-1);
>> + return (m);
>>  }
>>  
>>  static struct mbuf *
>> Index: sys/netinet/gre_proto.h
>> ===================================================================
>> RCS file: sys/netinet/gre_proto.h
>> diff -N sys/netinet/gre_proto.h
>> --- /dev/null 1 Jan 1970 00:00:00 -0000
>> +++ sys/netinet/gre_proto.h 29 Oct 2019 07:57:58 -0000
>> @@ -0,0 +1,48 @@
>> +/* $OpenBSD$ */
>> +
>> +/*
>> + * Copyright (c) 2019 David Gwynne <[hidden email]>
>> + *
>> + * Permission to use, copy, modify, and distribute this software for any
>> + * purpose with or without fee is hereby granted, provided that the above
>> + * copyright notice and this permission notice appear in all copies.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
>> + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
>> + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
>> + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
>> + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
>> + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
>> + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
>> + */
>> +
>> +#ifndef _NETINET_GRE_H_
>> +#define _NETINET_GRE_H_
>> +
>> +struct gre_header {
>> + uint16_t gre_flags;
>> +#define GRE_CP 0x8000 /* Checksum Present */
>> +#define GRE_KP 0x2000 /* Key Present */
>> +#define GRE_SP 0x1000 /* Sequence Present */
>> +
>> +#define GRE_VERS_MASK 0x0007
>> +#define GRE_VERS_0 0x0000
>> +#define GRE_VERS_1 0x0001
>> +
>> + uint16_t gre_proto;
>> +};
>> +
>> +struct gre_h_cksum {
>> + uint16_t gre_cksum;
>> + uint16_t gre_reserved1;
>> +};
>> +
>> +struct gre_h_key {
>> + uint32_t gre_key;
>> +};
>> +
>> +struct gre_h_seq {
>> + uint32_t gre_seq;
>> +};
>> +
>> +#endif /* _NETINET_GRE_H_ */
>> Index: sys/netinet/gre_var.h
>> ===================================================================
>> RCS file: sys/netinet/gre_var.h
>> diff -N sys/netinet/gre_var.h
>> --- /dev/null 1 Jan 1970 00:00:00 -0000
>> +++ sys/netinet/gre_var.h 29 Oct 2019 07:57:58 -0000
>> @@ -0,0 +1,64 @@
>> +/* $OpenBSD$ */
>> +
>> +/*
>> + * Copyright (c) 2019 David Gwynne <[hidden email]>
>> + *
>> + * Permission to use, copy, modify, and distribute this software for any
>> + * purpose with or without fee is hereby granted, provided that the above
>> + * copyright notice and this permission notice appear in all copies.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
>> + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
>> + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
>> + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
>> + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
>> + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
>> + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
>> + */
>> +
>> +#ifndef _NETINET_GRE_VAR_H_
>> +#define _NETINET_GRE_VAR_H_
>> +
>> +/*
>> + * setsockopt(s, IPPROTO_GRE, ...
>> + */
>> +
>> +#define GRE_CKSUM 1 /* bool; enable GRE checksum headers */
>> +#define GRE_KEY 2 /* uint32_t; enable and set GRE key */
>> + /* NULL; disable GRE key header */
>> +#define GRE_SEQ 3 /* uint32_t; enable and set GRE seq */
>> + /* NULL; disable GRE seq header */
>> +
>> +#define GRE_RECVSEQ 4 /* bool; enable reception of seq numbers */
>> +#define GRE_SENDSEQ GRE_RECVSEQ
>> +
>> +#ifdef _KERNEL
>> +int gre_raw_usrreq(struct socket *, int, struct mbuf *, struct mbuf *,
>> +    struct mbuf *, struct proc *);
>> +int gre_sysctl(int *, u_int, void *, size_t *, void *, size_t);
>> +
>> +void gre_init(void);
>> +
>> +int gre_attach(struct socket *, int);
>> +int gre_detach(struct socket *);
>> +
>> +int gre_ip4_usrreq(struct socket *, int, struct mbuf *, struct mbuf *,
>> +    struct mbuf *, struct proc *);
>> +int gre_ip4_ctloutput(int, struct socket *, int, int, struct mbuf *);
>> +
>> +int gre_ip4_input(struct mbuf **, int *, int, int);
>> +
>> +struct mbuf *
>> + gre_if4_input(struct mbuf *, int); /* interface glue */
>> +
>> +#ifdef INET6
>> +int gre_ip6_usrreq(struct socket *, int, struct mbuf *, struct mbuf *,
>> +    struct mbuf *, struct proc *);
>> +int gre_ip6_ctloutput(int, struct socket *, int, int, struct mbuf *);
>> +int gre_ip6_input(struct mbuf **, int *, int, int);
>> +
>> +struct mbuf *
>> + gre_if6_input(struct mbuf *, int); /* interface glue */
>> +#endif /* INET6 */
>> +#endif /* _KERNEL */
>> +#endif /* _NETINET_GRE_VAR_H_ */
>> Index: sys/netinet/in_proto.c
>> ===================================================================
>> RCS file: /cvs/src/sys/netinet/in_proto.c,v
>> retrieving revision 1.93
>> diff -u -p -r1.93 in_proto.c
>> --- sys/netinet/in_proto.c 15 Jul 2019 12:40:42 -0000 1.93
>> +++ sys/netinet/in_proto.c 29 Oct 2019 07:57:58 -0000
>> @@ -147,8 +147,7 @@
>>  
>>  #include "gre.h"
>>  #if NGRE > 0
>> -#include <netinet/ip_gre.h>
>> -#include <net/if_gre.h>
>> +#include <netinet/gre_var.h>
>>  #endif
>>  
>>  #include "carp.h"
>> @@ -346,11 +345,24 @@ const struct protosw inetsw[] = {
>>    .pr_domain = &inetdomain,
>>    .pr_protocol = IPPROTO_GRE,
>>    .pr_flags = PR_ATOMIC|PR_ADDR,
>> -  .pr_input = gre_input,
>> +  .pr_input = gre_ip4_input,
>>    .pr_ctloutput = rip_ctloutput,
>> -  .pr_usrreq = gre_usrreq,
>> +  .pr_usrreq = gre_raw_usrreq,
>>    .pr_attach = rip_attach,
>>    .pr_detach = rip_detach,
>> +  .pr_sysctl = gre_sysctl
>> +},
>> +{
>> +  .pr_type = SOCK_DGRAM,
>> +  .pr_domain = &inetdomain,
>> +  .pr_protocol = IPPROTO_GRE,
>> +  .pr_flags = PR_ATOMIC|PR_ADDR,
>> +  .pr_input = gre_ip4_input,
>> +  .pr_ctloutput = gre_ip4_ctloutput,
>> +  .pr_usrreq = gre_ip4_usrreq,
>> +  .pr_attach = gre_attach,
>> +  .pr_detach = gre_detach,
>> +  .pr_init = gre_init,
>>    .pr_sysctl = gre_sysctl
>>  },
>>  #endif /* NGRE > 0 */
>> Index: sys/netinet/ip_gre.c
>> ===================================================================
>> RCS file: /cvs/src/sys/netinet/ip_gre.c,v
>> retrieving revision 1.71
>> diff -u -p -r1.71 ip_gre.c
>> --- sys/netinet/ip_gre.c 7 Feb 2018 22:30:59 -0000 1.71
>> +++ sys/netinet/ip_gre.c 29 Oct 2019 07:57:58 -0000
>> @@ -1,7 +1,23 @@
>> -/*      $OpenBSD: ip_gre.c,v 1.71 2018/02/07 22:30:59 dlg Exp $ */
>> +/* $OpenBSD: ip_gre.c,v 1.71 2018/02/07 22:30:59 dlg Exp $ */
>>  /* $NetBSD: ip_gre.c,v 1.9 1999/10/25 19:18:11 drochner Exp $ */
>>  
>>  /*
>> + * Copyright (c) 2019 David Gwynne <[hidden email]>
>> + *
>> + * Permission to use, copy, modify, and distribute this software for any
>> + * purpose with or without fee is hereby granted, provided that the above
>> + * copyright notice and this permission notice appear in all copies.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
>> + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
>> + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
>> + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
>> + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
>> + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
>> + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
>> + */
>> +
>> +/*
>>   * Copyright (c) 1998 The NetBSD Foundation, Inc.
>>   * All rights reserved.
>>   *
>> @@ -30,16 +46,6 @@
>>   * POSSIBILITY OF SUCH DAMAGE.
>>   */
>>  
>> -/*
>> - * decapsulate tunneled packets and send them on
>> - * output half is in net/if_gre.[ch]
>> - * This currently handles IPPROTO_GRE, IPPROTO_MOBILE
>> - */
>> -
>> -
>> -#include "gre.h"
>> -#if NGRE > 0
>> -
>>  #include <sys/param.h>
>>  #include <sys/systm.h>
>>  #include <sys/mbuf.h>
>> @@ -47,24 +53,40 @@
>>  #include <sys/socket.h>
>>  #include <sys/socketvar.h>
>>  #include <sys/sysctl.h>
>> +#include <sys/proc.h>
>> +#include <sys/atomic.h>
>> +#include <sys/pool.h>
>> +#include <sys/tree.h>
>> +
>> +#include <sys/domain.h>
>>  
>>  #include <net/if.h>
>> +#include <net/if_var.h>
>>  #include <net/route.h>
>>  
>>  #include <netinet/in.h>
>> +#include <netinet/in_var.h>
>>  #include <netinet/ip.h>
>>  #include <netinet/ip_var.h>
>>  #include <netinet/in_pcb.h>
>>  
>> +#include <netinet/gre_var.h>
>> +#include <netinet/gre_proto.h>
>> +
>>  #ifdef PIPEX
>>  #include <net/pipex.h>
>>  #endif
>>  
>> +
>> +/*
>> + * socket({AF_INET,AF_INET6}, SOCK_RAW, IPPROTO_GRE);
>> + */
>> +
>>  int
>> -gre_usrreq(struct socket *so, int req, struct mbuf *m, struct mbuf *nam,
>> +gre_raw_usrreq(struct socket *so, int req, struct mbuf *m, struct mbuf *nam,
>>      struct mbuf *control, struct proc *p)
>>  {
>> -#ifdef  PIPEX
>> +#ifdef  PIPEX
>>   struct inpcb *inp = sotoinpcb(so);
>>  
>>   if (inp != NULL && inp->inp_pipex && req == PRU_SEND) {
>> @@ -92,4 +114,1817 @@ gre_usrreq(struct socket *so, int req, s
>>  #endif
>>   return rip_usrreq(so, req, m, nam, control, p);
>>  }
>> -#endif /* if NGRE > 0 */
>> +
>> +/*
>> + * support socket({AF_INET,AF_INET6}, SOCK_DGRAM, IPPROTO_GRE);
>> + *
>> + * GRE datagram sockets provide support for GRE version 0 packets
>> + * in the kernel.
>> + */
>> +
>> +#define GRE_VALID_MASK (GRE_VERS_MASK | GRE_CP | GRE_KP | GRE_SP)
>> +
>> +struct gre_pcb_key;
>> +
>> +struct gre_pcb {
>> + struct inpcb gpcb_inpcb;
>> + unsigned int gpcb_pflags; /* pcb flags */
>> +#define GREPCB_RECVSEQ (1 << 0)
>> + uint16_t gpcb_flags; /* network byteorder */
>> + uint32_t gpcb_key; /* network byteorder */
>> + uint32_t gpcb_seq; /* host byteorder */
>> +
>> + struct gre_pcb_key *gpcb_pcb_key;
>> + int gpcb_reuse;
>> + uint8_t gpcb_ttl;
>> +};
>> +TAILQ_HEAD(gre_pcb_list, inpcb);
>> +
>> +static inline struct gre_pcb *
>> +inp_gpcb(struct inpcb *inp)
>> +{
>> + return ((struct gre_pcb *)inp);
>> +}
>> +
>> +#define gpcb_inp(_gpcb) (&(_gpcb)->gpcb_inpcb)
>> +
>> +static inline struct socket *
>> +gpcb_so(struct gre_pcb *gpcb)
>> +{
>> + return (gpcb_inp(gpcb)->inp_socket);
>> +}
>> +
>> +struct gre_pcb_key {
>> + union inpaddru gk_laddr;
>> +#define gk_laddr4 gk_laddr.iau_a4u.inaddr
>> +#define gk_laddr6 gk_laddr.iau_addr6
>> + union inpaddru gk_faddr;
>> +#define gk_faddr4 gk_faddr.iau_a4u.inaddr
>> +#define gk_faddr6 gk_faddr.iau_addr6
>> +
>> + uint16_t gk_flags; /* network byteorder */
>> + uint16_t gk_proto; /* network byteorder */
>> + uint32_t gk_key; /* network byteorder */
>> +
>> + unsigned int gk_rtableid;
>> + sa_family_t gk_family;
>> +
>> + RBT_ENTRY(gre_pcb_key) gk_entry;
>> + struct gre_pcb_list gk_pcbs;
>> + unsigned int gk_state;
>> +#define GRE_S_DISCONNECTED 0
>> +#define GRE_S_WILDCARD 1
>> +#define GRE_S_BOUND 2
>> +#define GRE_S_CONNECTED 3
>> +};
>> +
>> +RBT_HEAD(gre_tree_wildcards, gre_pcb_key);
>> +RBT_HEAD(gre_tree_bound, gre_pcb_key);
>> +RBT_HEAD(gre_tree_connected, gre_pcb_key);
>> +
>> +static int gre_pcb_key_cmp_wildcard(const struct gre_pcb_key *,
>> +    const struct gre_pcb_key *);
>> +static int gre_pcb_key_cmp_bound(const struct gre_pcb_key *,
>> +    const struct gre_pcb_key *);
>> +
>> +RBT_PROTOTYPE(gre_tree_wildcards, gre_pcb_key, gk_entry,
>> +    gre_pcb_key_cmp_wildcard);
>> +RBT_PROTOTYPE(gre_tree_bound, gre_pcb_key, gk_entry,
>> +    gre_pcb_key_cmp_bound);
>> +RBT_PROTOTYPE(gre_tree_connected, gre_pcb_key, gk_entry,
>> +    gre_pcb_key_cmp_connected);
>> +
>> +struct gre_ops {
>> + int (*op_nametosa)(struct inpcb *, struct mbuf *,
>> +    struct sockaddr **);
>> + int (*op_is_wildcard)(struct sockaddr *);
>> + int (*op_is_multicast)(struct sockaddr *);
>> + int (*op_is_broadcast)(unsigned int, struct sockaddr *);
>> + int (*op_is_local)(unsigned int, struct sockaddr *);
>> +
>> + uint16_t (*op_proto)(struct sockaddr *);
>> + void (*op_addr)(union inpaddru *, struct sockaddr *);
>> +
>> + int (*op_selsrc)(struct inpcb *, struct sockaddr *,
>> +    union inpaddru *, void *);
>> + int (*op_control)(struct socket *, u_long, caddr_t, struct ifnet *);
>> + void (*op_getsockname)(struct inpcb *, struct mbuf *);
>> + void (*op_getpeername)(struct inpcb *, struct mbuf *);
>> + int (*op_ctloutput)(int, struct socket *, int, int, struct mbuf *);
>> +
>> + void (*op_sbappend)(struct gre_pcb *, struct mbuf *, int, uint32_t);
>> +
>> + /* PRU_SEND is split into two parts, with gre_output in the middle: */
>> +
>> + /* 1: the "pre" phase, so ipv6 can do it's stupid pktops stuff */
>> + int (*op_send)(const struct gre_ops *, struct gre_pcb *,
>> +    struct mbuf *, struct mbuf *, struct mbuf *);
>> + /* 2: actually doing the ip encap and output */
>> + int (*op_output)(struct gre_pcb *, const struct gre_pcb_key *,
>> +    struct mbuf *, void *);
>> +
>> + int * const defttl;
>> +};
>> +struct gre_tree_wildcards gre_wildcards = RBT_INITIALIZER();
>> +struct gre_tree_bound gre_bound = RBT_INITIALIZER();
>> +struct gre_tree_connected gre_connected = RBT_INITIALIZER();
>> +
>> +struct pool gre_pcb_key_pool;
>> +struct pool gre_pcb_pool;
>> +
>> +static struct mbuf *
>> + gre_ip_input(const struct gre_ops *, struct mbuf *, int,
>> +    uint8_t, struct gre_pcb_key *);
>> +
>> +static int gre_usrreq(const struct gre_ops *, struct socket *, int,
>> +    struct mbuf *, struct mbuf *, struct mbuf *,
>> +    struct proc *);
>> +static int gre_ctloutput(const struct gre_ops *, int, struct socket *,
>> +    int, int, struct mbuf *);
>> +
>> +static int gre_send(const struct gre_ops *, struct gre_pcb *,
>> +    struct mbuf *, struct mbuf *, struct mbuf *);
>> +static int gre_output(const struct gre_ops *, struct gre_pcb *,
>> +    struct mbuf *, struct mbuf *, struct mbuf *, void *);
>> +static int gre_disconnect(struct gre_pcb *);
>> +
>> +#define GRE_OPT_EINVAL ((void *)-1)
>> +static void *gre_opt(struct mbuf *, int, int, socklen_t);
>> +
>> +unsigned int gre_sendspace = 9216; /* XXX sysctl? */
>> +unsigned int gre_recvspace = (40 * 1024);
>> +
>> +static struct gre_pcb_key *
>> +gre_pcb_key_get(const struct gre_pcb *gpcb)
>> +{
>> + struct gre_pcb_key *gk;
>> +
>> + gk = pool_get(&gre_pcb_key_pool, PR_NOWAIT|PR_ZERO);
>> + if (gk == NULL)
>> + return (NULL);
>> +
>> + gk->gk_flags = gpcb->gpcb_flags;
>> + gk->gk_key = gpcb->gpcb_key;
>> +
>> + TAILQ_INIT(&gk->gk_pcbs);
>> +
>> + return (gk);
>> +}
>> +
>> +static void
>> +gre_pcb_key_put(struct gre_pcb_key *gk)
>> +{
>> + pool_put(&gre_pcb_key_pool, gk);
>> +}
>> +
>> +static struct gre_pcb_key *
>> +gre_pcb_key_insert(unsigned int state, struct gre_pcb_key *gk)
>> +{
>> + struct gre_pcb_key *ogk;
>> +
>> + gk->gk_state = state;
>> +
>> + switch (state) {
>> + case GRE_S_WILDCARD:
>> + ogk = RBT_INSERT(gre_tree_wildcards, &gre_wildcards, gk);
>> + break;
>> + case GRE_S_BOUND:
>> + ogk = RBT_INSERT(gre_tree_bound, &gre_bound, gk);
>> + break;
>> + case GRE_S_CONNECTED:
>> + ogk = RBT_INSERT(gre_tree_connected, &gre_connected, gk);
>> + break;
>> + default:
>> + panic("%s unexpected state %u", __func__, state);
>> + }
>> +
>> + return (ogk);
>> +}
>> +
>> +static inline int
>> +gre_pcb_empty(struct gre_pcb_list *l)
>> +{
>> + return (TAILQ_EMPTY(l));
>> +}
>> +
>> +static inline void
>> +gre_pcb_insert(struct gre_pcb_list *l, struct gre_pcb *gpcb)
>> +{
>> + TAILQ_INSERT_TAIL(l, gpcb_inp(gpcb), inp_queue);
>> +}
>> +
>> +static inline void
>> +gre_pcb_remove(struct gre_pcb_list *l, struct gre_pcb *gpcb)
>> +{
>> + TAILQ_REMOVE(l, gpcb_inp(gpcb), inp_queue);
>> +}
>> +
>> +static inline struct gre_pcb *
>> +gre_pcb_first(struct gre_pcb_list *l)
>> +{
>> + return (inp_gpcb(TAILQ_FIRST(l)));
>> +}
>> +
>> +static inline struct gre_pcb *
>> +gre_pcb_next(struct gre_pcb *gpcb)
>> +{
>> + return (inp_gpcb(TAILQ_NEXT(gpcb_inp(gpcb), inp_queue)));
>> +}
>> +
>> +/*
>> + * INET4
>> + */
>> +
>> +static int gre_ip4_nametosa(struct inpcb *, struct mbuf *,
>> +    struct sockaddr **);
>> +static int gre_ip4_is_wildcard(struct sockaddr *);
>> +static int gre_ip4_is_multicast(struct sockaddr *);
>> +static int gre_ip4_is_broadcast(unsigned int, struct sockaddr *);
>> +static int gre_ip4_is_local(unsigned int, struct sockaddr *);
>> +static uint16_t gre_ip4_proto(struct sockaddr *);
>> +static void gre_ip4_addr(union inpaddru *, struct sockaddr *);
>> +static int gre_ip4_selsrc(struct inpcb *, struct sockaddr *,
>> +    union inpaddru *, void *);
>> +static void gre_ip4_sbappend(struct gre_pcb *, struct mbuf *, int,
>> +    uint32_t);
>> +static int gre_ip4_send(const struct gre_ops *, struct gre_pcb *,
>> +    struct mbuf *, struct mbuf *, struct mbuf *);
>> +static int gre_ip4_output(struct gre_pcb *, const struct gre_pcb_key *,
>> +    struct mbuf *, void *);
>> +
>> +static const struct gre_ops gre_ip4_ops = {
>> + .op_nametosa = gre_ip4_nametosa,
>> + .op_is_wildcard = gre_ip4_is_wildcard,
>> + .op_is_multicast = gre_ip4_is_multicast,
>> + .op_is_broadcast = gre_ip4_is_broadcast,
>> + .op_is_local = gre_ip4_is_local,
>> +
>> + .op_proto = gre_ip4_proto,
>> + .op_addr = gre_ip4_addr,
>> +
>> + .op_selsrc = gre_ip4_selsrc,
>> + .op_control = in_control,
>> + .op_getsockname = in_setsockaddr,
>> + .op_getpeername = in_setpeeraddr,
>> + .op_ctloutput = ip_ctloutput,
>> +
>> + .op_sbappend = gre_ip4_sbappend,
>> + .op_send = gre_ip4_send,
>> + .op_output = gre_ip4_output,
>> +
>> + .defttl = &ip_defttl,
>> +};
>> +
>> +
>> +static int
>> +gre_ip4_nametosa(struct inpcb *inp, struct mbuf *addr, struct sockaddr **sa)
>> +{
>> + return (in_nam2sin(addr, (struct sockaddr_in **)sa));
>> +}
>> +
>> +static int
>> +gre_ip4_is_wildcard(struct sockaddr *sa)
>> +{
>> + struct sockaddr_in *sin = satosin(sa);
>> + return (sin->sin_addr.s_addr == INADDR_ANY);
>> +}
>> +
>> +static int
>> +gre_ip4_is_multicast(struct sockaddr *sa)
>> +{
>> + struct sockaddr_in *sin = satosin(sa);
>> + return (IN_MULTICAST(sin->sin_addr.s_addr));
>> +}
>> +
>> +static int
>> +gre_ip4_is_broadcast(unsigned int rtableid, struct sockaddr *sa)
>> +{
>> + struct sockaddr_in *sin = satosin(sa);
>> +
>> + return ((sin->sin_addr.s_addr == INADDR_BROADCAST) ||
>> +    in_broadcast(sin->sin_addr, rtableid));
>> +}
>> +
>> +static int
>> +gre_ip4_is_local(unsigned int rtableid, struct sockaddr *sa)
>> +{
>> + struct sockaddr_in *sin = satosin(sa);
>> + /* cope with ifa_ifwithaddr using memcmp */
>> + struct sockaddr_in needle = {
>> + .sin_len = sin->sin_len,
>> + .sin_family = sin->sin_family,
>> + .sin_addr = sin->sin_addr,
>> + };
>> + struct ifaddr *ifa;
>> +
>> + ifa = ifa_ifwithaddr(sintosa(&needle), rtableid);
>> + if (ifa == NULL)
>> + return (EADDRNOTAVAIL);
>> +
>> + return (0);
>> +}
>> +
>> +static uint16_t
>> +gre_ip4_proto(struct sockaddr *sa)
>> +{
>> + struct sockaddr_in *sin = satosin(sa);
>> + return (sin->sin_port);
>> +}
>> +
>> +static void
>> +gre_ip4_addr(union inpaddru *addr, struct sockaddr *sa)
>> +{
>> + struct sockaddr_in *sin = satosin(sa);
>> + addr->iau_a4u.inaddr = sin->sin_addr;
>> +}
>> +
>> +static int
>> +gre_ip4_selsrc(struct inpcb *inp, struct sockaddr *sa, union inpaddru *laddr,
>> +    void *opts)
>> +{
>> + struct in_addr *insrc;
>> + int error;
>> +
>> + insrc = gre_opt(opts, IPPROTO_IP, IP_SENDSRCADDR, sizeof(*insrc));
>> + if (insrc == GRE_OPT_EINVAL)
>> + return (EINVAL);
>> +
>> + if (insrc == NULL) {
>> + error = in_pcbselsrc(&insrc, satosin(sa), inp);
>> + if (error != 0)
>> + return (error);
>> + } else {
>> + struct sockaddr_in sin = {
>> + .sin_len = sizeof(sin),
>> + .sin_family = AF_INET,
>> + .sin_addr = *insrc,
>> + };
>> +
>> + /* XXX sigh. */
>> + error = in_pcbaddrisavail(inp, &sin, 0, NULL);
>> + if (error != 0)
>> + return (error);
>> + }
>> +
>> + laddr->iau_a4u.inaddr = *insrc;
>> +
>> + return (0);
>> +}
>> +
>> +static int
>> +gre_ip4_send(const struct gre_ops *ops, struct gre_pcb *gpcb,
>> +    struct mbuf *m, struct mbuf *addr, struct mbuf *control)
>> +{
>> + return (gre_output(ops, gpcb, m, addr, control, control));
>> +}
>> +
>> +static int
>> +gre_ip4_output(struct gre_pcb *gpcb, const struct gre_pcb_key *gk,
>> +    struct mbuf *m, void *opts)
>> +{
>> + struct inpcb *inp = gpcb_inp(gpcb);
>> + struct ip *ip;
>> + int error;
>> + uint32_t ipsecflowinfo = 0;
>> +
>> + m = m_prepend(m, sizeof(*ip), M_DONTWAIT);
>> + if (m == NULL) {
>> + error = ENOBUFS;
>> + goto dropped;
>> + }
>> + if (m->m_pkthdr.len > IP_MAXPACKET) {
>> + error = EMSGSIZE;
>> + goto drop;
>> + }
>> +
>> +#ifdef IPSEC
>> + if (ISSET(inp->inp_flags, INP_IPSECFLOWINFO)) {
>> + uint32_t *p = gre_opt(opts, IPPROTO_IP, IP_IPSECFLOWINFO,
>> +    sizeof(*p));
>> + if (p == GRE_OPT_EINVAL) {
>> + error = EINVAL;
>> + goto drop;
>> + }
>> +
>> + if (p != NULL)
>> + ipsecflowinfo = *p;
>> + }
>> +#endif /* IPSEC */
>> +
>> + ip = mtod(m, struct ip *);
>> + ip->ip_v = IPVERSION;
>> + ip->ip_hl = sizeof(*ip) >> 2;
>> + ip->ip_off = 0; /* XXX nodf? */
>> + ip->ip_tos = 0; /* XXX */;
>> + ip->ip_len = htons(m->m_pkthdr.len);
>> + ip->ip_ttl = gpcb->gpcb_ttl;
>> + ip->ip_p = IPPROTO_GRE;
>> + ip->ip_src = gk->gk_laddr4;
>> + ip->ip_dst = gk->gk_faddr4;
>> +
>> + error = ip_output(m, inp->inp_options, &inp->inp_route,
>> +    gpcb_so(gpcb)->so_options & SO_BROADCAST, inp->inp_moptions, inp,
>> +    ipsecflowinfo);
>> +
>> + return (error);
>> +
>> +drop:
>> + m_freem(m);
>> +dropped:
>> + return (error);
>> +}
>> +
>> +static void *
>> +gre_opt(struct mbuf *control, int level, int type, socklen_t len)
>> +{
>> + u_int clen;
>> + struct cmsghdr *cm;
>> + caddr_t cmsgs;
>> + size_t mlen;
>> +
>> + if (control == NULL)
>> + return (NULL);
>> +
>> + if (control->m_next != NULL)
>> + return (GRE_OPT_EINVAL);
>> +
>> + mlen = CMSG_LEN(len);
>> +
>> + clen = control->m_len;
>> + cmsgs = mtod(control, caddr_t);
>> + do {
>> + if (clen < CMSG_LEN(0))
>> + return (GRE_OPT_EINVAL);
>> +
>> + cm = (struct cmsghdr *)cmsgs;
>> + if (cm->cmsg_len < CMSG_LEN(0) ||
>> +    CMSG_ALIGN(cm->cmsg_len) > clen)
>> + return (GRE_OPT_EINVAL);
>> +
>> + if (cm->cmsg_level == level &&
>> +    cm->cmsg_type == type &&
>> +    cm->cmsg_len == mlen)
>> + return (CMSG_DATA(cm));
>> +
>> + clen -= CMSG_ALIGN(cm->cmsg_len);
>> + cmsgs += CMSG_ALIGN(cm->cmsg_len);
>> + } while (clen);
>> +
>> + return (NULL);
>> +}
>> +
>> +static int
>> +gre_send(const struct gre_ops *ops, struct gre_pcb *gpcb,
>> +    struct mbuf *m, struct mbuf *addr, struct mbuf *control)
>> +{
>> + int error;
>> +
>> + error = (*ops->op_send)(ops, gpcb, m, addr, control);
>> +
>> + m_freem(control);
>> +
>> + return (error);
>> +}
>> +
>> +static int
>> +gre_output(const struct gre_ops *ops, struct gre_pcb *gpcb,
>> +    struct mbuf *m, struct mbuf *addr, struct mbuf *control, void *opts)
>> +{
>> + const struct gre_pcb_key *gk = gpcb->gpcb_pcb_key;
>> + struct gre_pcb_key key;
>> + struct gre_header *gh;
>> + int error;
>> +
>> + if (addr != NULL) {
>> + struct gre_pcb_key *ogk;
>> + struct inpcb *inp = gpcb_inp(gpcb);
>> + struct sockaddr *sa;
>> + int state = GRE_S_DISCONNECTED;
>> +
>> + if (gk != NULL)
>> + state = gk->gk_state;
>> +
>> + if (state == GRE_S_CONNECTED) {
>> + error = EISCONN;
>> + goto drop;
>> + }
>> +
>> + error = (*ops->op_nametosa)(inp, addr, &sa);
>> + if (error != 0)
>> + goto drop;
>> +
>> + key.gk_family = sa->sa_family;
>> + key.gk_proto = (*ops->op_proto)(sa);
>> +
>> + if (state != GRE_S_DISCONNECTED) {
>> + KASSERT(key.gk_family == gk->gk_family);
>> + if (key.gk_proto != gk->gk_proto) {
>> + error = EADDRNOTAVAIL;
>> + goto drop;
>> + }
>> + }
>> + if (state == GRE_S_BOUND) {
>> + key.gk_laddr = gk->gk_laddr;
>> + } else {
>> + error = (*ops->op_selsrc)(inp, sa, &key.gk_laddr,
>> +    opts);
>> + if (error != 0)
>> + goto drop;
>> + }
>> +
>> + (*ops->op_addr)(&key.gk_faddr, sa);
>> +
>> + key.gk_rtableid = inp->inp_rtableid;
>> + key.gk_flags = gpcb->gpcb_flags;
>> + key.gk_key = gpcb->gpcb_key;
>> +
>> + ogk = RBT_FIND(gre_tree_connected, &gre_connected, &key);
>> + if (ogk != NULL) {
>> + struct gre_pcb *ogpcb;
>> + int reuse = gpcb->gpcb_reuse;
>> +
>> + ogpcb = gre_pcb_first(&ogk->gk_pcbs);
>> + if (ogpcb != NULL) {
>> + struct socket *oso = gpcb_so(ogpcb);
>> + if (!ISSET(reuse, oso->so_options)) {
>> + error = EADDRINUSE;
>> + goto drop;
>> + }
>> + }
>> + }
>> +
>> + gk = &key;
>> + } else {
>> + if (gk == NULL || gk->gk_state != GRE_S_CONNECTED) {
>> + error = ENOTCONN;
>> + goto drop;
>> + }
>> + }
>> +
>> + if (ISSET(gk->gk_flags, htons(GRE_SP))) {
>> + struct gre_h_seq *gsh;
>> + uint32_t *seqno;
>> +
>> + seqno = gre_opt(control, IPPROTO_GRE, GRE_SENDSEQ,
>> +    sizeof(*seqno));
>> + if (seqno == GRE_OPT_EINVAL) {
>> + error = EINVAL;
>> + goto drop;
>> + }
>> +
>> + m = m_prepend(m, sizeof(*gsh), M_DONTWAIT);
>> + if (m == NULL)
>> + return (ENOBUFS);
>> +
>> + gsh = mtod(m, struct gre_h_seq *);
>> + htobem32(&gsh->gre_seq, seqno != NULL ?
>> +    (gpcb->gpcb_seq = *seqno) : /* keep track of new start */
>> +    atomic_inc_int_nv(&gpcb->gpcb_seq));
>> + }
>> +
>> + if (ISSET(gk->gk_flags, htons(GRE_KP))) {
>> + struct gre_h_key *gkh;
>> +
>> + m = m_prepend(m, sizeof(*gkh), M_DONTWAIT);
>> + if (m == NULL)
>> + return (ENOBUFS);
>> +
>> + gkh = mtod(m, struct gre_h_key *);
>> + gkh->gre_key = gk->gk_key;
>> + }
>> +
>> + if (ISSET(gk->gk_flags, htons(GRE_CP))) {
>> + struct gre_h_cksum *gch;
>> +
>> + m = m_prepend(m, sizeof(*gch), M_DONTWAIT);
>> + if (m == NULL)
>> + return (ENOBUFS);
>> +
>> + gch = mtod(m, struct gre_h_cksum *);
>> + gch->gre_cksum = 0; /* XXX need to checksum */
>> + gch->gre_reserved1 = 0;
>> + }
>> +
>> + m = m_prepend(m, sizeof(*gh), M_DONTWAIT);
>> + if (m == NULL)
>> + return (ENOBUFS);
>> +
>> + gh = mtod(m, struct gre_header *);
>> + gh->gre_flags = gk->gk_flags;
>> + gh->gre_proto = gk->gk_proto;
>> +
>> + KASSERT(ISSET(m->m_flags, M_PKTHDR));
>> +
>> + m->m_pkthdr.ph_rtableid = gpcb_inp(gpcb)->inp_rtableid;
>> +
>> + return ((*ops->op_output)(gpcb, gk, m, opts));
>> +
>> +drop:
>> + m_freem(m);
>> + return (error);
>> +}
>> +
>> +static void
>> +gre_sbappend(struct gre_pcb *gpcb, struct sockaddr *sa, struct mbuf *m,
>> +    struct mbuf *opts, int hlen, uint32_t seqno)
>> +{
>> + struct socket *so = gpcb_so(gpcb);
>> +
>> + if (ISSET(gpcb->gpcb_pflags, GREPCB_RECVSEQ)) {
>> + struct mbuf *opt = sbcreatecontrol(&seqno, sizeof(seqno),
>> +    GRE_RECVSEQ, IPPROTO_GRE);
>> + if (opt != NULL) {
>> + opt->m_next = opts;
>> + opts = opt;
>> + }
>> + }
>> +
>> + m_adj(m, hlen);
>> + if (sbappendaddr(so, &so->so_rcv, sa, m, opts) == 0) {
>> + m_freem(m);
>> + m_freem(opts);
>> + return;
>> + }
>> +
>> + sorwakeup(so);
>> +}
>> +
>> +int
>> +gre_ip4_usrreq(struct socket *so, int req, struct mbuf *m, struct mbuf *addr,
>> +    struct mbuf *control, struct proc *p)
>> +{
>> + return (gre_usrreq(&gre_ip4_ops, so, req, m, addr, control, p));
>> +}
>> +
>> +int
>> +gre_ip4_ctloutput(int op, struct socket *so, int level, int optname,
>> +    struct mbuf *m)
>> +{
>> + return (gre_ctloutput(&gre_ip4_ops, op, so, level, optname, m));
>> +}
>> +
>> +static void
>> +gre_ip4_sbappend(struct gre_pcb *gpcb, struct mbuf *m, int hlen, uint32_t seqno)
>> +{
>> + struct inpcb *inp = gpcb_inp(gpcb);
>> + struct socket *so = gpcb_so(gpcb);
>> + struct mbuf *opts = NULL;
>> + struct sockaddr_in sin = {
>> + .sin_len = sizeof(sin),
>> + .sin_family = AF_INET,
>> + .sin_port = inp->inp_lport,
>> + .sin_addr = mtod(m, struct ip *)->ip_src,
>> + };
>> +
>> + if (ISSET(inp->inp_flags, INP_CONTROLOPTS) ||
>> +    ISSET(so->so_options, SO_TIMESTAMP))
>> + ip_savecontrol(inp, &opts, mtod(m, struct ip *), m);
>> +
>> + if (ISSET(inp->inp_flags, INP_RECVDSTPORT)) {
>> + struct mbuf *opt;
>> +
>> + opt = sbcreatecontrol(&inp->inp_lport, sizeof(inp->inp_lport),
>> +    IP_RECVDSTPORT, IPPROTO_IP);
>> + if (opt != NULL) {
>> + opt->m_next = opts;
>> + opts = opt;
>> + }
>> + }
>> +
>> + gre_sbappend(gpcb, sintosa(&sin), m, opts, hlen, seqno);
>> +}
>> +
>> +int
>> +gre_ip4_input(struct mbuf **mp, int *offp, int proto, int af)
>> +{
>> + struct mbuf *m = *mp;
>> + struct gre_pcb_key gk;
>> + struct ip *ip;
>> +
>> + m = gre_if4_input(m, *offp);
>> + if (m == NULL)
>> + return (IPPROTO_DONE);
>> +
>> + ip = mtod(m, struct ip *);
>> +
>> + gk.gk_family = AF_INET;
>> + gk.gk_laddr4 = ip->ip_dst;
>> + gk.gk_faddr4 = ip->ip_src;
>> +
>> + m = gre_ip_input(&gre_ip4_ops, m, *offp, ip->ip_ttl, &gk);
>> + if (m == NULL)
>> + return (IPPROTO_DONE);
>> +
>> + *mp = m;
>> + return (rip_input(mp, offp, proto, af));
>> +}
>> +
>> +#if INET6
>> +#include <netinet6/ip6_var.h>
>> +#include <netinet6/in6_var.h>
>> +
>> +static int gre_ip6_nametosa(struct inpcb *, struct mbuf *,
>> +    struct sockaddr **);
>> +static int gre_ip6_is_wildcard(struct sockaddr *);
>> +static int gre_ip6_is_multicast(struct sockaddr *);
>> +static int gre_ip6_is_broadcast(unsigned int, struct sockaddr *);
>> +static int gre_ip6_is_local(unsigned int, struct sockaddr *);
>> +static uint16_t gre_ip6_proto(struct sockaddr *);
>> +static void gre_ip6_addr(union inpaddru *, struct sockaddr *);
>> +static int gre_ip6_selsrc(struct inpcb *, struct sockaddr *,
>> +    union inpaddru *, void *);
>> +static void gre_ip6_sbappend(struct gre_pcb *, struct mbuf *, int,
>> +    uint32_t);
>> +static int gre_ip6_send(const struct gre_ops *, struct gre_pcb *,
>> +    struct mbuf *, struct mbuf *, struct mbuf *);
>> +static int gre_ip6_output(struct gre_pcb *, const struct gre_pcb_key *,
>> +    struct mbuf *, void *);
>> +
>> +static const struct gre_ops gre_ip6_ops = {
>> + .op_nametosa = gre_ip6_nametosa,
>> + .op_is_wildcard = gre_ip6_is_wildcard,
>> + .op_is_multicast = gre_ip6_is_multicast,
>> + .op_is_broadcast = gre_ip6_is_broadcast,
>> + .op_is_local = gre_ip6_is_local,
>> +
>> + .op_proto = gre_ip6_proto,
>> + .op_addr = gre_ip6_addr,
>> +
>> + .op_selsrc = gre_ip6_selsrc,
>> + .op_control = in6_control,
>> + .op_getsockname = in6_setsockaddr,
>> + .op_getpeername = in6_setpeeraddr,
>> + .op_ctloutput = ip6_ctloutput,
>> + .op_sbappend = gre_ip6_sbappend,
>> + .op_send = gre_ip6_send,
>> + .op_output = gre_ip6_output,
>> +
>> + .defttl = &ip6_defhlim,
>> +};
>> +
>> +static int
>> +gre_ip6_nametosa(struct inpcb *inp, struct mbuf *addr, struct sockaddr **sa)
>> +{
>> + struct sockaddr_in6 *sin6;
>> + int error;
>> +
>> + error = in6_nam2sin6(addr, &sin6);
>> + if (error != 0)
>> + return (error);
>> +
>> + /* reject IPv4 mapped addresses, we have no support for them */
>> + if (IN6_IS_ADDR_V4MAPPED(&sin6->sin6_addr))
>> + return (EADDRNOTAVAIL);
>> +
>> + if (in6_embedscope(&sin6->sin6_addr, sin6, inp) != 0)
>> + return (EINVAL);
>> +
>> + *sa = sin6tosa(sin6);
>> + return (0);
>> +}
>> +
>> +static int
>> +gre_ip6_is_wildcard(struct sockaddr *sa)
>> +{
>> + struct sockaddr_in6 *sin6 = satosin6(sa);
>> + return (IN6_IS_ADDR_UNSPECIFIED(&sin6->sin6_addr));
>> +}
>> +
>> +static int
>> +gre_ip6_is_multicast(struct sockaddr *sa)
>> +{
>> + struct sockaddr_in6 *sin6 = satosin6(sa);
>> + return (IN6_IS_ADDR_MULTICAST(&sin6->sin6_addr));
>> +}
>> +
>> +static int
>> +gre_ip6_is_broadcast(unsigned int rtableid, struct sockaddr *sa)
>> +{
>> + return (0);
>> +}
>> +
>> +static int
>> +gre_ip6_is_local(unsigned int rtableid, struct sockaddr *sa)
>> +{
>> + struct sockaddr_in6 *sin6 = satosin6(sa);
>> + struct sockaddr_in6 needle = {
>> + .sin6_len = sin6->sin6_len,
>> + .sin6_family = sin6->sin6_family,
>> + .sin6_addr = sin6->sin6_addr,
>> + };
>> + struct ifaddr *ifa;
>> +
>> + ifa = ifa_ifwithaddr(sin6tosa(&needle), rtableid);
>> + if (ifa == NULL)
>> + return (EADDRNOTAVAIL);
>> +
>> + /*
>> + * bind to an anycast address might accidentally
>> + * cause sending a packet with an anycast source
>> + * address, so we forbid it.
>> + *
>> + * We should allow to bind to a deprecated address,
>> + * since the application dare to use it.
>> + * But, can we assume that they are careful enough
>> + * to check if the address is deprecated or not?
>> + * Maybe, as a safeguard, we should have a setsockopt
>> + * flag to control the bind(2) behavior against
>> + * deprecated addresses (default: forbid bind(2)).
>> + */
>> + if (ISSET(ifatoia6(ifa)->ia6_flags, IN6_IFF_ANYCAST|
>> +    IN6_IFF_TENTATIVE|IN6_IFF_DUPLICATED|IN6_IFF_DETACHED))
>> + return (EADDRNOTAVAIL);
>> +
>> + return (0);
>> +}
>> +
>> +static uint16_t
>> +gre_ip6_proto(struct sockaddr *sa)
>> +{
>> + struct sockaddr_in6 *sin6 = satosin6(sa);
>> + return (sin6->sin6_port);
>> +}
>> +
>> +static void
>> +gre_ip6_addr(union inpaddru *addr, struct sockaddr *sa)
>> +{
>> + struct sockaddr_in6 *sin6 = satosin6(sa);
>> + addr->iau_addr6 = sin6->sin6_addr;
>> +}
>> +
>> +static int
>> +gre_ip6_selsrc(struct inpcb *inp, struct sockaddr *sa, union inpaddru *laddr,
>> +    void *opts)
>> +{
>> + struct in6_addr *in6src;
>> + int error;
>> +
>> + if (opts == NULL)
>> + opts = inp->inp_outputopts6;
>> +
>> + error = in6_pcbselsrc(&in6src, satosin6(sa), inp, opts);
>> + if (error != 0)
>> + return (error);
>> +
>> + laddr->iau_addr6 = *in6src;
>> +
>> + return (0);
>> +}
>> +
>> +int
>> +gre_ip6_usrreq(struct socket *so, int req, struct mbuf *m, struct mbuf *addr,
>> +    struct mbuf *control, struct proc *p)
>> +{
>> + return (gre_usrreq(&gre_ip6_ops, so, req, m, addr, control, p));
>> +}
>> +
>> +int
>> +gre_ip6_ctloutput(int op, struct socket *so, int level, int optname,
>> +    struct mbuf *m)
>> +{
>> + return (gre_ctloutput(&gre_ip6_ops, op, so, level, optname, m));
>> +}
>> +
>> +static void
>> +gre_ip6_sbappend(struct gre_pcb *gpcb, struct mbuf *m, int hlen, uint32_t seqno)
>> +{
>> + struct inpcb *inp = gpcb_inp(gpcb);
>> + struct socket *so = gpcb_so(gpcb);
>> + struct mbuf *opts = NULL;
>> + struct sockaddr_in6 sin6 = {
>> + .sin6_len = sizeof(sin6),
>> + .sin6_family = AF_INET6,
>> + .sin6_port = inp->inp_lport,
>> + };
>> +
>> + in6_recoverscope(&sin6, &mtod(m, struct ip6_hdr *)->ip6_src);
>> +
>> + if (ISSET(inp->inp_flags, IN6P_CONTROLOPTS) ||
>> +    ISSET(so->so_options, SO_TIMESTAMP))
>> + ip6_savecontrol(inp, m, &opts);
>> +
>> + if (ISSET(inp->inp_flags, IN6P_RECVDSTPORT)) {
>> + struct mbuf *opt;
>> +
>> + opt = sbcreatecontrol(&inp->inp_fport, sizeof(inp->inp_fport),
>> +    IPV6_RECVDSTPORT, IPPROTO_IPV6);
>> + if (opt != NULL) {
>> + opt->m_next = opts;
>> + opts = opt;
>> + }
>> + }
>> +
>> + gre_sbappend(gpcb, sin6tosa(&sin6), m, opts, hlen, seqno);
>> +}
>> +
>> +static int
>> +gre_ip6_send(const struct gre_ops *ops, struct gre_pcb *gpcb, struct mbuf *m,
>> +    struct mbuf *addr, struct mbuf *control)
>> +{
>> + struct inpcb *inp = gpcb_inp(gpcb);
>> + struct ip6_pktopts *opts = inp->inp_outputopts6;
>> + struct ip6_pktopts opt;
>> + int error;
>> +
>> + if (control != NULL) {
>> + error = ip6_setpktopts(control, &opt, opts, /* priv */ 1,
>> +    IPPROTO_GRE);
>> + if (error != 0) {
>> + m_freem(m);
>> + return (error);
>> + }
>> + opts = &opt;
>> + }
>> +
>> + error = gre_output(ops, gpcb, m, addr, control, opts);
>> + if (control != NULL)
>> + ip6_clearpktopts(&opt, -1);
>> + return (error);
>> +}
>> +
>> +static int
>> +gre_ip6_output(struct gre_pcb *gpcb, const struct gre_pcb_key *gk,
>> +    struct mbuf *m, void *opts)
>> +{
>> + struct inpcb *inp = gpcb_inp(gpcb);
>> + uint16_t len = m->m_pkthdr.len;
>> + struct ip6_hdr *ip6;
>> + int flags = 0;
>> + int error;
>> +
>> + if (len > IP_MAXPACKET) {
>> + error = EMSGSIZE;
>> + goto drop;
>> + }
>> +
>> + m = m_prepend(m, sizeof(*ip6), M_DONTWAIT);
>> + if (m == NULL) {
>> + error = ENOBUFS;
>> + goto dropped;
>> + }
>> +
>> + ip6 = mtod(m, struct ip6_hdr *);
>> + ip6->ip6_vfc = IPV6_VERSION;
>> + ip6->ip6_plen = htons(len);
>> + ip6->ip6_nxt = IPPROTO_GRE;
>> + ip6->ip6_hlim = gpcb->gpcb_ttl;
>> + ip6->ip6_src = gk->gk_laddr6;
>> + ip6->ip6_dst = gk->gk_faddr6;
>> +
>> + if (ISSET(inp->inp_flags, IN6P_MINMTU)) /* wtf */
>> + flags |= IPV6_MINMTU;
>> +
>> + error = ip6_output(m, opts, &inp->inp_route6,
>> +    flags, inp->inp_moptions6, inp);
>> +
>> + return (error);
>> +
>> +drop:
>> + m_freem(m);
>> +dropped:
>> + return (error);
>> +}
>> +
>> +int
>> +gre_ip6_input(struct mbuf **mp, int *offp, int proto, int af)
>> +{
>> + struct mbuf *m = *mp;
>> + struct gre_pcb_key gk;
>> + struct ip6_hdr *ip6;
>> +
>> + m = gre_if6_input(m, *offp);
>> + if (m == NULL)
>> + return (IPPROTO_DONE);
>> +
>> + ip6 = mtod(m, struct ip6_hdr *);
>> +
>> + gk.gk_family = AF_INET6;
>> + gk.gk_laddr6 = ip6->ip6_dst;
>> + gk.gk_faddr6 = ip6->ip6_src;
>> +
>> + m = gre_ip_input(&gre_ip6_ops, m, *offp, ip6->ip6_hlim, &gk);
>> + if (m == NULL)
>> + return (IPPROTO_DONE);
>> +
>> + *mp = m;
>> + return (rip6_input(mp, offp, proto, af));
>> +}
>> +#endif /* INET6 */
>> +
>> +/*
>> + * generic GRE protocol handling
>> + */
>> +
>> +void
>> +gre_init(void)
>> +{
>> + pool_init(&gre_pcb_key_pool, sizeof(struct gre_pcb_key), 0,
>> +    IPL_NONE, PR_WAITOK, "grekey", NULL);
>> + pool_init(&gre_pcb_pool, sizeof(struct gre_pcb), 0,
>> +    IPL_NONE, PR_WAITOK, "grepcb", NULL);
>> +}
>> +
>> +int
>> +gre_attach(struct socket *so, int proto)
>> +{
>> + const struct gre_ops *ops = &gre_ip4_ops;
>> + struct gre_pcb *gpcb;
>> + struct inpcb *inp;
>> + int error;
>> + int flags = 0;
>> +
>> + if (so->so_pcb != NULL)
>> + return (EINVAL);
>> +
>> + error = suser(curproc);
>> + if (error != 0)
>> + return (error);
>> +
>> + error = soreserve(so, gre_sendspace, gre_recvspace);
>> + if (error != 0)
>> + return (error);
>> +
>> +#ifdef INET6
>> + if (sotopf(so) == PF_INET6) {
>> + ops = &gre_ip6_ops;
>> + flags = INP_IPV6;
>> + }
>> +#endif
>> +
>> + gpcb = pool_get(&gre_pcb_pool, PR_NOWAIT | PR_ZERO);
>> + if (gpcb == NULL)
>> + return (ENOBUFS);
>> +
>> + inp = gpcb_inp(gpcb);
>> + inp->inp_socket = so;
>> + refcnt_init(&inp->inp_refcnt);
>> + inp->inp_seclevel[SL_AUTH] = IPSEC_AUTH_LEVEL_DEFAULT;
>> + inp->inp_seclevel[SL_ESP_TRANS] = IPSEC_ESP_TRANS_LEVEL_DEFAULT;
>> + inp->inp_seclevel[SL_ESP_NETWORK] = IPSEC_ESP_NETWORK_LEVEL_DEFAULT;
>> + inp->inp_seclevel[SL_IPCOMP] = IPSEC_IPCOMP_LEVEL_DEFAULT;
>> + inp->inp_rtableid = curproc->p_p->ps_rtableid;
>> + inp->inp_ip_minttl = 0;
>> + inp->inp_flags |= flags;
>> + inp->inp_hops = *ops->defttl;
>> + inp->inp_ppcb = (caddr_t)gpcb;
>> +
>> + so->so_pcb = inp;
>> +
>> + return (0);
>> +}
>> +
>> +static void
>> +gre_inpdetach(struct gre_pcb *gpcb)
>> +{
>> + struct socket *so = gpcb_so(gpcb);
>> + struct inpcb *inp = gpcb_inp(gpcb);
>> +
>> + KASSERT(so->so_pcb == inp);
>> +
>> + so->so_pcb = NULL;
>> + sofree(so, SL_NOUNLOCK);
>> +
>> + m_freem(inp->inp_options);
>> + if (inp->inp_route.ro_rt) {
>> + rtfree(inp->inp_route.ro_rt);
>> + inp->inp_route.ro_rt = NULL;
>> + }
>> +
>> + switch (ISSET(inp->inp_flags, INP_IPV6)) {
>> + case 0:
>> + ip_freemoptions(inp->inp_moptions);
>> + break;
>> +#ifdef INET6
>> + case INP_IPV6:
>> + ip6_freepcbopts(inp->inp_outputopts6);
>> + ip6_freemoptions(inp->inp_moptions6);
>> + break;
>> +#endif
>> + }
>> +
>> + KASSERT((struct gre_pcb *)inp->inp_ppcb == (struct gre_pcb *)inp);
>> +
>> + (void)gre_disconnect(gpcb);
>> + pool_put(&gre_pcb_pool, gpcb);
>> +}
>> +
>> +int
>> +gre_detach(struct socket *so)
>> +{
>> + struct inpcb *inp;
>> +
>> + soassertlocked(so);
>> +
>> + inp = sotoinpcb(so);
>> + if (inp == NULL)
>> + return (EINVAL);
>> +
>> + gre_inpdetach(inp_gpcb(inp));
>> +
>> + return (0);
>> +}
>> +
>> +static int
>> +gre_bind(const struct gre_ops *ops, struct gre_pcb *gpcb, struct mbuf *addr)
>> +{
>> + struct inpcb *inp = gpcb_inp(gpcb);
>> + struct socket *so = gpcb_so(gpcb);
>> + struct sockaddr *sa;
>> + unsigned int state = GRE_S_BOUND;
>> + struct gre_pcb_key *gk, *ogk;
>> + int reuse = ISSET(so->so_options, SO_REUSEPORT);
>> + int error;
>> +
>> + if (gpcb->gpcb_pcb_key != NULL)
>> + return (EISCONN);
>> +
>> + error = (*ops->op_nametosa)(inp, addr, &sa);
>> + if (error != 0)
>> + return (error);
>> +
>> + if ((*ops->op_is_wildcard)(sa)) {
>> + state = GRE_S_WILDCARD;
>> + } else if ((*ops->op_is_multicast)(sa)) {
>> + /*
>> + * Treat SO_REUSEADDR as SO_REUSEPORT for multicast;
>> + * allow complete duplication of binding if
>> + * SO_REUSEPORT is set, or if SO_REUSEADDR is set
>> + * and a multicast address is bound on both
>> + * new and duplicated sockets.
>> + */
>> + if (ISSET(so->so_options, SO_REUSEADDR|SO_REUSEPORT))
>> + reuse = SO_REUSEADDR|SO_REUSEPORT;
>> + } else if (ISSET(so->so_options, SO_BINDANY) ||
>> +    (*ops->op_is_broadcast)(inp->inp_rtableid, sa) == 0) {
>> + /*
>> + * we must check that we are binding to an address we
>> + * own except when:
>> + * - SO_BINDANY is set or
>> + * - we are binding a UDP socket to 255.255.255.255 or
>> + * - we are binding a UDP socket to one of our broadcast
>> + *   addresses
>> + */
>> + ;
>> + } else {
>> + error = (*ops->op_is_local)(inp->inp_rtableid, sa);
>> + if (error != 0)
>> + return (error);
>> + }
>> +
>> + gk = gre_pcb_key_get(gpcb);
>> + if (gk == NULL)
>> + return (ENOMEM);
>> +
>> + gk->gk_family = sa->sa_family;
>> + (*ops->op_addr)(&gk->gk_laddr, sa);
>> + gk->gk_proto = (*ops->op_proto)(sa);
>> +
>> + ogk = gre_pcb_key_insert(state, gk);
>> + if (ogk != NULL) {
>> + struct gre_pcb *ogpcb;
>> +
>> + gre_pcb_key_put(gk);
>> +
>> + ogpcb = gre_pcb_first(&ogk->gk_pcbs);
>> + if (ogpcb != NULL) {
>> + struct socket *oso = gpcb_so(ogpcb);
>> + if (!ISSET(reuse, oso->so_options))
>> + return (EADDRINUSE);
>> + }
>> +
>> + gk = ogk;
>> + }
>> +
>> + /* commit */
>> +
>> + gpcb->gpcb_pcb_key = gk;
>> + gre_pcb_insert(&gk->gk_pcbs, gpcb);
>> +
>> + inp->inp_laddru = gk->gk_laddr;
>> + inp->inp_lport = gk->gk_proto;
>> +
>> + gpcb->gpcb_reuse = reuse;
>> +
>> + return (0);
>> +}
>> +
>> +static int
>> +gre_connect(const struct gre_ops *ops, struct gre_pcb *gpcb, struct mbuf *addr)
>> +{
>> + struct inpcb *inp = gpcb_inp(gpcb);
>> + struct socket *so = gpcb_so(gpcb);
>> + struct gre_pcb_key *bgk = gpcb->gpcb_pcb_key;
>> + struct gre_pcb_key *gk, *ogk;
>> + struct sockaddr *sa;
>> + int reuse = ISSET(so->so_options, SO_REUSEPORT);
>> + int error;
>> +
>> + if (bgk != NULL && bgk->gk_state == GRE_S_CONNECTED)
>> + return (EISCONN);
>> +
>> + error = (*ops->op_nametosa)(inp, addr, &sa);
>> + if (error != 0)
>> + return (error);
>> +
>> + /* don't allow connections to wildcard addresses */
>> + if ((*ops->op_is_wildcard)(sa))
>> + return (EADDRNOTAVAIL);
>> +
>> + gk = gre_pcb_key_get(gpcb);
>> + if (gk == NULL)
>> + return (ENOBUFS);
>> +
>> + if (bgk == NULL || bgk->gk_state == GRE_S_WILDCARD) {
>> + error = (*ops->op_selsrc)(inp, sa, &gk->gk_laddr, NULL);
>> + if (error != 0)
>> + goto put;
>> +
>> + gk->gk_proto = (*ops->op_proto)(sa);
>> + } else {
>> + if (bgk->gk_proto != (*ops->op_proto)(sa)) {
>> + error = EADDRNOTAVAIL;
>> + goto put;
>> + }
>> +
>> + gk->gk_laddr = bgk->gk_laddr;
>> + gk->gk_proto = bgk->gk_proto;
>> + reuse |= gpcb->gpcb_reuse;
>> + }
>> +
>> + gk->gk_family = sa->sa_family;
>> + (*ops->op_addr)(&gk->gk_faddr, sa);
>> +
>> + ogk = gre_pcb_key_insert(GRE_S_CONNECTED, gk);
>> + if (ogk != NULL) {
>> + struct gre_pcb *ogpcb;
>> +
>> + gre_pcb_key_put(gk);
>> +
>> + ogpcb = gre_pcb_first(&ogk->gk_pcbs);
>> + if (ogpcb != NULL) {
>> + struct socket *oso = gpcb_so(ogpcb);
>> + KASSERTMSG(oso != NULL, "ogk %p ogpcb %p oso %p",
>> +    ogk, ogpcb, oso);
>> + if (!ISSET(reuse, oso->so_options))
>> + return (EADDRINUSE);
>> + }
>> +
>> + gk = ogk;
>> + }
>> +
>> + /* commit */
>> +
>> + gre_disconnect(gpcb);
>> + gpcb->gpcb_pcb_key = gk;
>> + gre_pcb_insert(&gk->gk_pcbs, gpcb);
>> +
>> + inp->inp_laddru = gk->gk_laddr;
>> + inp->inp_faddru = gk->gk_faddr;
>> + inp->inp_lport = inp->inp_fport = gk->gk_proto;
>> +
>> + gpcb->gpcb_reuse = reuse;
>> + soisconnected(so);
>> + return (0);
>> +
>> +put:
>> + gre_pcb_key_put(gk);
>> + return (error);
>> +}
>> +
>> +
>> +static int
>> +gre_getsockname(const struct gre_ops *ops, struct gre_pcb *gpcb,
>> +    struct mbuf *addr)
>> +{
>> + struct gre_pcb_key *gk = gpcb->gpcb_pcb_key;
>> +
>> + if (gk == NULL)
>> + return (ENOTCONN);
>> +
>> + (*ops->op_getsockname)(gpcb_inp(gpcb), addr);
>> +
>> + return (0);
>> +}
>> +
>> +static int
>> +gre_getpeername(const struct gre_ops *ops, struct gre_pcb *gpcb,
>> +    struct mbuf *addr)
>> +{
>> + struct gre_pcb_key *gk = gpcb->gpcb_pcb_key;
>> +
>> + if (gk == NULL || gk->gk_state != GRE_S_CONNECTED)
>> + return (ENOTCONN);
>> +
>> + (*ops->op_getpeername)(gpcb_inp(gpcb), addr);
>> +
>> + return (0);
>> +}
>> +
>> +static int
>> +gre_disconnect(struct gre_pcb *gpcb)
>> +{
>> + struct gre_pcb_key *gk;
>> +
>> + gk = gpcb->gpcb_pcb_key;
>> + if (gk == NULL) {
>> + return (ENOTCONN);
>> + }
>> +
>> + gre_pcb_remove(&gk->gk_pcbs, gpcb);
>> + if (gre_pcb_empty(&gk->gk_pcbs)) {
>> + switch (gk->gk_state) {
>> + case GRE_S_WILDCARD:
>> + RBT_REMOVE(gre_tree_wildcards, &gre_wildcards, gk);
>> + break;
>> + case GRE_S_BOUND:
>> + RBT_REMOVE(gre_tree_bound, &gre_bound, gk);
>> + break;
>> + case GRE_S_CONNECTED:
>> + RBT_REMOVE(gre_tree_connected, &gre_connected, gk);
>> + break;
>> + }
>> + pool_put(&gre_pcb_key_pool, gk);
>> + }
>> +
>> + gpcb->gpcb_pcb_key = NULL;
>> +
>> + return (0);
>> +}
>> +
>> +static int
>> +gre_usrreq(const struct gre_ops *ops, struct socket *so, int req,
>> +    struct mbuf *m, struct mbuf *addr, struct mbuf *control, struct proc *p)
>> +{
>> + struct inpcb *inp;
>> + struct gre_pcb *gpcb;
>> + int error = 0;
>> +
>> + if (req == PRU_CONTROL) {
>> + return ((*ops->op_control)(so, (u_long)m, (caddr_t)addr,
>> +    (struct ifnet *)control));
>> + }
>> +
>> + soassertlocked(so);
>> +
>> + inp = sotoinpcb(so);
>> + if (inp == NULL) {
>> + error = EINVAL;
>> + goto release;
>> + }
>> + gpcb = inp_gpcb(inp);
>> +
>> + switch (req) {
>> + case PRU_BIND:
>> + error = gre_bind(ops, gpcb, addr);
>> + break;
>> +
>> + case PRU_LISTEN:
>> + error = EOPNOTSUPP;
>> + break;
>> +
>> + case PRU_CONNECT:
>> + error = gre_connect(ops, gpcb, addr);
>> + break;
>> +
>> + case PRU_CONNECT2:
>> + error = EOPNOTSUPP;
>> + break;
>> +
>> + case PRU_ACCEPT:
>> + error = EOPNOTSUPP;
>> + break;
>> +
>> + case PRU_DISCONNECT:
>> + error = gre_disconnect(gpcb);
>> + if (error != 0)
>> + break;
>> +
>> + gpcb->gpcb_reuse = 0;
>> + CLR(so->so_state, SS_ISCONNECTED); /* XXX cos udp_usrreq.c */
>> + memset(&inp->inp_laddru, 0, sizeof(inp->inp_laddru));
>> + memset(&inp->inp_faddru, 0, sizeof(inp->inp_faddru));
>> + inp->inp_lport = inp->inp_fport = 0;
>> + break;
>> +
>> + case PRU_SHUTDOWN:
>> + socantsendmore(so);
>> + break;
>> +
>> + case PRU_SEND:
>> + return (gre_send(ops, gpcb, m, addr, control));
>> +
>> + case PRU_ABORT:
>> + soisdisconnected(so);
>> + gre_inpdetach(gpcb);
>> + break;
>> +
>> + case PRU_SOCKADDR:
>> + return (gre_getsockname(ops, gpcb, addr));
>> + break;
>> + case PRU_PEERADDR:
>> + return (gre_getpeername(ops, gpcb, addr));
>> + break;
>> +
>> + case PRU_SENSE:
>> + /* stat: don't bother with a block size. */
>> + break;
>> +
>> + case PRU_SENDOOB:
>> + case PRU_FASTTIMO:
>> + case PRU_SLOWTIMO:
>> + case PRU_PROTORCV:
>> + case PRU_PROTOSEND:
>> + case PRU_RCVD:
>> + case PRU_RCVOOB:
>> + error = EOPNOTSUPP;
>> + break;
>> +
>> + default:
>> + panic("%s req %d", __func__, req);
>> + }
>> +
>> +release:
>> + switch (req) {
>> + case PRU_RCVD:
>> + case PRU_RCVOOB:
>> + case PRU_SENSE:
>> + break;
>> + default:
>> + m_freem(control);
>> + m_freem(m);
>> + break;
>> + }
>> +
>> + return (error);
>> +}
>> +
>> +static int
>> +gre_setopt(struct gre_pcb *gpcb, int optname, struct mbuf *m)
>> +{
>> + /*
>> + * only support changing these options when the socket is
>> + * completely disconnected. the amount of code needed to try
>> + * changing the gre_pcb_key was "quite large" and arguably
>> + * not worth it.
>> + */
>> + switch (optname) {
>> + case GRE_CKSUM:
>> + case GRE_KEY:
>> + case GRE_SEQ:
>> + if (gpcb->gpcb_pcb_key != NULL)
>> + return (EISCONN);
>> + break;
>> + }
>> +
>> + switch (optname) {
>> + case GRE_CKSUM:
>> + if (m == NULL || m->m_len != sizeof(int))
>> + return (EINVAL);
>> +
>> + if (*mtod(m, int *))
>> + SET(gpcb->gpcb_flags, htons(GRE_CP));
>> + else
>> + CLR(gpcb->gpcb_flags, htons(GRE_CP));
>> + break;
>> + case GRE_KEY:
>> + if (m == NULL || m->m_len == 0) { /* disable key */
>> + CLR(gpcb->gpcb_flags, htons(GRE_KP));
>> + gpcb->gpcb_key = htonl(0);
>> + break;
>> + }
>> +
>> + if (m->m_len != sizeof(gpcb->gpcb_key))
>> + return (EINVAL);
>> +
>> + SET(gpcb->gpcb_flags, htons(GRE_KP));
>> + htobem32(&gpcb->gpcb_key, *mtod(m, uint32_t *));
>> + break;
>> + case GRE_SEQ:
>> + if (m == NULL || m->m_len == 0) { /* disable seq */
>> + CLR(gpcb->gpcb_flags, htons(GRE_SP));
>> + gpcb->gpcb_seq = 0;
>> + break;
>> + }
>> +
>> + if (m->m_len != sizeof(gpcb->gpcb_seq))
>> + return (EINVAL);
>> +
>> + SET(gpcb->gpcb_flags, htons(GRE_SP));
>> + gpcb->gpcb_seq = *mtod(m, uint32_t *);
>> + break;
>> + case GRE_RECVSEQ:
>> + if (m == NULL || m->m_len != sizeof(int))
>> + return (EINVAL);
>> +
>> + if (*mtod(m, int *))
>> + SET(gpcb->gpcb_pflags, GREPCB_RECVSEQ);
>> + else
>> + CLR(gpcb->gpcb_pflags, GREPCB_RECVSEQ);
>> + break;
>> + default:
>> + return (ENOPROTOOPT);
>> + break;
>> + }
>> +
>> + return (0);
>> +}
>> +
>> +static int
>> +gre_getopt(struct gre_pcb *gpcb, int optname, struct mbuf *m)
>> +{
>> + switch (optname) {
>> + case GRE_KEY:
>> + if (!ISSET(gpcb->gpcb_flags, htons(GRE_KP)))
>> + return (ENOTCONN);
>> +
>> + m->m_len = sizeof(gpcb->gpcb_key);
>> + *mtod(m, uint32_t *) = bemtoh32(&gpcb->gpcb_key);
>> + break;
>> + case GRE_CKSUM:
>> + m->m_len = sizeof(int);
>> + *mtod(m, int *) = !!ISSET(gpcb->gpcb_flags, htons(GRE_CP));
>> + break;
>> + case GRE_SEQ:
>> + if (!ISSET(gpcb->gpcb_flags, htons(GRE_SP)))
>> + return (ENOTCONN);
>> +
>> + m->m_len = sizeof(gpcb->gpcb_seq);
>> + *mtod(m, uint32_t *) = gpcb->gpcb_key;
>> + break;
>> + default:
>> + return (ENOPROTOOPT);
>> + }
>> +
>> + return (0);
>> +}
>> +
>> +/*
>> + * IP socket option processing.
>> + */
>> +static int
>> +gre_ctloutput(const struct gre_ops *ops, int op, struct socket *so,
>> +    int level, int optname, struct mbuf *m)
>> +{
>> + struct inpcb *inp;
>> + struct gre_pcb *gpcb;
>> + int error;
>> +
>> + inp = sotoinpcb(so);
>> + if (inp == NULL)
>> + return (ECONNRESET);
>> + if (level != IPPROTO_GRE)
>> + return ((*ops->op_ctloutput)(level, so, level, optname, m));
>> +
>> + gpcb = inp_gpcb(inp);
>> +
>> + switch (op) {
>> + case PRCO_SETOPT:
>> + error = gre_setopt(gpcb, optname, m);
>> + break;
>> + case PRCO_GETOPT:
>> + error = gre_getopt(gpcb, optname, m);
>> + break;
>> + default:
>> + panic("%s op %d", __func__, op);
>> + }
>> +
>> + return (error);
>> +}
>> +
>> +static void *
>> +gre_pullup(struct mbuf **mp, int *offp, int len)
>> +{
>> + int hlen = *offp + len;
>> + void *h;
>> +
>> + if ((*mp)->m_pkthdr.len < hlen)
>> + return (NULL); /* decline */
>> +
>> + *mp = m_pullup(*mp, hlen);
>> + if (*mp == NULL)
>> + return (NULL);
>> +
>> + h = mtod(*mp, caddr_t) + *offp;
>> + *offp = hlen;
>> +
>> + return (h);
>> +}
>> +
>> +static int
>> +gre_candeliver(const struct gre_pcb *gpcb, uint8_t ttl)
>> +{
>> + const struct inpcb *inp = gpcb_inp(gpcb);
>> +
>> + if (ISSET(inp->inp_socket->so_state, SS_CANTRCVMORE))
>> + return (0);
>> +
>> + if (inp->inp_ip_minttl > ttl)
>> + return (0);
>> +
>> + return (1);
>> +}
>> +
>> +static struct mbuf *
>> +gre_ip_input(const struct gre_ops *ops, struct mbuf *m, int iphlen,
>> +    uint8_t ttl, struct gre_pcb_key *key)
>> +{
>> + int hlen = iphlen;
>> + struct gre_header *gh;
>> + struct gre_pcb_key *gk;
>> + struct gre_pcb *gpcb, *ngpcb;
>> + uint16_t cksum = 0;
>> + uint32_t seqno = 0;
>> +
>> + gh = gre_pullup(&m, &hlen, sizeof(*gh));
>> + if (gh == NULL)
>> + return (m);
>> +
>> + if ((gh->gre_flags & htons(GRE_VERS_MASK)) != htons(GRE_VERS_0))
>> + return (m); /* decline */
>> +
>> + if (ISSET(gh->gre_flags, ~htons(GRE_VALID_MASK)))
>> + return (m); /* decline */
>> +
>> + key->gk_flags = gh->gre_flags;
>> + key->gk_proto = gh->gre_proto;
>> +
>> + if (ISSET(key->gk_flags, htons(GRE_CP))) {
>> + struct gre_h_cksum *gch;
>> +
>> + gch = gre_pullup(&m, &hlen, sizeof(*gch));
>> + if (gch == NULL)
>> + return (m);
>> +
>> + cksum = gch->gre_cksum;
>> +
>> + /* XXX ignore Reserved (Offset) field */
>> + }
>> +
>> + if (ISSET(key->gk_flags, htons(GRE_KP))) {
>> + struct gre_h_key *gkh;
>> +
>> + gkh = gre_pullup(&m, &hlen, sizeof(*gkh));
>> + if (gkh == NULL)
>> + return (m);
>> +
>> + key->gk_key = gkh->gre_key;
>> + }
>> +
>> + if (ISSET(key->gk_flags, htons(GRE_SP))) {
>> + struct gre_h_seq *gsh;
>> +
>> + gsh = gre_pullup(&m, &hlen, sizeof(*gsh));
>> + if (gsh == NULL)
>> + return (m);
>> +
>> + seqno = bemtoh32(&gsh->gre_seq);
>> + }
>> +
>> + key->gk_rtableid = m->m_pkthdr.ph_rtableid;
>> +
>> + gk = RBT_FIND(gre_tree_connected, &gre_connected, key);
>> + if (gk == NULL) {
>> + gk = RBT_FIND(gre_tree_bound, &gre_bound, key);
>> + if (gk == NULL) {
>> + gk = RBT_FIND(gre_tree_wildcards, &gre_wildcards, key);
>> + if (gk == NULL)
>> + return (m); /* decline */
>> + }
>> + }
>> +
>> + /* it's ours now */
>> +
>> + if (ISSET(key->gk_flags, htons(GRE_CP))) {
>> + /* XXX actually do the checksum calc */
>> + }
>> +
>> + gpcb = gre_pcb_first(&gk->gk_pcbs);
>> + while (!gre_candeliver(gpcb, ttl)) {
>> + gpcb = gre_pcb_next(gpcb);
>> + if (gpcb == NULL)
>> + goto drop;
>> + }
>> +
>> + ngpcb = gpcb;
>> + for (;;) {
>> + struct mbuf *mm;
>> +
>> + ngpcb = gre_pcb_next(ngpcb);
>> + if (ngpcb == NULL)
>> + break;
>> +
>> + if (!gre_candeliver(ngpcb, ttl))
>> + continue;
>> +
>> + mm = m_dup_pkt(m, 0, M_DONTWAIT);
>> + if (mm == NULL) {
>> + /* assume further copies will also fail */
>> + break;
>> + }
>> +
>> + (*ops->op_sbappend)(gpcb, mm, hlen, seqno);
>> + gpcb = ngpcb;
>> + }
>> +
>> + (*ops->op_sbappend)(gpcb, m, hlen, seqno);
>> +
>> + return (NULL);
>> +
>> +drop:
>> + m_freem(m);
>> + return (NULL);
>> +}
>> +
>> +static int
>> +gre_pcb_key_cmp_wildcard(const struct gre_pcb_key *a,
>> +    const struct gre_pcb_key *b)
>> +{
>> + if (a->gk_proto > b->gk_proto)
>> + return (1);
>> + if (a->gk_proto < b->gk_proto)
>> + return (-1);
>> +
>> + if (a->gk_flags > b->gk_flags)
>> + return (1);
>> + if (a->gk_flags < b->gk_flags)
>> + return (-1);
>> +
>> + if (ISSET(a->gk_flags, htons(GRE_KP))) {
>> + if (a->gk_key > b->gk_key)
>> + return (1);
>> + if (a->gk_key < b->gk_key)
>> + return (-1);
>> + }
>> +
>> + if (a->gk_rtableid > b->gk_rtableid)
>> + return (1);
>> + if (a->gk_rtableid < b->gk_rtableid)
>> + return (-1);
>> +
>> + if (a->gk_family > b->gk_family)
>> + return (1);
>> + if (a->gk_family < b->gk_family)
>> + return (-1);
>> +
>> + return (0);
>> +}
>> +
>> +RBT_GENERATE(gre_tree_wildcards, gre_pcb_key, gk_entry,
>> +    gre_pcb_key_cmp_wildcard);
>> +
>> +static int
>> +gre_pcb_key_cmp_bound(const struct gre_pcb_key *a,
>> +    const struct gre_pcb_key *b)
>> +{
>> + int rv;
>> +
>> + rv = gre_pcb_key_cmp_wildcard(a, b);
>> + if (rv != 0)
>> + return (rv);
>> +
>> + switch (a->gk_family) {
>> + case AF_INET:
>> + rv = memcmp(&a->gk_laddr4, &b->gk_laddr4, sizeof(a->gk_laddr4));
>> + break;
>> +#ifdef INET
>> + case AF_INET6:
>> + rv = memcmp(&a->gk_laddr6, &b->gk_laddr6, sizeof(a->gk_laddr6));
>> + break;
>> +#endif
>> + default:
>> + unhandled_af(a->gk_family);
>> + }
>> +
>> + return (rv);
>> +}
>> +
>> +RBT_GENERATE(gre_tree_bound, gre_pcb_key, gk_entry, gre_pcb_key_cmp_bound);
>> +
>> +static int
>> +gre_pcb_key_cmp_connected(const struct gre_pcb_key *a,
>> +    const struct gre_pcb_key *b)
>> +{
>> + int rv;
>> +
>> + rv = gre_pcb_key_cmp_bound(a, b);
>> + if (rv != 0)
>> + return (rv);
>> +
>> + switch (a->gk_family) {
>> + case AF_INET:
>> + rv = memcmp(&a->gk_faddr4, &b->gk_faddr4, sizeof(a->gk_faddr4));
>> + break;
>> +#ifdef INET
>> + case AF_INET6:
>> + rv = memcmp(&a->gk_faddr6, &b->gk_faddr6, sizeof(a->gk_faddr6));
>> + break;
>> +#endif
>> + default:
>> + unhandled_af(a->gk_family);
>> + }
>> +
>> + return (rv);
>> +}
>> +
>> +RBT_GENERATE(gre_tree_connected, gre_pcb_key, gk_entry,
>> +    gre_pcb_key_cmp_connected);
>> Index: sys/netinet6/in6_proto.c
>> ===================================================================
>> RCS file: /cvs/src/sys/netinet6/in6_proto.c,v
>> retrieving revision 1.104
>> diff -u -p -r1.104 in6_proto.c
>> --- sys/netinet6/in6_proto.c 13 Jun 2019 08:12:11 -0000 1.104
>> +++ sys/netinet6/in6_proto.c 29 Oct 2019 07:57:58 -0000
>> @@ -117,7 +117,7 @@
>>  
>>  #include "gre.h"
>>  #if NGRE > 0
>> -#include <net/if_gre.h>
>> +#include <netinet/gre_var.h>
>>  #endif
>>  
>>  /*
>> @@ -340,11 +340,22 @@ const struct protosw inet6sw[] = {
>>    .pr_domain = &inet6domain,
>>    .pr_protocol = IPPROTO_GRE,
>>    .pr_flags = PR_ATOMIC|PR_ADDR,
>> -  .pr_input = gre_input6,
>> +  .pr_input = gre_ip6_input,
>>    .pr_ctloutput = rip6_ctloutput,
>>    .pr_usrreq = rip6_usrreq,
>>    .pr_attach = rip6_attach,
>>    .pr_detach = rip6_detach,
>> +},
>> +{
>> +  .pr_type = SOCK_DGRAM,
>> +  .pr_domain = &inet6domain,
>> +  .pr_protocol = IPPROTO_GRE,
>> +  .pr_flags = PR_ATOMIC|PR_ADDR,
>> +  .pr_input = gre_ip6_input,
>> +  .pr_ctloutput = gre_ip6_ctloutput,
>> +  .pr_usrreq = gre_ip6_usrreq,
>> +  .pr_attach = gre_attach,
>> +  .pr_detach = gre_detach,
>>  },
>>  #endif /* NGRE */
>

Reply | Threaded
Open this post in threaded view
|

Re: GRE datagram socket support

Theo de Raadt-2
In reply to this post by David Gwynne-5
I am a big grumpy about /etc/protocols becoming another file that
is bypassed (accepted transparently) in kernel pledge.  but dlg
has convinced me hardcoding would be worse.

Reply | Threaded
Open this post in threaded view
|

Re: GRE datagram socket support

David Gwynne-5
In reply to this post by Damien Miller


> On 22 Jan 2020, at 8:54 am, Damien Miller <[hidden email]> wrote:
>
> On Wed, 22 Jan 2020, David Gwynne wrote:
>
>>> Index: sys/kern/kern_pledge.c
>>> ===================================================================
>>> RCS file: /cvs/src/sys/kern/kern_pledge.c,v
>>> retrieving revision 1.255
>>> diff -u -p -r1.255 kern_pledge.c
>>> --- sys/kern/kern_pledge.c 25 Aug 2019 18:46:40 -0000 1.255
>>> +++ sys/kern/kern_pledge.c 29 Oct 2019 07:57:58 -0000
>>> @@ -666,7 +666,7 @@ pledge_namei(struct proc *p, struct name
>>> }
>>> }
>>>
>>> - /* DNS needs /etc/{resolv.conf,hosts,services}. */
>>> + /* DNS needs /etc/{resolv.conf,hosts,services,protocols}. */
>>> if ((ni->ni_pledge == PLEDGE_RPATH) &&
>>>    (p->p_p->ps_pledge & PLEDGE_DNS)) {
>>> if (strcmp(path, "/etc/resolv.conf") == 0) {
>>> @@ -678,6 +678,10 @@ pledge_namei(struct proc *p, struct name
>>> return (0);
>>> }
>>> if (strcmp(path, "/etc/services") == 0) {
>>> + ni->ni_cnd.cn_flags |= BYPASSUNVEIL;
>>> + return (0);
>>> + }
>>> + if (strcmp(path, "/etc/protocols") == 0) {
>>> ni->ni_cnd.cn_flags |= BYPASSUNVEIL;
>>> return (0);
>
> This looks like it is fixing a real, separate bug in pledge vs
> getaddrinfo, no? (specifically: that lookups for named ports will fail
> currently).

no, our getaddrinfo currently hardcodes mapping of SOCK_STREAM, SOCK_DGRAM, IPPROTO_TCP, and IPPROTO_UDP and maps them to "udp" and "tcp" for use when looking up /etc/services via getservbyname_r. this is fine because they are by far the most common case and worth optimising for.

the problem is if (when) i want to use getnameinfo to look up entries for IPPROTO_GRE. i either hardcode IPPROTO_GRE in getnameinfo guts to "gre" for it to pass to getservbyname_r, or i look up /etc/protocols via getprotobynumber_r to get a name. i opted for the latter.

dlg
Reply | Threaded
Open this post in threaded view
|

Re: GRE datagram socket support

Theo de Raadt-2
David Gwynne <[hidden email]> wrote:

> > On 22 Jan 2020, at 8:54 am, Damien Miller <[hidden email]> wrote:
> >
> > On Wed, 22 Jan 2020, David Gwynne wrote:
> >
> >>> Index: sys/kern/kern_pledge.c
> >>> ===================================================================
> >>> RCS file: /cvs/src/sys/kern/kern_pledge.c,v
> >>> retrieving revision 1.255
> >>> diff -u -p -r1.255 kern_pledge.c
> >>> --- sys/kern/kern_pledge.c 25 Aug 2019 18:46:40 -0000 1.255
> >>> +++ sys/kern/kern_pledge.c 29 Oct 2019 07:57:58 -0000
> >>> @@ -666,7 +666,7 @@ pledge_namei(struct proc *p, struct name
> >>> }
> >>> }
> >>>
> >>> - /* DNS needs /etc/{resolv.conf,hosts,services}. */
> >>> + /* DNS needs /etc/{resolv.conf,hosts,services,protocols}. */
> >>> if ((ni->ni_pledge == PLEDGE_RPATH) &&
> >>>    (p->p_p->ps_pledge & PLEDGE_DNS)) {
> >>> if (strcmp(path, "/etc/resolv.conf") == 0) {
> >>> @@ -678,6 +678,10 @@ pledge_namei(struct proc *p, struct name
> >>> return (0);
> >>> }
> >>> if (strcmp(path, "/etc/services") == 0) {
> >>> + ni->ni_cnd.cn_flags |= BYPASSUNVEIL;
> >>> + return (0);
> >>> + }
> >>> + if (strcmp(path, "/etc/protocols") == 0) {
> >>> ni->ni_cnd.cn_flags |= BYPASSUNVEIL;
> >>> return (0);
> >
> > This looks like it is fixing a real, separate bug in pledge vs
> > getaddrinfo, no? (specifically: that lookups for named ports will fail
> > currently).
>
> no, our getaddrinfo currently hardcodes mapping of SOCK_STREAM, SOCK_DGRAM, IPPROTO_TCP, and IPPROTO_UDP and maps them to "udp" and "tcp" for use when looking up /etc/services via getservbyname_r. this is fine because they are by far the most common case and worth optimising for.
>
> the problem is if (when) i want to use getnameinfo to look up entries for IPPROTO_GRE. i either hardcode IPPROTO_GRE in getnameinfo guts to "gre" for it to pass to getservbyname_r, or i look up /etc/protocols via getprotobynumber_r to get a name. i opted for the latter.

a number of people have expressed gaping-mouth horror that pledge knows
about some userland paths.  these specific paths are really part of what
libc does.  we could have hard-coded this info into libc and avoided the
horror, but that approach would have other downsides.

many programs hit libc interfaces which needed the /etc/services file
because of getaddrinfo_async/getnameinfo which are highly desired
interfaces.  protocols hadn't hit this situation yet, but will now rise
to the same level.

i want to *minimize* the number of recognized paths, since there are two
very subtle changes in behaviour.

one other thing needs mentioning:  this kind of translation won't work in
a chroot.