veb(4), a virtual ethernet bridge (that could replace bridge(4)?)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

veb(4), a virtual ethernet bridge (that could replace bridge(4)?)

David Gwynne-5
i was bored at home a few weeks back, so i had a go at scratching
an itch i've had for a while now which was to write a quick and
dirty ethernet switch. the itch got worse recently when stsp@ asked
about some weird packet behaviour that may or may not have been
caused by bridge(4). trying to follow the code and how it interacts
with the stack was... challenging.

since then it has become less quick and dirty, and i've shined it enough
that i think it should be considered for the tree. however, it's not a
rewrite of bridge(4), there's some very significant semantic differences
that need to be explained.

the new driver is called veb(4), short for Virtual Ethernet Bridge. it
also contains a companion driver called vport(4), which i'll explain on
the way.

veb(4), like bridge(4), is a software implementation of an ethernet
switch. it is also represented as a virtual clonable interface that
you create at runtime, and then you add other ethernet interfaces
to as ports. these ethernet interfaces then act like ports on a
switch. packets received by the ethernet port interfaces are input to
the switch, which then decides which other Ethernet port interface
to send the packet out of based on the destination ethernet address.

the most fundamental difference between bridge(4) and veb(4) is that
veb(4) takes over ports completely and only uses them for l2. packets
coming into a veb member goes into the switching code, and then the
packet pops out another interface. that's it. this is different to
bridge(4), which kind of treats each member interface as two ports,
one which goes to the wire and one which goes to the network stack
using that port. this is where a lot of my confusion about the
bridge(4) comes from, both in terms of the code and when im trying
to actually use it. this difference is where most of the simplifcation
in veb comes from, and is fundamental to how it works.

because veb is only a layer 2 switch, by default does not interact
with the layer 3 kernel handling at all. this includes both the
ip/mpls stacks, and pf.  probably the biggest visible consequence of this
is if you add an interface that is currently how you're connected to the
host, veb(4) will basically take those packets away from the stack and
you'll be disconnected.

to have pf look at packets going in and out of interfaces on a
veb(4), you have to enable the link1 flag.

if you want the layer 3 stacks in the kernel to participate on a
veb(4), you have to explicitly create and add vport(4) interfaces.
vport(4) is special, and is handled specially by veb(4). one half
of the special handling is that veb(4) tries to disable l3 handling
on ports, but it doesnt do that on vport(4) ports. this allows you to
treat vport interfaces like a normal ethernet interface, but instead of
being plugged into a physical switch, the vport interface is plugged
into the virtual switch.

veb does not run pf on vport inteface because pf will be run as
packets enter and leave the network stack. the stack runs pf on
vport interfaces regardless of whether link1 is set on the veb
interface or not.

a weird consequence of this is that pf on vport interfaces runs in
the opposite direction of pf on other veb ports. packets going from
veb out to an interface have pf run with PF_OUT, but if the packet
is going from veb to a vport, it will be run with PF_IN by the
stack.

the reason that pf is disabled on normal port interfaces by default
is to minimise the complications in pf state tracking that happen
in this situation. a simple ruleset with pf enabled on normal ports
would have a state get created when it enters the member port. then
if the packet was destined for the vport, it would match the same
state in the same direction that was created on the normal port.

having vports as explicit interfaces you have to create and add to the
veb allows for a couple of interesting use cases. firstly, you can
use veb as a nexus between different rdomains by attaching multiple
vports in their own rdomains to the same veb. i think ive figured
all the dragons out in the code to support that. as always, care
must be taken with how and when pf gets run on those different
interfaces.

the second is that you can have veb implement as a "bump in the wire",
applying policy or monitoring to traffic going over the veb with
confidence that it won't leak into the stack of the local system
unless you explicitly configure it to do so.

another part of the itch this diff was tryign to scratch was factoring
out the bridge (not bridge(4)) code i have in bpe and nvgre. there's now
some common code in if_etherbridge that is used by bpe, nvgre, and
veb(4) that handles the actual leaning and port lookups used by all
those drivers.

i am also looking at using that same code for vxlan(4), but i'm holding
off on that because i'd probably want to rework that code to use udp
sockets at the same time.

some of the recent polishing has been to implemented the "protected"
pvlan, and filter rules features that bridge had.

lastly, there's some things bridge(4) does that veb(4) does not do. the
main things i can think of are the ipsec interception that bridge(4)
does, spanning tree support, the ethernet address table management (eg,
static entries or deleting specific entries), and some port flag handling.
apart from stp, none of it is particularly hard. it's just hard to get
motivated to do any more of this out of tree anymore.

oh, veb(4) should be a lot faster than bridge(4) too. and mpsafe. and
able to be run concurrently. hrvoje popovski has tested some versions of
these diffs and has the following numbers so far:

> 3550m4 - slower box
> forwarding - 560 Kpps
> bridge - 400 Kpps
> veb - 850 Kpps
> tpmr - 920 Kpps
>
> r620 - faster box
> forwarding - 1 Mpps
> bridge - 680 Kpps
> veb - 1.5 Mpps
> tpmr - 1.75 Mpps

ignoring the performance differences between bridge(4) and veb(4),
i am interested in thoughts on the semantic differences between
them. if anyone wants some insight into why bridge(4) is the way
it is, you can read the Transparent Network Security Policy Enforcement
paper by angelos and jason.

ive been using this code at home for half a week now, and it's been very
boring, which is unlike my first attempts at using bridge(4) for
the same work. i am obviously biased though.

Index: conf/GENERIC
===================================================================
RCS file: /cvs/src/sys/conf/GENERIC,v
retrieving revision 1.273
diff -u -p -r1.273 GENERIC
--- conf/GENERIC 30 Sep 2020 14:51:17 -0000 1.273
+++ conf/GENERIC 10 Feb 2021 12:06:23 -0000
@@ -82,11 +82,13 @@ pseudo-device msts 1 # MSTS line discipl
 pseudo-device endrun 1 # EndRun line discipline
 pseudo-device vnd 4 # vnode disk devices
 pseudo-device ksyms 1 # kernel symbols device
+pseudo-device kstat # kernel statistics
 #pseudo-device dt # Dynamic Tracer
 
 # clonable devices
 pseudo-device bpfilter # packet filter
 pseudo-device bridge # network bridging support
+pseudo-device veb # virtual Ethernet bridge
 pseudo-device carp # CARP protocol support
 pseudo-device etherip # EtherIP (RFC 3378)
 pseudo-device gif # IPv[46] over IPv[46] tunnel (RFC1933)
Index: conf/files
===================================================================
RCS file: /cvs/src/sys/conf/files,v
retrieving revision 1.693
diff -u -p -r1.693 files
--- conf/files 28 Jan 2021 14:53:20 -0000 1.693
+++ conf/files 10 Feb 2021 12:06:23 -0000
@@ -13,6 +13,7 @@ define audio {}
 define scsi {}
 define atascsi {}
 define ifmedia
+define etherbridge
 define mii {[phy = -1]}
 define midibus {}
 define radiobus {}
@@ -555,11 +556,12 @@ pseudo-device bpfilter: ifnet
 pseudo-device enc: ifnet
 pseudo-device etherip: ifnet, ether, ifmedia
 pseudo-device bridge: ifnet, ether
+pseudo-device veb: ifnet, ether, etherbridge
 pseudo-device vlan: ifnet, ether
 pseudo-device carp: ifnet, ether
 pseudo-device sppp: ifnet
 pseudo-device gif: ifnet
-pseudo-device gre: ifnet
+pseudo-device gre: ifnet, ether, etherbridge
 pseudo-device crypto: ifnet
 pseudo-device trunk: ifnet, ether, ifmedia
 pseudo-device aggr: ifnet, ether, ifmedia
@@ -567,7 +569,7 @@ pseudo-device tpmr: ifnet, ether, ifmedi
 pseudo-device mpe: ifnet, mpls
 pseudo-device mpw: ifnet, mpls, ether
 pseudo-device mpip: ifnet, mpls
-pseudo-device bpe: ifnet, ether, ifmedia
+pseudo-device bpe: ifnet, ether, ifmedia, etherbridge
 pseudo-device vether: ifnet, ether
 pseudo-device pppx: ifnet
 pseudo-device vxlan: ifnet, ether, ifmedia
@@ -812,6 +814,8 @@ file net/if_tun.c tun needs-count
 file net/if_bridge.c bridge needs-count
 file net/bridgectl.c bridge
 file net/bridgestp.c bridge
+file net/if_etherbridge.c etherbridge
+file net/if_veb.c veb
 file net/if_vlan.c vlan needs-count
 file net/if_switch.c switch needs-count
 file net/switchctl.c switch
@@ -840,7 +844,7 @@ file net/if_wg.c wg
 file net/wg_noise.c wg
 file net/wg_cookie.c wg
 file net/bfd.c bfd
-file net/toeplitz.c stoeplitz needs-flag
+file net/toeplitz.c stoeplitz | etherbridge needs-flag
 file net80211/ieee80211.c wlan
 file net80211/ieee80211_amrr.c wlan
 file net80211/ieee80211_crypto.c wlan
Index: net/if_bpe.c
===================================================================
RCS file: /cvs/src/sys/net/if_bpe.c,v
retrieving revision 1.15
diff -u -p -r1.15 if_bpe.c
--- net/if_bpe.c 19 Jan 2021 07:30:19 -0000 1.15
+++ net/if_bpe.c 10 Feb 2021 12:06:23 -0000
@@ -27,6 +27,7 @@
 #include <sys/timeout.h>
 #include <sys/pool.h>
 #include <sys/tree.h>
+#include <sys/smr.h>
 
 #include <net/if.h>
 #include <net/if_var.h>
@@ -40,7 +41,7 @@
 
 /* for bridge stuff */
 #include <net/if_bridge.h>
-
+#include <net/if_etherbridge.h>
 
 #if NBPFILTER > 0
 #include <net/bpf.h>
@@ -74,42 +75,17 @@ static inline int bpe_cmp(const struct b
 RBT_PROTOTYPE(bpe_tree, bpe_key, k_entry, bpe_cmp);
 RBT_GENERATE(bpe_tree, bpe_key, k_entry, bpe_cmp);
 
-struct bpe_entry {
- struct ether_addr be_c_da; /* customer address - must be first */
- struct ether_addr be_b_da; /* bridge address */
- unsigned int be_type;
-#define BPE_ENTRY_DYNAMIC 0
-#define BPE_ENTRY_STATIC 1
- struct refcnt be_refs;
- time_t be_age;
-
- RBT_ENTRY(bpe_entry) be_entry;
-};
-
-RBT_HEAD(bpe_map, bpe_entry);
-
-static inline int bpe_entry_cmp(const struct bpe_entry *,
-    const struct bpe_entry *);
-
-RBT_PROTOTYPE(bpe_map, bpe_entry, be_entry, bpe_entry_cmp);
-RBT_GENERATE(bpe_map, bpe_entry, be_entry, bpe_entry_cmp);
-
 struct bpe_softc {
  struct bpe_key sc_key; /* must be first */
  struct arpcom sc_ac;
  int sc_txhprio;
  int sc_rxhprio;
- uint8_t sc_group[ETHER_ADDR_LEN];
+ struct ether_addr sc_group;
 
  struct task sc_ltask;
  struct task sc_dtask;
 
- struct bpe_map sc_bridge_map;
- struct rwlock sc_bridge_lock;
- unsigned int sc_bridge_num;
- unsigned int sc_bridge_max;
- int sc_bridge_tmo; /* seconds */
- struct timeout sc_bridge_age;
+ struct etherbridge sc_eb;
 };
 
 void bpeattach(int);
@@ -132,16 +108,26 @@ static void bpe_link_hook(void *);
 static void bpe_link_state(struct bpe_softc *, u_char, uint64_t);
 static void bpe_detach_hook(void *);
 
-static void bpe_input_map(struct bpe_softc *,
-    const uint8_t *, const uint8_t *);
-static void bpe_bridge_age(void *);
-
 static struct if_clone bpe_cloner =
     IF_CLONE_INITIALIZER("bpe", bpe_clone_create, bpe_clone_destroy);
 
+static int bpe_eb_port_eq(void *, void *, void *);
+static void *bpe_eb_port_take(void *, void *);
+static void bpe_eb_port_rele(void *, void *);
+static size_t bpe_eb_port_ifname(void *, char *, size_t, void *);
+static void bpe_eb_port_sa(void *, struct sockaddr_storage *, void *);
+
+static const struct etherbridge_ops bpe_etherbridge_ops = {
+ bpe_eb_port_eq,
+ bpe_eb_port_take,
+ bpe_eb_port_rele,
+ bpe_eb_port_ifname,
+ bpe_eb_port_sa,
+};
+
 static struct bpe_tree bpe_interfaces = RBT_INITIALIZER();
 static struct rwlock bpe_lock = RWLOCK_INITIALIZER("bpeifs");
-static struct pool bpe_entry_pool;
+static struct pool bpe_endpoint_pool;
 
 void
 bpeattach(int count)
@@ -154,18 +140,27 @@ bpe_clone_create(struct if_clone *ifc, i
 {
  struct bpe_softc *sc;
  struct ifnet *ifp;
+ int error;
 
- if (bpe_entry_pool.pr_size == 0) {
- pool_init(&bpe_entry_pool, sizeof(struct bpe_entry), 0,
+ if (bpe_endpoint_pool.pr_size == 0) {
+ pool_init(&bpe_endpoint_pool, sizeof(struct ether_addr), 0,
     IPL_NONE, 0, "bpepl", NULL);
  }
 
  sc = malloc(sizeof(*sc), M_DEVBUF, M_WAITOK|M_ZERO);
+
  ifp = &sc->sc_ac.ac_if;
 
  snprintf(ifp->if_xname, sizeof(ifp->if_xname), "%s%d",
     ifc->ifc_name, unit);
 
+ error = etherbridge_init(&sc->sc_eb, ifp->if_xname,
+    &bpe_etherbridge_ops, sc);
+ if (error == -1) {
+ free(sc, M_DEVBUF, sizeof(*sc));
+ return (error);
+ }
+
  sc->sc_key.k_if = 0;
  sc->sc_key.k_isid = 0;
  bpe_set_group(sc, 0);
@@ -176,13 +171,6 @@ bpe_clone_create(struct if_clone *ifc, i
  task_set(&sc->sc_ltask, bpe_link_hook, sc);
  task_set(&sc->sc_dtask, bpe_detach_hook, sc);
 
- rw_init(&sc->sc_bridge_lock, "bpebr");
- RBT_INIT(bpe_map, &sc->sc_bridge_map);
- sc->sc_bridge_num = 0;
- sc->sc_bridge_max = 100; /* XXX */
- sc->sc_bridge_tmo = 240;
- timeout_set_proc(&sc->sc_bridge_age, bpe_bridge_age, sc);
-
  ifp->if_softc = sc;
  ifp->if_hardmtu = ETHER_MAX_HARDMTU_LEN;
  ifp->if_ioctl = bpe_ioctl;
@@ -211,25 +199,9 @@ bpe_clone_destroy(struct ifnet *ifp)
  ether_ifdetach(ifp);
  if_detach(ifp);
 
- free(sc, M_DEVBUF, sizeof(*sc));
-
- return (0);
-}
-
-static inline int
-bpe_entry_valid(struct bpe_softc *sc, const struct bpe_entry *be)
-{
- time_t diff;
-
- if (be == NULL)
- return (0);
-
- if (be->be_type == BPE_ENTRY_STATIC)
- return (1);
+ etherbridge_destroy(&sc->sc_eb);
 
- diff = getuptime() - be->be_age;
- if (diff < sc->sc_bridge_tmo)
- return (1);
+ free(sc, M_DEVBUF, sizeof(*sc));
 
  return (0);
 }
@@ -287,23 +259,21 @@ bpe_start(struct ifnet *ifp)
  beh = mtod(m, struct ether_header *);
 
  if (ETHER_IS_BROADCAST(ceh->ether_dhost)) {
- memcpy(beh->ether_dhost, sc->sc_group,
+ memcpy(beh->ether_dhost, &sc->sc_group,
     sizeof(beh->ether_dhost));
  } else {
- struct bpe_entry *be;
+ struct ether_addr *endpoint;
 
- rw_enter_read(&sc->sc_bridge_lock);
- be = RBT_FIND(bpe_map, &sc->sc_bridge_map,
-    (struct bpe_entry *)ceh->ether_dhost);
- if (bpe_entry_valid(sc, be)) {
- memcpy(beh->ether_dhost, &be->be_b_da,
-    sizeof(beh->ether_dhost));
- } else {
+ smr_read_enter();
+ endpoint = etherbridge_resolve(&sc->sc_eb,
+    (struct ether_addr *)ceh->ether_dhost);
+ if (endpoint == NULL) {
  /* "flood" to unknown hosts */
- memcpy(beh->ether_dhost, sc->sc_group,
-    sizeof(beh->ether_dhost));
+ endpoint = &sc->sc_group;
  }
- rw_exit_read(&sc->sc_bridge_lock);
+ memcpy(beh->ether_dhost, endpoint,
+    sizeof(beh->ether_dhost));
+ smr_read_leave();
  }
 
  memcpy(beh->ether_shost, ((struct arpcom *)ifp0)->ac_enaddr,
@@ -326,121 +296,6 @@ done:
  if_put(ifp0);
 }
 
-static void
-bpe_bridge_age(void *arg)
-{
- struct bpe_softc *sc = arg;
- struct bpe_entry *be, *nbe;
- time_t diff;
-
- timeout_add_sec(&sc->sc_bridge_age, BPE_BRIDGE_AGE_TMO);
-
- rw_enter_write(&sc->sc_bridge_lock);
- RBT_FOREACH_SAFE(be, bpe_map, &sc->sc_bridge_map, nbe) {
- if (be->be_type != BPE_ENTRY_DYNAMIC)
- continue;
-
- diff = getuptime() - be->be_age;
- if (diff < sc->sc_bridge_tmo)
- continue;
-
- sc->sc_bridge_num--;
- RBT_REMOVE(bpe_map, &sc->sc_bridge_map, be);
- if (refcnt_rele(&be->be_refs))
- pool_put(&bpe_entry_pool, be);
- }
- rw_exit_write(&sc->sc_bridge_lock);
-}
-
-static int
-bpe_rtfind(struct bpe_softc *sc, struct ifbaconf *baconf)
-{
- struct ifnet *ifp = &sc->sc_ac.ac_if;
- struct bpe_entry *be;
- struct ifbareq bareq;
- caddr_t uaddr, end;
- int error;
- time_t age;
- struct sockaddr_dl *sdl;
-
- if (baconf->ifbac_len == 0) {
- /* single read is atomic */
- baconf->ifbac_len = sc->sc_bridge_num * sizeof(bareq);
- return (0);
- }
-
- uaddr = baconf->ifbac_buf;
- end = uaddr + baconf->ifbac_len;
-
- rw_enter_read(&sc->sc_bridge_lock);
- RBT_FOREACH(be, bpe_map, &sc->sc_bridge_map) {
- if (uaddr >= end)
- break;
-
- memcpy(bareq.ifba_name, ifp->if_xname,
-    sizeof(bareq.ifba_name));
- memcpy(bareq.ifba_ifsname, ifp->if_xname,
-    sizeof(bareq.ifba_ifsname));
- memcpy(&bareq.ifba_dst, &be->be_c_da,
-    sizeof(bareq.ifba_dst));
-
- memset(&bareq.ifba_dstsa, 0, sizeof(bareq.ifba_dstsa));
-
- bzero(&bareq.ifba_dstsa, sizeof(bareq.ifba_dstsa));
- sdl = (struct sockaddr_dl *)&bareq.ifba_dstsa;
- sdl->sdl_len = sizeof(sdl);
- sdl->sdl_family = AF_LINK;
- sdl->sdl_index = 0;
- sdl->sdl_type = IFT_ETHER;
- sdl->sdl_nlen = 0;
- sdl->sdl_alen = sizeof(be->be_b_da);
- CTASSERT(sizeof(sdl->sdl_data) >= sizeof(be->be_b_da));
- memcpy(sdl->sdl_data, &be->be_b_da, sizeof(be->be_b_da));
-
- switch (be->be_type) {
- case BPE_ENTRY_DYNAMIC:
- age = getuptime() - be->be_age;
- bareq.ifba_age = MIN(age, 0xff);
- bareq.ifba_flags = IFBAF_DYNAMIC;
- break;
- case BPE_ENTRY_STATIC:
- bareq.ifba_age = 0;
- bareq.ifba_flags = IFBAF_STATIC;
- break;
- }
-
- error = copyout(&bareq, uaddr, sizeof(bareq));
- if (error != 0) {
- rw_exit_read(&sc->sc_bridge_lock);
- return (error);
- }
-
- uaddr += sizeof(bareq);
- }
- baconf->ifbac_len = sc->sc_bridge_num * sizeof(bareq);
- rw_exit_read(&sc->sc_bridge_lock);
-
- return (0);
-}
-
-static void
-bpe_flush_map(struct bpe_softc *sc, uint32_t flags)
-{
- struct bpe_entry *be, *nbe;
-
- rw_enter_write(&sc->sc_bridge_lock);
- RBT_FOREACH_SAFE(be, bpe_map, &sc->sc_bridge_map, nbe) {
- if (flags == IFBF_FLUSHDYN &&
-    be->be_type != BPE_ENTRY_DYNAMIC)
- continue;
-
- RBT_REMOVE(bpe_map, &sc->sc_bridge_map, be);
- if (refcnt_rele(&be->be_refs))
- pool_put(&bpe_entry_pool, be);
- }
- rw_exit_write(&sc->sc_bridge_lock);
-}
-
 static int
 bpe_ioctl(struct ifnet *ifp, u_long cmd, caddr_t data)
 {
@@ -510,16 +365,10 @@ bpe_ioctl(struct ifnet *ifp, u_long cmd,
  if (error != 0)
  break;
 
- if (bparam->ifbrp_csize < 1) {
- error = EINVAL;
- break;
- }
-
- /* commit */
- sc->sc_bridge_max = bparam->ifbrp_csize;
+ error = etherbridge_set_max(&sc->sc_eb, bparam);
  break;
  case SIOCBRDGGCACHE:
- bparam->ifbrp_csize = sc->sc_bridge_max;
+ error = etherbridge_get_max(&sc->sc_eb, bparam);
  break;
 
  case SIOCBRDGSTO:
@@ -527,26 +376,22 @@ bpe_ioctl(struct ifnet *ifp, u_long cmd,
  if (error != 0)
  break;
 
- if (bparam->ifbrp_ctime < 8 ||
-    bparam->ifbrp_ctime > 3600) {
- error = EINVAL;
- break;
- }
- sc->sc_bridge_tmo = bparam->ifbrp_ctime;
+ error = etherbridge_set_tmo(&sc->sc_eb, bparam);
  break;
  case SIOCBRDGGTO:
- bparam->ifbrp_ctime = sc->sc_bridge_tmo;
+ error = etherbridge_get_tmo(&sc->sc_eb, bparam);
  break;
 
  case SIOCBRDGRTS:
- error = bpe_rtfind(sc, (struct ifbaconf *)data);
+ error = etherbridge_rtfind(&sc->sc_eb,
+    (struct ifbaconf *)data);
  break;
  case SIOCBRDGFLUSH:
  error = suser(curproc);
  if (error != 0)
  break;
 
- bpe_flush_map(sc,
+ etherbridge_flush(&sc->sc_eb,
     ((struct ifbreq *)data)->ifbr_ifsflags);
  break;
 
@@ -580,16 +425,22 @@ bpe_up(struct bpe_softc *sc)
  struct ifnet *ifp = &sc->sc_ac.ac_if;
  struct ifnet *ifp0;
  struct bpe_softc *osc;
- int error = 0;
+ int error;
  u_int hardmtu;
  u_int hlen = sizeof(struct ether_header) + sizeof(uint32_t);
 
  KASSERT(!ISSET(ifp->if_flags, IFF_RUNNING));
  NET_ASSERT_LOCKED();
 
+ error = etherbridge_up(&sc->sc_eb);
+ if (error != 0)
+ return (error);
+
  ifp0 = if_get(sc->sc_key.k_if);
- if (ifp0 == NULL)
- return (ENXIO);
+ if (ifp0 == NULL) {
+ error = ENXIO;
+ goto down;
+ }
 
  /* check again if bpe will work on top of the parent */
  if (ifp0->if_type != IFT_ETHER) {
@@ -643,8 +494,6 @@ bpe_up(struct bpe_softc *sc)
 
  if_put(ifp0);
 
- timeout_add_sec(&sc->sc_bridge_age, BPE_BRIDGE_AGE_TMO);
-
  return (0);
 
 remove:
@@ -656,6 +505,8 @@ scrub:
  ifp->if_hardmtu = 0xffff;
 put:
  if_put(ifp0);
+down:
+ etherbridge_down(&sc->sc_eb);
 
  return (error);
 }
@@ -685,6 +536,8 @@ bpe_down(struct bpe_softc *sc)
  CLR(ifp->if_flags, IFF_SIMPLEX);
  ifp->if_hardmtu = 0xffff;
 
+ etherbridge_down(&sc->sc_eb);
+
  return (0);
 }
 
@@ -702,7 +555,7 @@ bpe_multi(struct bpe_softc *sc, struct i
  CTASSERT(sizeof(sa->sa_data) >= sizeof(sc->sc_group));
 
  sa->sa_family = AF_UNSPEC;
- memcpy(sa->sa_data, sc->sc_group, sizeof(sc->sc_group));
+ memcpy(sa->sa_data, &sc->sc_group, sizeof(sc->sc_group));
 
  return ((*ifp0->if_ioctl)(ifp0, cmd, (caddr_t)&ifr));
 }
@@ -710,7 +563,7 @@ bpe_multi(struct bpe_softc *sc, struct i
 static void
 bpe_set_group(struct bpe_softc *sc, uint32_t isid)
 {
- uint8_t *group = sc->sc_group;
+ uint8_t *group = sc->sc_group.ether_addr_octet;
 
  group[0] = 0x01;
  group[1] = 0x1e;
@@ -740,7 +593,7 @@ bpe_set_vnetid(struct bpe_softc *sc, con
  /* commit */
  sc->sc_key.k_isid = isid;
  bpe_set_group(sc, isid);
- bpe_flush_map(sc, IFBF_FLUSHALL);
+ etherbridge_flush(&sc->sc_eb, IFBF_FLUSHALL);
 
  return (0);
 }
@@ -771,7 +624,7 @@ bpe_set_parent(struct bpe_softc *sc, con
 
  /* commit */
  sc->sc_key.k_if = ifp0->if_index;
- bpe_flush_map(sc, IFBF_FLUSHALL);
+ etherbridge_flush(&sc->sc_eb, IFBF_FLUSHALL);
 
 put:
  if_put(ifp0);
@@ -804,7 +657,7 @@ bpe_del_parent(struct bpe_softc *sc)
 
  /* commit */
  sc->sc_key.k_if = 0;
- bpe_flush_map(sc, IFBF_FLUSHALL);
+ etherbridge_flush(&sc->sc_eb, IFBF_FLUSHALL);
 
  return (0);
 }
@@ -822,75 +675,6 @@ bpe_find(struct ifnet *ifp0, uint32_t is
  return (sc);
 }
 
-static void
-bpe_input_map(struct bpe_softc *sc, const uint8_t *ba, const uint8_t *ca)
-{
- struct bpe_entry *be;
- int new = 0;
-
- if (ETHER_IS_MULTICAST(ca))
- return;
-
- /* remember where it came from */
- rw_enter_read(&sc->sc_bridge_lock);
- be = RBT_FIND(bpe_map, &sc->sc_bridge_map, (struct bpe_entry *)ca);
- if (be == NULL)
- new = 1;
- else {
- be->be_age = getuptime(); /* only a little bit racy */
-
- if (be->be_type != BPE_ENTRY_DYNAMIC ||
-    ETHER_IS_EQ(ba, &be->be_b_da))
- be = NULL;
- else
- refcnt_take(&be->be_refs);
- }
- rw_exit_read(&sc->sc_bridge_lock);
-
- if (new) {
- struct bpe_entry *obe;
- unsigned int num;
-
- be = pool_get(&bpe_entry_pool, PR_NOWAIT);
- if (be == NULL) {
- /* oh well */
- return;
- }
-
- memcpy(&be->be_c_da, ca, sizeof(be->be_c_da));
- memcpy(&be->be_b_da, ba, sizeof(be->be_b_da));
- be->be_type = BPE_ENTRY_DYNAMIC;
- refcnt_init(&be->be_refs);
- be->be_age = getuptime();
-
- rw_enter_write(&sc->sc_bridge_lock);
- num = sc->sc_bridge_num;
- if (++num > sc->sc_bridge_max)
- obe = be;
- else {
- /* try and give the ref to the map */
- obe = RBT_INSERT(bpe_map, &sc->sc_bridge_map, be);
- if (obe == NULL) {
- /* count the insert */
- sc->sc_bridge_num = num;
- }
- }
- rw_exit_write(&sc->sc_bridge_lock);
-
- if (obe != NULL)
- pool_put(&bpe_entry_pool, obe);
- } else if (be != NULL) {
- rw_enter_write(&sc->sc_bridge_lock);
- memcpy(&be->be_b_da, ba, sizeof(be->be_b_da));
- rw_exit_write(&sc->sc_bridge_lock);
-
- if (refcnt_rele(&be->be_refs)) {
- /* ioctl may have deleted the entry */
- pool_put(&bpe_entry_pool, be);
- }
- }
-}
-
 void
 bpe_input(struct ifnet *ifp0, struct mbuf *m)
 {
@@ -928,7 +712,8 @@ bpe_input(struct ifnet *ifp0, struct mbu
 
  ceh = (struct ether_header *)(itagp + 1);
 
- bpe_input_map(sc, beh->ether_shost, ceh->ether_shost);
+ etherbridge_map(&sc->sc_eb, ceh->ether_shost,
+    (struct ether_addr *)beh->ether_shost);
 
  m_adj(m, sizeof(*beh) + sizeof(*itagp));
 
@@ -1035,12 +820,62 @@ bpe_cmp(const struct bpe_key *a, const s
  return (1);
  if (a->k_isid < b->k_isid)
  return (-1);
-
+
  return (0);
 }
 
-static inline int
-bpe_entry_cmp(const struct bpe_entry *a, const struct bpe_entry *b)
+static int
+bpe_eb_port_eq(void *arg, void *a, void *b)
+{
+ struct ether_addr *ea = a, *eb = b;
+
+ return (memcmp(ea, eb, sizeof(*ea)) == 0);
+}
+
+static void *
+bpe_eb_port_take(void *arg, void *port)
+{
+ struct ether_addr *ea = port;
+ struct ether_addr *endpoint;
+
+ endpoint = pool_get(&bpe_endpoint_pool, PR_NOWAIT);
+ if (endpoint == NULL)
+ return (NULL);
+
+ memcpy(endpoint, ea, sizeof(*endpoint));
+
+ return (endpoint);
+}
+
+static void
+bpe_eb_port_rele(void *arg, void *port)
+{
+ struct ether_addr *endpoint = port;
+
+ pool_put(&bpe_endpoint_pool, endpoint);
+}
+
+static size_t
+bpe_eb_port_ifname(void *arg, char *dst, size_t len, void *port)
 {
- return memcmp(&a->be_c_da, &b->be_c_da, sizeof(a->be_c_da));
+ struct bpe_softc *sc = arg;
+
+ return (strlcpy(dst, sc->sc_ac.ac_if.if_xname, len));
+}
+
+static void
+bpe_eb_port_sa(void *arg, struct sockaddr_storage *ss, void *port)
+{
+ struct ether_addr *endpoint = port;
+ struct sockaddr_dl *sdl;
+
+ sdl = (struct sockaddr_dl *)ss;
+ sdl->sdl_len = sizeof(sdl);
+ sdl->sdl_family = AF_LINK;
+ sdl->sdl_index = 0;
+ sdl->sdl_type = IFT_ETHER;
+ sdl->sdl_nlen = 0;
+ sdl->sdl_alen = sizeof(*endpoint);
+ CTASSERT(sizeof(sdl->sdl_data) >= sizeof(*endpoint));
+ memcpy(sdl->sdl_data, endpoint, sizeof(*endpoint));
 }
Index: net/if_etherbridge.c
===================================================================
RCS file: net/if_etherbridge.c
diff -N net/if_etherbridge.c
--- /dev/null 1 Jan 1970 00:00:00 -0000
+++ net/if_etherbridge.c 10 Feb 2021 12:06:23 -0000
@@ -0,0 +1,584 @@
+/* $OpenBSD$ */
+
+/*
+ * Copyright (c) 2018, 2021 David Gwynne <[hidden email]>
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#include "bpfilter.h"
+
+#include <sys/param.h>
+#include <sys/systm.h>
+#include <sys/kernel.h>
+#include <sys/mbuf.h>
+#include <sys/socket.h>
+#include <sys/ioctl.h>
+#include <sys/timeout.h>
+#include <sys/pool.h>
+#include <sys/tree.h>
+
+#include <net/if.h>
+#include <net/if_var.h>
+#include <net/if_dl.h>
+#include <net/if_media.h>
+#include <net/if_types.h>
+#include <net/rtable.h>
+#include <net/toeplitz.h>
+
+#include <netinet/in.h>
+#include <netinet/if_ether.h>
+
+/* for bridge stuff */
+#include <net/if_bridge.h>
+
+#include <net/if_etherbridge.h>
+
+static inline void ebe_take(struct eb_entry *);
+static inline void ebe_rele(struct eb_entry *);
+static void ebe_free(void *);
+
+static void etherbridge_age(void *);
+
+RBT_PROTOTYPE(eb_tree, eb_entry, ebe_tentry, ebt_cmp);
+
+static struct pool eb_entry_pool;
+
+static inline int
+eb_port_eq(struct etherbridge *eb, void *a, void *b)
+{
+ return ((*eb->eb_ops->eb_op_port_eq)(eb->eb_cookie, a, b));
+}
+
+static inline void *
+eb_port_take(struct etherbridge *eb, void *port)
+{
+ return ((*eb->eb_ops->eb_op_port_take)(eb->eb_cookie, port));
+}
+
+static inline void
+eb_port_rele(struct etherbridge *eb, void *port)
+{
+ return ((*eb->eb_ops->eb_op_port_rele)(eb->eb_cookie, port));
+}
+
+static inline size_t
+eb_port_ifname(struct etherbridge *eb, char *dst, size_t len, void *port)
+{
+ return ((*eb->eb_ops->eb_op_port_ifname)(eb->eb_cookie, dst, len,
+    port));
+}
+
+static inline void
+eb_port_sa(struct etherbridge *eb, struct sockaddr_storage *ss, void *port)
+{
+ (*eb->eb_ops->eb_op_port_sa)(eb->eb_cookie, ss, port);
+}
+
+int
+etherbridge_init(struct etherbridge *eb, const char *name,
+    const struct etherbridge_ops *ops, void *cookie)
+{
+ size_t i;
+
+ if (eb_entry_pool.pr_size == 0) {
+ pool_init(&eb_entry_pool, sizeof(struct eb_entry),
+    0, IPL_SOFTNET, 0, "ebepl", NULL);
+ }
+
+ eb->eb_table = mallocarray(ETHERBRIDGE_TABLE_SIZE,
+    sizeof(*eb->eb_table), M_DEVBUF, M_WAITOK|M_CANFAIL);
+ if (eb->eb_table == NULL)
+ return (ENOMEM);
+
+ eb->eb_name = name;
+ eb->eb_ops = ops;
+ eb->eb_cookie = cookie;
+
+ mtx_init(&eb->eb_lock, IPL_SOFTNET);
+ RBT_INIT(eb_tree, &eb->eb_tree);
+
+ eb->eb_num = 0;
+ eb->eb_max = 100; /* XXX */
+ eb->eb_max_age = 8;
+ timeout_set(&eb->eb_tmo_age, etherbridge_age, eb);
+
+ for (i = 0; i < ETHERBRIDGE_TABLE_SIZE; i++) {
+ struct eb_list *ebl = &eb->eb_table[i];
+ SMR_TAILQ_INIT(ebl);
+ }
+
+ return (0);
+}
+
+int
+etherbridge_up(struct etherbridge *eb)
+{
+ etherbridge_age(eb);
+ return (0);
+}
+
+int
+etherbridge_down(struct etherbridge *eb)
+{
+ smr_barrier();
+
+ return (0);
+}
+
+void
+etherbridge_destroy(struct etherbridge *eb)
+{
+ struct eb_entry *ebe, *nebe;
+
+ /* XXX assume that nothing will calling etherbridge_map now */
+
+ timeout_del_barrier(&eb->eb_tmo_age);
+
+ free(eb->eb_table, M_DEVBUF,
+    ETHERBRIDGE_TABLE_SIZE * sizeof(*eb->eb_table));
+
+ RBT_FOREACH_SAFE(ebe, eb_tree, &eb->eb_tree, nebe) {
+ RBT_REMOVE(eb_tree, &eb->eb_tree, ebe);
+ ebe_free(ebe);
+ }
+}
+
+static struct eb_list *
+etherbridge_list(struct etherbridge *eb, const struct ether_addr *ea)
+{
+ uint16_t hash = stoeplitz_eaddr(ea->ether_addr_octet);
+ hash &= ETHERBRIDGE_TABLE_MASK;
+ return (&eb->eb_table[hash]);
+}
+
+static struct eb_entry *
+ebl_find(struct eb_list *ebl, const struct ether_addr *ea)
+{
+ struct eb_entry *ebe;
+
+ SMR_TAILQ_FOREACH(ebe, ebl, ebe_lentry) {
+ if (ETHER_IS_EQ(ea, &ebe->ebe_addr))
+ return (ebe);
+ }
+
+ return (NULL);
+}
+
+static inline void
+ebl_insert(struct eb_list *ebl, struct eb_entry *ebe)
+{
+ SMR_TAILQ_INSERT_TAIL_LOCKED(ebl, ebe, ebe_lentry);
+}
+
+static inline void
+ebl_remove(struct eb_list *ebl, struct eb_entry *ebe)
+{
+ SMR_TAILQ_REMOVE_LOCKED(ebl, ebe, ebe_lentry);
+}
+
+static inline int
+ebt_cmp(const struct eb_entry *aebe, const struct eb_entry *bebe)
+{
+ return (memcmp(&aebe->ebe_addr, &bebe->ebe_addr,
+    sizeof(aebe->ebe_addr)));
+}
+
+RBT_GENERATE(eb_tree, eb_entry, ebe_tentry, ebt_cmp);
+
+static inline struct eb_entry *
+ebt_insert(struct etherbridge *eb, struct eb_entry *ebe)
+{
+ return (RBT_INSERT(eb_tree, &eb->eb_tree, ebe));
+}
+
+static inline void
+ebt_replace(struct etherbridge *eb, struct eb_entry *oebe,
+    struct eb_entry *nebe)
+{
+ struct eb_entry *rvebe;
+
+ RBT_REMOVE(eb_tree, &eb->eb_tree, oebe);
+ rvebe = RBT_INSERT(eb_tree, &eb->eb_tree, nebe);
+ KASSERTMSG(rvebe == NULL, "ebt_replace eb %p nebe %p rvebe %p",
+    eb, nebe, rvebe);
+}
+
+static inline void
+ebt_remove(struct etherbridge *eb, struct eb_entry *ebe)
+{
+ RBT_REMOVE(eb_tree, &eb->eb_tree, ebe);
+}
+
+static inline void
+ebe_take(struct eb_entry *ebe)
+{
+ refcnt_take(&ebe->ebe_refs);
+}
+
+static void
+ebe_rele(struct eb_entry *ebe)
+{
+ if (refcnt_rele(&ebe->ebe_refs))
+ smr_call(&ebe->ebe_smr_entry, ebe_free, ebe);
+}
+
+static void
+ebe_free(void *arg)
+{
+ struct eb_entry *ebe = arg;
+ struct etherbridge *eb = ebe->ebe_etherbridge;
+
+ eb_port_rele(eb, ebe->ebe_port);
+ pool_put(&eb_entry_pool, ebe);
+}
+
+void *
+etherbridge_resolve(struct etherbridge *eb, const struct ether_addr *ea)
+{
+ struct eb_list *ebl = etherbridge_list(eb, ea);
+ struct eb_entry *ebe;
+
+ SMR_ASSERT_CRITICAL();
+
+ ebe = ebl_find(ebl, ea);
+ if (ebe != NULL) {
+ if (ebe->ebe_type == EBE_DYNAMIC) {
+ int diff = getuptime() - ebe->ebe_age;
+ if (diff > eb->eb_max_age)
+ return (NULL);
+ }
+
+ return (ebe->ebe_port);
+ }
+
+ return (NULL);
+}
+
+void
+etherbridge_map(struct etherbridge *eb, void *port,
+    const struct ether_addr *ea)
+{
+ struct eb_list *ebl;
+ struct eb_entry *oebe, *nebe;
+ unsigned int num;
+ void *nport;
+ int new = 0;
+
+ if (ETHER_IS_MULTICAST(ea->ether_addr_octet) ||
+    ETHER_IS_EQ(ea->ether_addr_octet, etheranyaddr))
+ return;
+
+ ebl = etherbridge_list(eb, ea);
+
+ smr_read_enter();
+ oebe = ebl_find(ebl, ea);
+ if (oebe == NULL)
+ new = 1;
+ else {
+ oebe->ebe_age = getuptime();
+
+ /* does this entry need to be replaced? */
+ if (oebe->ebe_type == EBE_DYNAMIC &&
+    !eb_port_eq(eb, oebe->ebe_port, port)) {
+ new = 1;
+ ebe_take(oebe);
+ } else
+ oebe = NULL;
+ }
+ smr_read_leave();
+
+ if (!new)
+ return;
+
+ nport = eb_port_take(eb, port);
+ if (nport == NULL) {
+ /* XXX should we remove the old one and flood? */
+ return;
+ }
+
+ nebe = pool_get(&eb_entry_pool, PR_NOWAIT);
+ if (nebe == NULL) {
+ /* XXX should we remove the old one and flood? */
+ eb_port_rele(eb, nport);
+ return;
+ }
+
+ smr_init(&nebe->ebe_smr_entry);
+ refcnt_init(&nebe->ebe_refs);
+ nebe->ebe_etherbridge = eb;
+
+ nebe->ebe_addr = *ea;
+ nebe->ebe_port = nport;
+ nebe->ebe_type = EBE_DYNAMIC;
+ nebe->ebe_age = getuptime();
+
+ mtx_enter(&eb->eb_lock);
+ num = eb->eb_num + (oebe == NULL);
+ if (num <= eb->eb_max && ebt_insert(eb, nebe) == oebe) {
+ /* we won, do the update */
+ ebl_insert(ebl, nebe);
+
+ if (oebe != NULL) {
+ ebl_remove(ebl, oebe);
+ ebt_replace(eb, oebe, nebe);
+
+ /* take the table reference away */
+ if (refcnt_rele(&oebe->ebe_refs)) {
+ panic("%s: eb %p oebe %p refcnt",
+    __func__, eb, oebe);
+ }
+ }
+
+ nebe = NULL;
+ eb->eb_num = num;
+ }
+ mtx_leave(&eb->eb_lock);
+
+ if (nebe != NULL) {
+ /*
+ * the new entry didnt make it into the
+ * table, so it can be freed directly.
+ */
+ ebe_free(nebe);
+ }
+
+ if (oebe != NULL) {
+ /*
+ * the old entry could be referenced in
+ * multiple places, including an smr read
+ * section, so release it properly.
+ */
+ ebe_rele(oebe);
+ }
+}
+
+static void
+etherbridge_age(void *arg)
+{
+ struct etherbridge *eb = arg;
+ struct eb_entry *ebe, *nebe;
+ struct eb_queue ebq = TAILQ_HEAD_INITIALIZER(ebq);
+ int diff;
+ unsigned int now = getuptime();
+ size_t i;
+
+ timeout_add_sec(&eb->eb_tmo_age, 100);
+
+ for (i = 0; i < ETHERBRIDGE_TABLE_SIZE; i++) {
+ struct eb_list *ebl = &eb->eb_table[i];
+#if 0
+ if (SMR_TAILQ_EMPTY(ebl));
+ continue;
+#endif
+
+ mtx_enter(&eb->eb_lock); /* don't block map too much */
+ SMR_TAILQ_FOREACH_SAFE_LOCKED(ebe, ebl, ebe_lentry, nebe) {
+ if (ebe->ebe_type != EBE_DYNAMIC)
+ continue;
+
+ diff = now - ebe->ebe_age;
+ if (diff < eb->eb_max_age)
+ continue;
+
+ ebl_remove(ebl, ebe);
+ ebt_remove(eb, ebe);
+ eb->eb_num--;
+
+ /* we own the tables ref now */
+
+ TAILQ_INSERT_TAIL(&ebq, ebe, ebe_qentry);
+ }
+ mtx_leave(&eb->eb_lock);
+ }
+
+ TAILQ_FOREACH_SAFE(ebe, &ebq, ebe_qentry, nebe) {
+ TAILQ_REMOVE(&ebq, ebe, ebe_qentry);
+ ebe_rele(ebe);
+ }
+}
+
+void
+etherbridge_detach_port(struct etherbridge *eb, void *port)
+{
+ struct eb_entry *ebe, *nebe;
+ struct eb_queue ebq = TAILQ_HEAD_INITIALIZER(ebq);
+ size_t i;
+
+ for (i = 0; i < ETHERBRIDGE_TABLE_SIZE; i++) {
+ struct eb_list *ebl = &eb->eb_table[i];
+
+ mtx_enter(&eb->eb_lock); /* don't block map too much */
+ SMR_TAILQ_FOREACH_SAFE_LOCKED(ebe, ebl, ebe_lentry, nebe) {
+ if (!eb_port_eq(eb, ebe->ebe_port, port))
+ continue;
+
+ ebl_remove(ebl, ebe);
+ ebt_remove(eb, ebe);
+ eb->eb_num--;
+
+ /* we own the tables ref now */
+
+ TAILQ_INSERT_TAIL(&ebq, ebe, ebe_qentry);
+ }
+ mtx_leave(&eb->eb_lock);
+ }
+
+ smr_barrier(); /* try and do it once for all the entries */
+
+ TAILQ_FOREACH_SAFE(ebe, &ebq, ebe_qentry, nebe) {
+ TAILQ_REMOVE(&ebq, ebe, ebe_qentry);
+ if (refcnt_rele(&ebe->ebe_refs))
+ ebe_free(ebe);
+ }
+}
+
+void
+etherbridge_flush(struct etherbridge *eb, uint32_t flags)
+{
+ struct eb_entry *ebe, *nebe;
+ struct eb_queue ebq = TAILQ_HEAD_INITIALIZER(ebq);
+ size_t i;
+
+ for (i = 0; i < ETHERBRIDGE_TABLE_SIZE; i++) {
+ struct eb_list *ebl = &eb->eb_table[i];
+
+ mtx_enter(&eb->eb_lock); /* don't block map too much */
+ SMR_TAILQ_FOREACH_SAFE_LOCKED(ebe, ebl, ebe_lentry, nebe) {
+ if (flags == IFBF_FLUSHDYN &&
+    ebe->ebe_type != EBE_DYNAMIC)
+ continue;
+
+ ebl_remove(ebl, ebe);
+ ebt_remove(eb, ebe);
+ eb->eb_num--;
+
+ /* we own the tables ref now */
+
+ TAILQ_INSERT_TAIL(&ebq, ebe, ebe_qentry);
+ }
+ mtx_leave(&eb->eb_lock);
+ }
+
+ smr_barrier(); /* try and do it once for all the entries */
+
+ TAILQ_FOREACH_SAFE(ebe, &ebq, ebe_qentry, nebe) {
+ TAILQ_REMOVE(&ebq, ebe, ebe_qentry);
+ if (refcnt_rele(&ebe->ebe_refs))
+ ebe_free(ebe);
+ }
+}
+
+int
+etherbridge_rtfind(struct etherbridge *eb, struct ifbaconf *baconf)
+{
+ struct eb_entry *ebe;
+ struct ifbareq bareq;
+ caddr_t buf;
+ size_t len, nlen;
+ time_t age, now = getuptime();
+ int error;
+
+ if (baconf->ifbac_len == 0) {
+ /* single read is atomic */
+ baconf->ifbac_len = eb->eb_num * sizeof(bareq);
+ return (0);
+ }
+
+ buf = malloc(baconf->ifbac_len, M_TEMP, M_WAITOK|M_CANFAIL);
+ if (buf == NULL)
+ return (ENOMEM);
+ len = 0;
+
+ mtx_enter(&eb->eb_lock);
+ RBT_FOREACH(ebe, eb_tree, &eb->eb_tree) {
+ nlen = len + sizeof(bareq);
+ if (nlen > baconf->ifbac_len) {
+ break;
+}
+
+ strlcpy(bareq.ifba_name, eb->eb_name,
+    sizeof(bareq.ifba_name));
+ eb_port_ifname(eb,
+    bareq.ifba_ifsname, sizeof(bareq.ifba_ifsname),
+    ebe->ebe_port);
+ memcpy(&bareq.ifba_dst, &ebe->ebe_addr,
+    sizeof(bareq.ifba_dst));
+
+ memset(&bareq.ifba_dstsa, 0, sizeof(bareq.ifba_dstsa));
+ eb_port_sa(eb, &bareq.ifba_dstsa, ebe->ebe_port);
+
+ switch (ebe->ebe_type) {
+ case EBE_DYNAMIC:
+ age = now - ebe->ebe_age;
+ bareq.ifba_age = MIN(age, 0xff);
+ bareq.ifba_flags = IFBAF_DYNAMIC;
+ break;
+ case EBE_STATIC:
+ bareq.ifba_age = 0;
+ bareq.ifba_flags = IFBAF_STATIC;
+ break;
+ }
+
+ memcpy(buf + len, &bareq, sizeof(bareq));
+                len = nlen;
+        }
+ nlen = baconf->ifbac_len;
+ baconf->ifbac_len = eb->eb_num * sizeof(bareq);
+ mtx_leave(&eb->eb_lock);
+
+ error = copyout(buf, baconf->ifbac_buf, len);
+ free(buf, M_TEMP, nlen);
+
+        return (error);
+}
+
+int
+etherbridge_set_max(struct etherbridge *eb, struct ifbrparam *bparam)
+{
+ if (bparam->ifbrp_csize < 1)
+ return (EINVAL);
+
+ /* commit */
+ eb->eb_max = bparam->ifbrp_csize;
+
+ return (0);
+}
+
+int
+etherbridge_get_max(struct etherbridge *eb, struct ifbrparam *bparam)
+{
+ bparam->ifbrp_csize = eb->eb_max;
+
+ return (0);
+}
+
+int
+etherbridge_set_tmo(struct etherbridge *eb, struct ifbrparam *bparam)
+{
+ if (bparam->ifbrp_ctime < 8 || bparam->ifbrp_ctime > 3600)
+ return (EINVAL);
+
+ /* commit */
+ eb->eb_max_age = bparam->ifbrp_ctime;
+
+ return (0);
+}
+
+int
+etherbridge_get_tmo(struct etherbridge *eb, struct ifbrparam *bparam)
+{
+ bparam->ifbrp_ctime = eb->eb_max_age;
+
+ return (0);
+}
Index: net/if_etherbridge.h
===================================================================
RCS file: net/if_etherbridge.h
diff -N net/if_etherbridge.h
--- /dev/null 1 Jan 1970 00:00:00 -0000
+++ net/if_etherbridge.h 10 Feb 2021 12:06:23 -0000
@@ -0,0 +1,103 @@
+/* $OpenBSD$ */
+
+/*
+ * Copyright (c) 2018, 2021 David Gwynne <[hidden email]>
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#ifndef _NET_ETHERBRIDGE_H_
+#define _NET_ETHERBRIDGE_H_
+
+#define ETHERBRIDGE_TABLE_BITS 8
+#define ETHERBRIDGE_TABLE_SIZE (1U << ETHERBRIDGE_TABLE_BITS)
+#define ETHERBRIDGE_TABLE_MASK (ETHERBRIDGE_TABLE_SIZE - 1)
+
+struct etherbridge_ops {
+ int (*eb_op_port_eq)(void *, void *, void *);
+ void *(*eb_op_port_take)(void *, void *);
+ void (*eb_op_port_rele)(void *, void *);
+ size_t (*eb_op_port_ifname)(void *, char *, size_t, void *);
+ void (*eb_op_port_sa)(void *, struct sockaddr_storage *, void *);
+};
+
+struct etherbridge;
+
+struct eb_entry {
+ SMR_TAILQ_ENTRY(eb_entry) ebe_lentry;
+ union {
+ RBT_ENTRY(eb_entry) _ebe_tentry;
+ TAILQ_ENTRY(eb_entry) _ebe_qentry;
+ } _ebe_entries;
+#define ebe_tentry _ebe_entries._ebe_tentry
+#define ebe_qentry _ebe_entries._ebe_qentry
+
+ struct ether_addr ebe_addr;
+ void *ebe_port;
+ unsigned int ebe_type;
+#define EBE_DYNAMIC 0x0
+#define EBE_STATIC 0x1
+#define EBE_DEAD 0xdead
+ time_t ebe_age;
+
+ struct etherbridge *ebe_etherbridge;
+ struct refcnt ebe_refs;
+ struct smr_entry ebe_smr_entry;
+};
+
+SMR_TAILQ_HEAD(eb_list, eb_entry);
+RBT_HEAD(eb_tree, eb_entry);
+TAILQ_HEAD(eb_queue, eb_entry);
+
+struct etherbridge {
+ const char *eb_name;
+ const struct etherbridge_ops *eb_ops;
+ void *eb_cookie;
+
+ struct mutex eb_lock;
+ unsigned int eb_num;
+ unsigned int eb_max;
+ int eb_max_age; /* seconds */
+ struct timeout eb_tmo_age;
+
+ struct eb_list *eb_table;
+ struct eb_tree eb_tree;
+
+};
+
+int etherbridge_init(struct etherbridge *, const char *,
+     const struct etherbridge_ops *, void *);
+int etherbridge_up(struct etherbridge *);
+int etherbridge_down(struct etherbridge *);
+void etherbridge_destroy(struct etherbridge *);
+
+void etherbridge_map(struct etherbridge *, void *,
+    const struct ether_addr *);
+void *etherbridge_resolve(struct etherbridge *, const struct ether_addr *);
+void etherbridge_detach_port(struct etherbridge *, void *);
+
+/* ioctl support */
+int etherbridge_set_max(struct etherbridge *, struct ifbrparam *);
+int etherbridge_get_max(struct etherbridge *, struct ifbrparam *);
+int etherbridge_set_tmo(struct etherbridge *, struct ifbrparam *);
+int etherbridge_get_tmo(struct etherbridge *, struct ifbrparam *);
+int etherbridge_rtfind(struct etherbridge *, struct ifbaconf *);
+void etherbridge_flush(struct etherbridge *, uint32_t);
+
+static inline unsigned int
+etherbridge_num(const struct etherbridge *eb)
+{
+ return (eb->eb_num);
+}
+
+#endif /* _NET_ETHERBRIDGE_H_ */
Index: net/if_gre.c
===================================================================
RCS file: /cvs/src/sys/net/if_gre.c,v
retrieving revision 1.164
diff -u -p -r1.164 if_gre.c
--- net/if_gre.c 19 Jan 2021 07:31:47 -0000 1.164
+++ net/if_gre.c 10 Feb 2021 12:06:23 -0000
@@ -99,6 +99,7 @@
 /* for nvgre bridge shizz */
 #include <sys/socket.h>
 #include <net/if_bridge.h>
+#include <net/if_etherbridge.h>
 
 /*
  * packet formats
@@ -395,27 +396,6 @@ struct egre_tree egre_tree = RBT_INITIAL
  * Network Virtualisation Using Generic Routing Encapsulation (NVGRE)
  */
 
-#define NVGRE_AGE_TMO 100 /* seconds */
-
-struct nvgre_entry {
- RB_ENTRY(nvgre_entry) nv_entry;
- struct ether_addr nv_dst;
- uint8_t nv_type;
-#define NVGRE_ENTRY_DYNAMIC 0
-#define NVGRE_ENTRY_STATIC 1
- union gre_addr nv_gateway;
- struct refcnt nv_refs;
- int nv_age;
-};
-
-RBT_HEAD(nvgre_map, nvgre_entry);
-
-static inline int
- nvgre_entry_cmp(const struct nvgre_entry *,
-    const struct nvgre_entry *);
-
-RBT_PROTOTYPE(nvgre_map, nvgre_entry, nv_entry, nvgre_entry_cmp);
-
 struct nvgre_softc {
  struct gre_tunnel sc_tunnel; /* must be first */
  unsigned int sc_ifp0;
@@ -432,12 +412,7 @@ struct nvgre_softc {
  struct task sc_ltask;
  struct task sc_dtask;
 
- struct rwlock sc_ether_lock;
- struct nvgre_map sc_ether_map;
- unsigned int sc_ether_num;
- unsigned int sc_ether_max;
- int sc_ether_tmo;
- struct timeout sc_ether_age;
+ struct etherbridge sc_eb;
 };
 
 RBT_HEAD(nvgre_ucast_tree, nvgre_softc);
@@ -474,16 +449,24 @@ static int nvgre_input(const struct gre_
     uint8_t);
 static void nvgre_send(void *);
 
-static int nvgre_rtfind(struct nvgre_softc *, struct ifbaconf *);
-static void nvgre_flush_map(struct nvgre_softc *);
-static void nvgre_input_map(struct nvgre_softc *,
-    const struct gre_tunnel *, const struct ether_header *);
-static void nvgre_age(void *);
+static int nvgre_eb_port_eq(void *, void *, void *);
+static void *nvgre_eb_port_take(void *, void *);
+static void nvgre_eb_port_rele(void *, void *);
+static size_t nvgre_eb_port_ifname(void *, char *, size_t, void *);
+static void nvgre_eb_port_sa(void *, struct sockaddr_storage *, void *);
+
+static const struct etherbridge_ops nvgre_etherbridge_ops = {
+ nvgre_eb_port_eq,
+ nvgre_eb_port_take,
+ nvgre_eb_port_rele,
+ nvgre_eb_port_ifname,
+ nvgre_eb_port_sa,
+};
 
 struct if_clone nvgre_cloner =
     IF_CLONE_INITIALIZER("nvgre", nvgre_clone_create, nvgre_clone_destroy);
 
-struct pool nvgre_pool;
+struct pool nvgre_endpoint_pool;
 
 /* protected by NET_LOCK */
 struct nvgre_ucast_tree nvgre_ucast_tree = RBT_INITIALIZER();
@@ -759,10 +742,11 @@ nvgre_clone_create(struct if_clone *ifc,
  struct nvgre_softc *sc;
  struct ifnet *ifp;
  struct gre_tunnel *tunnel;
+ int error;
 
- if (nvgre_pool.pr_size == 0) {
- pool_init(&nvgre_pool, sizeof(struct nvgre_entry), 0,
-    IPL_SOFTNET, 0, "nvgren", NULL);
+ if (nvgre_endpoint_pool.pr_size == 0) {
+ pool_init(&nvgre_endpoint_pool, sizeof(union gre_addr),
+    0, IPL_SOFTNET, 0, "nvgreep", NULL);
  }
 
  sc = malloc(sizeof(*sc), M_DEVBUF, M_WAITOK|M_ZERO);
@@ -771,6 +755,13 @@ nvgre_clone_create(struct if_clone *ifc,
  snprintf(ifp->if_xname, sizeof(ifp->if_xname), "%s%d",
     ifc->ifc_name, unit);
 
+ error = etherbridge_init(&sc->sc_eb, ifp->if_xname,
+    &nvgre_etherbridge_ops, sc);
+ if (error != 0) {
+ free(sc, M_DEVBUF, sizeof(*sc));
+ return (error);
+ }
+
  ifp->if_softc = sc;
  ifp->if_hardmtu = ETHER_MAX_HARDMTU_LEN;
  ifp->if_ioctl = nvgre_ioctl;
@@ -793,13 +784,6 @@ nvgre_clone_create(struct if_clone *ifc,
  task_set(&sc->sc_ltask, nvgre_link_change, sc);
  task_set(&sc->sc_dtask, nvgre_detach, sc);
 
- rw_init(&sc->sc_ether_lock, "nvgrelk");
- RBT_INIT(nvgre_map, &sc->sc_ether_map);
- sc->sc_ether_num = 0;
- sc->sc_ether_max = 100;
- sc->sc_ether_tmo = 240 * hz;
- timeout_set_proc(&sc->sc_ether_age, nvgre_age, sc); /* ugh */
-
  ifmedia_init(&sc->sc_media, 0, egre_media_change, egre_media_status);
  ifmedia_add(&sc->sc_media, IFM_ETHER | IFM_AUTO, 0, NULL);
  ifmedia_set(&sc->sc_media, IFM_ETHER | IFM_AUTO);
@@ -821,6 +805,8 @@ nvgre_clone_destroy(struct ifnet *ifp)
  nvgre_down(sc);
  NET_UNLOCK();
 
+ etherbridge_destroy(&sc->sc_eb);
+
  ifmedia_delete_instance(&sc->sc_media, IFM_INST_ANY);
  ether_ifdetach(ifp);
  if_detach(ifp);
@@ -1344,183 +1330,6 @@ egre_input(const struct gre_tunnel *key,
  return (0);
 }
 
-static int
-nvgre_rtfind(struct nvgre_softc *sc, struct ifbaconf *baconf)
-{
- struct ifnet *ifp = &sc->sc_ac.ac_if;
- struct nvgre_entry *nv;
- struct ifbareq bareq;
- caddr_t uaddr, end;
- int error;
- int age;
-
- if (baconf->ifbac_len == 0) {
- /* single read is atomic */
- baconf->ifbac_len = sc->sc_ether_num * sizeof(bareq);
- return (0);
- }
-
- uaddr = baconf->ifbac_buf;
- end = uaddr + baconf->ifbac_len;
-
- rw_enter_read(&sc->sc_ether_lock);
- RBT_FOREACH(nv, nvgre_map, &sc->sc_ether_map) {
- if (uaddr >= end)
- break;
-
- memcpy(bareq.ifba_name, ifp->if_xname,
-    sizeof(bareq.ifba_name));
- memcpy(bareq.ifba_ifsname, ifp->if_xname,
-    sizeof(bareq.ifba_ifsname));
- memcpy(&bareq.ifba_dst, &nv->nv_dst,
-    sizeof(bareq.ifba_dst));
-
- memset(&bareq.ifba_dstsa, 0, sizeof(bareq.ifba_dstsa));
- switch (sc->sc_tunnel.t_af) {
- case AF_INET: {
- struct sockaddr_in *sin;
-
- sin = (struct sockaddr_in *)&bareq.ifba_dstsa;
- sin->sin_len = sizeof(*sin);
- sin->sin_family = AF_INET;
- sin->sin_addr = nv->nv_gateway.in4;
-
- break;
- }
-#ifdef INET6
- case AF_INET6: {
- struct sockaddr_in6 *sin6;
-
- sin6 = (struct sockaddr_in6 *)&bareq.ifba_dstsa;
- sin6->sin6_len = sizeof(*sin6);
- sin6->sin6_family = AF_INET6;
- sin6->sin6_addr = nv->nv_gateway.in6;
-
- break;
- }
-#endif /* INET6 */
- default:
- unhandled_af(sc->sc_tunnel.t_af);
- }
-
- switch (nv->nv_type) {
- case NVGRE_ENTRY_DYNAMIC:
- age = (ticks - nv->nv_age) / hz;
- bareq.ifba_age = MIN(age, 0xff);
- bareq.ifba_flags = IFBAF_DYNAMIC;
- break;
- case NVGRE_ENTRY_STATIC:
- bareq.ifba_age = 0;
- bareq.ifba_flags = IFBAF_STATIC;
- break;
- }
-
- error = copyout(&bareq, uaddr, sizeof(bareq));
- if (error != 0) {
- rw_exit_read(&sc->sc_ether_lock);
- return (error);
- }
-
- uaddr += sizeof(bareq);
- }
- baconf->ifbac_len = sc->sc_ether_num * sizeof(bareq);
- rw_exit_read(&sc->sc_ether_lock);
-
- return (0);
-}
-
-static void
-nvgre_flush_map(struct nvgre_softc *sc)
-{
- struct nvgre_map map;
- struct nvgre_entry *nv, *nnv;
-
- rw_enter_write(&sc->sc_ether_lock);
- map = sc->sc_ether_map;
- RBT_INIT(nvgre_map, &sc->sc_ether_map);
- sc->sc_ether_num = 0;
- rw_exit_write(&sc->sc_ether_lock);
-
- RBT_FOREACH_SAFE(nv, nvgre_map, &map, nnv) {
- RBT_REMOVE(nvgre_map, &map, nv);
- if (refcnt_rele(&nv->nv_refs))
- pool_put(&nvgre_pool, nv);
- }
-}
-
-static void
-nvgre_input_map(struct nvgre_softc *sc, const struct gre_tunnel *key,
-    const struct ether_header *eh)
-{
- struct nvgre_entry *nv, nkey;
- int new = 0;
-
- if (ETHER_IS_BROADCAST(eh->ether_shost) ||
-    ETHER_IS_MULTICAST(eh->ether_shost))
- return;
-
- memcpy(&nkey.nv_dst, eh->ether_shost, ETHER_ADDR_LEN);
-
- /* remember where it came from */
- rw_enter_read(&sc->sc_ether_lock);
- nv = RBT_FIND(nvgre_map, &sc->sc_ether_map, &nkey);
- if (nv == NULL)
- new = 1;
- else {
- nv->nv_age = ticks;
-
- if (nv->nv_type != NVGRE_ENTRY_DYNAMIC ||
-    gre_ip_cmp(key->t_af, &key->t_dst, &nv->nv_gateway) == 0)
- nv = NULL;
- else
- refcnt_take(&nv->nv_refs);
- }
- rw_exit_read(&sc->sc_ether_lock);
-
- if (new) {
- struct nvgre_entry *onv;
- unsigned int num;
-
- nv = pool_get(&nvgre_pool, PR_NOWAIT);
- if (nv == NULL) {
- /* oh well */
- return;
- }
-
- memcpy(&nv->nv_dst, eh->ether_shost, ETHER_ADDR_LEN);
- nv->nv_type = NVGRE_ENTRY_DYNAMIC;
- nv->nv_gateway = key->t_dst;
- refcnt_init(&nv->nv_refs);
- nv->nv_age = ticks;
-
- rw_enter_write(&sc->sc_ether_lock);
- num = sc->sc_ether_num;
- if (++num > sc->sc_ether_max)
- onv = nv;
- else {
- /* try to give the ref to the map */
- onv = RBT_INSERT(nvgre_map, &sc->sc_ether_map, nv);
- if (onv == NULL) {
- /* count the successful insert */
- sc->sc_ether_num = num;
- }
- }
- rw_exit_write(&sc->sc_ether_lock);
-
- if (onv != NULL)
- pool_put(&nvgre_pool, nv);
- } else if (nv != NULL) {
- rw_enter_write(&sc->sc_ether_lock);
- nv->nv_gateway = key->t_dst;
- rw_exit_write(&sc->sc_ether_lock);
-
- if (refcnt_rele(&nv->nv_refs)) {
- /* ioctl may have deleted the entry */
- pool_put(&nvgre_pool, nv);
- }
- }
-}
-
 static inline struct nvgre_softc *
 nvgre_mcast_find(const struct gre_tunnel *key, unsigned int if0idx)
 {
@@ -1562,6 +1371,7 @@ nvgre_input(const struct gre_tunnel *key
     uint8_t otos)
 {
  struct nvgre_softc *sc;
+ struct ether_header *eh;
 
  if (ISSET(m->m_flags, M_MCAST|M_BCAST))
  sc = nvgre_mcast_find(key, m->m_pkthdr.ph_ifidx);
@@ -1576,7 +1386,9 @@ nvgre_input(const struct gre_tunnel *key
  if (m == NULL)
  return (0);
 
- nvgre_input_map(sc, key, mtod(m, struct ether_header *));
+ eh = mtod(m, struct ether_header *);
+ etherbridge_map(&sc->sc_eb, (void *)&key->t_dst,
+    (struct ether_addr *)eh->ether_shost);
 
  SET(m->m_pkthdr.csum_flags, M_FLOWID);
  m->m_pkthdr.ph_flowid = bemtoh32(&key->t_key) & ~GRE_KEY_ENTROPY;
@@ -2768,7 +2580,7 @@ nvgre_ioctl(struct ifnet *ifp, u_long cm
  }
  error = gre_set_tunnel(tunnel, (struct if_laddrreq *)data, 0);
  if (error == 0)
- nvgre_flush_map(sc);
+ etherbridge_flush(&sc->sc_eb, IFBF_FLUSHALL);
  break;
  case SIOCGLIFPHYADDR:
  error = gre_get_tunnel(tunnel, (struct if_laddrreq *)data);
@@ -2780,7 +2592,7 @@ nvgre_ioctl(struct ifnet *ifp, u_long cm
  }
  error = gre_del_tunnel(tunnel);
  if (error == 0)
- nvgre_flush_map(sc);
+ etherbridge_flush(&sc->sc_eb, IFBF_FLUSHALL);
  break;
 
  case SIOCSIFPARENT:
@@ -2790,7 +2602,7 @@ nvgre_ioctl(struct ifnet *ifp, u_long cm
  }
  error = nvgre_set_parent(sc, parent->ifp_parent);
  if (error == 0)
- nvgre_flush_map(sc);
+ etherbridge_flush(&sc->sc_eb, IFBF_FLUSHALL);
  break;
  case SIOCGIFPARENT:
  ifp0 = if_get(sc->sc_ifp0);
@@ -2809,7 +2621,7 @@ nvgre_ioctl(struct ifnet *ifp, u_long cm
  }
  /* commit */
  sc->sc_ifp0 = 0;
- nvgre_flush_map(sc);
+ etherbridge_flush(&sc->sc_eb, IFBF_FLUSHALL);
  break;
 
  case SIOCSVNETID:
@@ -2825,7 +2637,7 @@ nvgre_ioctl(struct ifnet *ifp, u_long cm
 
  /* commit */
  tunnel->t_key = htonl(ifr->ifr_vnetid << GRE_KEY_ENTROPY_SHIFT);
- nvgre_flush_map(sc);
+ etherbridge_flush(&sc->sc_eb, IFBF_FLUSHALL);
  break;
  case SIOCGVNETID:
  error = gre_get_vnetid(tunnel, ifr);
@@ -2839,7 +2651,7 @@ nvgre_ioctl(struct ifnet *ifp, u_long cm
  break;
  }
  tunnel->t_rtableid = ifr->ifr_rdomainid;
- nvgre_flush_map(sc);
+ etherbridge_flush(&sc->sc_eb, IFBF_FLUSHALL);
  break;
  case SIOCGLIFPHYRTABLE:
  ifr->ifr_rdomainid = tunnel->t_rtableid;
@@ -2890,35 +2702,26 @@ nvgre_ioctl(struct ifnet *ifp, u_long cm
  break;
 
  case SIOCBRDGSCACHE:
- if (bparam->ifbrp_csize < 1) {
- error = EINVAL;
- break;
- }
-
- /* commit */
- sc->sc_ether_max = bparam->ifbrp_csize;
+ error = etherbridge_set_max(&sc->sc_eb, bparam);
  break;
  case SIOCBRDGGCACHE:
- bparam->ifbrp_csize = sc->sc_ether_max;
+ error = etherbridge_get_max(&sc->sc_eb, bparam);
  break;
 
  case SIOCBRDGSTO:
- if (bparam->ifbrp_ctime < 0 ||
-    bparam->ifbrp_ctime > INT_MAX / hz) {
- error = EINVAL;
- break;
- }
- sc->sc_ether_tmo = bparam->ifbrp_ctime * hz;
+ error = etherbridge_set_tmo(&sc->sc_eb, bparam);
  break;
  case SIOCBRDGGTO:
- bparam->ifbrp_ctime = sc->sc_ether_tmo / hz;
+ error = etherbridge_get_tmo(&sc->sc_eb, bparam);
  break;
 
  case SIOCBRDGRTS:
- error = nvgre_rtfind(sc, (struct ifbaconf *)data);
+ error = etherbridge_rtfind(&sc->sc_eb,
+    (struct ifbaconf *)data);
  break;
  case SIOCBRDGFLUSH:
- nvgre_flush_map(sc);
+ etherbridge_flush(&sc->sc_eb,
+    ((struct ifbreq *)data)->ifbr_ifsflags);
  break;
 
  case SIOCADDMULTI:
@@ -3667,8 +3470,6 @@ nvgre_up(struct nvgre_softc *sc)
  sc->sc_inm = inm;
  SET(sc->sc_ac.ac_if.if_flags, IFF_RUNNING);
 
- timeout_add_sec(&sc->sc_ether_age, NVGRE_AGE_TMO);
-
  return (0);
 
 remove_ucast:
@@ -3693,7 +3494,6 @@ nvgre_down(struct nvgre_softc *sc)
  CLR(ifp->if_flags, IFF_RUNNING);
 
  NET_UNLOCK();
- timeout_del_barrier(&sc->sc_ether_age);
  ifq_barrier(&ifp->if_snd);
  if (!task_del(softnet, &sc->sc_send_task))
  taskq_barrier(softnet);
@@ -3770,60 +3570,11 @@ nvgre_set_parent(struct nvgre_softc *sc,
 }
 
 static void
-nvgre_age(void *arg)
-{
- struct nvgre_softc *sc = arg;
- struct nvgre_entry *nv, *nnv;
- int tmo = sc->sc_ether_tmo * 2;
- int diff;
-
- if (!ISSET(sc->sc_ac.ac_if.if_flags, IFF_RUNNING))
- return;
-
- rw_enter_write(&sc->sc_ether_lock); /* XXX */
- RBT_FOREACH_SAFE(nv, nvgre_map, &sc->sc_ether_map, nnv) {
- if (nv->nv_type != NVGRE_ENTRY_DYNAMIC)
- continue;
-
- diff = ticks - nv->nv_age;
- if (diff < tmo)
- continue;
-
- sc->sc_ether_num--;
- RBT_REMOVE(nvgre_map, &sc->sc_ether_map, nv);
- if (refcnt_rele(&nv->nv_refs))
- pool_put(&nvgre_pool, nv);
- }
- rw_exit_write(&sc->sc_ether_lock);
-
- timeout_add_sec(&sc->sc_ether_age, NVGRE_AGE_TMO);
-}
-
-static inline int
-nvgre_entry_valid(struct nvgre_softc *sc, const struct nvgre_entry *nv)
-{
- int diff;
-
- if (nv == NULL)
- return (0);
-
- if (nv->nv_type == NVGRE_ENTRY_STATIC)
- return (1);
-
- diff = ticks - nv->nv_age;
- if (diff < sc->sc_ether_tmo)
- return (1);
-
- return (0);
-}
-
-static void
 nvgre_start(struct ifnet *ifp)
 {
  struct nvgre_softc *sc = ifp->if_softc;
  const struct gre_tunnel *tunnel = &sc->sc_tunnel;
  union gre_addr gateway;
- struct nvgre_entry *nv, key;
  struct mbuf_list ml = MBUF_LIST_INITIALIZER();
  struct ether_header *eh;
  struct mbuf *m, *m0;
@@ -3847,18 +3598,17 @@ nvgre_start(struct ifnet *ifp)
  if (ETHER_IS_BROADCAST(eh->ether_dhost))
  gateway = tunnel->t_dst;
  else {
- memcpy(&key.nv_dst, eh->ether_dhost,
-    sizeof(key.nv_dst));
+ const union gre_addr *endpoint;
 
- rw_enter_read(&sc->sc_ether_lock);
- nv = RBT_FIND(nvgre_map, &sc->sc_ether_map, &key);
- if (nvgre_entry_valid(sc, nv))
- gateway = nv->nv_gateway;
- else {
+ smr_read_enter();
+ endpoint = etherbridge_resolve(&sc->sc_eb,
+    (struct ether_addr *)eh->ether_dhost);
+ if (endpoint == NULL) {
  /* "flood" to unknown hosts */
- gateway = tunnel->t_dst;
+ endpoint = &tunnel->t_dst;
  }
- rw_exit_read(&sc->sc_ether_lock);
+ gateway = *endpoint;
+ smr_read_leave();
  }
 
  /* force prepend mbuf because of alignment problems */
@@ -4346,14 +4096,6 @@ egre_cmp(const struct egre_softc *a, con
 
 RBT_GENERATE(egre_tree, egre_softc, sc_entry, egre_cmp);
 
-static inline int
-nvgre_entry_cmp(const struct nvgre_entry *a, const struct nvgre_entry *b)
-{
- return (memcmp(&a->nv_dst, &b->nv_dst, sizeof(a->nv_dst)));
-}
-
-RBT_GENERATE(nvgre_map, nvgre_entry, nv_entry, nvgre_entry_cmp);
-
 static int
 nvgre_cmp_tunnel(const struct gre_tunnel *a, const struct gre_tunnel *b)
 {
@@ -4473,3 +4215,73 @@ eoip_cmp(const struct eoip_softc *ea, co
 }
 
 RBT_GENERATE(eoip_tree, eoip_softc, sc_entry, eoip_cmp);
+
+static int
+nvgre_eb_port_eq(void *arg, void *a, void *b)
+{
+ struct nvgre_softc *sc = arg;
+
+ return (gre_ip_cmp(sc->sc_tunnel.t_af, a, b) == 0);
+}
+
+static void *
+nvgre_eb_port_take(void *arg, void *port)
+{
+ union gre_addr *ea = port;
+ union gre_addr *endpoint;
+
+ endpoint = pool_get(&nvgre_endpoint_pool, PR_NOWAIT);
+ if (endpoint == NULL)
+ return (NULL);
+
+ *endpoint = *ea;
+
+ return (endpoint);
+}
+
+static void
+nvgre_eb_port_rele(void *arg, void *port)
+{
+ union gre_addr *endpoint = port;
+
+ pool_put(&nvgre_endpoint_pool, endpoint);
+}
+
+static size_t
+nvgre_eb_port_ifname(void *arg, char *dst, size_t len, void *port)
+{
+ struct nvgre_softc *sc = arg;
+
+ return (strlcpy(dst, sc->sc_ac.ac_if.if_xname, len));
+}
+
+static void
+nvgre_eb_port_sa(void *arg, struct sockaddr_storage *ss, void *port)
+{
+ struct nvgre_softc *sc = arg;
+ union gre_addr *endpoint = port;
+
+ switch (sc->sc_tunnel.t_af) {
+ case AF_INET: {
+ struct sockaddr_in *sin = (struct sockaddr_in *)ss;
+
+ sin->sin_len = sizeof(*sin);
+ sin->sin_family = AF_INET;
+ sin->sin_addr = endpoint->in4;
+ break;
+ }
+#ifdef INET6
+ case AF_INET6: {
+ struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)ss;
+
+ sin6->sin6_len = sizeof(*sin6);
+ sin6->sin6_family = AF_INET6;
+ sin6->sin6_addr = endpoint->in6;
+
+ break;
+ }
+#endif /* INET6 */
+ default:
+ unhandled_af(sc->sc_tunnel.t_af);
+ }
+}
Index: net/if_veb.c
===================================================================
RCS file: net/if_veb.c
diff -N net/if_veb.c
--- /dev/null 1 Jan 1970 00:00:00 -0000
+++ net/if_veb.c 10 Feb 2021 12:06:23 -0000
@@ -0,0 +1,1747 @@
+/* $OpenBSD$ */
+
+/*
+ * Copyright (c) 2021 David Gwynne <[hidden email]>
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#include "bpfilter.h"
+#include "pf.h"
+#include "vlan.h"
+
+#include <sys/param.h>
+#include <sys/kernel.h>
+#include <sys/malloc.h>
+#include <sys/mbuf.h>
+#include <sys/queue.h>
+#include <sys/socket.h>
+#include <sys/sockio.h>
+#include <sys/systm.h>
+#include <sys/syslog.h>
+#include <sys/rwlock.h>
+#include <sys/percpu.h>
+#include <sys/smr.h>
+#include <sys/task.h>
+#include <sys/pool.h>
+
+#include <net/if.h>
+#include <net/if_dl.h>
+#include <net/if_types.h>
+
+#include <netinet/in.h>
+#include <netinet/if_ether.h>
+
+#include <net/if_bridge.h>
+#include <net/if_etherbridge.h>
+
+#if NBPFILTER > 0
+#include <net/bpf.h>
+#endif
+
+#if NPF > 0
+#include <net/pfvar.h>
+#endif
+
+#if NVLAN > 0
+#include <net/if_vlan_var.h>
+#endif
+
+struct veb_rule {
+ TAILQ_ENTRY(veb_rule) vr_entry;
+ SMR_TAILQ_ENTRY(veb_rule) vr_lentry[2];
+
+ uint16_t vr_flags;
+#define VEB_R_F_IN (1U << 0)
+#define VEB_R_F_OUT (1U << 1)
+#define VEB_R_F_SRC (1U << 2)
+#define VEB_R_F_DST (1U << 3)
+
+#define VEB_R_F_ARP (1U << 4)
+#define VEB_R_F_RARP (1U << 5)
+#define VEB_R_F_SHA (1U << 6)
+#define VEB_R_F_SPA (1U << 7)
+#define VEB_R_F_THA (1U << 8)
+#define VEB_R_F_TPA (1U << 9)
+ uint16_t vr_arp_op;
+
+ struct ether_addr vr_src;
+ struct ether_addr vr_dst;
+ struct ether_addr vr_arp_sha;
+ struct ether_addr vr_arp_tha;
+ struct in_addr vr_arp_spa;
+ struct in_addr vr_arp_tpa;
+
+ unsigned int vr_action;
+#define VEB_R_MATCH 0
+#define VEB_R_PASS 1
+#define VEB_R_BLOCK 2
+
+ int vr_pftag;
+};
+
+TAILQ_HEAD(veb_rules, veb_rule);
+SMR_TAILQ_HEAD(veb_rule_list, veb_rule);
+
+struct veb_softc;
+
+struct veb_port {
+ struct ifnet *p_ifp0;
+ struct refcnt p_refs;
+
+ int (*p_ioctl)(struct ifnet *, u_long, caddr_t);
+ int (*p_output)(struct ifnet *, struct mbuf *, struct sockaddr *,
+    struct rtentry *);
+
+ struct task p_ltask;
+ struct task p_dtask;
+
+ struct veb_softc *p_veb;
+
+ struct ether_brport p_brport;
+
+ unsigned int p_link_state;
+ unsigned int p_span;
+ unsigned int p_bif_flags;
+ uint32_t p_protected;
+
+ struct veb_rules p_vrl;
+ unsigned int p_nvrl;
+ struct veb_rule_list p_vr_list[2];
+#define VEB_RULE_LIST_OUT 0
+#define VEB_RULE_LIST_IN 1
+
+ SMR_TAILQ_ENTRY(veb_port) p_entry;
+};
+
+struct veb_ports {
+ SMR_TAILQ_HEAD(, veb_port) l_list;
+ unsigned int l_count;
+};
+
+struct veb_softc {
+ struct ifnet sc_if;
+ unsigned int sc_dead;
+
+ struct etherbridge sc_eb;
+
+ struct rwlock sc_rule_lock;
+ struct veb_ports sc_ports;
+ struct veb_ports sc_spans;
+};
+
+#define DPRINTF(_sc, fmt...)    do { \
+ if (ISSET((_sc)->sc_if.if_flags, IFF_DEBUG)) \
+ printf(fmt); \
+} while (0)
+
+
+static int veb_clone_create(struct if_clone *, int);
+static int veb_clone_destroy(struct ifnet *);
+
+static int veb_ioctl(struct ifnet *, u_long, caddr_t);
+static void veb_input(struct ifnet *, struct mbuf *);
+static int veb_enqueue(struct ifnet *, struct mbuf *);
+static int veb_output(struct ifnet *, struct mbuf *, struct sockaddr *,
+    struct rtentry *);
+static void veb_start(struct ifqueue *);
+
+static int veb_up(struct veb_softc *);
+static int veb_down(struct veb_softc *);
+static int veb_iff(struct veb_softc *);
+
+static void veb_p_linkch(void *);
+static void veb_p_detach(void *);
+static int veb_p_ioctl(struct ifnet *, u_long, caddr_t);
+static int veb_p_output(struct ifnet *, struct mbuf *,
+    struct sockaddr *, struct rtentry *);
+
+static void veb_p_dtor(struct veb_softc *, struct veb_port *,
+    const char *);
+static int veb_add_port(struct veb_softc *,
+    const struct ifbreq *, unsigned int);
+static int veb_del_port(struct veb_softc *,
+    const struct ifbreq *, unsigned int);
+static int veb_port_list(struct veb_softc *, struct ifbifconf *);
+static int veb_port_set_protected(struct veb_softc *,
+    const struct ifbreq *);
+
+static int veb_rule_add(struct veb_softc *, const struct ifbrlreq *);
+static int veb_rule_list_flush(struct veb_softc *,
+    const struct ifbrlreq *);
+static void veb_rule_list_free(struct veb_rule *);
+static int veb_rule_list_get(struct veb_softc *, struct ifbrlconf *);
+
+static int veb_eb_port_cmp(void *, void *, void *);
+static void *veb_eb_port_take(void *, void *);
+static void veb_eb_port_rele(void *, void *);
+static size_t veb_eb_port_ifname(void *, char *, size_t, void *);
+static void veb_eb_port_sa(void *, struct sockaddr_storage *, void *);
+
+static const struct etherbridge_ops veb_etherbridge_ops = {
+ veb_eb_port_cmp,
+ veb_eb_port_take,
+ veb_eb_port_rele,
+ veb_eb_port_ifname,
+ veb_eb_port_sa,
+};
+
+static struct if_clone veb_cloner =
+    IF_CLONE_INITIALIZER("veb", veb_clone_create, veb_clone_destroy);
+
+static struct pool veb_rule_pool;
+
+static int vport_clone_create(struct if_clone *, int);
+static int vport_clone_destroy(struct ifnet *);
+
+struct vport_softc {
+ struct arpcom sc_ac;
+ unsigned int sc_dead;
+};
+
+static int vport_ioctl(struct ifnet *, u_long, caddr_t);
+static int vport_enqueue(struct ifnet *, struct mbuf *);
+static void vport_start(struct ifqueue *);
+
+static int vport_up(struct vport_softc *);
+static int vport_down(struct vport_softc *);
+static int vport_iff(struct vport_softc *);
+
+static struct if_clone vport_cloner =
+    IF_CLONE_INITIALIZER("vport", vport_clone_create, vport_clone_destroy);
+
+void
+vebattach(int count)
+{
+ if_clone_attach(&veb_cloner);
+ if_clone_attach(&vport_cloner);
+}
+
+static int
+veb_clone_create(struct if_clone *ifc, int unit)
+{
+ struct veb_softc *sc;
+ struct ifnet *ifp;
+ int error;
+
+ if (veb_rule_pool.pr_size == 0) {
+ pool_init(&veb_rule_pool, sizeof(struct veb_rule),
+    0, IPL_SOFTNET, 0, "vebrpl", NULL);
+ }
+
+ sc = malloc(sizeof(*sc), M_DEVBUF, M_WAITOK|M_ZERO|M_CANFAIL);
+ if (sc == NULL)
+ return (ENOMEM);
+
+ rw_init(&sc->sc_rule_lock, "vebrlk");
+ SMR_TAILQ_INIT(&sc->sc_ports.l_list);
+ SMR_TAILQ_INIT(&sc->sc_spans.l_list);
+
+ ifp = &sc->sc_if;
+
+ snprintf(ifp->if_xname, sizeof(ifp->if_xname), "%s%d",
+    ifc->ifc_name, unit);
+
+ error = etherbridge_init(&sc->sc_eb, ifp->if_xname,
+    &veb_etherbridge_ops, sc);
+ if (error != 0) {
+ free(sc, M_DEVBUF, sizeof(*sc));
+ return (error);
+ }
+
+ ifp->if_softc = sc;
+ ifp->if_type = IFT_BRIDGE;
+ ifp->if_hdrlen = ETHER_HDR_LEN;
+ ifp->if_hardmtu = ETHER_MAX_HARDMTU_LEN;
+ ifp->if_ioctl = veb_ioctl;
+ ifp->if_input = veb_input;
+ //ifp->if_rtrequest = veb_rtrequest;
+ ifp->if_output = veb_output;
+ ifp->if_enqueue = veb_enqueue;
+ ifp->if_qstart = veb_start;
+ ifp->if_flags = IFF_BROADCAST | IFF_SIMPLEX | IFF_MULTICAST;
+ ifp->if_xflags = IFXF_CLONED | IFXF_MPSAFE;
+
+ if_counters_alloc(ifp);
+ if_attach(ifp);
+
+ if_alloc_sadl(ifp);
+
+#if NBPFILTER > 0
+ bpfattach(&ifp->if_bpf, ifp, DLT_EN10MB, ETHER_HDR_LEN);
+#endif
+
+ return (0);
+}
+
+static int
+veb_clone_destroy(struct ifnet *ifp)
+{
+ struct veb_softc *sc = ifp->if_softc;
+ struct veb_port *p, *np;
+
+ NET_LOCK();
+ sc->sc_dead = 1;
+
+ if (ISSET(ifp->if_flags, IFF_RUNNING))
+ veb_down(sc);
+ NET_UNLOCK();
+
+ if_detach(ifp);
+
+ NET_LOCK();
+ SMR_TAILQ_FOREACH_SAFE_LOCKED(p, &sc->sc_ports.l_list, p_entry, np)
+ veb_p_dtor(sc, p, "destroy");
+ SMR_TAILQ_FOREACH_SAFE_LOCKED(p, &sc->sc_spans.l_list, p_entry, np)
+ veb_p_dtor(sc, p, "destroy");
+ NET_UNLOCK();
+
+ etherbridge_destroy(&sc->sc_eb);
+
+ free(sc, M_DEVBUF, sizeof(*sc));
+
+ return (0);
+}
+
+static struct mbuf *
+veb_span_input(struct ifnet *ifp0, struct mbuf *m, void *brport)
+{
+ m_freem(m);
+ return (NULL);
+}
+
+static void
+veb_span(struct veb_softc *sc, struct mbuf *m0)
+{
+ struct veb_port *p;
+ struct ifnet *ifp0;
+ struct mbuf *m;
+
+ smr_read_enter();
+ SMR_TAILQ_FOREACH(p, &sc->sc_spans.l_list, p_entry) {
+ ifp0 = p->p_ifp0;
+ if (!ISSET(ifp0->if_flags, IFF_RUNNING))
+ continue;
+
+ m = m_dup_pkt(m0, max_linkhdr + ETHER_ALIGN, M_NOWAIT);
+ if (m == NULL) {
+ /* XXX count error */
+ continue;
+ }
+
+ if_enqueue(ifp0, m); /* XXX count error */
+ }
+ smr_read_leave();
+}
+
+static int
+veb_vlan_filter(const struct mbuf *m)
+{
+ const struct ether_header *eh;
+
+ eh = mtod(m, struct ether_header *);
+ switch (ntohs(eh->ether_type)) {
+ case ETHERTYPE_VLAN:
+ case ETHERTYPE_QINQ:
+ return (1);
+ default:
+ break;
+ }
+
+ return (0);
+}
+
+static int
+veb_rule_arp_match(const struct veb_rule *vr, struct mbuf *m)
+{
+ struct ether_header *eh;
+ struct ether_arp ea;
+
+ eh = mtod(m, struct ether_header *);
+
+ if (eh->ether_type != htons(ETHERTYPE_ARP))
+ return (0);
+ if (m->m_pkthdr.len < sizeof(*eh) + sizeof(ea))
+ return (0);
+
+ m_copydata(m, sizeof(*eh), sizeof(ea), (caddr_t)&ea);
+
+ if (ea.arp_hrd != htons(ARPHRD_ETHER) ||
+    ea.arp_pro != htons(ETHERTYPE_IP) ||
+    ea.arp_hln != ETHER_ADDR_LEN ||
+    ea.arp_pln != sizeof(struct in_addr))
+ return (0);
+
+ if (ISSET(vr->vr_flags, VEB_R_F_ARP)) {
+ if (ea.arp_op != htons(ARPOP_REQUEST) &&
+    ea.arp_op != htons(ARPOP_REPLY))
+ return (0);
+ }
+ if (ISSET(vr->vr_flags, VEB_R_F_RARP)) {
+ if (ea.arp_op != htons(ARPOP_REVREQUEST) &&
+    ea.arp_op != htons(ARPOP_REVREPLY))
+ return (0);
+ }
+
+ if (vr->vr_arp_op != htons(0) && vr->vr_arp_op != ea.arp_op)
+ return (0);
+
+ if (ISSET(vr->vr_flags, VEB_R_F_SHA) &&
+    !ETHER_IS_EQ(&vr->vr_arp_sha, ea.arp_sha))
+ return (0);
+ if (ISSET(vr->vr_flags, VEB_R_F_THA) &&
+    !ETHER_IS_EQ(&vr->vr_arp_tha, ea.arp_tha))
+ return (0);
+ if (ISSET(vr->vr_flags, VEB_R_F_SPA) &&
+    memcmp(&vr->vr_arp_spa, ea.arp_spa, sizeof(vr->vr_arp_spa)) != 0)
+ return (0);
+ if (ISSET(vr->vr_flags, VEB_R_F_TPA) &&
+    memcmp(&vr->vr_arp_tpa, ea.arp_tpa, sizeof(vr->vr_arp_tpa)) != 0)
+ return (0);
+
+ return (1);
+}
+
+static int
+veb_rule_list_test(struct veb_rule *vr, int dir, struct mbuf *m)
+{
+ struct ether_header *eh = mtod(m, struct ether_header *);
+
+ SMR_ASSERT_CRITICAL();
+
+ do {
+ if (ISSET(vr->vr_flags, VEB_R_F_ARP|VEB_R_F_RARP) &&
+    !veb_rule_arp_match(vr, m))
+ continue;
+
+ if (ISSET(vr->vr_flags, VEB_R_F_SRC) &&
+    !ETHER_IS_EQ(&vr->vr_src, eh->ether_shost))
+ continue;
+ if (ISSET(vr->vr_flags, VEB_R_F_DST) &&
+    !ETHER_IS_EQ(&vr->vr_dst, eh->ether_dhost))
+ continue;
+
+ if (vr->vr_action == VEB_R_BLOCK)
+ return (VEB_R_BLOCK);
+#if NPF > 0
+ pf_tag_packet(m, vr->vr_pftag, -1);
+#endif
+ if (vr->vr_action == VEB_R_PASS)
+ return (VEB_R_PASS);
+ } while ((vr = SMR_TAILQ_NEXT(vr, vr_lentry[dir])) != NULL);
+
+ return (VEB_R_PASS);
+}
+
+static inline int
+veb_rule_filter(struct veb_port *p, int dir, struct mbuf *m)
+{
+ struct veb_rule *vr;
+
+ vr = SMR_TAILQ_FIRST(&p->p_vr_list[dir]);
+ if (vr == NULL)
+ return (0);
+
+ return (veb_rule_list_test(vr, dir, m) == VEB_R_BLOCK);
+}
+
+#if NPF > 0
+static struct mbuf *
+veb_pf(struct ifnet *ifp0, int dir, struct mbuf *m)
+{
+ struct ether_header *eh, copy;
+ sa_family_t af = AF_UNSPEC;
+
+ /*
+ * pf runs on vport interfaces when they enter or leave the
+ * l3 stack, so don't confuse things (even more) by running
+ * pf again here. note that because of this exception the
+ * pf direction on vport interfaces is reversed compared to
+ * other veb ports.
+ */
+ if (ifp0->if_enqueue == vport_enqueue)
+ return (m);
+
+ eh = mtod(m, struct ether_header *);
+ switch (ntohs(eh->ether_type)) {
+ case ETHERTYPE_IP:
+ af = AF_INET;
+ break;
+ case ETHERTYPE_IPV6:
+ af = AF_INET6;
+ break;
+ default:
+ return (m);
+ }
+
+ copy = *eh;
+ m_adj(m, sizeof(*eh));
+
+ if (pf_test(af, dir, ifp0, &m) != PF_PASS) {
+ m_freem(m);
+ return (NULL);
+ }
+ if (m == NULL)
+ return (NULL);
+
+ m = m_prepend(m, sizeof(*eh), M_DONTWAIT);
+ if (m == NULL)
+ return (NULL);
+
+ /* checksum? */
+
+ eh = mtod(m, struct ether_header *);
+ *eh = copy;
+
+ return (m);
+}
+#endif /* NPF > 0 */
+
+static void
+veb_broadcast(struct veb_softc *sc, struct veb_port *rp, struct mbuf *m0)
+{
+ struct ifnet *ifp = &sc->sc_if;
+ struct veb_port *tp;
+ struct ifnet *ifp0;
+ struct mbuf *m;
+
+#if NPF > 0
+ /*
+ * we couldnt find a specific port to send this packet to,
+ * but pf should still have a chance to apply policy to it.
+ * let pf look at it, but use the veb interface as a proxy.
+ */
+ if (ISSET(ifp->if_flags, IFF_LINK1) &&
+    (m = veb_pf(ifp, PF_OUT, m0)) == NULL)
+ return;
+#endif
+
+ counters_pkt(ifp->if_counters, ifc_opackets, ifc_obytes,
+    m0->m_pkthdr.len);
+
+ smr_read_enter();
+ SMR_TAILQ_FOREACH(tp, &sc->sc_ports.l_list, p_entry) {
+ if (rp == tp || (rp->p_protected & tp->p_protected)) {
+ /*
+ * don't let Ethernet packets hairpin or
+ * move between ports in the same protected
+ * domain(s).
+ */
+ continue;
+ }
+
+ ifp0 = tp->p_ifp0;
+ if (!ISSET(ifp0->if_flags, IFF_RUNNING)) {
+ /* don't waste time */
+ continue;
+ }
+
+ if (!ISSET(tp->p_bif_flags, IFBIF_DISCOVER) &&
+    !ISSET(m0->m_flags, M_BCAST | M_MCAST)) {
+ /* don't flood unknown unicast */
+ continue;
+ }
+
+ if (veb_rule_filter(tp, VEB_RULE_LIST_OUT, m0))
+ continue;
+
+ m = m_dup_pkt(m0, max_linkhdr + ETHER_ALIGN, M_NOWAIT);
+ if (m == NULL) {
+ /* XXX count error? */
+ continue;
+ }
+
+ if_enqueue(ifp0, m); /* XXX count error? */
+ }
+ smr_read_leave();
+
+ m_freem(m0);
+}
+
+static struct mbuf *
+veb_transmit(struct veb_softc *sc, struct veb_port *rp, struct veb_port *tp,
+    struct mbuf *m)
+{
+ struct ifnet *ifp = &sc->sc_if;
+ struct ifnet *ifp0;
+
+ if (tp == NULL)
+ return (m);
+
+ if (rp == tp || (rp->p_protected & tp->p_protected)) {
+                /*
+ * don't let Ethernet packets hairpin or move between
+ * ports in the same protected domain(s).
+ */
+ goto drop;
+ }
+
+ if (veb_rule_filter(tp, VEB_RULE_LIST_OUT, m))
+ goto drop;
+
+ ifp0 = tp->p_ifp0;
+
+#if NPF > 0
+ if (ISSET(ifp->if_flags, IFF_LINK1) &&
+    (m = veb_pf(ifp0, PF_OUT, m)) == NULL)
+ return (NULL);
+#endif
+
+ counters_pkt(ifp->if_counters, ifc_opackets, ifc_obytes,
+    m->m_pkthdr.len);
+
+ if_enqueue(ifp0, m); /* XXX count error? */
+
+ return (NULL);
+drop:
+ m_freem(m);
+ return (NULL);
+}
+
+static struct mbuf *
+veb_port_input(struct ifnet *ifp0, struct mbuf *m, void *brport)
+{
+ struct veb_port *p = brport;
+ struct veb_softc *sc = p->p_veb;
+ struct ifnet *ifp = &sc->sc_if;
+ struct ether_header *eh;
+#if NBPFILTER > 0
+ caddr_t if_bpf;
+#endif
+
+ if (ISSET(m->m_flags, M_PROTO1)) {
+ CLR(m->m_flags, M_PROTO1);
+ return (m);
+ }
+
+ if (!ISSET(ifp->if_flags, IFF_RUNNING))
+ return (m);
+
+#if NVLAN > 0
+ /*
+ * If the underlying interface removed the VLAN header itself,
+ * add it back.
+ */
+ if (ISSET(m->m_flags, M_VLANTAG)) {
+ m = vlan_inject(m, ETHERTYPE_VLAN, m->m_pkthdr.ether_vtag);
+ if (m == NULL) {
+ counters_inc(ifp->if_counters, ifc_ierrors);
+ goto drop;
+ }
+ }
+#endif
+
+ counters_pkt(ifp->if_counters, ifc_ipackets, ifc_ibytes,
+    m->m_pkthdr.len);
+
+ /* force packets into the one routing domain for pf */
+ m->m_pkthdr.ph_rtableid = ifp->if_rdomain;
+
+#if NBPFILTER > 0
+ if_bpf = READ_ONCE(ifp->if_bpf);
+ if (if_bpf != NULL) {
+ if (bpf_mtap_ether(if_bpf, m, 0) != 0)
+ goto drop;
+ }
+#endif
+
+ veb_span(sc, m);
+
+ if (!ISSET(ifp->if_flags, IFF_LINK2) &&
+    veb_vlan_filter(m))
+ goto drop;
+
+ if (veb_rule_filter(p, VEB_RULE_LIST_IN, m))
+ goto drop;
+
+#if NPF > 0
+ if (ISSET(ifp->if_flags, IFF_LINK1) &&
+    (m = veb_pf(ifp0, PF_IN, m)) == NULL)
+ return (NULL);
+#endif
+
+ eh = mtod(m, struct ether_header *);
+
+ if (ISSET(p->p_bif_flags, IFBIF_LEARNING)) {
+ etherbridge_map(&sc->sc_eb, p,
+    (struct ether_addr *)eh->ether_shost);
+ }
+
+ CLR(m->m_flags, M_BCAST|M_MCAST);
+ SET(m->m_flags, M_PROTO1);
+
+ if (!ETHER_IS_MULTICAST(eh->ether_dhost)) {
+ struct veb_port *tp = NULL;
+
+ smr_read_enter();
+ tp = etherbridge_resolve(&sc->sc_eb,
+    (struct ether_addr *)eh->ether_dhost);
+ m = veb_transmit(sc, p, tp, m);
+ smr_read_leave();
+
+ if (m == NULL)
+ return (NULL);
+
+ /* unknown unicast address */
+ } else {
+ SET(m->m_flags,
+    ETHER_IS_BROADCAST(eh->ether_dhost) ? M_BCAST : M_MCAST);
+ }
+
+ veb_broadcast(sc, p, m);
+ return (NULL);
+
+drop:
+ m_freem(m);
+ return (NULL);
+}
+
+static void
+veb_input(struct ifnet *ifp, struct mbuf *m)
+{
+ m_freem(m);
+}
+
+static int
+veb_output(struct ifnet *ifp, struct mbuf *m, struct sockaddr *dst,
+    struct rtentry *rt)
+{
+ m_freem(m);
+ return (ENODEV);
+}
+
+static int
+veb_enqueue(struct ifnet *ifp, struct mbuf *m)
+{
+ m_freem(m);
+ return (ENODEV);
+}
+
+static void
+veb_start(struct ifqueue *ifq)
+{
+ ifq_purge(ifq);
+}
+
+static int
+veb_ioctl(struct ifnet *ifp, u_long cmd, caddr_t data)
+{
+ struct veb_softc *sc = ifp->if_softc;
+ struct ifbrparam *bparam = (struct ifbrparam *)data;
+ int error = 0;
+
+ if (sc->sc_dead)
+ return (ENXIO);
+
+ switch (cmd) {
+ case SIOCSIFFLAGS:
+ if (ISSET(ifp->if_flags, IFF_UP)) {
+ if (!ISSET(ifp->if_flags, IFF_RUNNING))
+ error = veb_up(sc);
+ } else {
+ if (ISSET(ifp->if_flags, IFF_RUNNING))
+ error = veb_down(sc);
+ }
+ break;
+
+ case SIOCBRDGADD:
+ error = suser(curproc);
+ if (error != 0)
+ break;
+
+ error = veb_add_port(sc, (struct ifbreq *)data, 0);
+ break;
+ case SIOCBRDGADDS:
+ error = suser(curproc);
+ if (error != 0)
+ break;
+
+ error = veb_add_port(sc, (struct ifbreq *)data, 1);
+ break;
+ case SIOCBRDGDEL:
+ error = suser(curproc);
+ if (error != 0)
+ break;
+
+ error = veb_del_port(sc, (struct ifbreq *)data, 0);
+ break;
+ case SIOCBRDGDELS:
+ error = suser(curproc);
+ if (error != 0)
+ break;
+
+ error = veb_del_port(sc, (struct ifbreq *)data, 1);
+ break;
+
+ case SIOCBRDGSCACHE:
+ error = suser(curproc);
+ if (error != 0)
+ break;
+
+ error = etherbridge_set_max(&sc->sc_eb, bparam);
+ break;
+ case SIOCBRDGGCACHE:
+ error = etherbridge_get_max(&sc->sc_eb, bparam);
+ break;
+
+ case SIOCBRDGSTO:
+ error = suser(curproc);
+ if (error != 0)
+ break;
+
+ error = etherbridge_set_tmo(&sc->sc_eb, bparam);
+ break;
+ case SIOCBRDGGTO:
+ error = etherbridge_get_tmo(&sc->sc_eb, bparam);
+ break;
+
+ case SIOCBRDGRTS:
+ error = etherbridge_rtfind(&sc->sc_eb, (struct ifbaconf *)data);
+ break;
+ case SIOCBRDGIFS:
+ error = veb_port_list(sc, (struct ifbifconf *)data);
+ break;
+
+ case SIOCBRDGSIFPROT:
+ error = veb_port_set_protected(sc, (struct ifbreq *)data);
+ break;
+
+ case SIOCBRDGARL:
+ error = veb_rule_add(sc, (struct ifbrlreq *)data);
+ break;
+ case SIOCBRDGFRL:
+ error = veb_rule_list_flush(sc, (struct ifbrlreq *)data);
+ break;
+ case SIOCBRDGGRL:
+ error = veb_rule_list_get(sc, (struct ifbrlconf *)data);
+ break;
+
+ default:
+ error = ENOTTY;
+ break;
+ }
+
+ if (error == ENETRESET)
+ error = veb_iff(sc);
+
+ return (error);
+}
+
+static int
+veb_add_port(struct veb_softc *sc, const struct ifbreq *req, unsigned int span)
+{
+ struct ifnet *ifp = &sc->sc_if;
+ struct ifnet *ifp0;
+ struct veb_ports *port_list;
+ struct veb_port *p;
+ int error;
+
+ NET_ASSERT_LOCKED();
+
+ ifp0 = if_unit(req->ifbr_ifsname);
+ if (ifp0 == NULL)
+ return (EINVAL);
+
+ if (ifp0->if_type != IFT_ETHER) {
+ error = EPROTONOSUPPORT;
+ goto put;
+ }
+
+ if (ifp0 == ifp) {
+ error = EPROTONOSUPPORT;
+ goto put;
+ }
+
+ error = ether_brport_isset(ifp0);
+ if (error != 0)
+ goto put;
+
+ /* let's try */
+
+ p = malloc(sizeof(*p), M_DEVBUF, M_WAITOK|M_ZERO|M_CANFAIL);
+ if (p == NULL) {
+ error = ENOMEM;
+ goto put;
+ }
+
+ p->p_ifp0 = ifp0;
+ p->p_veb = sc;
+
+ refcnt_init(&p->p_refs);
+ TAILQ_INIT(&p->p_vrl);
+ SMR_TAILQ_INIT(&p->p_vr_list[0]);
+ SMR_TAILQ_INIT(&p->p_vr_list[1]);
+
+ p->p_ioctl = ifp0->if_ioctl;
+ p->p_output = ifp0->if_output;
+
+ if (span) {
+ port_list = &sc->sc_spans;
+
+ p->p_brport.eb_input = veb_span_input;
+ p->p_bif_flags = IFBIF_SPAN;
+ } else {
+ port_list = &sc->sc_ports;
+
+ error = ifpromisc(ifp0, 1);
+ if (error != 0)
+ goto free;
+
+ p->p_bif_flags = IFBIF_LEARNING | IFBIF_DISCOVER;
+ p->p_brport.eb_input = veb_port_input;
+ }
+
+ /* this might have changed if we slept for malloc or ifpromisc */
+ error = ether_brport_isset(ifp0);
+ if (error != 0)
+ goto unpromisc;
+
+ task_set(&p->p_ltask, veb_p_linkch, p);
+ if_linkstatehook_add(ifp0, &p->p_ltask);
+
+ task_set(&p->p_dtask, veb_p_detach, p);
+ if_detachhook_add(ifp0, &p->p_dtask);
+
+ p->p_brport.eb_port = p;
+
+ /* commit */
+ SMR_TAILQ_INSERT_TAIL_LOCKED(&port_list->l_list, p, p_entry);
+ port_list->l_count++;
+
+ ether_brport_set(ifp0, &p->p_brport);
+ if (ifp0->if_enqueue != vport_enqueue) { /* vport is special */
+ ifp0->if_ioctl = veb_p_ioctl;
+ ifp0->if_output = veb_p_output;
+ }
+
+ veb_p_linkch(p);
+
+ return (0);
+
+unpromisc:
+ if (!span)
+ ifpromisc(ifp0, 0);
+free:
+ free(p, M_DEVBUF, sizeof(*p));
+put:
+ if_put(ifp0);
+ return (error);
+}
+
+static struct veb_port *
+veb_trunkport(struct veb_softc *sc, const char *name, unsigned int span)
+{
+ struct veb_ports *port_list;
+ struct veb_port *p;
+
+ port_list = span ? &sc->sc_spans : &sc->sc_ports;
+
+ SMR_TAILQ_FOREACH_LOCKED(p, &port_list->l_list, p_entry) {
+ if (strcmp(p->p_ifp0->if_xname, name) == 0)
+ return (p);
+ }
+
+ return (NULL);
+}
+
+static int
+veb_del_port(struct veb_softc *sc, const struct ifbreq *req, unsigned int span)
+{
+ struct veb_port *p;
+
+ NET_ASSERT_LOCKED();
+ p = veb_trunkport(sc, req->ifbr_ifsname, span);
+ if (p == NULL)
+ return (EINVAL);
+
+ veb_p_dtor(sc, p, "del");
+
+ return (0);
+}
+
+static struct veb_port *
+veb_port_get(struct veb_softc *sc, const char *name)
+{
+ struct veb_port *p;
+
+ NET_ASSERT_LOCKED();
+
+ SMR_TAILQ_FOREACH_LOCKED(p, &sc->sc_ports.l_list, p_entry) {
+ struct ifnet *ifp0 = p->p_ifp0;
+ if (strncmp(ifp0->if_xname, name,
+    sizeof(ifp0->if_xname)) == 0) {
+ refcnt_take(&p->p_refs);
+ break;
+ }
+ }
+
+ return (p);
+}
+
+static void
+veb_port_put(struct veb_softc *sc, struct veb_port *p)
+{
+ refcnt_rele_wake(&p->p_refs);
+}
+
+static int
+veb_port_set_protected(struct veb_softc *sc, const struct ifbreq *ifbr)
+{
+ struct veb_port *p;
+
+ p = veb_port_get(sc, ifbr->ifbr_ifsname);
+ if (p == NULL)
+ return (ESRCH);
+
+ p->p_protected = ifbr->ifbr_protected;
+ veb_port_put(sc, p);
+
+ return (0);
+}
+
+static int
+veb_rule_add(struct veb_softc *sc, const struct ifbrlreq *ifbr)
+{
+ const struct ifbrarpf *brla = &ifbr->ifbr_arpf;
+ struct veb_rule vr, *vrp;
+ struct veb_port *p;
+ int error;
+
+ memset(&vr, 0, sizeof(vr));
+
+ switch (ifbr->ifbr_action) {
+ case BRL_ACTION_BLOCK:
+ vr.vr_action = VEB_R_BLOCK;
+ break;
+ case BRL_ACTION_PASS:
+ vr.vr_action = VEB_R_PASS;
+ break;
+ /* XXX VEB_R_MATCH */
+ default:
+ return (EINVAL);
+ }
+
+ if (!ISSET(ifbr->ifbr_flags, BRL_FLAG_IN|BRL_FLAG_OUT))
+ return (EINVAL);
+ if (ISSET(ifbr->ifbr_flags, BRL_FLAG_IN))
+ SET(vr.vr_flags, VEB_R_F_IN);
+ if (ISSET(ifbr->ifbr_flags, BRL_FLAG_OUT))
+ SET(vr.vr_flags, VEB_R_F_OUT);
+
+ if (ISSET(ifbr->ifbr_flags, BRL_FLAG_SRCVALID)) {
+ SET(vr.vr_flags, VEB_R_F_SRC);
+ vr.vr_src = ifbr->ifbr_src;
+ }
+ if (ISSET(ifbr->ifbr_flags, BRL_FLAG_DSTVALID)) {
+ SET(vr.vr_flags, VEB_R_F_DST);
+ vr.vr_dst = ifbr->ifbr_dst;
+ }
+
+ /* ARP rule */
+ if (ISSET(brla->brla_flags, BRLA_ARP|BRLA_RARP)) {
+ if (ISSET(brla->brla_flags, BRLA_ARP))
+ SET(vr.vr_flags, VEB_R_F_ARP);
+ if (ISSET(brla->brla_flags, BRLA_RARP))
+ SET(vr.vr_flags, VEB_R_F_RARP);
+
+ if (ISSET(brla->brla_flags, BRLA_SHA)) {
+ SET(vr.vr_flags, VEB_R_F_SHA);
+ vr.vr_arp_sha = brla->brla_sha;
+ }
+ if (ISSET(brla->brla_flags, BRLA_THA)) {
+ SET(vr.vr_flags, VEB_R_F_THA);
+ vr.vr_arp_tha = brla->brla_tha;
+ }
+ if (ISSET(brla->brla_flags, BRLA_SPA)) {
+ SET(vr.vr_flags, VEB_R_F_SPA);
+ vr.vr_arp_spa = brla->brla_spa;
+ }
+ if (ISSET(brla->brla_flags, BRLA_TPA)) {
+ SET(vr.vr_flags, VEB_R_F_TPA);
+ vr.vr_arp_tpa = brla->brla_tpa;
+ }
+ vr.vr_arp_op = htons(brla->brla_op);
+ }
+
+ if (ifbr->ifbr_tagname[0] != '\0') {
+#if NPF > 0
+ vr.vr_pftag = pf_tagname2tag((char *)ifbr->ifbr_tagname, 1);
+ if (vr.vr_pftag == 0)
+ return (ENOMEM);
+#else
+ return (EINVAL);
+#endif
+ }
+
+ p = veb_port_get(sc, ifbr->ifbr_ifsname);
+ if (p == NULL) {
+ error = ESRCH;
+ goto error;
+ }
+
+ vrp = pool_get(&veb_rule_pool, PR_WAITOK|PR_LIMITFAIL|PR_ZERO);
+ if (vrp == NULL) {
+ error = ENOMEM;
+ goto port_put;
+ }
+
+ *vrp = vr;
+
+ /* there's one big lock on a veb for all ports */
+ error = rw_enter(&sc->sc_rule_lock, RW_WRITE|RW_INTR);
+ if (error != 0)
+ goto rule_put;
+
+ TAILQ_INSERT_TAIL(&p->p_vrl, vrp, vr_entry);
+ p->p_nvrl++;
+ if (ISSET(vr.vr_flags, VEB_R_F_OUT)) {
+ SMR_TAILQ_INSERT_TAIL_LOCKED(&p->p_vr_list[0],
+    vrp, vr_lentry[0]);
+ }
+ if (ISSET(vr.vr_flags, VEB_R_F_IN)) {
+ SMR_TAILQ_INSERT_TAIL_LOCKED(&p->p_vr_list[1],
+    vrp, vr_lentry[1]);
+ }
+
+ rw_exit(&sc->sc_rule_lock);
+ veb_port_put(sc, p);
+
+ return (0);
+
+rule_put:
+ pool_put(&veb_rule_pool, vrp);
+port_put:
+ veb_port_put(sc, p);
+error:
+#if NPF > 0
+ pf_tag_unref(vr.vr_pftag);
+#endif
+ return (error);
+}
+
+static void
+veb_rule_list_free(struct veb_rule *nvr)
+{
+ struct veb_rule *vr;
+
+ while ((vr = nvr) != NULL) {
+ nvr = TAILQ_NEXT(vr, vr_entry);
+ pool_put(&veb_rule_pool, vr);
+ }
+}
+
+static int
+veb_rule_list_flush(struct veb_softc *sc, const struct ifbrlreq *ifbr)
+{
+ struct veb_port *p;
+ struct veb_rule *vr;
+ int error;
+
+ p = veb_port_get(sc, ifbr->ifbr_ifsname);
+ if (p == NULL)
+ return (ESRCH);
+
+ error = rw_enter(&sc->sc_rule_lock, RW_WRITE|RW_INTR);
+ if (error != 0) {
+ veb_port_put(sc, p);
+ return (error);
+ }
+
+ /* take all the rules away */
+ vr = TAILQ_FIRST(&p->p_vrl);
+
+ /* reset the lists and counts of rules */
+ TAILQ_INIT(&p->p_vrl);
+ p->p_nvrl = 0;
+ SMR_TAILQ_INIT(&p->p_vr_list[0]);
+ SMR_TAILQ_INIT(&p->p_vr_list[1]);
+
+ rw_exit(&sc->sc_rule_lock);
+ veb_port_put(sc, p);
+
+ smr_barrier();
+ veb_rule_list_free(vr);
+
+ return (0);
+}
+
+static void
+veb_rule2ifbr(struct ifbrlreq *ifbr, const struct veb_rule *vr)
+{
+ switch (vr->vr_action) {
+ case VEB_R_PASS:
+ ifbr->ifbr_action = BRL_ACTION_PASS;
+ break;
+ case VEB_R_BLOCK:
+ ifbr->ifbr_action = BRL_ACTION_BLOCK;
+ break;
+ }
+
+ if (ISSET(vr->vr_flags, VEB_R_F_IN))
+ SET(ifbr->ifbr_flags, BRL_FLAG_IN);
+ if (ISSET(vr->vr_flags, VEB_R_F_OUT))
+ SET(ifbr->ifbr_flags, BRL_FLAG_OUT);
+
+ if (ISSET(vr->vr_flags, VEB_R_F_SRC)) {
+ SET(ifbr->ifbr_flags, BRL_FLAG_SRCVALID);
+ ifbr->ifbr_src = vr->vr_src;
+ }
+ if (ISSET(vr->vr_flags, VEB_R_F_DST)) {
+ SET(ifbr->ifbr_flags, BRL_FLAG_DSTVALID);
+ ifbr->ifbr_dst = vr->vr_dst;
+ }
+
+ /* ARP rule */
+ if (ISSET(vr->vr_flags, VEB_R_F_ARP|VEB_R_F_RARP)) {
+ struct ifbrarpf *brla = &ifbr->ifbr_arpf;
+
+ if (ISSET(vr->vr_flags, VEB_R_F_ARP))
+ SET(brla->brla_flags, BRLA_ARP);
+ if (ISSET(vr->vr_flags, VEB_R_F_RARP))
+ SET(brla->brla_flags, BRLA_RARP);
+
+ if (ISSET(vr->vr_flags, VEB_R_F_SHA)) {
+ SET(brla->brla_flags, BRLA_SHA);
+ brla->brla_sha = vr->vr_arp_sha;
+ }
+ if (ISSET(vr->vr_flags, VEB_R_F_THA)) {
+ SET(brla->brla_flags, BRLA_THA);
+ brla->brla_tha = vr->vr_arp_tha;
+ }
+
+ if (ISSET(vr->vr_flags, VEB_R_F_SPA)) {
+ SET(brla->brla_flags, BRLA_SPA);
+ brla->brla_spa = vr->vr_arp_spa;
+ }
+ if (ISSET(vr->vr_flags, VEB_R_F_TPA)) {
+ SET(brla->brla_flags, BRLA_TPA);
+ brla->brla_tpa = vr->vr_arp_tpa;
+ }
+
+ brla->brla_op = ntohs(vr->vr_arp_op);
+ }
+
+#if NPF > 0
+ if (vr->vr_pftag != 0)
+ pf_tag2tagname(vr->vr_pftag, ifbr->ifbr_tagname);
+#endif
+}
+
+static int
+veb_rule_list_get(struct veb_softc *sc, struct ifbrlconf *ifbrl)
+{
+ struct veb_port *p;
+ struct veb_rule *vr;
+ struct ifbrlreq *ifbr, *ifbrs;
+ int error = 0;
+ size_t len;
+
+ p = veb_port_get(sc, ifbrl->ifbrl_ifsname);
+ if (p == NULL)
+ return (ESRCH);
+
+ len = p->p_nvrl; /* estimate */
+ if (ifbrl->ifbrl_len == 0 || len == 0) {
+ ifbrl->ifbrl_len = len * sizeof(*ifbrs);
+ goto port_put;
+ }
+
+ error = rw_enter(&sc->sc_rule_lock, RW_READ|RW_INTR);
+ if (error != 0)
+ goto port_put;
+
+ ifbrs = mallocarray(p->p_nvrl, sizeof(*ifbrs), M_TEMP,
+    M_WAITOK|M_CANFAIL|M_ZERO);
+ if (ifbrs == NULL) {
+ rw_exit(&sc->sc_rule_lock);
+ goto port_put;
+ }
+ len = p->p_nvrl * sizeof(*ifbrs);
+
+ ifbr = ifbrs;
+ TAILQ_FOREACH(vr, &p->p_vrl, vr_entry) {
+ strlcpy(ifbr->ifbr_name, sc->sc_if.if_xname,
+    sizeof(ifbr->ifbr_name));
+ strlcpy(ifbr->ifbr_ifsname, p->p_ifp0->if_xname,
+    sizeof(ifbr->ifbr_ifsname));
+ veb_rule2ifbr(ifbr, vr);
+
+ ifbr++;
+ }
+
+ rw_exit(&sc->sc_rule_lock);
+
+ error = copyout(ifbrs, ifbrl->ifbrl_buf, min(len, ifbrl->ifbrl_len));
+ if (error == 0)
+ ifbrl->ifbrl_len = len;
+ free(ifbrs, M_TEMP, len);
+
+port_put:
+ veb_port_put(sc, p);
+ return (error);
+}
+
+static int
+veb_port_list(struct veb_softc *sc, struct ifbifconf *bifc)
+{
+ struct ifnet *ifp = &sc->sc_if;
+ struct veb_port *p;
+ struct ifnet *ifp0;
+ struct ifbreq breq;
+ int n = 0, error = 0;
+
+ NET_ASSERT_LOCKED();
+
+ if (bifc->ifbic_len == 0) {
+ n = sc->sc_ports.l_count + sc->sc_spans.l_count;
+ goto done;
+ }
+
+ SMR_TAILQ_FOREACH_LOCKED(p, &sc->sc_ports.l_list, p_entry) {
+ if (bifc->ifbic_len < sizeof(breq))
+ break;
+
+ memset(&breq, 0, sizeof(breq));
+
+ ifp0 = p->p_ifp0;
+
+ strlcpy(breq.ifbr_name, ifp->if_xname, IFNAMSIZ);
+ strlcpy(breq.ifbr_ifsname, ifp0->if_xname, IFNAMSIZ);
+
+ /* flag as span port so ifconfig(8)'s brconfig.c:bridge_list()
+ * stays quiet wrt. STP */
+ breq.ifbr_ifsflags = p->p_bif_flags;
+ breq.ifbr_portno = ifp0->if_index;
+ breq.ifbr_protected = p->p_protected;
+ if ((error = copyout(&breq, bifc->ifbic_req + n,
+    sizeof(breq))) != 0)
+ goto done;
+
+ bifc->ifbic_len -= sizeof(breq);
+ n++;
+ }
+
+ SMR_TAILQ_FOREACH_LOCKED(p, &sc->sc_spans.l_list, p_entry) {
+ if (bifc->ifbic_len < sizeof(breq))
+ break;
+
+ memset(&breq, 0, sizeof(breq));
+
+ strlcpy(breq.ifbr_name, ifp->if_xname, IFNAMSIZ);
+ strlcpy(breq.ifbr_ifsname, p->p_ifp0->if_xname, IFNAMSIZ);
+
+ /* flag as span port so ifconfig(8)'s brconfig.c:bridge_list()
+ * stays quiet wrt. STP */
+ breq.ifbr_ifsflags = p->p_bif_flags;
+ if ((error = copyout(&breq, bifc->ifbic_req + n,
+    sizeof(breq))) != 0)
+ goto done;
+
+ bifc->ifbic_len -= sizeof(breq);
+ n++;
+ }
+
+done:
+ bifc->ifbic_len = n * sizeof(breq);
+ return (error);
+}
+
+static int
+veb_p_ioctl(struct ifnet *ifp0, u_long cmd, caddr_t data)
+{
+ const struct ether_brport *eb = ether_brport_get_locked(ifp0);
+ struct veb_port *p;
+ int error = 0;
+
+ KASSERTMSG(eb != NULL,
+    "%s: %s called without an ether_brport set",
+    ifp0->if_xname, __func__);
+ KASSERTMSG(eb->eb_input == veb_port_input,
+    "%s: %s called, but eb_input seems wrong (%p != veb_port_input())",
+    ifp0->if_xname, __func__, eb->eb_input);
+
+ p = eb->eb_port;
+
+ switch (cmd) {
+ case SIOCSIFADDR:
+ error = EBUSY;
+ break;
+
+ default:
+ error = (*p->p_ioctl)(ifp0, cmd, data);
+ break;
+ }
+
+ return (error);
+}
+
+static int
+veb_p_output(struct ifnet *ifp0, struct mbuf *m, struct sockaddr *dst,
+    struct rtentry *rt)
+{
+ int (*p_output)(struct ifnet *, struct mbuf *, struct sockaddr *,
+    struct rtentry *) = NULL;
+ const struct ether_brport *eb;
+
+ /* restrict transmission to bpf only */
+ if ((m_tag_find(m, PACKET_TAG_DLT, NULL) == NULL)) {
+ m_freem(m);
+ return (EBUSY);
+ }
+
+ smr_read_enter();
+ eb = ether_brport_get(ifp0);
+ if (eb != NULL && eb->eb_input == veb_port_input) {
+ struct veb_port *p = eb->eb_port;
+ p_output = p->p_output; /* code doesn't go away */
+ }
+ smr_read_leave();
+
+ if (p_output == NULL) {
+ m_freem(m);
+ return (ENXIO);
+ }
+
+ return ((*p_output)(ifp0, m, dst, rt));
+}
+
+static void
+veb_p_dtor(struct veb_softc *sc, struct veb_port *p, const char *op)
+{
+ struct ifnet *ifp = &sc->sc_if;
+ struct ifnet *ifp0 = p->p_ifp0;
+ struct veb_ports *port_list;
+
+ DPRINTF(sc, "%s %s: destroying port\n",
+    ifp->if_xname, ifp0->if_xname);
+
+ ifp0->if_ioctl = p->p_ioctl;
+ ifp0->if_output = p->p_output;
+
+ ether_brport_clr(ifp0);
+
+ if_detachhook_del(ifp0, &p->p_dtask);
+ if_linkstatehook_del(ifp0, &p->p_ltask);
+
+ if (p->p_span) {
+ port_list = &sc->sc_spans;
+ } else {
+ if (ifpromisc(ifp0, 0) != 0) {
+ log(LOG_WARNING, "%s %s: unable to disable promisc\n",
+    ifp->if_xname, ifp0->if_xname);
+ }
+
+ etherbridge_detach_port(&sc->sc_eb, p);
+
+ port_list = &sc->sc_ports;
+ }
+ SMR_TAILQ_REMOVE_LOCKED(&port_list->l_list, p, p_entry);
+ port_list->l_count--;
+
+ smr_barrier();
+ refcnt_finalize(&p->p_refs, "vebpdtor");
+
+ veb_rule_list_free(TAILQ_FIRST(&p->p_vrl));
+
+ if_put(ifp0);
+ free(p, M_DEVBUF, sizeof(*p));
+}
+
+static void
+veb_p_detach(void *arg)
+{
+ struct veb_port *p = arg;
+ struct veb_softc *sc = p->p_veb;
+
+ veb_p_dtor(sc, p, "detach");
+
+ NET_ASSERT_LOCKED();
+}
+
+static int
+veb_p_active(struct veb_port *p)
+{
+ struct ifnet *ifp0 = p->p_ifp0;
+
+ return (ISSET(ifp0->if_flags, IFF_RUNNING) &&
+    LINK_STATE_IS_UP(ifp0->if_link_state));
+}
+
+static void
+veb_p_linkch(void *arg)
+{
+ struct veb_port *p = arg;
+ u_char link_state = LINK_STATE_FULL_DUPLEX;
+
+ NET_ASSERT_LOCKED();
+
+ if (!veb_p_active(p))
+ link_state = LINK_STATE_DOWN;
+
+ p->p_link_state = link_state;
+}
+
+static int
+veb_up(struct veb_softc *sc)
+{
+ struct ifnet *ifp = &sc->sc_if;
+ int error;
+
+ error = etherbridge_up(&sc->sc_eb);
+ if (error != 0)
+ return (error);
+
+ NET_ASSERT_LOCKED();
+ SET(ifp->if_flags, IFF_RUNNING);
+
+ return (0);
+}
+
+static int
+veb_iff(struct veb_softc *sc)
+{
+ return (0);
+}
+
+static int
+veb_down(struct veb_softc *sc)
+{
+ struct ifnet *ifp = &sc->sc_if;
+ int error;
+
+ error = etherbridge_down(&sc->sc_eb);
+ if (error != 0)
+ return (0);
+
+ NET_ASSERT_LOCKED();
+ CLR(ifp->if_flags, IFF_RUNNING);
+
+ return (0);
+}
+
+static int
+veb_eb_port_cmp(void *arg, void *a, void *b)
+{
+ struct veb_port *pa = a, *pb = b;
+ return (pa == pb);
+}
+
+static void *
+veb_eb_port_take(void *arg, void *port)
+{
+ struct veb_port *p = port;
+
+ refcnt_take(&p->p_refs);
+
+ return (p);
+}
+
+static void
+veb_eb_port_rele(void *arg, void *port)
+{
+ struct veb_port *p = port;
+
+ refcnt_rele_wake(&p->p_refs);
+}
+
+static size_t
+veb_eb_port_ifname(void *arg, char *dst, size_t len, void *port)
+{
+ struct veb_port *p = port;
+
+ return (strlcpy(dst, p->p_ifp0->if_xname, len));
+}
+
+static void
+veb_eb_port_sa(void *arg, struct sockaddr_storage *ss, void *port)
+{
+ ss->ss_family = AF_UNSPEC;
+}
+
+/*
+ * virtual ethernet bridge port
+ */
+
+static int
+vport_clone_create(struct if_clone *ifc, int unit)
+{
+ struct vport_softc *sc;
+ struct ifnet *ifp;
+
+ sc = malloc(sizeof(*sc), M_DEVBUF, M_WAITOK|M_ZERO|M_CANFAIL);
+ if (sc == NULL)
+ return (ENOMEM);
+
+ ifp = &sc->sc_ac.ac_if;
+
+ snprintf(ifp->if_xname, sizeof(ifp->if_xname), "%s%d",
+    ifc->ifc_name, unit);
+
+ ifp->if_softc = sc;
+ ifp->if_type = IFT_ETHER;
+ ifp->if_hardmtu = ETHER_MAX_HARDMTU_LEN;
+ ifp->if_ioctl = vport_ioctl;
+ ifp->if_enqueue = vport_enqueue;
+ ifp->if_qstart = vport_start;
+ ifp->if_flags = IFF_BROADCAST | IFF_SIMPLEX | IFF_MULTICAST;
+ ifp->if_xflags = IFXF_CLONED | IFXF_MPSAFE;
+ ether_fakeaddr(ifp);
+
+ if_counters_alloc(ifp);
+ if_attach(ifp);
+ ether_ifattach(ifp);
+
+ return (0);
+}
+
+static int
+vport_clone_destroy(struct ifnet *ifp)
+{
+ struct vport_softc *sc = ifp->if_softc;
+
+ NET_LOCK();
+ sc->sc_dead = 1;
+
+ if (ISSET(ifp->if_flags, IFF_RUNNING))
+ vport_down(sc);
+ NET_UNLOCK();
+
+ ether_ifdetach(ifp);
+ if_detach(ifp);
+
+ free(sc, M_DEVBUF, sizeof(*sc));
+
+ return (0);
+}
+
+static int
+vport_ioctl(struct ifnet *ifp, u_long cmd, caddr_t data)
+{
+ struct vport_softc *sc = ifp->if_softc;
+ int error = 0;
+
+ if (sc->sc_dead)
+ return (ENXIO);
+
+ switch (cmd) {
+ case SIOCSIFFLAGS:
+ if (ISSET(ifp->if_flags, IFF_UP)) {
+ if (!ISSET(ifp->if_flags, IFF_RUNNING))
+ error = vport_up(sc);
+ } else {
+ if (ISSET(ifp->if_flags, IFF_RUNNING))
+ error = vport_down(sc);
+ }
+ break;
+
+ case SIOCADDMULTI:
+ case SIOCDELMULTI:
+ break;
+
+ default:
+ error = ether_ioctl(ifp, &sc->sc_ac, cmd, data);
+ break;
+ }
+
+ if (error == ENETRESET)
+ error = vport_iff(sc);
+
+ return (error);
+}
+
+static int
+vport_up(struct vport_softc *sc)
+{
+ struct ifnet *ifp = &sc->sc_ac.ac_if;
+
+ NET_ASSERT_LOCKED();
+ SET(ifp->if_flags, IFF_RUNNING);
+
+ return (0);
+}
+
+static int
+vport_iff(struct vport_softc *sc)
+{
+ return (0);
+}
+
+static int
+vport_down(struct vport_softc *sc)
+{
+ struct ifnet *ifp = &sc->sc_ac.ac_if;
+
+ NET_ASSERT_LOCKED();
+ CLR(ifp->if_flags, IFF_RUNNING);
+
+ return (0);
+}
+
+static int
+vport_enqueue(struct ifnet *ifp, struct mbuf *m)
+{
+ struct arpcom *ac;
+ const struct ether_brport *eb;
+ int error = ENETDOWN;
+#if NBPFILTER > 0
+ caddr_t if_bpf;
+#endif
+
+#if NPF > 0
+ /*
+ * the packet is about to leave the l3 stack and go into
+ * the l2 switching space, or it's coming from a switch space
+ * into the network stack. either way, there's no relationship
+ * between pf states in those different places.
+ */
+ pf_pkt_addr_changed(m);
+#endif
+
+ if (ISSET(m->m_flags, M_PROTO1)) {
+ /* packet is coming from a bridge */
+ if_vinput(ifp, m);
+ return (0);
+ }
+
+ /* packet is going to the bridge */
+
+ ac = (struct arpcom *)ifp;
+
+ smr_read_enter();
+ eb = SMR_PTR_GET(&ac->ac_brport);
+ if (eb != NULL) {
+ counters_pkt(ifp->if_counters, ifc_opackets, ifc_obytes,
+    m->m_pkthdr.len);
+
+#if NBPFILTER > 0
+ if_bpf = READ_ONCE(ifp->if_bpf);
+ if (if_bpf != NULL)
+ bpf_mtap_ether(if_bpf, m, BPF_DIRECTION_OUT);
+#endif
+
+ m = (*eb->eb_input)(ifp, m, eb->eb_port);
+
+ error = 0;
+ }
+ smr_read_leave();
+
+ m_freem(m);
+
+ return (error);
+}
+
+static void
+vport_start(struct ifqueue *ifq)
+{
+ ifq_purge(ifq);
+}
Index: net/toeplitz.c
===================================================================
RCS file: /cvs/src/sys/net/toeplitz.c,v
retrieving revision 1.9
diff -u -p -r1.9 toeplitz.c
--- net/toeplitz.c 1 Sep 2020 19:18:26 -0000 1.9
+++ net/toeplitz.c 10 Feb 2021 12:06:23 -0000
@@ -187,6 +187,15 @@ stoeplitz_hash_ip6port(const struct stoe
 }
 #endif /* INET6 */
 
+uint16_t
+stoeplitz_hash_eaddr(const struct stoeplitz_cache *scache,
+    const uint8_t ea[static 6])
+{
+ const uint16_t *ea16 = (const uint16_t *)ea;
+
+ return (stoeplitz_hash_n16(scache, ea16[0] ^ ea16[1] ^ ea16[2]));
+}
+
 void
 stoeplitz_to_key(void *key, size_t klen)
 {
Index: net/toeplitz.h
===================================================================
RCS file: /cvs/src/sys/net/toeplitz.h,v
retrieving revision 1.3
diff -u -p -r1.3 toeplitz.h
--- net/toeplitz.h 19 Jun 2020 08:48:15 -0000 1.3
+++ net/toeplitz.h 10 Feb 2021 12:06:23 -0000
@@ -53,6 +53,9 @@ uint16_t stoeplitz_hash_ip6port(const st
     uint16_t, uint16_t);
 #endif
 
+uint16_t stoeplitz_hash_eaddr(const struct stoeplitz_cache *,
+    const uint8_t [static 6]);
+
 /* hash a uint16_t in network byte order */
 static __unused inline uint16_t
 stoeplitz_hash_n16(const struct stoeplitz_cache *scache, uint16_t n16)
@@ -116,5 +119,7 @@ extern const struct stoeplitz_cache *con
 #define stoeplitz_ip6port(_sa6, _da6, _sp, _dp) \
  stoeplitz_hash_ip6port(stoeplitz_cache, (_sa6), (_da6), (_sp), (_dp))
 #endif
+#define stoeplitz_eaddr(_ea) \
+ stoeplitz_hash_eaddr(stoeplitz_cache, (_ea))
 
 #endif /* _SYS_NET_TOEPLITZ_H_ */

Reply | Threaded
Open this post in threaded view
|

Re: veb(4), a virtual ethernet bridge (that could replace bridge(4)?)

Vitaliy Makkoveev-3
Hello.


> + ifp->if_ioctl = veb_ioctl;
> + ifp->if_input = veb_input;
> + //ifp->if_rtrequest = veb_rtrequest;
> + ifp->if_output = veb_output;
> + ifp->if_enqueue = veb_enqueue;

Could you replace c++ style comment in veb_clone_create()?

> +veb_clone_destroy(struct ifnet *ifp)
> +{
> + struct veb_softc *sc = ifp->if_softc;
> + struct veb_port *p, *np;
> +
> + NET_LOCK();
> + sc->sc_dead = 1;
> +
> + if (ISSET(ifp->if_flags, IFF_RUNNING))
> + veb_down(sc);
> + NET_UNLOCK();
> +
> + if_detach(ifp);


Also veb_down() looks strange here. I guess it is no reason to
play with `if_flags' here and smr_barrier() could be called after
if_detach(). This makes `sc_dead’ unnecessary.
Reply | Threaded
Open this post in threaded view
|

Re: veb(4), a virtual ethernet bridge (that could replace bridge(4)?)

David Gwynne-5


> On 22 Feb 2021, at 12:46 am, Vitaliy Makkoveev <[hidden email]> wrote:
>
> Hello.
>
>
>> + ifp->if_ioctl = veb_ioctl;
>> + ifp->if_input = veb_input;
>> + //ifp->if_rtrequest = veb_rtrequest;
>> + ifp->if_output = veb_output;
>> + ifp->if_enqueue = veb_enqueue;
>
> Could you replace c++ style comment in veb_clone_create()?

yep.

>
>> +veb_clone_destroy(struct ifnet *ifp)
>> +{
>> + struct veb_softc *sc = ifp->if_softc;
>> + struct veb_port *p, *np;
>> +
>> + NET_LOCK();
>> + sc->sc_dead = 1;
>> +
>> + if (ISSET(ifp->if_flags, IFF_RUNNING))
>> + veb_down(sc);
>> + NET_UNLOCK();
>> +
>> + if_detach(ifp);
>
>
> Also veb_down() looks strange here. I guess it is no reason to
> play with `if_flags' here and smr_barrier() could be called after
> if_detach(). This makes `sc_dead’ unnecessary.

i need to think about sc_dead again. i do it in a bunch of different drivers and you're pretty confident it's not needed anymore.

technically the flags don't need to be cleared, but i like having the flow right in case i made veb_down do more in the future.