pipex(4): kill pipexintr()

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

pipex(4): kill pipexintr()

Vitaliy Makkoveev-2
Now pipex(4) is fully covered by NET_LOCK() and this is documented. But
we still have an issue with pipex(4) session itself and I guess it's
time to fix it.

We have `pipexinq' and `pipexoutq' mbuf(9) queues to store mbufs. Each
mbuf(9) passed to these queues stores the pointer to corresponding
session referenced as `m_pkthdr.ph_cookie'. We enqueue incoming mbufs for
pppx(4) and incoming and outgoing mbufs for pppac(4). But we don't
enqueue pppoe related mbufs. After packet was enqueued to corresponding
queue we call schednetisr() which just schedules netisr() to run:

---- cut begin ----

780 pipex_ppp_enqueue(struct mbuf *m0, struct pipex_session *session,
781     struct mbuf_queue *mq)
782 {
783         m0->m_pkthdr.ph_cookie = session;
784         /* XXX need to support other protocols */
785         m0->m_pkthdr.ph_ppp_proto = PPP_IP;
786
787         if (mq_enqueue(mq, m0) != 0)
788                 return (1);
789
790         schednetisr(NETISR_PIPEX);
791
792         return (0);
793 }

---- cut end ----

Also we have pipex_timer() which should destroy session in safe way, but
it does this only for pppac(4) and only for sessions closed by
`PIPEXDSESSION' command:

---- cut begin ----

812 pipex_timer(void *ignored_arg)
813 {
        /* skip */
846                 case PIPEX_STATE_CLOSED:
847                         /*
848                          * mbuf queued in pipexinq or pipexoutq may have a
849     * refererce to this session.
850                          */
851                         if (!mq_empty(&pipexinq) || !mq_empty(&pipexoutq))
852                                 continue;
853
854                         pipex_destroy_session(session);
855                         break;

---- cut end ----

While we destroy sessions through pipex_rele_session() or through
pipex_iface_fini() or through `PIPEXSMODE' command we don't check
`pipexinq' and `pipexoutq' state. This means we can break them.

It's not guaranteed that netisr() will start just after schednetisr()
call. This means we can destroy session, but corresponding mbuf(9) is
stored within `pipexinq' or `pipexoutq'. It's `m_pkthdr.ph_cookie' still
stores pointer to destroyed session and we have use after free issue. I
wonder why we didn't caught panic yet.

I propose to kill `pipexinq', `pipexoutq' and pipexintr(). There is
absolutely no reason them to exist. This should not only fix issue
described above but simplifies code too.

Other ways are to implement reference counters for session or walk
through mbuf(9) queues and kill corresponding mbufs. It doesn't make
sense to go these ways.

Index: lib/libc/sys/sysctl.2
===================================================================
RCS file: /cvs/src/lib/libc/sys/sysctl.2,v
retrieving revision 1.40
diff -u -p -r1.40 sysctl.2
--- lib/libc/sys/sysctl.2 17 May 2020 05:48:39 -0000 1.40
+++ lib/libc/sys/sysctl.2 29 Jul 2020 13:47:40 -0000
@@ -2033,35 +2033,11 @@ The currently defined variable names are
 .Bl -column "Third level name" "integer" "Changeable" -offset indent
 .It Sy "Third level name" Ta Sy "Type" Ta Sy "Changeable"
 .It Dv PIPEXCTL_ENABLE Ta integer Ta yes
-.It Dv PIPEXCTL_INQ Ta node Ta not applicable
-.It Dv PIPEXCTL_OUTQ Ta node Ta not applicable
 .El
 .Bl -tag -width "123456"
 .It Dv PIPEXCTL_ENABLE
 If set to 1, enable PIPEX processing.
 The default is 0.
-.It Dv PIPEXCTL_INQ Pq Va net.pipex.inq
-Fourth level comprises an array of
-.Vt struct ifqueue
-structures containing information about the PIPEX packet input queue.
-The forth level names for the elements of
-.Vt struct ifqueue
-are the same as described in
-.Li ip.arpq
-in the
-.Dv PF_INET
-section.
-.It Dv PIPEXCTL_OUTQ Pq Va net.pipex.outq
-Fourth level comprises an array of
-.Vt struct ifqueue
-structures containing information about PIPEX packet output queue.
-The forth level names for the elements of
-.Vt struct ifqueue
-are the same as described in
-.Li ip.arpq
-in the
-.Dv PF_INET
-section.
 .El
 .El
 .Ss CTL_VFS
Index: sys/net/if.c
===================================================================
RCS file: /cvs/src/sys/net/if.c,v
retrieving revision 1.616
diff -u -p -r1.616 if.c
--- sys/net/if.c 24 Jul 2020 18:17:14 -0000 1.616
+++ sys/net/if.c 29 Jul 2020 13:47:44 -0000
@@ -909,13 +909,6 @@ if_netisr(void *unused)
  KERNEL_UNLOCK();
  }
 #endif
-#ifdef PIPEX
- if (n & (1 << NETISR_PIPEX)) {
- KERNEL_LOCK();
- pipexintr();
- KERNEL_UNLOCK();
- }
-#endif
  t |= n;
  }
 
Index: sys/net/netisr.h
===================================================================
RCS file: /cvs/src/sys/net/netisr.h,v
retrieving revision 1.51
diff -u -p -r1.51 netisr.h
--- sys/net/netisr.h 6 Aug 2019 22:57:54 -0000 1.51
+++ sys/net/netisr.h 29 Jul 2020 13:47:44 -0000
@@ -48,7 +48,6 @@
 #define NETISR_IPV6 24 /* same as AF_INET6 */
 #define NETISR_ISDN 26 /* same as AF_E164 */
 #define NETISR_PPP 28 /* for PPP processing */
-#define NETISR_PIPEX 27 /* for pipex processing */
 #define NETISR_BRIDGE 29 /* for bridge processing */
 #define NETISR_PPPOE 30 /* for pppoe processing */
 #define NETISR_SWITCH 31 /* for switch dataplane */
@@ -68,7 +67,6 @@ void bridgeintr(void);
 void pppoeintr(void);
 void switchintr(void);
 void pfsyncintr(void);
-void pipexintr(void);
 
 #define schednetisr(anisr) \
 do { \
Index: sys/net/pipex.c
===================================================================
RCS file: /cvs/src/sys/net/pipex.c,v
retrieving revision 1.122
diff -u -p -r1.122 pipex.c
--- sys/net/pipex.c 29 Jul 2020 12:09:31 -0000 1.122
+++ sys/net/pipex.c 29 Jul 2020 13:47:44 -0000
@@ -102,10 +102,6 @@ struct radix_node_head *pipex_rd_head6 =
 struct timeout pipex_timer_ch; /* callout timer context */
 int pipex_prune = 1; /* [I] walk list every seconds */
 
-/* pipex traffic queue */
-struct mbuf_queue pipexinq = MBUF_QUEUE_INITIALIZER(IFQ_MAXLEN, IPL_NET);
-struct mbuf_queue pipexoutq = MBUF_QUEUE_INITIALIZER(IFQ_MAXLEN, IPL_NET);
-
 /* borrow an mbuf pkthdr field */
 #define ph_ppp_proto ether_vtag
 
@@ -725,82 +721,6 @@ pipex_lookup_by_session_id(int protocol,
 }
 
 /***********************************************************************
- * Queue and Software Interrupt Handler
- ***********************************************************************/
-void
-pipexintr(void)
-{
- struct pipex_session *pkt_session;
- u_int16_t proto;
- struct mbuf *m;
- struct mbuf_list ml;
-
- /* ppp output */
- mq_delist(&pipexoutq, &ml);
- while ((m = ml_dequeue(&ml)) != NULL) {
- pkt_session = m->m_pkthdr.ph_cookie;
- if (pkt_session == NULL) {
- m_freem(m);
- continue;
- }
- proto = m->m_pkthdr.ph_ppp_proto;
-
- m->m_pkthdr.ph_cookie = NULL;
- m->m_pkthdr.ph_ppp_proto = 0;
-
- if (pkt_session->is_multicast != 0) {
- struct pipex_session *session;
- struct mbuf *m0;
-
- LIST_FOREACH(session, &pipex_session_list,
-    session_list) {
- if (session->pipex_iface !=
-    pkt_session->pipex_iface)
- continue;
- if (session->ip_forward == 0 &&
-    session->ip6_forward == 0)
- continue;
- m0 = m_copym(m, 0, M_COPYALL, M_NOWAIT);
- if (m0 == NULL) {
- session->stat.oerrors++;
- continue;
- }
- pipex_ppp_output(m0, session, proto);
- }
- m_freem(m);
- } else
- pipex_ppp_output(m, pkt_session, proto);
- }
-
- /* ppp input */
- mq_delist(&pipexinq, &ml);
- while ((m = ml_dequeue(&ml)) != NULL) {
- pkt_session = m->m_pkthdr.ph_cookie;
- if (pkt_session == NULL) {
- m_freem(m);
- continue;
- }
- pipex_ppp_input(m, pkt_session, 0);
- }
-}
-
-Static int
-pipex_ppp_enqueue(struct mbuf *m0, struct pipex_session *session,
-    struct mbuf_queue *mq)
-{
- m0->m_pkthdr.ph_cookie = session;
- /* XXX need to support other protocols */
- m0->m_pkthdr.ph_ppp_proto = PPP_IP;
-
- if (mq_enqueue(mq, m0) != 0)
- return (1);
-
- schednetisr(NETISR_PIPEX);
-
- return (0);
-}
-
-/***********************************************************************
  * Timer functions
  ***********************************************************************/
 Static void
@@ -852,13 +772,6 @@ pipex_timer(void *ignored_arg)
  /* FALLTHROUGH */
 
  case PIPEX_STATE_CLOSED:
- /*
- * mbuf queued in pipexinq or pipexoutq may have a
- * refererce to this session.
- */
- if (!mq_empty(&pipexinq) || !mq_empty(&pipexoutq))
- continue;
-
  pipex_destroy_session(session);
  break;
 
@@ -958,12 +871,29 @@ pipex_ip_output(struct mbuf *m0, struct
  if (m0 == NULL)
  goto dropped;
  }
- } else
+
+ pipex_ppp_output(m0, session, PPP_IP);
+ } else {
+ struct pipex_session *session_tmp;
+ struct mbuf *m;
+
  m0->m_flags &= ~(M_BCAST|M_MCAST);
 
- /* output ip packets to the session tunnel */
- if (pipex_ppp_enqueue(m0, session, &pipexoutq))
- goto dropped;
+ LIST_FOREACH(session_tmp, &pipex_session_list, session_list) {
+ if (session_tmp->pipex_iface != session->pipex_iface)
+ continue;
+ if (session_tmp->ip_forward == 0 &&
+    session_tmp->ip6_forward == 0)
+ continue;
+ m = m_copym(m0, 0, M_COPYALL, M_NOWAIT);
+ if (m == NULL) {
+ session_tmp->stat.oerrors++;
+ continue;
+ }
+ pipex_ppp_output(m, session_tmp, PPP_IP);
+ }
+ m_freem(m0);
+ }
 
  return;
 drop:
@@ -1241,7 +1171,7 @@ drop:
 
 Static struct mbuf *
 pipex_common_input(struct pipex_session *session, struct mbuf *m0, int hlen,
-    int plen, int useq)
+    int plen)
 {
  int proto, ppphlen;
  u_char code;
@@ -1289,19 +1219,12 @@ pipex_common_input(struct pipex_session
  m_adj(m0, plen - m0->m_pkthdr.len);
  }
 
- if (!useq) {
- pipex_ppp_input(m0, session, 0);
- return (NULL);
- }
+ pipex_ppp_input(m0, session, 0);
+
+ return (NULL);
 
- /* input ppp packets to kernel session */
- if (pipex_ppp_enqueue(m0, session, &pipexinq) != 0)
- goto dropped;
- else
- return (NULL);
 drop:
  m_freem(m0);
-dropped:
  session->stat.ierrors++;
  return (NULL);
 
@@ -1398,7 +1321,7 @@ pipex_pppoe_input(struct mbuf *m0, struc
     sizeof(struct pipex_pppoe_header), (caddr_t)&pppoe);
 
  hlen = sizeof(struct ether_header) + sizeof(struct pipex_pppoe_header);
- if ((m0 = pipex_common_input(session, m0, hlen, ntohs(pppoe.length), 0))
+ if ((m0 = pipex_common_input(session, m0, hlen, ntohs(pppoe.length)))
     == NULL)
  return (NULL);
  m_freem(m0);
@@ -1694,7 +1617,7 @@ pipex_pptp_input(struct mbuf *m0, struct
  pipex_pptp_output(NULL, session, 0, 1);
  }
 
- if ((m0 = pipex_common_input(session, m0, hlen, ntohs(gre->len), 1))
+ if ((m0 = pipex_common_input(session, m0, hlen, ntohs(gre->len)))
     == NULL) {
  /* ok,  The packet is for PIPEX */
  if (!rewind)
@@ -2129,7 +2052,7 @@ pipex_l2tp_input(struct mbuf *m0, int of
 
  length -= hlen + offset;
  hlen += off0 + offset;
- if ((m0 = pipex_common_input(session, m0, hlen, length, 1)) == NULL) {
+ if ((m0 = pipex_common_input(session, m0, hlen, length)) == NULL) {
  /* ok,  The packet is for PIPEX */
  if (!rewind)
  session->proto.l2tp.nr_gap += nseq;
@@ -3052,12 +2975,6 @@ pipex_sysctl(int *name, u_int namelen, v
  return (ENOTDIR);
  return (sysctl_int(oldp, oldlenp, newp, newlen,
     &pipex_enable));
- case PIPEXCTL_INQ:
-        return (sysctl_mq(name + 1, namelen - 1,
-    oldp, oldlenp, newp, newlen, &pipexinq));
- case PIPEXCTL_OUTQ:
-        return (sysctl_mq(name + 1, namelen - 1,
-    oldp, oldlenp, newp, newlen, &pipexoutq));
  default:
  return (ENOPROTOOPT);
  }
Index: sys/net/pipex.h
===================================================================
RCS file: /cvs/src/sys/net/pipex.h,v
retrieving revision 1.26
diff -u -p -r1.26 pipex.h
--- sys/net/pipex.h 29 Jul 2020 12:09:31 -0000 1.26
+++ sys/net/pipex.h 29 Jul 2020 13:47:44 -0000
@@ -33,15 +33,11 @@
  * Names for pipex sysctl objects
  */
 #define PIPEXCTL_ENABLE 1
-#define PIPEXCTL_INQ 2
-#define PIPEXCTL_OUTQ 3
-#define PIPEXCTL_MAXID 4
+#define PIPEXCTL_MAXID 2
 
 #define PIPEXCTL_NAMES { \
         { 0, 0 }, \
         { "enable", CTLTYPE_INT }, \
-        { "inq", CTLTYPE_NODE }, \
-        { "outq", CTLTYPE_NODE }, \
 }
 
 #define PIPEX_PROTO_L2TP 1 /* protocol L2TP */
Index: sys/net/pipex_local.h
===================================================================
RCS file: /cvs/src/sys/net/pipex_local.h,v
retrieving revision 1.38
diff -u -p -r1.38 pipex_local.h
--- sys/net/pipex_local.h 29 Jul 2020 12:09:31 -0000 1.38
+++ sys/net/pipex_local.h 29 Jul 2020 13:47:44 -0000
@@ -403,7 +403,8 @@ void                  pipex_ip_input (st
 #ifdef INET6
 void                  pipex_ip6_input (struct mbuf *, struct pipex_session *);
 #endif
-struct mbuf           *pipex_common_input(struct pipex_session *, struct mbuf *, int, int, int);
+struct mbuf           *pipex_common_input(struct pipex_session *,
+                          struct mbuf *, int, int);
 
 #ifdef PIPEX_PPPOE
 void                  pipex_pppoe_output (struct mbuf *, struct pipex_session *);

Reply | Threaded
Open this post in threaded view
|

Re: pipex(4): kill pipexintr()

Martin Pieuchot
On 29/07/20(Wed) 23:04, Vitaliy Makkoveev wrote:

> Now pipex(4) is fully covered by NET_LOCK() and this is documented. But
> we still have an issue with pipex(4) session itself and I guess it's
> time to fix it.
>
> We have `pipexinq' and `pipexoutq' mbuf(9) queues to store mbufs. Each
> mbuf(9) passed to these queues stores the pointer to corresponding
> session referenced as `m_pkthdr.ph_cookie'. We enqueue incoming mbufs for
> pppx(4) and incoming and outgoing mbufs for pppac(4). But we don't
> enqueue pppoe related mbufs. After packet was enqueued to corresponding
> queue we call schednetisr() which just schedules netisr() to run:
>
> ---- cut begin ----
>
> 780 pipex_ppp_enqueue(struct mbuf *m0, struct pipex_session *session,
> 781     struct mbuf_queue *mq)
> 782 {
> 783         m0->m_pkthdr.ph_cookie = session;
> 784         /* XXX need to support other protocols */
> 785         m0->m_pkthdr.ph_ppp_proto = PPP_IP;
> 786
> 787         if (mq_enqueue(mq, m0) != 0)
> 788                 return (1);
> 789
> 790         schednetisr(NETISR_PIPEX);
> 791
> 792         return (0);
> 793 }
>
> ---- cut end ----
>
> Also we have pipex_timer() which should destroy session in safe way, but
> it does this only for pppac(4) and only for sessions closed by
> `PIPEXDSESSION' command:
>
> ---- cut begin ----
>
> 812 pipex_timer(void *ignored_arg)
> 813 {
> /* skip */
> 846                 case PIPEX_STATE_CLOSED:
> 847                         /*
> 848                          * mbuf queued in pipexinq or pipexoutq may have a
> 849     * refererce to this session.
> 850                          */
> 851                         if (!mq_empty(&pipexinq) || !mq_empty(&pipexoutq))
> 852                                 continue;
> 853
> 854                         pipex_destroy_session(session);
> 855                         break;
>
> ---- cut end ----
>
> While we destroy sessions through pipex_rele_session() or through
> pipex_iface_fini() or through `PIPEXSMODE' command we don't check
> `pipexinq' and `pipexoutq' state. This means we can break them.
>
> It's not guaranteed that netisr() will start just after schednetisr()
> call. This means we can destroy session, but corresponding mbuf(9) is
> stored within `pipexinq' or `pipexoutq'. It's `m_pkthdr.ph_cookie' still
> stores pointer to destroyed session and we have use after free issue. I
> wonder why we didn't caught panic yet.
>
> I propose to kill `pipexinq', `pipexoutq' and pipexintr(). There is
> absolutely no reason them to exist. This should not only fix issue
> described above but simplifies code too.

Time is fine.  Make sure you watch for possible fallouts.   If you're
curious you can generate Flamegraphs or profile the kernel with and
without this diff to get an idea of what get executed.  This change
should improve latency by reducing contention on the KERNEL_LOCK().

ok mpi@

Note that as a next step it could be beneficial to pass an `ifp' pointer
to pipex_ip{,6}_input() and maybe even pipex_ppp_input() to reduce the
number of if_get(9).  This will make an interesting pattern appear ;)

> Index: lib/libc/sys/sysctl.2
> ===================================================================
> RCS file: /cvs/src/lib/libc/sys/sysctl.2,v
> retrieving revision 1.40
> diff -u -p -r1.40 sysctl.2
> --- lib/libc/sys/sysctl.2 17 May 2020 05:48:39 -0000 1.40
> +++ lib/libc/sys/sysctl.2 29 Jul 2020 13:47:40 -0000
> @@ -2033,35 +2033,11 @@ The currently defined variable names are
>  .Bl -column "Third level name" "integer" "Changeable" -offset indent
>  .It Sy "Third level name" Ta Sy "Type" Ta Sy "Changeable"
>  .It Dv PIPEXCTL_ENABLE Ta integer Ta yes
> -.It Dv PIPEXCTL_INQ Ta node Ta not applicable
> -.It Dv PIPEXCTL_OUTQ Ta node Ta not applicable
>  .El
>  .Bl -tag -width "123456"
>  .It Dv PIPEXCTL_ENABLE
>  If set to 1, enable PIPEX processing.
>  The default is 0.
> -.It Dv PIPEXCTL_INQ Pq Va net.pipex.inq
> -Fourth level comprises an array of
> -.Vt struct ifqueue
> -structures containing information about the PIPEX packet input queue.
> -The forth level names for the elements of
> -.Vt struct ifqueue
> -are the same as described in
> -.Li ip.arpq
> -in the
> -.Dv PF_INET
> -section.
> -.It Dv PIPEXCTL_OUTQ Pq Va net.pipex.outq
> -Fourth level comprises an array of
> -.Vt struct ifqueue
> -structures containing information about PIPEX packet output queue.
> -The forth level names for the elements of
> -.Vt struct ifqueue
> -are the same as described in
> -.Li ip.arpq
> -in the
> -.Dv PF_INET
> -section.
>  .El
>  .El
>  .Ss CTL_VFS
> Index: sys/net/if.c
> ===================================================================
> RCS file: /cvs/src/sys/net/if.c,v
> retrieving revision 1.616
> diff -u -p -r1.616 if.c
> --- sys/net/if.c 24 Jul 2020 18:17:14 -0000 1.616
> +++ sys/net/if.c 29 Jul 2020 13:47:44 -0000
> @@ -909,13 +909,6 @@ if_netisr(void *unused)
>   KERNEL_UNLOCK();
>   }
>  #endif
> -#ifdef PIPEX
> - if (n & (1 << NETISR_PIPEX)) {
> - KERNEL_LOCK();
> - pipexintr();
> - KERNEL_UNLOCK();
> - }
> -#endif
>   t |= n;
>   }
>  
> Index: sys/net/netisr.h
> ===================================================================
> RCS file: /cvs/src/sys/net/netisr.h,v
> retrieving revision 1.51
> diff -u -p -r1.51 netisr.h
> --- sys/net/netisr.h 6 Aug 2019 22:57:54 -0000 1.51
> +++ sys/net/netisr.h 29 Jul 2020 13:47:44 -0000
> @@ -48,7 +48,6 @@
>  #define NETISR_IPV6 24 /* same as AF_INET6 */
>  #define NETISR_ISDN 26 /* same as AF_E164 */
>  #define NETISR_PPP 28 /* for PPP processing */
> -#define NETISR_PIPEX 27 /* for pipex processing */
>  #define NETISR_BRIDGE 29 /* for bridge processing */
>  #define NETISR_PPPOE 30 /* for pppoe processing */
>  #define NETISR_SWITCH 31 /* for switch dataplane */
> @@ -68,7 +67,6 @@ void bridgeintr(void);
>  void pppoeintr(void);
>  void switchintr(void);
>  void pfsyncintr(void);
> -void pipexintr(void);
>  
>  #define schednetisr(anisr) \
>  do { \
> Index: sys/net/pipex.c
> ===================================================================
> RCS file: /cvs/src/sys/net/pipex.c,v
> retrieving revision 1.122
> diff -u -p -r1.122 pipex.c
> --- sys/net/pipex.c 29 Jul 2020 12:09:31 -0000 1.122
> +++ sys/net/pipex.c 29 Jul 2020 13:47:44 -0000
> @@ -102,10 +102,6 @@ struct radix_node_head *pipex_rd_head6 =
>  struct timeout pipex_timer_ch; /* callout timer context */
>  int pipex_prune = 1; /* [I] walk list every seconds */
>  
> -/* pipex traffic queue */
> -struct mbuf_queue pipexinq = MBUF_QUEUE_INITIALIZER(IFQ_MAXLEN, IPL_NET);
> -struct mbuf_queue pipexoutq = MBUF_QUEUE_INITIALIZER(IFQ_MAXLEN, IPL_NET);
> -
>  /* borrow an mbuf pkthdr field */
>  #define ph_ppp_proto ether_vtag
>  
> @@ -725,82 +721,6 @@ pipex_lookup_by_session_id(int protocol,
>  }
>  
>  /***********************************************************************
> - * Queue and Software Interrupt Handler
> - ***********************************************************************/
> -void
> -pipexintr(void)
> -{
> - struct pipex_session *pkt_session;
> - u_int16_t proto;
> - struct mbuf *m;
> - struct mbuf_list ml;
> -
> - /* ppp output */
> - mq_delist(&pipexoutq, &ml);
> - while ((m = ml_dequeue(&ml)) != NULL) {
> - pkt_session = m->m_pkthdr.ph_cookie;
> - if (pkt_session == NULL) {
> - m_freem(m);
> - continue;
> - }
> - proto = m->m_pkthdr.ph_ppp_proto;
> -
> - m->m_pkthdr.ph_cookie = NULL;
> - m->m_pkthdr.ph_ppp_proto = 0;
> -
> - if (pkt_session->is_multicast != 0) {
> - struct pipex_session *session;
> - struct mbuf *m0;
> -
> - LIST_FOREACH(session, &pipex_session_list,
> -    session_list) {
> - if (session->pipex_iface !=
> -    pkt_session->pipex_iface)
> - continue;
> - if (session->ip_forward == 0 &&
> -    session->ip6_forward == 0)
> - continue;
> - m0 = m_copym(m, 0, M_COPYALL, M_NOWAIT);
> - if (m0 == NULL) {
> - session->stat.oerrors++;
> - continue;
> - }
> - pipex_ppp_output(m0, session, proto);
> - }
> - m_freem(m);
> - } else
> - pipex_ppp_output(m, pkt_session, proto);
> - }
> -
> - /* ppp input */
> - mq_delist(&pipexinq, &ml);
> - while ((m = ml_dequeue(&ml)) != NULL) {
> - pkt_session = m->m_pkthdr.ph_cookie;
> - if (pkt_session == NULL) {
> - m_freem(m);
> - continue;
> - }
> - pipex_ppp_input(m, pkt_session, 0);
> - }
> -}
> -
> -Static int
> -pipex_ppp_enqueue(struct mbuf *m0, struct pipex_session *session,
> -    struct mbuf_queue *mq)
> -{
> - m0->m_pkthdr.ph_cookie = session;
> - /* XXX need to support other protocols */
> - m0->m_pkthdr.ph_ppp_proto = PPP_IP;
> -
> - if (mq_enqueue(mq, m0) != 0)
> - return (1);
> -
> - schednetisr(NETISR_PIPEX);
> -
> - return (0);
> -}
> -
> -/***********************************************************************
>   * Timer functions
>   ***********************************************************************/
>  Static void
> @@ -852,13 +772,6 @@ pipex_timer(void *ignored_arg)
>   /* FALLTHROUGH */
>  
>   case PIPEX_STATE_CLOSED:
> - /*
> - * mbuf queued in pipexinq or pipexoutq may have a
> - * refererce to this session.
> - */
> - if (!mq_empty(&pipexinq) || !mq_empty(&pipexoutq))
> - continue;
> -
>   pipex_destroy_session(session);
>   break;
>  
> @@ -958,12 +871,29 @@ pipex_ip_output(struct mbuf *m0, struct
>   if (m0 == NULL)
>   goto dropped;
>   }
> - } else
> +
> + pipex_ppp_output(m0, session, PPP_IP);
> + } else {
> + struct pipex_session *session_tmp;
> + struct mbuf *m;
> +
>   m0->m_flags &= ~(M_BCAST|M_MCAST);
>  
> - /* output ip packets to the session tunnel */
> - if (pipex_ppp_enqueue(m0, session, &pipexoutq))
> - goto dropped;
> + LIST_FOREACH(session_tmp, &pipex_session_list, session_list) {
> + if (session_tmp->pipex_iface != session->pipex_iface)
> + continue;
> + if (session_tmp->ip_forward == 0 &&
> +    session_tmp->ip6_forward == 0)
> + continue;
> + m = m_copym(m0, 0, M_COPYALL, M_NOWAIT);
> + if (m == NULL) {
> + session_tmp->stat.oerrors++;
> + continue;
> + }
> + pipex_ppp_output(m, session_tmp, PPP_IP);
> + }
> + m_freem(m0);
> + }
>  
>   return;
>  drop:
> @@ -1241,7 +1171,7 @@ drop:
>  
>  Static struct mbuf *
>  pipex_common_input(struct pipex_session *session, struct mbuf *m0, int hlen,
> -    int plen, int useq)
> +    int plen)
>  {
>   int proto, ppphlen;
>   u_char code;
> @@ -1289,19 +1219,12 @@ pipex_common_input(struct pipex_session
>   m_adj(m0, plen - m0->m_pkthdr.len);
>   }
>  
> - if (!useq) {
> - pipex_ppp_input(m0, session, 0);
> - return (NULL);
> - }
> + pipex_ppp_input(m0, session, 0);
> +
> + return (NULL);
>  
> - /* input ppp packets to kernel session */
> - if (pipex_ppp_enqueue(m0, session, &pipexinq) != 0)
> - goto dropped;
> - else
> - return (NULL);
>  drop:
>   m_freem(m0);
> -dropped:
>   session->stat.ierrors++;
>   return (NULL);
>  
> @@ -1398,7 +1321,7 @@ pipex_pppoe_input(struct mbuf *m0, struc
>      sizeof(struct pipex_pppoe_header), (caddr_t)&pppoe);
>  
>   hlen = sizeof(struct ether_header) + sizeof(struct pipex_pppoe_header);
> - if ((m0 = pipex_common_input(session, m0, hlen, ntohs(pppoe.length), 0))
> + if ((m0 = pipex_common_input(session, m0, hlen, ntohs(pppoe.length)))
>      == NULL)
>   return (NULL);
>   m_freem(m0);
> @@ -1694,7 +1617,7 @@ pipex_pptp_input(struct mbuf *m0, struct
>   pipex_pptp_output(NULL, session, 0, 1);
>   }
>  
> - if ((m0 = pipex_common_input(session, m0, hlen, ntohs(gre->len), 1))
> + if ((m0 = pipex_common_input(session, m0, hlen, ntohs(gre->len)))
>      == NULL) {
>   /* ok,  The packet is for PIPEX */
>   if (!rewind)
> @@ -2129,7 +2052,7 @@ pipex_l2tp_input(struct mbuf *m0, int of
>  
>   length -= hlen + offset;
>   hlen += off0 + offset;
> - if ((m0 = pipex_common_input(session, m0, hlen, length, 1)) == NULL) {
> + if ((m0 = pipex_common_input(session, m0, hlen, length)) == NULL) {
>   /* ok,  The packet is for PIPEX */
>   if (!rewind)
>   session->proto.l2tp.nr_gap += nseq;
> @@ -3052,12 +2975,6 @@ pipex_sysctl(int *name, u_int namelen, v
>   return (ENOTDIR);
>   return (sysctl_int(oldp, oldlenp, newp, newlen,
>      &pipex_enable));
> - case PIPEXCTL_INQ:
> -        return (sysctl_mq(name + 1, namelen - 1,
> -    oldp, oldlenp, newp, newlen, &pipexinq));
> - case PIPEXCTL_OUTQ:
> -        return (sysctl_mq(name + 1, namelen - 1,
> -    oldp, oldlenp, newp, newlen, &pipexoutq));
>   default:
>   return (ENOPROTOOPT);
>   }
> Index: sys/net/pipex.h
> ===================================================================
> RCS file: /cvs/src/sys/net/pipex.h,v
> retrieving revision 1.26
> diff -u -p -r1.26 pipex.h
> --- sys/net/pipex.h 29 Jul 2020 12:09:31 -0000 1.26
> +++ sys/net/pipex.h 29 Jul 2020 13:47:44 -0000
> @@ -33,15 +33,11 @@
>   * Names for pipex sysctl objects
>   */
>  #define PIPEXCTL_ENABLE 1
> -#define PIPEXCTL_INQ 2
> -#define PIPEXCTL_OUTQ 3
> -#define PIPEXCTL_MAXID 4
> +#define PIPEXCTL_MAXID 2
>  
>  #define PIPEXCTL_NAMES { \
>          { 0, 0 }, \
>          { "enable", CTLTYPE_INT }, \
> -        { "inq", CTLTYPE_NODE }, \
> -        { "outq", CTLTYPE_NODE }, \
>  }
>  
>  #define PIPEX_PROTO_L2TP 1 /* protocol L2TP */
> Index: sys/net/pipex_local.h
> ===================================================================
> RCS file: /cvs/src/sys/net/pipex_local.h,v
> retrieving revision 1.38
> diff -u -p -r1.38 pipex_local.h
> --- sys/net/pipex_local.h 29 Jul 2020 12:09:31 -0000 1.38
> +++ sys/net/pipex_local.h 29 Jul 2020 13:47:44 -0000
> @@ -403,7 +403,8 @@ void                  pipex_ip_input (st
>  #ifdef INET6
>  void                  pipex_ip6_input (struct mbuf *, struct pipex_session *);
>  #endif
> -struct mbuf           *pipex_common_input(struct pipex_session *, struct mbuf *, int, int, int);
> +struct mbuf           *pipex_common_input(struct pipex_session *,
> +                          struct mbuf *, int, int);
>  
>  #ifdef PIPEX_PPPOE
>  void                  pipex_pppoe_output (struct mbuf *, struct pipex_session *);

Reply | Threaded
Open this post in threaded view
|

Re: pipex(4): kill pipexintr()

YASUOKA Masahiko-3
In reply to this post by Vitaliy Makkoveev-2
Hi,

sys/net/if_ethersubr.c:
372 void
373 ether_input(struct ifnet *ifp, struct mbuf *m)
(snip)
519 #if NPPPOE > 0 || defined(PIPEX)
520         case ETHERTYPE_PPPOEDISC:
521         case ETHERTYPE_PPPOE:
522                 if (m->m_flags & (M_MCAST | M_BCAST))
523                         goto dropanyway;
524 #ifdef PIPEX
525                 if (pipex_enable) {
526                         struct pipex_session *session;
527
528                         if ((session = pipex_pppoe_lookup_session(m)) != NULL) {
529                                 pipex_pppoe_input(m, session);
530                                 return;
531                         }
532                 }
533 #endif

previously a packet which branchces to #529 is enqueued.

If the diff removes the queue, then the pipex input routine is
executed by the NIC's interrupt handler.

The queues had been made to avoid that kind of situations.

Also I don't see a relation of the use-after-free problem and killing
queues.  Can't we fix the problem unless we kill the queues?

On Wed, 29 Jul 2020 23:04:36 +0300
Vitaliy Makkoveev <[hidden email]> wrote:

> Now pipex(4) is fully covered by NET_LOCK() and this is documented. But
> we still have an issue with pipex(4) session itself and I guess it's
> time to fix it.
>
> We have `pipexinq' and `pipexoutq' mbuf(9) queues to store mbufs. Each
> mbuf(9) passed to these queues stores the pointer to corresponding
> session referenced as `m_pkthdr.ph_cookie'. We enqueue incoming mbufs for
> pppx(4) and incoming and outgoing mbufs for pppac(4). But we don't
> enqueue pppoe related mbufs. After packet was enqueued to corresponding
> queue we call schednetisr() which just schedules netisr() to run:
>
> ---- cut begin ----
>
> 780 pipex_ppp_enqueue(struct mbuf *m0, struct pipex_session *session,
> 781     struct mbuf_queue *mq)
> 782 {
> 783         m0->m_pkthdr.ph_cookie = session;
> 784         /* XXX need to support other protocols */
> 785         m0->m_pkthdr.ph_ppp_proto = PPP_IP;
> 786
> 787         if (mq_enqueue(mq, m0) != 0)
> 788                 return (1);
> 789
> 790         schednetisr(NETISR_PIPEX);
> 791
> 792         return (0);
> 793 }
>
> ---- cut end ----
>
> Also we have pipex_timer() which should destroy session in safe way, but
> it does this only for pppac(4) and only for sessions closed by
> `PIPEXDSESSION' command:
>
> ---- cut begin ----
>
> 812 pipex_timer(void *ignored_arg)
> 813 {
> /* skip */
> 846                 case PIPEX_STATE_CLOSED:
> 847                         /*
> 848                          * mbuf queued in pipexinq or pipexoutq may have a
> 849     * refererce to this session.
> 850                          */
> 851                         if (!mq_empty(&pipexinq) || !mq_empty(&pipexoutq))
> 852                                 continue;
> 853
> 854                         pipex_destroy_session(session);
> 855                         break;
>
> ---- cut end ----
>
> While we destroy sessions through pipex_rele_session() or through
> pipex_iface_fini() or through `PIPEXSMODE' command we don't check
> `pipexinq' and `pipexoutq' state. This means we can break them.
>
> It's not guaranteed that netisr() will start just after schednetisr()
> call. This means we can destroy session, but corresponding mbuf(9) is
> stored within `pipexinq' or `pipexoutq'. It's `m_pkthdr.ph_cookie' still
> stores pointer to destroyed session and we have use after free issue. I
> wonder why we didn't caught panic yet.
>
> I propose to kill `pipexinq', `pipexoutq' and pipexintr(). There is
> absolutely no reason them to exist. This should not only fix issue
> described above but simplifies code too.
>
> Other ways are to implement reference counters for session or walk
> through mbuf(9) queues and kill corresponding mbufs. It doesn't make
> sense to go these ways.
>
> Index: lib/libc/sys/sysctl.2
> ===================================================================
> RCS file: /cvs/src/lib/libc/sys/sysctl.2,v
> retrieving revision 1.40
> diff -u -p -r1.40 sysctl.2
> --- lib/libc/sys/sysctl.2 17 May 2020 05:48:39 -0000 1.40
> +++ lib/libc/sys/sysctl.2 29 Jul 2020 13:47:40 -0000
> @@ -2033,35 +2033,11 @@ The currently defined variable names are
>  .Bl -column "Third level name" "integer" "Changeable" -offset indent
>  .It Sy "Third level name" Ta Sy "Type" Ta Sy "Changeable"
>  .It Dv PIPEXCTL_ENABLE Ta integer Ta yes
> -.It Dv PIPEXCTL_INQ Ta node Ta not applicable
> -.It Dv PIPEXCTL_OUTQ Ta node Ta not applicable
>  .El
>  .Bl -tag -width "123456"
>  .It Dv PIPEXCTL_ENABLE
>  If set to 1, enable PIPEX processing.
>  The default is 0.
> -.It Dv PIPEXCTL_INQ Pq Va net.pipex.inq
> -Fourth level comprises an array of
> -.Vt struct ifqueue
> -structures containing information about the PIPEX packet input queue.
> -The forth level names for the elements of
> -.Vt struct ifqueue
> -are the same as described in
> -.Li ip.arpq
> -in the
> -.Dv PF_INET
> -section.
> -.It Dv PIPEXCTL_OUTQ Pq Va net.pipex.outq
> -Fourth level comprises an array of
> -.Vt struct ifqueue
> -structures containing information about PIPEX packet output queue.
> -The forth level names for the elements of
> -.Vt struct ifqueue
> -are the same as described in
> -.Li ip.arpq
> -in the
> -.Dv PF_INET
> -section.
>  .El
>  .El
>  .Ss CTL_VFS
> Index: sys/net/if.c
> ===================================================================
> RCS file: /cvs/src/sys/net/if.c,v
> retrieving revision 1.616
> diff -u -p -r1.616 if.c
> --- sys/net/if.c 24 Jul 2020 18:17:14 -0000 1.616
> +++ sys/net/if.c 29 Jul 2020 13:47:44 -0000
> @@ -909,13 +909,6 @@ if_netisr(void *unused)
>   KERNEL_UNLOCK();
>   }
>  #endif
> -#ifdef PIPEX
> - if (n & (1 << NETISR_PIPEX)) {
> - KERNEL_LOCK();
> - pipexintr();
> - KERNEL_UNLOCK();
> - }
> -#endif
>   t |= n;
>   }
>  
> Index: sys/net/netisr.h
> ===================================================================
> RCS file: /cvs/src/sys/net/netisr.h,v
> retrieving revision 1.51
> diff -u -p -r1.51 netisr.h
> --- sys/net/netisr.h 6 Aug 2019 22:57:54 -0000 1.51
> +++ sys/net/netisr.h 29 Jul 2020 13:47:44 -0000
> @@ -48,7 +48,6 @@
>  #define NETISR_IPV6 24 /* same as AF_INET6 */
>  #define NETISR_ISDN 26 /* same as AF_E164 */
>  #define NETISR_PPP 28 /* for PPP processing */
> -#define NETISR_PIPEX 27 /* for pipex processing */
>  #define NETISR_BRIDGE 29 /* for bridge processing */
>  #define NETISR_PPPOE 30 /* for pppoe processing */
>  #define NETISR_SWITCH 31 /* for switch dataplane */
> @@ -68,7 +67,6 @@ void bridgeintr(void);
>  void pppoeintr(void);
>  void switchintr(void);
>  void pfsyncintr(void);
> -void pipexintr(void);
>  
>  #define schednetisr(anisr) \
>  do { \
> Index: sys/net/pipex.c
> ===================================================================
> RCS file: /cvs/src/sys/net/pipex.c,v
> retrieving revision 1.122
> diff -u -p -r1.122 pipex.c
> --- sys/net/pipex.c 29 Jul 2020 12:09:31 -0000 1.122
> +++ sys/net/pipex.c 29 Jul 2020 13:47:44 -0000
> @@ -102,10 +102,6 @@ struct radix_node_head *pipex_rd_head6 =
>  struct timeout pipex_timer_ch; /* callout timer context */
>  int pipex_prune = 1; /* [I] walk list every seconds */
>  
> -/* pipex traffic queue */
> -struct mbuf_queue pipexinq = MBUF_QUEUE_INITIALIZER(IFQ_MAXLEN, IPL_NET);
> -struct mbuf_queue pipexoutq = MBUF_QUEUE_INITIALIZER(IFQ_MAXLEN, IPL_NET);
> -
>  /* borrow an mbuf pkthdr field */
>  #define ph_ppp_proto ether_vtag
>  
> @@ -725,82 +721,6 @@ pipex_lookup_by_session_id(int protocol,
>  }
>  
>  /***********************************************************************
> - * Queue and Software Interrupt Handler
> - ***********************************************************************/
> -void
> -pipexintr(void)
> -{
> - struct pipex_session *pkt_session;
> - u_int16_t proto;
> - struct mbuf *m;
> - struct mbuf_list ml;
> -
> - /* ppp output */
> - mq_delist(&pipexoutq, &ml);
> - while ((m = ml_dequeue(&ml)) != NULL) {
> - pkt_session = m->m_pkthdr.ph_cookie;
> - if (pkt_session == NULL) {
> - m_freem(m);
> - continue;
> - }
> - proto = m->m_pkthdr.ph_ppp_proto;
> -
> - m->m_pkthdr.ph_cookie = NULL;
> - m->m_pkthdr.ph_ppp_proto = 0;
> -
> - if (pkt_session->is_multicast != 0) {
> - struct pipex_session *session;
> - struct mbuf *m0;
> -
> - LIST_FOREACH(session, &pipex_session_list,
> -    session_list) {
> - if (session->pipex_iface !=
> -    pkt_session->pipex_iface)
> - continue;
> - if (session->ip_forward == 0 &&
> -    session->ip6_forward == 0)
> - continue;
> - m0 = m_copym(m, 0, M_COPYALL, M_NOWAIT);
> - if (m0 == NULL) {
> - session->stat.oerrors++;
> - continue;
> - }
> - pipex_ppp_output(m0, session, proto);
> - }
> - m_freem(m);
> - } else
> - pipex_ppp_output(m, pkt_session, proto);
> - }
> -
> - /* ppp input */
> - mq_delist(&pipexinq, &ml);
> - while ((m = ml_dequeue(&ml)) != NULL) {
> - pkt_session = m->m_pkthdr.ph_cookie;
> - if (pkt_session == NULL) {
> - m_freem(m);
> - continue;
> - }
> - pipex_ppp_input(m, pkt_session, 0);
> - }
> -}
> -
> -Static int
> -pipex_ppp_enqueue(struct mbuf *m0, struct pipex_session *session,
> -    struct mbuf_queue *mq)
> -{
> - m0->m_pkthdr.ph_cookie = session;
> - /* XXX need to support other protocols */
> - m0->m_pkthdr.ph_ppp_proto = PPP_IP;
> -
> - if (mq_enqueue(mq, m0) != 0)
> - return (1);
> -
> - schednetisr(NETISR_PIPEX);
> -
> - return (0);
> -}
> -
> -/***********************************************************************
>   * Timer functions
>   ***********************************************************************/
>  Static void
> @@ -852,13 +772,6 @@ pipex_timer(void *ignored_arg)
>   /* FALLTHROUGH */
>  
>   case PIPEX_STATE_CLOSED:
> - /*
> - * mbuf queued in pipexinq or pipexoutq may have a
> - * refererce to this session.
> - */
> - if (!mq_empty(&pipexinq) || !mq_empty(&pipexoutq))
> - continue;
> -
>   pipex_destroy_session(session);
>   break;
>  
> @@ -958,12 +871,29 @@ pipex_ip_output(struct mbuf *m0, struct
>   if (m0 == NULL)
>   goto dropped;
>   }
> - } else
> +
> + pipex_ppp_output(m0, session, PPP_IP);
> + } else {
> + struct pipex_session *session_tmp;
> + struct mbuf *m;
> +
>   m0->m_flags &= ~(M_BCAST|M_MCAST);
>  
> - /* output ip packets to the session tunnel */
> - if (pipex_ppp_enqueue(m0, session, &pipexoutq))
> - goto dropped;
> + LIST_FOREACH(session_tmp, &pipex_session_list, session_list) {
> + if (session_tmp->pipex_iface != session->pipex_iface)
> + continue;
> + if (session_tmp->ip_forward == 0 &&
> +    session_tmp->ip6_forward == 0)
> + continue;
> + m = m_copym(m0, 0, M_COPYALL, M_NOWAIT);
> + if (m == NULL) {
> + session_tmp->stat.oerrors++;
> + continue;
> + }
> + pipex_ppp_output(m, session_tmp, PPP_IP);
> + }
> + m_freem(m0);
> + }
>  
>   return;
>  drop:
> @@ -1241,7 +1171,7 @@ drop:
>  
>  Static struct mbuf *
>  pipex_common_input(struct pipex_session *session, struct mbuf *m0, int hlen,
> -    int plen, int useq)
> +    int plen)
>  {
>   int proto, ppphlen;
>   u_char code;
> @@ -1289,19 +1219,12 @@ pipex_common_input(struct pipex_session
>   m_adj(m0, plen - m0->m_pkthdr.len);
>   }
>  
> - if (!useq) {
> - pipex_ppp_input(m0, session, 0);
> - return (NULL);
> - }
> + pipex_ppp_input(m0, session, 0);
> +
> + return (NULL);
>  
> - /* input ppp packets to kernel session */
> - if (pipex_ppp_enqueue(m0, session, &pipexinq) != 0)
> - goto dropped;
> - else
> - return (NULL);
>  drop:
>   m_freem(m0);
> -dropped:
>   session->stat.ierrors++;
>   return (NULL);
>  
> @@ -1398,7 +1321,7 @@ pipex_pppoe_input(struct mbuf *m0, struc
>      sizeof(struct pipex_pppoe_header), (caddr_t)&pppoe);
>  
>   hlen = sizeof(struct ether_header) + sizeof(struct pipex_pppoe_header);
> - if ((m0 = pipex_common_input(session, m0, hlen, ntohs(pppoe.length), 0))
> + if ((m0 = pipex_common_input(session, m0, hlen, ntohs(pppoe.length)))
>      == NULL)
>   return (NULL);
>   m_freem(m0);
> @@ -1694,7 +1617,7 @@ pipex_pptp_input(struct mbuf *m0, struct
>   pipex_pptp_output(NULL, session, 0, 1);
>   }
>  
> - if ((m0 = pipex_common_input(session, m0, hlen, ntohs(gre->len), 1))
> + if ((m0 = pipex_common_input(session, m0, hlen, ntohs(gre->len)))
>      == NULL) {
>   /* ok,  The packet is for PIPEX */
>   if (!rewind)
> @@ -2129,7 +2052,7 @@ pipex_l2tp_input(struct mbuf *m0, int of
>  
>   length -= hlen + offset;
>   hlen += off0 + offset;
> - if ((m0 = pipex_common_input(session, m0, hlen, length, 1)) == NULL) {
> + if ((m0 = pipex_common_input(session, m0, hlen, length)) == NULL) {
>   /* ok,  The packet is for PIPEX */
>   if (!rewind)
>   session->proto.l2tp.nr_gap += nseq;
> @@ -3052,12 +2975,6 @@ pipex_sysctl(int *name, u_int namelen, v
>   return (ENOTDIR);
>   return (sysctl_int(oldp, oldlenp, newp, newlen,
>      &pipex_enable));
> - case PIPEXCTL_INQ:
> -        return (sysctl_mq(name + 1, namelen - 1,
> -    oldp, oldlenp, newp, newlen, &pipexinq));
> - case PIPEXCTL_OUTQ:
> -        return (sysctl_mq(name + 1, namelen - 1,
> -    oldp, oldlenp, newp, newlen, &pipexoutq));
>   default:
>   return (ENOPROTOOPT);
>   }
> Index: sys/net/pipex.h
> ===================================================================
> RCS file: /cvs/src/sys/net/pipex.h,v
> retrieving revision 1.26
> diff -u -p -r1.26 pipex.h
> --- sys/net/pipex.h 29 Jul 2020 12:09:31 -0000 1.26
> +++ sys/net/pipex.h 29 Jul 2020 13:47:44 -0000
> @@ -33,15 +33,11 @@
>   * Names for pipex sysctl objects
>   */
>  #define PIPEXCTL_ENABLE 1
> -#define PIPEXCTL_INQ 2
> -#define PIPEXCTL_OUTQ 3
> -#define PIPEXCTL_MAXID 4
> +#define PIPEXCTL_MAXID 2
>  
>  #define PIPEXCTL_NAMES { \
>          { 0, 0 }, \
>          { "enable", CTLTYPE_INT }, \
> -        { "inq", CTLTYPE_NODE }, \
> -        { "outq", CTLTYPE_NODE }, \
>  }
>  
>  #define PIPEX_PROTO_L2TP 1 /* protocol L2TP */
> Index: sys/net/pipex_local.h
> ===================================================================
> RCS file: /cvs/src/sys/net/pipex_local.h,v
> retrieving revision 1.38
> diff -u -p -r1.38 pipex_local.h
> --- sys/net/pipex_local.h 29 Jul 2020 12:09:31 -0000 1.38
> +++ sys/net/pipex_local.h 29 Jul 2020 13:47:44 -0000
> @@ -403,7 +403,8 @@ void                  pipex_ip_input (st
>  #ifdef INET6
>  void                  pipex_ip6_input (struct mbuf *, struct pipex_session *);
>  #endif
> -struct mbuf           *pipex_common_input(struct pipex_session *, struct mbuf *, int, int, int);
> +struct mbuf           *pipex_common_input(struct pipex_session *,
> +                          struct mbuf *, int, int);
>  
>  #ifdef PIPEX_PPPOE
>  void                  pipex_pppoe_output (struct mbuf *, struct pipex_session *);
>

Reply | Threaded
Open this post in threaded view
|

Re: pipex(4): kill pipexintr()

Vitaliy Makkoveev-2
On Thu, Jul 30, 2020 at 09:13:46PM +0900, YASUOKA Masahiko wrote:

> Hi,
>
> sys/net/if_ethersubr.c:
> 372 void
> 373 ether_input(struct ifnet *ifp, struct mbuf *m)
> (snip)
> 519 #if NPPPOE > 0 || defined(PIPEX)
> 520         case ETHERTYPE_PPPOEDISC:
> 521         case ETHERTYPE_PPPOE:
> 522                 if (m->m_flags & (M_MCAST | M_BCAST))
> 523                         goto dropanyway;
> 524 #ifdef PIPEX
> 525                 if (pipex_enable) {
> 526                         struct pipex_session *session;
> 527
> 528                         if ((session = pipex_pppoe_lookup_session(m)) != NULL) {
> 529                                 pipex_pppoe_input(m, session);
> 530                                 return;
> 531                         }
> 532                 }
> 533 #endif
>
> previously a packet which branchces to #529 is enqueued.
>
> If the diff removes the queue, then the pipex input routine is
> executed by the NIC's interrupt handler.
>
> The queues had been made to avoid that kind of situations.

It's not enqueued in pppoe case. According pipex_pppoe_input() code we
call pipex_common_input() with `useq' argument set to '0', so we don't
enqueue mbuf(9) but pass it to pipex_ppp_input() which will pass it to
ipv{4,6}_input().

---- cut begin ----

pipex_pppoe_input(struct mbuf *m0, struct pipex_session *session)
{
        int hlen;
        struct pipex_pppoe_header pppoe;

        NET_ASSERT_LOCKED();
        /* already checked at pipex_pppoe_lookup_session */
        KASSERT(m0->m_pkthdr.len >= (sizeof(struct ether_header) +
            sizeof(pppoe)));

        m_copydata(m0, sizeof(struct ether_header),
            sizeof(struct pipex_pppoe_header), (caddr_t)&pppoe);

        hlen = sizeof(struct ether_header) + sizeof(struct pipex_pppoe_header);
        if ((m0 = pipex_common_input(session, m0, hlen, ntohs(pppoe.length), 0))
            == NULL)
                return (NULL);
        m_freem(m0);
        session->stat.ierrors++;
        return (NULL);
}

pipex_common_input(struct pipex_session *session, struct mbuf *m0, int hlen,
    int plen, int useq)
{
        /* skip */

        if (!useq) {
                pipex_ppp_input(m0, session, 0);
                return (NULL);
        }

        /* input ppp packets to kernel session */
        if (pipex_ppp_enqueue(m0, session, &pipexinq) != 0)
                goto dropped;
        else
                return (NULL);


---- cut end ----

We enqueue pppac(4) related mbufs, except incoming pppoe. We enqueue
pppx(4) related incoming mbufs except pppoe. We don't enqueue pppx(4)
outgoing mbufs, we don't enqueue pppoe incoming mbufs.

>
> Also I don't see a relation of the use-after-free problem and killing
> queues.  Can't we fix the problem unless we kill the queues?
>

Yes we can. Reference counters allow us to keep orphan sessions in these
queues without use after free issue.

I will wait your commentaries current enqueuing before to do something.

Reply | Threaded
Open this post in threaded view
|

Re: pipex(4): kill pipexintr()

YASUOKA Masahiko-3
On Thu, 30 Jul 2020 15:34:09 +0300
Vitaliy Makkoveev <[hidden email]> wrote:

> On Thu, Jul 30, 2020 at 09:13:46PM +0900, YASUOKA Masahiko wrote:
>> Hi,
>>
>> sys/net/if_ethersubr.c:
>> 372 void
>> 373 ether_input(struct ifnet *ifp, struct mbuf *m)
>> (snip)
>> 519 #if NPPPOE > 0 || defined(PIPEX)
>> 520         case ETHERTYPE_PPPOEDISC:
>> 521         case ETHERTYPE_PPPOE:
>> 522                 if (m->m_flags & (M_MCAST | M_BCAST))
>> 523                         goto dropanyway;
>> 524 #ifdef PIPEX
>> 525                 if (pipex_enable) {
>> 526                         struct pipex_session *session;
>> 527
>> 528                         if ((session = pipex_pppoe_lookup_session(m)) != NULL) {
>> 529                                 pipex_pppoe_input(m, session);
>> 530                                 return;
>> 531                         }
>> 532                 }
>> 533 #endif
>>
>> previously a packet which branchces to #529 is enqueued.
>>
>> If the diff removes the queue, then the pipex input routine is
>> executed by the NIC's interrupt handler.
>>
>> The queues had been made to avoid that kind of situations.
>
> It's not enqueued in pppoe case. According pipex_pppoe_input() code we
> call pipex_common_input() with `useq' argument set to '0', so we don't
> enqueue mbuf(9) but pass it to pipex_ppp_input() which will pass it to
> ipv{4,6}_input().

You are right.  Sorry, I forgot about this which I did that by myself.

>> Also I don't see a relation of the use-after-free problem and killing
>> queues.  Can't we fix the problem unless we kill the queues?
>
> Yes we can. Reference counters allow us to keep orphan sessions in these
> queues without use after free issue.
>
> I will wait your commentaries current enqueuing before to do something.

I have another concern.

You might know, when L2TP/IPsec is used heavily, the crypto thread
uses 100% of 1 CPU core.  In that case, that thread becomes like
below:

  crypto thread -> udp_userreq -> pipex_l2tp_input

some clients are using MPPE(RC4 encryption) on CCP.  It's not so
light.

How do we offload this for CPUs?  I am thinking that "pipex" can have
a dedicated thread.  Do we have another scenario?

--yasuoka

Reply | Threaded
Open this post in threaded view
|

Re: pipex(4): kill pipexintr()

Vitaliy Makkoveev-2
On Thu, Jul 30, 2020 at 10:05:13PM +0900, YASUOKA Masahiko wrote:

> On Thu, 30 Jul 2020 15:34:09 +0300
> Vitaliy Makkoveev <[hidden email]> wrote:
> > On Thu, Jul 30, 2020 at 09:13:46PM +0900, YASUOKA Masahiko wrote:
> >> Hi,
> >>
> >> sys/net/if_ethersubr.c:
> >> 372 void
> >> 373 ether_input(struct ifnet *ifp, struct mbuf *m)
> >> (snip)
> >> 519 #if NPPPOE > 0 || defined(PIPEX)
> >> 520         case ETHERTYPE_PPPOEDISC:
> >> 521         case ETHERTYPE_PPPOE:
> >> 522                 if (m->m_flags & (M_MCAST | M_BCAST))
> >> 523                         goto dropanyway;
> >> 524 #ifdef PIPEX
> >> 525                 if (pipex_enable) {
> >> 526                         struct pipex_session *session;
> >> 527
> >> 528                         if ((session = pipex_pppoe_lookup_session(m)) != NULL) {
> >> 529                                 pipex_pppoe_input(m, session);
> >> 530                                 return;
> >> 531                         }
> >> 532                 }
> >> 533 #endif
> >>
> >> previously a packet which branchces to #529 is enqueued.
> >>
> >> If the diff removes the queue, then the pipex input routine is
> >> executed by the NIC's interrupt handler.
> >>
> >> The queues had been made to avoid that kind of situations.
> >
> > It's not enqueued in pppoe case. According pipex_pppoe_input() code we
> > call pipex_common_input() with `useq' argument set to '0', so we don't
> > enqueue mbuf(9) but pass it to pipex_ppp_input() which will pass it to
> > ipv{4,6}_input().
>
> You are right.  Sorry, I forgot about this which I did that by myself.
>

I'm interesting the reason why you did that.

> >> Also I don't see a relation of the use-after-free problem and killing
> >> queues.  Can't we fix the problem unless we kill the queues?
> >
> > Yes we can. Reference counters allow us to keep orphan sessions in these
> > queues without use after free issue.
> >
> > I will wait your commentaries current enqueuing before to do something.
>
> I have another concern.
>
> You might know, when L2TP/IPsec is used heavily, the crypto thread
> uses 100% of 1 CPU core.  In that case, that thread becomes like
> below:
>
>   crypto thread -> udp_userreq -> pipex_l2tp_input
>
> some clients are using MPPE(RC4 encryption) on CCP.  It's not so
> light.
>
> How do we offload this for CPUs?  I am thinking that "pipex" can have
> a dedicated thread.  Do we have another scenario?
>

I suppose you mean udp_input(). What is you call "crypto thread"? I did
a little backtrace but I didn't find this thread.

ether_resolve
  if_input_local
    ipv4_input
      ip_input_if
        ip_ours
          ip_deliver
            udp_input (through pr_input)
              pipex_l2tp_input

ipi{,6}_mloopback
  if_input_local
    ipv4_input
      ...
            udp_input (through pr_input)
              pipex_l2tp_input

loinput
  if_input_local
    ipv4_input
      ...
            udp_input (through pr_input)
              pipex_l2tp_input

Also various pseudo drivers call ipv{4,6}_input() and underlay
udp_unput() too.

Except nfs, we call udp_usrreq() through socket layer only. Do you mean
userland as "crypto thread"?

But upd_input(), udp_usrreq() and pipexintr() are serialized by
NET_LOCK(). We should move pipex(4) under it's own rwlock to allow them
simultaneous execution. Also, should we move outgoing pppx(4) mbufs to
queue too?

Reply | Threaded
Open this post in threaded view
|

Re: pipex(4): kill pipexintr()

YASUOKA Masahiko-3
On Thu, 30 Jul 2020 22:43:10 +0300
Vitaliy Makkoveev <[hidden email]> wrote:

> On Thu, Jul 30, 2020 at 10:05:13PM +0900, YASUOKA Masahiko wrote:
>> On Thu, 30 Jul 2020 15:34:09 +0300
>> Vitaliy Makkoveev <[hidden email]> wrote:
>> > On Thu, Jul 30, 2020 at 09:13:46PM +0900, YASUOKA Masahiko wrote:
>> >> Hi,
>> >>
>> >> sys/net/if_ethersubr.c:
>> >> 372 void
>> >> 373 ether_input(struct ifnet *ifp, struct mbuf *m)
>> >> (snip)
>> >> 519 #if NPPPOE > 0 || defined(PIPEX)
>> >> 520         case ETHERTYPE_PPPOEDISC:
>> >> 521         case ETHERTYPE_PPPOE:
>> >> 522                 if (m->m_flags & (M_MCAST | M_BCAST))
>> >> 523                         goto dropanyway;
>> >> 524 #ifdef PIPEX
>> >> 525                 if (pipex_enable) {
>> >> 526                         struct pipex_session *session;
>> >> 527
>> >> 528                         if ((session = pipex_pppoe_lookup_session(m)) != NULL) {
>> >> 529                                 pipex_pppoe_input(m, session);
>> >> 530                                 return;
>> >> 531                         }
>> >> 532                 }
>> >> 533 #endif
>> >>
>> >> previously a packet which branchces to #529 is enqueued.
>> >>
>> >> If the diff removes the queue, then the pipex input routine is
>> >> executed by the NIC's interrupt handler.
>> >>
>> >> The queues had been made to avoid that kind of situations.
>> >
>> > It's not enqueued in pppoe case. According pipex_pppoe_input() code we
>> > call pipex_common_input() with `useq' argument set to '0', so we don't
>> > enqueue mbuf(9) but pass it to pipex_ppp_input() which will pass it to
>> > ipv{4,6}_input().
>>
>> You are right.  Sorry, I forgot about this which I did that by myself.
>>
>
> I'm interesting the reason why you did that.
>
>> >> Also I don't see a relation of the use-after-free problem and killing
>> >> queues.  Can't we fix the problem unless we kill the queues?
>> >
>> > Yes we can. Reference counters allow us to keep orphan sessions in these
>> > queues without use after free issue.
>> >
>> > I will wait your commentaries current enqueuing before to do something.
>>
>> I have another concern.
>>
>> You might know, when L2TP/IPsec is used heavily, the crypto thread
>> uses 100% of 1 CPU core.  In that case, that thread becomes like
>> below:
>>
>>   crypto thread -> udp_userreq -> pipex_l2tp_input
>>
>> some clients are using MPPE(RC4 encryption) on CCP.  It's not so
>> light.
>>
>> How do we offload this for CPUs?  I am thinking that "pipex" can have
>> a dedicated thread.  Do we have another scenario?
>>
>
> I suppose you mean udp_input(). What is you call "crypto thread"? I did
> a little backtrace but I didn't find this thread.
>
> ether_resolve
>   if_input_local
>     ipv4_input
>       ip_input_if
>         ip_ours
>           ip_deliver
>             udp_input (through pr_input)
>               pipex_l2tp_input
>
> ipi{,6}_mloopback
>   if_input_local
>     ipv4_input
>       ...
>             udp_input (through pr_input)
>               pipex_l2tp_input
>
> loinput
>   if_input_local
>     ipv4_input
>       ...
>             udp_input (through pr_input)
>               pipex_l2tp_input
>
> Also various pseudo drivers call ipv{4,6}_input() and underlay
> udp_unput() too.
>
> Except nfs, we call udp_usrreq() through socket layer only. Do you mean
> userland as "crypto thread"?

Sorry, udp_usrreq() should be usr_input() and crypto thread meant a
kthread for crypto_taskq_mp_safe, whose name is "crynlk" (see
crypto_init()).

A packet of L2TP/IPsec (encapsulated IP/PPP/L2TP/UDP/ESP/UDP/IP) is
processed like:

   ipv4_input
     ...
       udp_input
         ipsec_common_input
           esp_input
             crypto_dispatch
               => crypto_taskq_mp_safe

   kthread "crynlk"
     crypto_invoke
       ... (*1)
         crypto_done
           esp_input_cb
             ipsec_common_input_cb
               ip_deliver
                 udp_input
                   pipex_l2tp_input
                     pipex_common_input
                       (*2)
        pipex_ppp_input
                         pipex_mppe_input (*3)
                           pipex_ppp_input
                             pipex_ip_input
                               ipv4_input
                                 ...

At *2 there was a queue.  "crynlk" is a busy thread, since it is doing
decryption at *1.  I think it's better pipex input is be done by
another thread than crypto since it also has decryption at *3.

> But upd_input(), udp_usrreq() and pipexintr() are serialized by
> NET_LOCK(). We should move pipex(4) under it's own rwlock to allow them
> simultaneous execution. Also, should we move outgoing pppx(4) mbufs to
> queue too?

pppx has a dedicated queue for each pppx interface.  I think that is a
reason why it skips pipexoutq.

Also I noticed there is a generic queue for IP.

 pipex_ppp_output
   pipex_l2tp_output
     ipsend()
       => net_tq

For PPPoE,

 pipex_ppp_output
   pipex_pppoe_output
     ipsend()
       => ifp->if_output

Currently it seems there is enough queues for output side.

--yasuoka

Reply | Threaded
Open this post in threaded view
|

Re: pipex(4): kill pipexintr()

YASUOKA Masahiko-3
In reply to this post by Vitaliy Makkoveev-2
On Thu, 30 Jul 2020 22:43:10 +0300
Vitaliy Makkoveev <[hidden email]> wrote:

> On Thu, Jul 30, 2020 at 10:05:13PM +0900, YASUOKA Masahiko wrote:
>> On Thu, 30 Jul 2020 15:34:09 +0300
>> Vitaliy Makkoveev <[hidden email]> wrote:
>> > On Thu, Jul 30, 2020 at 09:13:46PM +0900, YASUOKA Masahiko wrote:
>> >> If the diff removes the queue, then the pipex input routine is
>> >> executed by the NIC's interrupt handler.
>> >>
>> >> The queues had been made to avoid that kind of situations.
>> >
>> > It's not enqueued in pppoe case. According pipex_pppoe_input() code we
>> > call pipex_common_input() with `useq' argument set to '0', so we don't
>> > enqueue mbuf(9) but pass it to pipex_ppp_input() which will pass it to
>> > ipv{4,6}_input().
>>
>> You are right.  Sorry, I forgot about this which I did that by myself.
>
> I'm interesting the reason why you did that.

I remembered, it was first step of MP steps for pipex.

At that time, I discussed with mpi, he suggested like below.

 1. stop enqueueing packets for PPPoE
 2. try not take a kernel lock before calling gre_input(), then we can
    also stop enqueueing packets for PPTP(GRE)
 3. for L2TP, keep the queue and change the netisr to an unlocked task

Reply | Threaded
Open this post in threaded view
|

Re: pipex(4): kill pipexintr()

Martin Pieuchot
In reply to this post by YASUOKA Masahiko-3
On 30/07/20(Thu) 21:13, YASUOKA Masahiko wrote:

> sys/net/if_ethersubr.c:
> 372 void
> 373 ether_input(struct ifnet *ifp, struct mbuf *m)
> (snip)
> 519 #if NPPPOE > 0 || defined(PIPEX)
> 520         case ETHERTYPE_PPPOEDISC:
> 521         case ETHERTYPE_PPPOE:
> 522                 if (m->m_flags & (M_MCAST | M_BCAST))
> 523                         goto dropanyway;
> 524 #ifdef PIPEX
> 525                 if (pipex_enable) {
> 526                         struct pipex_session *session;
> 527
> 528                         if ((session = pipex_pppoe_lookup_session(m)) != NULL) {
> 529                                 pipex_pppoe_input(m, session);
> 530                                 return;
> 531                         }
> 532                 }
> 533 #endif
>
> previously a packet which branchces to #529 is enqueued.
>
> If the diff removes the queue, then the pipex input routine is
> executed by the NIC's interrupt handler.
>
> The queues had been made to avoid that kind of situations.

Since many releases there's a generic queue between every NIC interrupt
handler and the softnet thread.

With this diff the pipex routine is still executed in a task.  Previously
it was the `if_input_task_locked' which correspond to all netisr, now it
is `ifiq_task' which calls if_input_process().

Reply | Threaded
Open this post in threaded view
|

Re: pipex(4): kill pipexintr()

Vitaliy Makkoveev-2
In reply to this post by YASUOKA Masahiko-3
On Fri, Jul 31, 2020 at 09:36:32AM +0900, YASUOKA Masahiko wrote:

> On Thu, 30 Jul 2020 22:43:10 +0300
> Vitaliy Makkoveev <[hidden email]> wrote:
> > On Thu, Jul 30, 2020 at 10:05:13PM +0900, YASUOKA Masahiko wrote:
> >> On Thu, 30 Jul 2020 15:34:09 +0300
> >> Vitaliy Makkoveev <[hidden email]> wrote:
> >> > On Thu, Jul 30, 2020 at 09:13:46PM +0900, YASUOKA Masahiko wrote:
> >> >> Hi,
> >> >>
> >> >> sys/net/if_ethersubr.c:
> >> >> 372 void
> >> >> 373 ether_input(struct ifnet *ifp, struct mbuf *m)
> >> >> (snip)
> >> >> 519 #if NPPPOE > 0 || defined(PIPEX)
> >> >> 520         case ETHERTYPE_PPPOEDISC:
> >> >> 521         case ETHERTYPE_PPPOE:
> >> >> 522                 if (m->m_flags & (M_MCAST | M_BCAST))
> >> >> 523                         goto dropanyway;
> >> >> 524 #ifdef PIPEX
> >> >> 525                 if (pipex_enable) {
> >> >> 526                         struct pipex_session *session;
> >> >> 527
> >> >> 528                         if ((session = pipex_pppoe_lookup_session(m)) != NULL) {
> >> >> 529                                 pipex_pppoe_input(m, session);
> >> >> 530                                 return;
> >> >> 531                         }
> >> >> 532                 }
> >> >> 533 #endif
> >> >>
> >> >> previously a packet which branchces to #529 is enqueued.
> >> >>
> >> >> If the diff removes the queue, then the pipex input routine is
> >> >> executed by the NIC's interrupt handler.
> >> >>
> >> >> The queues had been made to avoid that kind of situations.
> >> >
> >> > It's not enqueued in pppoe case. According pipex_pppoe_input() code we
> >> > call pipex_common_input() with `useq' argument set to '0', so we don't
> >> > enqueue mbuf(9) but pass it to pipex_ppp_input() which will pass it to
> >> > ipv{4,6}_input().
> >>
> >> You are right.  Sorry, I forgot about this which I did that by myself.
> >>
> >
> > I'm interesting the reason why you did that.
> >
> >> >> Also I don't see a relation of the use-after-free problem and killing
> >> >> queues.  Can't we fix the problem unless we kill the queues?
> >> >
> >> > Yes we can. Reference counters allow us to keep orphan sessions in these
> >> > queues without use after free issue.
> >> >
> >> > I will wait your commentaries current enqueuing before to do something.
> >>
> >> I have another concern.
> >>
> >> You might know, when L2TP/IPsec is used heavily, the crypto thread
> >> uses 100% of 1 CPU core.  In that case, that thread becomes like
> >> below:
> >>
> >>   crypto thread -> udp_userreq -> pipex_l2tp_input
> >>
> >> some clients are using MPPE(RC4 encryption) on CCP.  It's not so
> >> light.
> >>
> >> How do we offload this for CPUs?  I am thinking that "pipex" can have
> >> a dedicated thread.  Do we have another scenario?
> >>
> >
> > I suppose you mean udp_input(). What is you call "crypto thread"? I did
> > a little backtrace but I didn't find this thread.
> >
> > ether_resolve
> >   if_input_local
> >     ipv4_input
> >       ip_input_if
> >         ip_ours
> >           ip_deliver
> >             udp_input (through pr_input)
> >               pipex_l2tp_input
> >
> > ipi{,6}_mloopback
> >   if_input_local
> >     ipv4_input
> >       ...
> >             udp_input (through pr_input)
> >               pipex_l2tp_input
> >
> > loinput
> >   if_input_local
> >     ipv4_input
> >       ...
> >             udp_input (through pr_input)
> >               pipex_l2tp_input
> >
> > Also various pseudo drivers call ipv{4,6}_input() and underlay
> > udp_unput() too.
> >
> > Except nfs, we call udp_usrreq() through socket layer only. Do you mean
> > userland as "crypto thread"?
>
> Sorry, udp_usrreq() should be usr_input() and crypto thread meant a
> kthread for crypto_taskq_mp_safe, whose name is "crynlk" (see
> crypto_init()).
>
> A packet of L2TP/IPsec (encapsulated IP/PPP/L2TP/UDP/ESP/UDP/IP) is
> processed like:
>
>    ipv4_input
>      ...
>        udp_input
>          ipsec_common_input
>   esp_input
>     crypto_dispatch
>       => crypto_taskq_mp_safe
>
>    kthread "crynlk"
>      crypto_invoke
>        ... (*1)
>          crypto_done
>   esp_input_cb
>     ipsec_common_input_cb
>       ip_deliver
>         udp_input
>   pipex_l2tp_input
>     pipex_common_input
>       (*2)
>         pipex_ppp_input
>         pipex_mppe_input (*3)
>           pipex_ppp_input
>     pipex_ip_input
>       ipv4_input
>         ...
>
> At *2 there was a queue.  "crynlk" is a busy thread, since it is doing
> decryption at *1.  I think it's better pipex input is be done by
> another thread than crypto since it also has decryption at *3.

Thanks for explanation. esp_input_cb() and underlay routines called
under NET_LOCK() so we have no simultaneous execution in this path too.
With current locking scheme pipexinrt() only introduces delay caused by
required context switch. And there is no offload to other CPU.

To reach offload we should introduce new rwlock `pipexlock' and move
pipex(4) under. I have ho objections against but since we have not yet
triggered use after free issue caused by unsafe session destruction, I
wish to fix it first. Also we can do fine grained locks for pipex(4)
instead of introducing `pipexlock'.

Are you agreed with this direction? Martin, do you have objections?

Reply | Threaded
Open this post in threaded view
|

Re: pipex(4): kill pipexintr()

Martin Pieuchot
On 31/07/20(Fri) 12:15, Vitaliy Makkoveev wrote:

> On Fri, Jul 31, 2020 at 09:36:32AM +0900, YASUOKA Masahiko wrote:
> > On Thu, 30 Jul 2020 22:43:10 +0300
> > Vitaliy Makkoveev <[hidden email]> wrote:
> > > On Thu, Jul 30, 2020 at 10:05:13PM +0900, YASUOKA Masahiko wrote:
> > >> On Thu, 30 Jul 2020 15:34:09 +0300
> > >> Vitaliy Makkoveev <[hidden email]> wrote:
> > >> > On Thu, Jul 30, 2020 at 09:13:46PM +0900, YASUOKA Masahiko wrote:
> > >> >> Hi,
> > >> >>
> > >> >> sys/net/if_ethersubr.c:
> > >> >> 372 void
> > >> >> 373 ether_input(struct ifnet *ifp, struct mbuf *m)
> > >> >> (snip)
> > >> >> 519 #if NPPPOE > 0 || defined(PIPEX)
> > >> >> 520         case ETHERTYPE_PPPOEDISC:
> > >> >> 521         case ETHERTYPE_PPPOE:
> > >> >> 522                 if (m->m_flags & (M_MCAST | M_BCAST))
> > >> >> 523                         goto dropanyway;
> > >> >> 524 #ifdef PIPEX
> > >> >> 525                 if (pipex_enable) {
> > >> >> 526                         struct pipex_session *session;
> > >> >> 527
> > >> >> 528                         if ((session = pipex_pppoe_lookup_session(m)) != NULL) {
> > >> >> 529                                 pipex_pppoe_input(m, session);
> > >> >> 530                                 return;
> > >> >> 531                         }
> > >> >> 532                 }
> > >> >> 533 #endif
> > >> >>
> > >> >> previously a packet which branchces to #529 is enqueued.
> > >> >>
> > >> >> If the diff removes the queue, then the pipex input routine is
> > >> >> executed by the NIC's interrupt handler.
> > >> >>
> > >> >> The queues had been made to avoid that kind of situations.
> > >> >
> > >> > It's not enqueued in pppoe case. According pipex_pppoe_input() code we
> > >> > call pipex_common_input() with `useq' argument set to '0', so we don't
> > >> > enqueue mbuf(9) but pass it to pipex_ppp_input() which will pass it to
> > >> > ipv{4,6}_input().
> > >>
> > >> You are right.  Sorry, I forgot about this which I did that by myself.
> > >>
> > >
> > > I'm interesting the reason why you did that.
> > >
> > >> >> Also I don't see a relation of the use-after-free problem and killing
> > >> >> queues.  Can't we fix the problem unless we kill the queues?
> > >> >
> > >> > Yes we can. Reference counters allow us to keep orphan sessions in these
> > >> > queues without use after free issue.
> > >> >
> > >> > I will wait your commentaries current enqueuing before to do something.
> > >>
> > >> I have another concern.
> > >>
> > >> You might know, when L2TP/IPsec is used heavily, the crypto thread
> > >> uses 100% of 1 CPU core.  In that case, that thread becomes like
> > >> below:
> > >>
> > >>   crypto thread -> udp_userreq -> pipex_l2tp_input
> > >>
> > >> some clients are using MPPE(RC4 encryption) on CCP.  It's not so
> > >> light.
> > >>
> > >> How do we offload this for CPUs?  I am thinking that "pipex" can have
> > >> a dedicated thread.  Do we have another scenario?
> > >>
> > >
> > > I suppose you mean udp_input(). What is you call "crypto thread"? I did
> > > a little backtrace but I didn't find this thread.
> > >
> > > ether_resolve
> > >   if_input_local
> > >     ipv4_input
> > >       ip_input_if
> > >         ip_ours
> > >           ip_deliver
> > >             udp_input (through pr_input)
> > >               pipex_l2tp_input
> > >
> > > ipi{,6}_mloopback
> > >   if_input_local
> > >     ipv4_input
> > >       ...
> > >             udp_input (through pr_input)
> > >               pipex_l2tp_input
> > >
> > > loinput
> > >   if_input_local
> > >     ipv4_input
> > >       ...
> > >             udp_input (through pr_input)
> > >               pipex_l2tp_input
> > >
> > > Also various pseudo drivers call ipv{4,6}_input() and underlay
> > > udp_unput() too.
> > >
> > > Except nfs, we call udp_usrreq() through socket layer only. Do you mean
> > > userland as "crypto thread"?
> >
> > Sorry, udp_usrreq() should be usr_input() and crypto thread meant a
> > kthread for crypto_taskq_mp_safe, whose name is "crynlk" (see
> > crypto_init()).
> >
> > A packet of L2TP/IPsec (encapsulated IP/PPP/L2TP/UDP/ESP/UDP/IP) is
> > processed like:
> >
> >    ipv4_input
> >      ...
> >        udp_input
> >          ipsec_common_input
> >   esp_input
> >     crypto_dispatch
> >       => crypto_taskq_mp_safe
> >
> >    kthread "crynlk"
> >      crypto_invoke
> >        ... (*1)
> >          crypto_done
> >   esp_input_cb
> >     ipsec_common_input_cb
> >       ip_deliver
> >         udp_input
> >   pipex_l2tp_input
> >     pipex_common_input
> >       (*2)
> >         pipex_ppp_input
> >         pipex_mppe_input (*3)
> >           pipex_ppp_input
> >     pipex_ip_input
> >       ipv4_input
> >         ...
> >
> > At *2 there was a queue.  "crynlk" is a busy thread, since it is doing
> > decryption at *1.  I think it's better pipex input is be done by
> > another thread than crypto since it also has decryption at *3.
>
> Thanks for explanation. esp_input_cb() and underlay routines called
> under NET_LOCK() so we have no simultaneous execution in this path too.
> With current locking scheme pipexinrt() only introduces delay caused by
> required context switch. And there is no offload to other CPU.
>
> To reach offload we should introduce new rwlock `pipexlock' and move
> pipex(4) under. I have ho objections against but since we have not yet
> triggered use after free issue caused by unsafe session destruction, I
> wish to fix it first. Also we can do fine grained locks for pipex(4)
> instead of introducing `pipexlock'.
>
> Are you agreed with this direction? Martin, do you have objections?

This isn't a realistic direction.  The pipex part isn't the difficult
task for this.  What needs to be addressed first is to be able to run
the IPsec/crypto stack in parallel of the rest of the network stack.

Reply | Threaded
Open this post in threaded view
|

Re: pipex(4): kill pipexintr()

Vitaliy Makkoveev-2
On Fri, Jul 31, 2020 at 08:26:22PM +0200, Martin Pieuchot wrote:

> On 31/07/20(Fri) 12:15, Vitaliy Makkoveev wrote:
> > On Fri, Jul 31, 2020 at 09:36:32AM +0900, YASUOKA Masahiko wrote:
> > > On Thu, 30 Jul 2020 22:43:10 +0300
> > > Vitaliy Makkoveev <[hidden email]> wrote:
> > > > On Thu, Jul 30, 2020 at 10:05:13PM +0900, YASUOKA Masahiko wrote:
> > > >> On Thu, 30 Jul 2020 15:34:09 +0300
> > > >> Vitaliy Makkoveev <[hidden email]> wrote:
> > > >> > On Thu, Jul 30, 2020 at 09:13:46PM +0900, YASUOKA Masahiko wrote:
> > > >> >> Hi,
> > > >> >>
> > > >> >> sys/net/if_ethersubr.c:
> > > >> >> 372 void
> > > >> >> 373 ether_input(struct ifnet *ifp, struct mbuf *m)
> > > >> >> (snip)
> > > >> >> 519 #if NPPPOE > 0 || defined(PIPEX)
> > > >> >> 520         case ETHERTYPE_PPPOEDISC:
> > > >> >> 521         case ETHERTYPE_PPPOE:
> > > >> >> 522                 if (m->m_flags & (M_MCAST | M_BCAST))
> > > >> >> 523                         goto dropanyway;
> > > >> >> 524 #ifdef PIPEX
> > > >> >> 525                 if (pipex_enable) {
> > > >> >> 526                         struct pipex_session *session;
> > > >> >> 527
> > > >> >> 528                         if ((session = pipex_pppoe_lookup_session(m)) != NULL) {
> > > >> >> 529                                 pipex_pppoe_input(m, session);
> > > >> >> 530                                 return;
> > > >> >> 531                         }
> > > >> >> 532                 }
> > > >> >> 533 #endif
> > > >> >>
> > > >> >> previously a packet which branchces to #529 is enqueued.
> > > >> >>
> > > >> >> If the diff removes the queue, then the pipex input routine is
> > > >> >> executed by the NIC's interrupt handler.
> > > >> >>
> > > >> >> The queues had been made to avoid that kind of situations.
> > > >> >
> > > >> > It's not enqueued in pppoe case. According pipex_pppoe_input() code we
> > > >> > call pipex_common_input() with `useq' argument set to '0', so we don't
> > > >> > enqueue mbuf(9) but pass it to pipex_ppp_input() which will pass it to
> > > >> > ipv{4,6}_input().
> > > >>
> > > >> You are right.  Sorry, I forgot about this which I did that by myself.
> > > >>
> > > >
> > > > I'm interesting the reason why you did that.
> > > >
> > > >> >> Also I don't see a relation of the use-after-free problem and killing
> > > >> >> queues.  Can't we fix the problem unless we kill the queues?
> > > >> >
> > > >> > Yes we can. Reference counters allow us to keep orphan sessions in these
> > > >> > queues without use after free issue.
> > > >> >
> > > >> > I will wait your commentaries current enqueuing before to do something.
> > > >>
> > > >> I have another concern.
> > > >>
> > > >> You might know, when L2TP/IPsec is used heavily, the crypto thread
> > > >> uses 100% of 1 CPU core.  In that case, that thread becomes like
> > > >> below:
> > > >>
> > > >>   crypto thread -> udp_userreq -> pipex_l2tp_input
> > > >>
> > > >> some clients are using MPPE(RC4 encryption) on CCP.  It's not so
> > > >> light.
> > > >>
> > > >> How do we offload this for CPUs?  I am thinking that "pipex" can have
> > > >> a dedicated thread.  Do we have another scenario?
> > > >>
> > > >
> > > > I suppose you mean udp_input(). What is you call "crypto thread"? I did
> > > > a little backtrace but I didn't find this thread.
> > > >
> > > > ether_resolve
> > > >   if_input_local
> > > >     ipv4_input
> > > >       ip_input_if
> > > >         ip_ours
> > > >           ip_deliver
> > > >             udp_input (through pr_input)
> > > >               pipex_l2tp_input
> > > >
> > > > ipi{,6}_mloopback
> > > >   if_input_local
> > > >     ipv4_input
> > > >       ...
> > > >             udp_input (through pr_input)
> > > >               pipex_l2tp_input
> > > >
> > > > loinput
> > > >   if_input_local
> > > >     ipv4_input
> > > >       ...
> > > >             udp_input (through pr_input)
> > > >               pipex_l2tp_input
> > > >
> > > > Also various pseudo drivers call ipv{4,6}_input() and underlay
> > > > udp_unput() too.
> > > >
> > > > Except nfs, we call udp_usrreq() through socket layer only. Do you mean
> > > > userland as "crypto thread"?
> > >
> > > Sorry, udp_usrreq() should be usr_input() and crypto thread meant a
> > > kthread for crypto_taskq_mp_safe, whose name is "crynlk" (see
> > > crypto_init()).
> > >
> > > A packet of L2TP/IPsec (encapsulated IP/PPP/L2TP/UDP/ESP/UDP/IP) is
> > > processed like:
> > >
> > >    ipv4_input
> > >      ...
> > >        udp_input
> > >          ipsec_common_input
> > >   esp_input
> > >     crypto_dispatch
> > >       => crypto_taskq_mp_safe
> > >
> > >    kthread "crynlk"
> > >      crypto_invoke
> > >        ... (*1)
> > >          crypto_done
> > >   esp_input_cb
> > >     ipsec_common_input_cb
> > >       ip_deliver
> > >         udp_input
> > >   pipex_l2tp_input
> > >     pipex_common_input
> > >       (*2)
> > >         pipex_ppp_input
> > >         pipex_mppe_input (*3)
> > >           pipex_ppp_input
> > >     pipex_ip_input
> > >       ipv4_input
> > >         ...
> > >
> > > At *2 there was a queue.  "crynlk" is a busy thread, since it is doing
> > > decryption at *1.  I think it's better pipex input is be done by
> > > another thread than crypto since it also has decryption at *3.
> >
> > Thanks for explanation. esp_input_cb() and underlay routines called
> > under NET_LOCK() so we have no simultaneous execution in this path too.
> > With current locking scheme pipexinrt() only introduces delay caused by
> > required context switch. And there is no offload to other CPU.
> >
> > To reach offload we should introduce new rwlock `pipexlock' and move
> > pipex(4) under. I have ho objections against but since we have not yet
> > triggered use after free issue caused by unsafe session destruction, I
> > wish to fix it first. Also we can do fine grained locks for pipex(4)
> > instead of introducing `pipexlock'.
> >
> > Are you agreed with this direction? Martin, do you have objections?
>
> This isn't a realistic direction.  The pipex part isn't the difficult
> task for this.  What needs to be addressed first is to be able to run
> the IPsec/crypto stack in parallel of the rest of the network stack.
>

What denies us to move pipex(4) under it's own lock?

Reply | Threaded
Open this post in threaded view
|

Re: pipex(4): kill pipexintr()

Vitaliy Makkoveev-2
In reply to this post by Vitaliy Makkoveev-2
Well, since pipexintr() killing was rejected, I propose to implement
reference counters to protect pipex(4) session itself. Diff below does
this.

Index: sys/net/if_ethersubr.c
===================================================================
RCS file: /cvs/src/sys/net/if_ethersubr.c,v
retrieving revision 1.266
diff -u -p -r1.266 if_ethersubr.c
--- sys/net/if_ethersubr.c 22 Jul 2020 02:16:01 -0000 1.266
+++ sys/net/if_ethersubr.c 31 Jul 2020 13:56:31 -0000
@@ -527,6 +527,7 @@ ether_input(struct ifnet *ifp, struct mb
 
  if ((session = pipex_pppoe_lookup_session(m)) != NULL) {
  pipex_pppoe_input(m, session);
+ pipex_rele_session(session);
  return;
  }
  }
Index: sys/net/if_gre.c
===================================================================
RCS file: /cvs/src/sys/net/if_gre.c,v
retrieving revision 1.158
diff -u -p -r1.158 if_gre.c
--- sys/net/if_gre.c 10 Jul 2020 13:26:41 -0000 1.158
+++ sys/net/if_gre.c 31 Jul 2020 13:56:31 -0000
@@ -984,9 +984,15 @@ gre_input_1(struct gre_tunnel *key, stru
  struct pipex_session *session;
 
  session = pipex_pptp_lookup_session(m);
- if (session != NULL &&
-    pipex_pptp_input(m, session) == NULL)
- return (NULL);
+ if (session != NULL) {
+ struct mbuf *m0;
+
+ m0 = pipex_pptp_input(m, session);
+ pipex_rele_session(session);
+
+ if (m0 == NULL)
+ return NULL;
+ }
  }
 #endif
  break;
Index: sys/net/pipex.c
===================================================================
RCS file: /cvs/src/sys/net/pipex.c,v
retrieving revision 1.122
diff -u -p -r1.122 pipex.c
--- sys/net/pipex.c 29 Jul 2020 12:09:31 -0000 1.122
+++ sys/net/pipex.c 31 Jul 2020 13:56:32 -0000
@@ -165,6 +165,7 @@ pipex_iface_init(struct pipex_iface_cont
 
  /* virtual pipex_session entry for multicast */
  session = pool_get(&pipex_session_pool, PR_WAITOK | PR_ZERO);
+ session->refs = 1;
  session->is_multicast = 1;
  session->pipex_iface = pipex_iface;
  session->ifindex = ifindex;
@@ -197,9 +198,9 @@ pipex_iface_stop(struct pipex_iface_cont
 void
 pipex_iface_fini(struct pipex_iface_context *pipex_iface)
 {
- pool_put(&pipex_session_pool, pipex_iface->multicast_session);
  NET_LOCK();
  pipex_iface_stop(pipex_iface);
+ pipex_rele_session(pipex_iface->multicast_session);
  NET_UNLOCK();
 }
 
@@ -329,6 +330,7 @@ pipex_init_session(struct pipex_session
 
  /* prepare a new session */
  session = pool_get(&pipex_session_pool, PR_WAITOK | PR_ZERO);
+ session->refs = 1;
  session->state = PIPEX_STATE_INITIAL;
  session->protocol = req->pr_protocol;
  session->session_id = req->pr_session_id;
@@ -421,9 +423,20 @@ pipex_init_session(struct pipex_session
  return 0;
 }
 
+void pipex_ref_session(struct pipex_session *session)
+{
+ atomic_inc_int_nv(&session->refs);
+ KASSERT(session->refs != 0);
+}
+
 void
 pipex_rele_session(struct pipex_session *session)
 {
+ KASSERT(session->refs > 0);
+
+ if (atomic_dec_int_nv(&session->refs) > 0)
+ return;
+
  if (session->mppe_recv.old_session_keys)
  pool_put(&mppe_key_pool, session->mppe_recv.old_session_keys);
  pool_put(&pipex_session_pool, session);
@@ -439,10 +452,12 @@ pipex_link_session(struct pipex_session
 
  if (!iface->pipexmode)
  return (ENXIO);
- if (pipex_lookup_by_session_id(session->protocol,
+ if (pipex_lookup_by_session_id_nonref(session->protocol,
     session->session_id))
  return (EEXIST);
 
+ pipex_ref_session(session);
+
  session->pipex_iface = iface;
  session->ifindex = iface->ifindex;
 
@@ -490,6 +505,8 @@ pipex_unlink_session(struct pipex_sessio
  /* if final session is destroyed, stop timer */
  if (LIST_EMPTY(&pipex_session_list))
  pipex_timer_stop();
+
+ pipex_rele_session(session);
 }
 
 Static int
@@ -505,8 +522,8 @@ pipex_add_session(struct pipex_session_r
 
  /* commit the session */
  if (!in_nullhost(session->ip_address.sin_addr)) {
- if (pipex_lookup_by_ip_address(session->ip_address.sin_addr)
-    != NULL) {
+ if (pipex_lookup_by_ip_address_nonref(
+    session->ip_address.sin_addr) != NULL) {
  error = EADDRINUSE;
  goto free;
  }
@@ -568,6 +585,7 @@ pipex_close_session(struct pipex_session
     struct pipex_iface_context *iface)
 {
  struct pipex_session *session;
+ int error = 0;
 
  NET_ASSERT_LOCKED();
  session = pipex_lookup_by_session_id(req->pcr_protocol,
@@ -575,17 +593,20 @@ pipex_close_session(struct pipex_session
  if (session == NULL)
  return (EINVAL);
  if (session->pipex_iface != iface)
- return (EINVAL);
-
- /* remove from close_wait list */
- if (session->state == PIPEX_STATE_CLOSE_WAIT)
- LIST_REMOVE(session, state_list);
+ error = EINVAL;
+ else {
+ /* remove from close_wait list */
+ if (session->state == PIPEX_STATE_CLOSE_WAIT)
+ LIST_REMOVE(session, state_list);
+
+ /* get statistics before destroy the session */
+ req->pcr_stat = session->stat;
+ session->state = PIPEX_STATE_CLOSED;
+ }
 
- /* get statistics before destroy the session */
- req->pcr_stat = session->stat;
- session->state = PIPEX_STATE_CLOSED;
+ pipex_rele_session(session);
 
- return (0);
+ return (error);
 }
 
 Static int
@@ -593,6 +614,7 @@ pipex_config_session(struct pipex_sessio
     struct pipex_iface_context *iface)
 {
  struct pipex_session *session;
+ int error = 0;
 
  NET_ASSERT_LOCKED();
  session = pipex_lookup_by_session_id(req->pcr_protocol,
@@ -600,10 +622,13 @@ pipex_config_session(struct pipex_sessio
  if (session == NULL)
  return (EINVAL);
  if (session->pipex_iface != iface)
- return (EINVAL);
- session->ip_forward = req->pcr_ip_forward;
+ error = EINVAL;
+ else
+ session->ip_forward = req->pcr_ip_forward;
 
- return (0);
+ pipex_rele_session(session);
+
+ return (error);
 }
 
 Static int
@@ -611,6 +636,7 @@ pipex_get_stat(struct pipex_session_stat
     struct pipex_iface_context *iface)
 {
  struct pipex_session *session;
+ int error = 0;
 
  NET_ASSERT_LOCKED();
  session = pipex_lookup_by_session_id(req->psr_protocol,
@@ -618,10 +644,13 @@ pipex_get_stat(struct pipex_session_stat
  if (session == NULL)
  return (EINVAL);
  if (session->pipex_iface != iface)
- return (EINVAL);
- req->psr_stat = session->stat;
+ error = EINVAL;
+ else
+ req->psr_stat = session->stat;
 
- return (0);
+ pipex_rele_session(session);
+
+ return (error);
 }
 
 Static int
@@ -670,7 +699,7 @@ pipex_destroy_session(struct pipex_sessi
 }
 
 Static struct pipex_session *
-pipex_lookup_by_ip_address(struct in_addr addr)
+pipex_lookup_by_ip_address_nonref(struct in_addr addr)
 {
  struct pipex_session *session;
  struct sockaddr_in pipex_in4, pipex_in4mask;
@@ -700,8 +729,20 @@ pipex_lookup_by_ip_address(struct in_add
  return (session);
 }
 
+struct pipex_session *
+pipex_lookup_by_ip_address(struct in_addr addr)
+{
+ struct pipex_session *session;
+
+ session = pipex_lookup_by_ip_address_nonref(addr);
+ if (session != NULL)
+ pipex_ref_session(session);
+
+ return (session);
+}
+
 Static struct pipex_session *
-pipex_lookup_by_session_id(int protocol, int session_id)
+pipex_lookup_by_session_id_nonref(int protocol, int session_id)
 {
  struct pipex_hash_head *list;
  struct pipex_session *session;
@@ -724,6 +765,18 @@ pipex_lookup_by_session_id(int protocol,
  return (session);
 }
 
+struct pipex_session *
+pipex_lookup_by_session_id(int protocol, int session_id)
+{
+ struct pipex_session *session;
+
+ session = pipex_lookup_by_session_id_nonref(protocol, session_id);
+ if (session != NULL)
+ pipex_ref_session(session);
+
+ return (session);
+}
+
 /***********************************************************************
  * Queue and Software Interrupt Handler
  ***********************************************************************/
@@ -743,6 +796,7 @@ pipexintr(void)
  m_freem(m);
  continue;
  }
+
  proto = m->m_pkthdr.ph_ppp_proto;
 
  m->m_pkthdr.ph_cookie = NULL;
@@ -752,6 +806,12 @@ pipexintr(void)
  struct pipex_session *session;
  struct mbuf *m0;
 
+ if (pkt_session->state != PIPEX_STATE_OPENED) {
+ m_freem(m);
+ pipex_rele_session(pkt_session);
+ continue;
+ }
+
  LIST_FOREACH(session, &pipex_session_list,
     session_list) {
  if (session->pipex_iface !=
@@ -770,6 +830,8 @@ pipexintr(void)
  m_freem(m);
  } else
  pipex_ppp_output(m, pkt_session, proto);
+
+ pipex_rele_session(pkt_session);
  }
 
  /* ppp input */
@@ -780,7 +842,9 @@ pipexintr(void)
  m_freem(m);
  continue;
  }
+
  pipex_ppp_input(m, pkt_session, 0);
+ pipex_rele_session(pkt_session);
  }
 }
 
@@ -788,6 +852,7 @@ Static int
 pipex_ppp_enqueue(struct mbuf *m0, struct pipex_session *session,
     struct mbuf_queue *mq)
 {
+ pipex_ref_session(session);
  m0->m_pkthdr.ph_cookie = session;
  /* XXX need to support other protocols */
  m0->m_pkthdr.ph_ppp_proto = PPP_IP;
@@ -852,13 +917,6 @@ pipex_timer(void *ignored_arg)
  /* FALLTHROUGH */
 
  case PIPEX_STATE_CLOSED:
- /*
- * mbuf queued in pipexinq or pipexoutq may have a
- * refererce to this session.
- */
- if (!mq_empty(&pipexinq) || !mq_empty(&pipexoutq))
- continue;
-
  pipex_destroy_session(session);
  break;
 
@@ -888,9 +946,10 @@ pipex_output(struct mbuf *m0, int af, in
  case AF_INET:
  if (m0->m_pkthdr.len >= sizeof(struct ip) + off) {
  m_copydata(m0, off, sizeof(struct ip), (caddr_t)&ip);
- if (IN_MULTICAST(ip.ip_dst.s_addr))
+ if (IN_MULTICAST(ip.ip_dst.s_addr)) {
  session = pipex_iface->multicast_session;
- else
+ pipex_ref_session(session);
+ } else
  session = pipex_lookup_by_ip_address(ip.ip_dst);
  }
  if (session != NULL) {
@@ -908,6 +967,7 @@ pipex_output(struct mbuf *m0, int af, in
  m_adj(m0, off);
 
  pipex_ip_output(m0, session);
+ pipex_rele_session(session);
  return (mret);
  }
  break;
@@ -917,8 +977,11 @@ pipex_output(struct mbuf *m0, int af, in
 
 drop:
  m_freem(m0);
- if (session != NULL)
+ if (session != NULL) {
  session->stat.oerrors++;
+ pipex_rele_session(session);
+ }
+
  return(NULL);
 }
 
@@ -1377,8 +1440,11 @@ pipex_pppoe_lookup_session(struct mbuf *
  PIPEX_DBG((NULL, LOG_DEBUG, "<%s> session not found (id=%d)",
     __func__, pppoe.session_id));
 #endif
- if (session && session->proto.pppoe.over_ifidx != m0->m_pkthdr.ph_ifidx)
+ if (session && session->proto.pppoe.over_ifidx !=
+    m0->m_pkthdr.ph_ifidx) {
+ pipex_rele_session(session);
  session = NULL;
+ }
 
  return (session);
 }
@@ -1817,8 +1883,10 @@ pipex_pptp_userland_lookup_session(struc
  LIST_FOREACH(session, list, peer_addr_chain) {
  if (pipex_sockaddr_compar_addr(&session->peer.sa, sa) != 0)
  continue;
- if (session->peer_session_id == id)
+ if (session->peer_session_id == id) {
+ pipex_ref_session(session);
  break;
+ }
  }
 #ifdef PIPEX_DEBUG
  if (session == NULL) {
@@ -2250,8 +2318,10 @@ pipex_l2tp_userland_lookup_session(struc
  continue;
  if (session->proto.l2tp.peer_tunnel_id != tunnel_id)
  continue;
- if (session->peer_session_id == session_id)
+ if (session->peer_session_id == session_id) {
+ pipex_ref_session(session);
  break;
+ }
  }
 #ifdef PIPEX_DEBUG
  if (session == NULL) {
Index: sys/net/pipex.h
===================================================================
RCS file: /cvs/src/sys/net/pipex.h,v
retrieving revision 1.26
diff -u -p -r1.26 pipex.h
--- sys/net/pipex.h 29 Jul 2020 12:09:31 -0000 1.26
+++ sys/net/pipex.h 31 Jul 2020 13:56:32 -0000
@@ -201,6 +201,7 @@ void                  pipex_init (void);
 void                  pipex_iface_init (struct pipex_iface_context *, u_int);
 void                  pipex_iface_fini (struct pipex_iface_context *);
 
+void                  pipex_rele_session(struct pipex_session *);
 int                   pipex_notify_close_session(struct pipex_session *session);
 int                   pipex_notify_close_session_all(void);
 
Index: sys/net/pipex_local.h
===================================================================
RCS file: /cvs/src/sys/net/pipex_local.h,v
retrieving revision 1.38
diff -u -p -r1.38 pipex_local.h
--- sys/net/pipex_local.h 29 Jul 2020 12:09:31 -0000 1.38
+++ sys/net/pipex_local.h 31 Jul 2020 13:56:32 -0000
@@ -159,6 +159,8 @@ struct pipex_session {
  LIST_ENTRY(pipex_session) id_chain; /* [N] id hash chain */
  LIST_ENTRY(pipex_session) peer_addr_chain;
  /* [N] peer's address hash chain */
+ u_int refs; /* reference counter for this
+ session */
  uint16_t state; /* [N] pipex session state */
 #define PIPEX_STATE_INITIAL 0x0000
 #define PIPEX_STATE_OPENED 0x0001
@@ -379,7 +381,7 @@ void                  pipex_iface_start
 void                  pipex_iface_stop (struct pipex_iface_context *);
 int                   pipex_init_session(struct pipex_session **,
                                              struct pipex_session_req *);
-void                  pipex_rele_session(struct pipex_session *);
+void                  pipex_ref_session(struct pipex_session *);
 int                   pipex_link_session(struct pipex_session *,
                                              struct pipex_iface_context *);
 void                  pipex_unlink_session(struct pipex_session *);
@@ -393,7 +395,9 @@ int                   pipex_get_stat (st
 int                   pipex_get_closed (struct pipex_session_list_req *,
                           struct pipex_iface_context *);
 int                   pipex_destroy_session (struct pipex_session *);
+struct pipex_session  *pipex_lookup_by_ip_address_nonref (struct in_addr);
 struct pipex_session  *pipex_lookup_by_ip_address (struct in_addr);
+struct pipex_session  *pipex_lookup_by_session_id_nonref (int, int);
 struct pipex_session  *pipex_lookup_by_session_id (int, int);
 void                  pipex_ip_output (struct mbuf *, struct pipex_session *);
 void                  pipex_ppp_output (struct mbuf *, struct pipex_session *, int);
Index: sys/netinet/ip_gre.c
===================================================================
RCS file: /cvs/src/sys/netinet/ip_gre.c,v
retrieving revision 1.71
diff -u -p -r1.71 ip_gre.c
--- sys/netinet/ip_gre.c 7 Feb 2018 22:30:59 -0000 1.71
+++ sys/netinet/ip_gre.c 31 Jul 2020 13:56:32 -0000
@@ -81,10 +81,15 @@ gre_usrreq(struct socket *so, int req, s
  if (in_nam2sin(nam, &sin4) == 0)
  ina_dst = &sin4->sin_addr;
  }
- if (ina_dst != NULL &&
-    (session = pipex_pptp_userland_lookup_session_ipv4(m,
-    *ina_dst)))
- m = pipex_pptp_userland_output(m, session);
+ if (ina_dst != NULL) {
+ session = pipex_pptp_userland_lookup_session_ipv4(m,
+    *ina_dst);
+
+ if (session != NULL) {
+ m = pipex_pptp_userland_output(m, session);
+ pipex_rele_session(session);
+ }
+ }
 
  if (m == NULL)
  return (ENOMEM);
Index: sys/netinet/udp_usrreq.c
===================================================================
RCS file: /cvs/src/sys/netinet/udp_usrreq.c,v
retrieving revision 1.259
diff -u -p -r1.259 udp_usrreq.c
--- sys/netinet/udp_usrreq.c 21 Jun 2020 05:19:27 -0000 1.259
+++ sys/netinet/udp_usrreq.c 31 Jul 2020 13:56:32 -0000
@@ -565,8 +565,11 @@ udp_input(struct mbuf **mp, int *offp, i
  struct pipex_session *session;
  int off = iphlen + sizeof(struct udphdr);
  if ((session = pipex_l2tp_lookup_session(m, off)) != NULL) {
- if ((m = *mp = pipex_l2tp_input(m, off, session,
-    ipsecflowinfo)) == NULL) {
+ m = *mp = pipex_l2tp_input(m, off, session,
+    ipsecflowinfo);
+ pipex_rele_session(session);
+
+ if (m == NULL) {
  /* the packet is handled by PIPEX */
  return IPPROTO_DONE;
  }
@@ -1149,12 +1152,15 @@ udp_usrreq(struct socket *so, int req, s
  session =
     pipex_l2tp_userland_lookup_session_ipv4(
  m, inp->inp_faddr);
- if (session != NULL)
- if ((m = pipex_l2tp_userland_output(
-    m, session)) == NULL) {
+ if (session != NULL) {
+ m = pipex_l2tp_userland_output(m, session);
+ pipex_rele_session(session);
+
+ if(m == NULL) {
  error = ENOMEM;
  goto release;
  }
+ }
  }
 #endif
 

Reply | Threaded
Open this post in threaded view
|

Re: pipex(4): kill pipexintr()

Martin Pieuchot
In reply to this post by Vitaliy Makkoveev-2
On 31/07/20(Fri) 21:58, Vitaliy Makkoveev wrote:
> [...]
> What denies us to move pipex(4) under it's own lock?

Such question won't lead us anywhere.  It assumes it makes sense to move
pipex under its own lock.  This assumption has many drawback which clearly
haven't been studied and more importantly it doesn't explains what for.

What is your goal?  What are you trying to achieve?  Improve latency?
Improve performances?  Of which subsystem?  Where is the bottleneck?
What is the architecture of the system?

IMHO the KERNEL_LOCK() should be removed an anything else postponed at
least until one has a clear understanding of the whole subsystem under
the NET_LOCK().

Reply | Threaded
Open this post in threaded view
|

Re: pipex(4): kill pipexintr()

Vitaliy Makkoveev-2
On Fri, Jul 31, 2020 at 10:25:48PM +0200, Martin Pieuchot wrote:

> On 31/07/20(Fri) 21:58, Vitaliy Makkoveev wrote:
> > [...]
> > What denies us to move pipex(4) under it's own lock?
>
> Such question won't lead us anywhere.  It assumes it makes sense to move
> pipex under its own lock.  This assumption has many drawback which clearly
> haven't been studied and more importantly it doesn't explains what for.
>
> What is your goal?  What are you trying to achieve?  Improve latency?
> Improve performances?  Of which subsystem?  Where is the bottleneck?
> What is the architecture of the system?
>

If I understood Yasuoka correctly kthread "crynlk" is the bottleneck
and he wish MPPE be offloaded to another cpu. Since there is no
simultaneous execution of crypto thread and pipexintr() which do MPPE
stuff, the easiest way to reach it is to move pipex(4) under another
lock. I don't mean "stop everything and implement now", I mean we can do
it in the future.

If there is no bottleneck in crypto thread, I see no reason to not
remove pipexintr().

If the bottleneck is the crypto thread, I propose to implement
refcounters to fix issue.

And finish this thread.

> IMHO the KERNEL_LOCK() should be removed an anything else postponed at
> least until one has a clear understanding of the whole subsystem under
> the NET_LOCK().
>

The whole subsystem is under NET_LOCK() now. Just after we fix use after
free issue we can start to move pipex(4) out of KERNEL_LOCK().

Reply | Threaded
Open this post in threaded view
|

Re: pipex(4): kill pipexintr()

YASUOKA Masahiko-3
Hi,

I'm not sure when it is broken, in old versions, it was assumed the
pipex queues are empty when pipex_iface_stop() is called.  The problem
mvs@ found is the assumption is not true any more.

pipex has a mechanism that delete a session when the queues are empty.

    819 Static void
    820 pipex_timer(void *ignored_arg)
    821 {
    (snip)
    854                 case PIPEX_STATE_CLOSED:
    855                         /*
    856                          * mbuf queued in pipexinq or pipexoutq may have a
    857                          * refererce to this session.
    858                          */
    859                         if (!mq_empty(&pipexinq) || !mq_empty(&pipexoutq))
    860                                 continue;
    861
    862                         pipex_destroy_session(session);
    863                         break;

I think using this is better.

How about this?

diff --git a/sys/net/pipex.c b/sys/net/pipex.c
index 2ad7757fee9..6fe14c400bf 100644
--- a/sys/net/pipex.c
+++ b/sys/net/pipex.c
@@ -190,7 +190,7 @@ pipex_iface_stop(struct pipex_iface_context *pipex_iface)
  LIST_FOREACH_SAFE(session, &pipex_session_list, session_list,
     session_tmp) {
  if (session->pipex_iface == pipex_iface)
- pipex_destroy_session(session);
+ pipex_unlink_session(session);
  }
 }
 
@@ -470,9 +470,16 @@ pipex_link_session(struct pipex_session *session,
 void
 pipex_unlink_session(struct pipex_session *session)
 {
+ struct radix_node *rn;
+
  session->ifindex = 0;
 
  NET_ASSERT_LOCKED();
+ if (!in_nullhost(session->ip_address.sin_addr)) {
+ rn = rn_delete(&session->ip_address, &session->ip_netmask,
+    pipex_rd_head4, (struct radix_node *)session);
+ KASSERT(rn != NULL);
+ }
  LIST_REMOVE(session, id_chain);
 #if defined(PIPEX_PPTP) || defined(PIPEX_L2TP)
  switch (session->protocol) {
@@ -486,10 +493,6 @@ pipex_unlink_session(struct pipex_session *session)
  LIST_REMOVE(session, state_list);
  LIST_REMOVE(session, session_list);
  session->state = PIPEX_STATE_CLOSED;
-
- /* if final session is destroyed, stop timer */
- if (LIST_EMPTY(&pipex_session_list))
- pipex_timer_stop();
 }
 
 Static int
@@ -652,20 +655,16 @@ pipex_get_closed(struct pipex_session_list_req *req,
 Static int
 pipex_destroy_session(struct pipex_session *session)
 {
- struct radix_node *rn;
-
  /* remove from radix tree and hash chain */
  NET_ASSERT_LOCKED();
 
- if (!in_nullhost(session->ip_address.sin_addr)) {
- rn = rn_delete(&session->ip_address, &session->ip_netmask,
-    pipex_rd_head4, (struct radix_node *)session);
- KASSERT(rn != NULL);
- }
-
  pipex_unlink_session(session);
  pipex_rele_session(session);
 
+ /* if final session is destroyed, stop timer */
+ if (LIST_EMPTY(&pipex_session_list))
+ pipex_timer_stop();
+
  return (0);
 }
 
@@ -739,7 +738,8 @@ pipexintr(void)
  mq_delist(&pipexoutq, &ml);
  while ((m = ml_dequeue(&ml)) != NULL) {
  pkt_session = m->m_pkthdr.ph_cookie;
- if (pkt_session == NULL) {
+ if (pkt_session == NULL ||
+    pkt_session->state == PIPEX_STATE_CLOSED) {
  m_freem(m);
  continue;
  }
@@ -776,7 +776,8 @@ pipexintr(void)
  mq_delist(&pipexinq, &ml);
  while ((m = ml_dequeue(&ml)) != NULL) {
  pkt_session = m->m_pkthdr.ph_cookie;
- if (pkt_session == NULL) {
+ if (pkt_session == NULL ||
+    pkt_session->state == PIPEX_STATE_CLOSED) {
  m_freem(m);
  continue;
  }

Reply | Threaded
Open this post in threaded view
|

Re: pipex(4): kill pipexintr()

Vitaliy Makkoveev-2
On Sat, Aug 01, 2020 at 07:44:17PM +0900, YASUOKA Masahiko wrote:

> Hi,
>
> I'm not sure when it is broken, in old versions, it was assumed the
> pipex queues are empty when pipex_iface_stop() is called.  The problem
> mvs@ found is the assumption is not true any more.
>
> pipex has a mechanism that delete a session when the queues are empty.
>
>     819 Static void
>     820 pipex_timer(void *ignored_arg)
>     821 {
>     (snip)
>     854                 case PIPEX_STATE_CLOSED:
>     855                         /*
>     856                          * mbuf queued in pipexinq or pipexoutq may have a
>     857                          * refererce to this session.
>     858                          */
>     859                         if (!mq_empty(&pipexinq) || !mq_empty(&pipexoutq))
>     860                                 continue;
>     861
>     862                         pipex_destroy_session(session);
>     863                         break;
>
> I think using this is better.
>
> How about this?
>

Unfortunately your diff is incorrect. It introduces memory leaks and
breaks pppx(4). Also it is incomplete.

We have multiple ways to kill pipex(sessions):

1. pppx(4)

We have `struct pppx_if' which has pointer to corresponding session and
this session is accessed directly within pppx(4) layer. Since we can't
destroy `ppp_if' in pipex(4) layer we can't destroy these sessions by
pipex_timer(). The only way to destroy them is pppx_if_destroy() which:

1. unlink session by pipex_unlink_session()
2. detach corresponding `ifnet' by if_detach()
3. release session by pipex_rele_session()

It's unsafe because mbuf queues can have references to this session.

2. pppac(4)

We have no direct access to corresponding sessions within pppac(4)
layer. Also there are multiple ways to do this:

1. pipex_ioctl() with `PIPEXSMODE' command. Underlay pipex_iface_stop()
walks through `pipex_session_list' and destroy sessions by
pipex_destroy_session() call. It's unsafe because we don't check queues.

2. pipex_ioctl() with `PIPEXDSESSION'. pipex_close_session() will change
session's  state and pipex_timer() will kill this sessions later. This
is the only safe way.

3. pipex_iface_fini(). The same as `PIPEXSMODE', pipex_iface_stop()
kills sessions, Which is also unsafe. Also we have another use after
free issue:

---- cut begin ----

pipex_iface_fini(struct pipex_iface_context *pipex_iface)
{
        pool_put(&pipex_session_pool, pipex_iface->multicast_session);
        NET_LOCK();
        pipex_iface_stop(pipex_iface);
        NET_UNLOCK();
}

---- cut end ----

`multicast_session' should be protected too. It also can be pushed to
`pipexoutq'. Also since this time pipexintr() and pipex_iface_fini() are
both serialized by KERNEL_LOCK() too we can't destroy `multicast_session'
which is in use by pipexintr(). But when we will drop KERNEL_LOCK()
around pipexintr() we can catch use after free issue here. I already did
diff for move this pool_put() under NET_LOCK(), but it was rejectedi by
mpi@ because:

---- cut begin ----
pipex_iface_fini() should be called on the last reference of the                
descriptor.  So this shouldn't be necessary.  If there's an issue              
with the current order of the operations, we should certainly fix              
it differently.  
---- cut end ----

So I repeat it again: npppd(8) can be killed in every moment by SIGKILL
or by SIGSEGV and pppacclose() will be called and it will call
pipex_iface_fini(). `multicast_session' can be used in this moment by
pipexintr().

And no locks protect `multicast_session' itself.

The two diffs I proposed in this thread solve problems caused by
pipexintr().

I'm not sure about bottleneck in crypto thread, because we have no
simultaneous execution with pipexintr(), but the second diff which
introduces refcounters does session destruction well, include
`multicast_session'. I found a little issue with second diff which stops
processing multicast packets, so I included updated version below.

> diff --git a/sys/net/pipex.c b/sys/net/pipex.c
> index 2ad7757fee9..6fe14c400bf 100644
> --- a/sys/net/pipex.c
> +++ b/sys/net/pipex.c
> @@ -190,7 +190,7 @@ pipex_iface_stop(struct pipex_iface_context *pipex_iface)
>   LIST_FOREACH_SAFE(session, &pipex_session_list, session_list,
>      session_tmp) {
>   if (session->pipex_iface == pipex_iface)
> - pipex_destroy_session(session);
> + pipex_unlink_session(session);

You assumed pipex_timer() the only place where sessions can be released?
pipex_unlink_session() will detach session from all lists, include
`pipex_session_list'. This session is leaked.

Also what to do with multicast sessions?

>   }
>  }
>  
> @@ -470,9 +470,16 @@ pipex_link_session(struct pipex_session *session,
>  void
>  pipex_unlink_session(struct pipex_session *session)
>  {
> + struct radix_node *rn;
> +
>   session->ifindex = 0;
>  
>   NET_ASSERT_LOCKED();
> + if (!in_nullhost(session->ip_address.sin_addr)) {
> + rn = rn_delete(&session->ip_address, &session->ip_netmask,
> +    pipex_rd_head4, (struct radix_node *)session);
> + KASSERT(rn != NULL);
> + }

pppx(4) has no sessions in radix tree. We will panic here.

>   LIST_REMOVE(session, id_chain);
>  #if defined(PIPEX_PPTP) || defined(PIPEX_L2TP)
>   switch (session->protocol) {
> @@ -486,10 +493,6 @@ pipex_unlink_session(struct pipex_session *session)
>   LIST_REMOVE(session, state_list);
>   LIST_REMOVE(session, session_list);
>   session->state = PIPEX_STATE_CLOSED;
> -
> - /* if final session is destroyed, stop timer */
> - if (LIST_EMPTY(&pipex_session_list))
> - pipex_timer_stop();
>  }
>  
>  Static int
> @@ -652,20 +655,16 @@ pipex_get_closed(struct pipex_session_list_req *req,
>  Static int
>  pipex_destroy_session(struct pipex_session *session)
>  {
> - struct radix_node *rn;
> -
>   /* remove from radix tree and hash chain */
>   NET_ASSERT_LOCKED();
>  
> - if (!in_nullhost(session->ip_address.sin_addr)) {
> - rn = rn_delete(&session->ip_address, &session->ip_netmask,
> -    pipex_rd_head4, (struct radix_node *)session);
> - KASSERT(rn != NULL);
> - }
> -
>   pipex_unlink_session(session);
>   pipex_rele_session(session);
>  
> + /* if final session is destroyed, stop timer */
> + if (LIST_EMPTY(&pipex_session_list))
> + pipex_timer_stop();
> +
>   return (0);
>  }
>  
> @@ -739,7 +738,8 @@ pipexintr(void)
>   mq_delist(&pipexoutq, &ml);
>   while ((m = ml_dequeue(&ml)) != NULL) {
>   pkt_session = m->m_pkthdr.ph_cookie;
> - if (pkt_session == NULL) {
> + if (pkt_session == NULL ||
> +    pkt_session->state == PIPEX_STATE_CLOSED) {
>   m_freem(m);
>   continue;
>   }

Multicast sessions state never changed, so this checks is wrong for
them.

> @@ -776,7 +776,8 @@ pipexintr(void)
>   mq_delist(&pipexinq, &ml);
>   while ((m = ml_dequeue(&ml)) != NULL) {
>   pkt_session = m->m_pkthdr.ph_cookie;
> - if (pkt_session == NULL) {
> + if (pkt_session == NULL ||
> +    pkt_session->state == PIPEX_STATE_CLOSED) {
>   m_freem(m);
>   continue;
>   }
>

Index: sys/net/if_ethersubr.c
===================================================================
RCS file: /cvs/src/sys/net/if_ethersubr.c,v
retrieving revision 1.266
diff -u -p -r1.266 if_ethersubr.c
--- sys/net/if_ethersubr.c 22 Jul 2020 02:16:01 -0000 1.266
+++ sys/net/if_ethersubr.c 1 Aug 2020 15:49:36 -0000
@@ -527,6 +527,7 @@ ether_input(struct ifnet *ifp, struct mb
 
  if ((session = pipex_pppoe_lookup_session(m)) != NULL) {
  pipex_pppoe_input(m, session);
+ pipex_rele_session(session);
  return;
  }
  }
Index: sys/net/if_gre.c
===================================================================
RCS file: /cvs/src/sys/net/if_gre.c,v
retrieving revision 1.158
diff -u -p -r1.158 if_gre.c
--- sys/net/if_gre.c 10 Jul 2020 13:26:41 -0000 1.158
+++ sys/net/if_gre.c 1 Aug 2020 15:49:36 -0000
@@ -984,9 +984,15 @@ gre_input_1(struct gre_tunnel *key, stru
  struct pipex_session *session;
 
  session = pipex_pptp_lookup_session(m);
- if (session != NULL &&
-    pipex_pptp_input(m, session) == NULL)
- return (NULL);
+ if (session != NULL) {
+ struct mbuf *m0;
+
+ m0 = pipex_pptp_input(m, session);
+ pipex_rele_session(session);
+
+ if (m0 == NULL)
+ return NULL;
+ }
  }
 #endif
  break;
Index: sys/net/pipex.c
===================================================================
RCS file: /cvs/src/sys/net/pipex.c,v
retrieving revision 1.122
diff -u -p -r1.122 pipex.c
--- sys/net/pipex.c 29 Jul 2020 12:09:31 -0000 1.122
+++ sys/net/pipex.c 1 Aug 2020 15:49:36 -0000
@@ -165,6 +165,8 @@ pipex_iface_init(struct pipex_iface_cont
 
  /* virtual pipex_session entry for multicast */
  session = pool_get(&pipex_session_pool, PR_WAITOK | PR_ZERO);
+ session->refs = 1;
+ session->state = PIPEX_STATE_INITIAL;
  session->is_multicast = 1;
  session->pipex_iface = pipex_iface;
  session->ifindex = ifindex;
@@ -175,6 +177,7 @@ Static void
 pipex_iface_start(struct pipex_iface_context *pipex_iface)
 {
  pipex_iface->pipexmode = 1;
+ pipex_iface->multicast_session->state = PIPEX_STATE_OPENED;
 }
 
 Static void
@@ -183,6 +186,7 @@ pipex_iface_stop(struct pipex_iface_cont
  struct pipex_session *session, *session_tmp;
 
  pipex_iface->pipexmode = 0;
+ pipex_iface->multicast_session->state = PIPEX_STATE_CLOSED;
  /*
  * traversal all pipex sessions.
  * it will become heavy if the number of pppac devices bocomes large.
@@ -197,9 +201,9 @@ pipex_iface_stop(struct pipex_iface_cont
 void
 pipex_iface_fini(struct pipex_iface_context *pipex_iface)
 {
- pool_put(&pipex_session_pool, pipex_iface->multicast_session);
  NET_LOCK();
  pipex_iface_stop(pipex_iface);
+ pipex_rele_session(pipex_iface->multicast_session);
  NET_UNLOCK();
 }
 
@@ -329,6 +333,7 @@ pipex_init_session(struct pipex_session
 
  /* prepare a new session */
  session = pool_get(&pipex_session_pool, PR_WAITOK | PR_ZERO);
+ session->refs = 1;
  session->state = PIPEX_STATE_INITIAL;
  session->protocol = req->pr_protocol;
  session->session_id = req->pr_session_id;
@@ -421,9 +426,20 @@ pipex_init_session(struct pipex_session
  return 0;
 }
 
+void pipex_ref_session(struct pipex_session *session)
+{
+ atomic_inc_int_nv(&session->refs);
+ KASSERT(session->refs != 0);
+}
+
 void
 pipex_rele_session(struct pipex_session *session)
 {
+ KASSERT(session->refs > 0);
+
+ if (atomic_dec_int_nv(&session->refs) > 0)
+ return;
+
  if (session->mppe_recv.old_session_keys)
  pool_put(&mppe_key_pool, session->mppe_recv.old_session_keys);
  pool_put(&pipex_session_pool, session);
@@ -439,10 +455,12 @@ pipex_link_session(struct pipex_session
 
  if (!iface->pipexmode)
  return (ENXIO);
- if (pipex_lookup_by_session_id(session->protocol,
+ if (pipex_lookup_by_session_id_nonref(session->protocol,
     session->session_id))
  return (EEXIST);
 
+ pipex_ref_session(session);
+
  session->pipex_iface = iface;
  session->ifindex = iface->ifindex;
 
@@ -490,6 +508,8 @@ pipex_unlink_session(struct pipex_sessio
  /* if final session is destroyed, stop timer */
  if (LIST_EMPTY(&pipex_session_list))
  pipex_timer_stop();
+
+ pipex_rele_session(session);
 }
 
 Static int
@@ -505,8 +525,8 @@ pipex_add_session(struct pipex_session_r
 
  /* commit the session */
  if (!in_nullhost(session->ip_address.sin_addr)) {
- if (pipex_lookup_by_ip_address(session->ip_address.sin_addr)
-    != NULL) {
+ if (pipex_lookup_by_ip_address_nonref(
+    session->ip_address.sin_addr) != NULL) {
  error = EADDRINUSE;
  goto free;
  }
@@ -568,6 +588,7 @@ pipex_close_session(struct pipex_session
     struct pipex_iface_context *iface)
 {
  struct pipex_session *session;
+ int error = 0;
 
  NET_ASSERT_LOCKED();
  session = pipex_lookup_by_session_id(req->pcr_protocol,
@@ -575,17 +596,20 @@ pipex_close_session(struct pipex_session
  if (session == NULL)
  return (EINVAL);
  if (session->pipex_iface != iface)
- return (EINVAL);
-
- /* remove from close_wait list */
- if (session->state == PIPEX_STATE_CLOSE_WAIT)
- LIST_REMOVE(session, state_list);
+ error = EINVAL;
+ else {
+ /* remove from close_wait list */
+ if (session->state == PIPEX_STATE_CLOSE_WAIT)
+ LIST_REMOVE(session, state_list);
+
+ /* get statistics before destroy the session */
+ req->pcr_stat = session->stat;
+ session->state = PIPEX_STATE_CLOSED;
+ }
 
- /* get statistics before destroy the session */
- req->pcr_stat = session->stat;
- session->state = PIPEX_STATE_CLOSED;
+ pipex_rele_session(session);
 
- return (0);
+ return (error);
 }
 
 Static int
@@ -593,6 +617,7 @@ pipex_config_session(struct pipex_sessio
     struct pipex_iface_context *iface)
 {
  struct pipex_session *session;
+ int error = 0;
 
  NET_ASSERT_LOCKED();
  session = pipex_lookup_by_session_id(req->pcr_protocol,
@@ -600,10 +625,13 @@ pipex_config_session(struct pipex_sessio
  if (session == NULL)
  return (EINVAL);
  if (session->pipex_iface != iface)
- return (EINVAL);
- session->ip_forward = req->pcr_ip_forward;
+ error = EINVAL;
+ else
+ session->ip_forward = req->pcr_ip_forward;
 
- return (0);
+ pipex_rele_session(session);
+
+ return (error);
 }
 
 Static int
@@ -611,6 +639,7 @@ pipex_get_stat(struct pipex_session_stat
     struct pipex_iface_context *iface)
 {
  struct pipex_session *session;
+ int error = 0;
 
  NET_ASSERT_LOCKED();
  session = pipex_lookup_by_session_id(req->psr_protocol,
@@ -618,10 +647,13 @@ pipex_get_stat(struct pipex_session_stat
  if (session == NULL)
  return (EINVAL);
  if (session->pipex_iface != iface)
- return (EINVAL);
- req->psr_stat = session->stat;
+ error = EINVAL;
+ else
+ req->psr_stat = session->stat;
 
- return (0);
+ pipex_rele_session(session);
+
+ return (error);
 }
 
 Static int
@@ -670,7 +702,7 @@ pipex_destroy_session(struct pipex_sessi
 }
 
 Static struct pipex_session *
-pipex_lookup_by_ip_address(struct in_addr addr)
+pipex_lookup_by_ip_address_nonref(struct in_addr addr)
 {
  struct pipex_session *session;
  struct sockaddr_in pipex_in4, pipex_in4mask;
@@ -700,8 +732,20 @@ pipex_lookup_by_ip_address(struct in_add
  return (session);
 }
 
+struct pipex_session *
+pipex_lookup_by_ip_address(struct in_addr addr)
+{
+ struct pipex_session *session;
+
+ session = pipex_lookup_by_ip_address_nonref(addr);
+ if (session != NULL)
+ pipex_ref_session(session);
+
+ return (session);
+}
+
 Static struct pipex_session *
-pipex_lookup_by_session_id(int protocol, int session_id)
+pipex_lookup_by_session_id_nonref(int protocol, int session_id)
 {
  struct pipex_hash_head *list;
  struct pipex_session *session;
@@ -724,6 +768,18 @@ pipex_lookup_by_session_id(int protocol,
  return (session);
 }
 
+struct pipex_session *
+pipex_lookup_by_session_id(int protocol, int session_id)
+{
+ struct pipex_session *session;
+
+ session = pipex_lookup_by_session_id_nonref(protocol, session_id);
+ if (session != NULL)
+ pipex_ref_session(session);
+
+ return (session);
+}
+
 /***********************************************************************
  * Queue and Software Interrupt Handler
  ***********************************************************************/
@@ -743,6 +799,7 @@ pipexintr(void)
  m_freem(m);
  continue;
  }
+
  proto = m->m_pkthdr.ph_ppp_proto;
 
  m->m_pkthdr.ph_cookie = NULL;
@@ -752,6 +809,12 @@ pipexintr(void)
  struct pipex_session *session;
  struct mbuf *m0;
 
+ if (pkt_session->state != PIPEX_STATE_OPENED) {
+ m_freem(m);
+ pipex_rele_session(pkt_session);
+ continue;
+ }
+
  LIST_FOREACH(session, &pipex_session_list,
     session_list) {
  if (session->pipex_iface !=
@@ -770,6 +833,8 @@ pipexintr(void)
  m_freem(m);
  } else
  pipex_ppp_output(m, pkt_session, proto);
+
+ pipex_rele_session(pkt_session);
  }
 
  /* ppp input */
@@ -780,7 +845,9 @@ pipexintr(void)
  m_freem(m);
  continue;
  }
+
  pipex_ppp_input(m, pkt_session, 0);
+ pipex_rele_session(pkt_session);
  }
 }
 
@@ -788,6 +855,7 @@ Static int
 pipex_ppp_enqueue(struct mbuf *m0, struct pipex_session *session,
     struct mbuf_queue *mq)
 {
+ pipex_ref_session(session);
  m0->m_pkthdr.ph_cookie = session;
  /* XXX need to support other protocols */
  m0->m_pkthdr.ph_ppp_proto = PPP_IP;
@@ -852,13 +920,6 @@ pipex_timer(void *ignored_arg)
  /* FALLTHROUGH */
 
  case PIPEX_STATE_CLOSED:
- /*
- * mbuf queued in pipexinq or pipexoutq may have a
- * refererce to this session.
- */
- if (!mq_empty(&pipexinq) || !mq_empty(&pipexoutq))
- continue;
-
  pipex_destroy_session(session);
  break;
 
@@ -888,9 +949,10 @@ pipex_output(struct mbuf *m0, int af, in
  case AF_INET:
  if (m0->m_pkthdr.len >= sizeof(struct ip) + off) {
  m_copydata(m0, off, sizeof(struct ip), (caddr_t)&ip);
- if (IN_MULTICAST(ip.ip_dst.s_addr))
+ if (IN_MULTICAST(ip.ip_dst.s_addr)) {
  session = pipex_iface->multicast_session;
- else
+ pipex_ref_session(session);
+ } else
  session = pipex_lookup_by_ip_address(ip.ip_dst);
  }
  if (session != NULL) {
@@ -908,6 +970,7 @@ pipex_output(struct mbuf *m0, int af, in
  m_adj(m0, off);
 
  pipex_ip_output(m0, session);
+ pipex_rele_session(session);
  return (mret);
  }
  break;
@@ -917,8 +980,11 @@ pipex_output(struct mbuf *m0, int af, in
 
 drop:
  m_freem(m0);
- if (session != NULL)
+ if (session != NULL) {
  session->stat.oerrors++;
+ pipex_rele_session(session);
+ }
+
  return(NULL);
 }
 
@@ -1377,8 +1443,11 @@ pipex_pppoe_lookup_session(struct mbuf *
  PIPEX_DBG((NULL, LOG_DEBUG, "<%s> session not found (id=%d)",
     __func__, pppoe.session_id));
 #endif
- if (session && session->proto.pppoe.over_ifidx != m0->m_pkthdr.ph_ifidx)
+ if (session && session->proto.pppoe.over_ifidx !=
+    m0->m_pkthdr.ph_ifidx) {
+ pipex_rele_session(session);
  session = NULL;
+ }
 
  return (session);
 }
@@ -1817,8 +1886,10 @@ pipex_pptp_userland_lookup_session(struc
  LIST_FOREACH(session, list, peer_addr_chain) {
  if (pipex_sockaddr_compar_addr(&session->peer.sa, sa) != 0)
  continue;
- if (session->peer_session_id == id)
+ if (session->peer_session_id == id) {
+ pipex_ref_session(session);
  break;
+ }
  }
 #ifdef PIPEX_DEBUG
  if (session == NULL) {
@@ -2250,8 +2321,10 @@ pipex_l2tp_userland_lookup_session(struc
  continue;
  if (session->proto.l2tp.peer_tunnel_id != tunnel_id)
  continue;
- if (session->peer_session_id == session_id)
+ if (session->peer_session_id == session_id) {
+ pipex_ref_session(session);
  break;
+ }
  }
 #ifdef PIPEX_DEBUG
  if (session == NULL) {
Index: sys/net/pipex.h
===================================================================
RCS file: /cvs/src/sys/net/pipex.h,v
retrieving revision 1.26
diff -u -p -r1.26 pipex.h
--- sys/net/pipex.h 29 Jul 2020 12:09:31 -0000 1.26
+++ sys/net/pipex.h 1 Aug 2020 15:49:36 -0000
@@ -201,6 +201,7 @@ void                  pipex_init (void);
 void                  pipex_iface_init (struct pipex_iface_context *, u_int);
 void                  pipex_iface_fini (struct pipex_iface_context *);
 
+void                  pipex_rele_session(struct pipex_session *);
 int                   pipex_notify_close_session(struct pipex_session *session);
 int                   pipex_notify_close_session_all(void);
 
Index: sys/net/pipex_local.h
===================================================================
RCS file: /cvs/src/sys/net/pipex_local.h,v
retrieving revision 1.38
diff -u -p -r1.38 pipex_local.h
--- sys/net/pipex_local.h 29 Jul 2020 12:09:31 -0000 1.38
+++ sys/net/pipex_local.h 1 Aug 2020 15:49:36 -0000
@@ -159,6 +159,8 @@ struct pipex_session {
  LIST_ENTRY(pipex_session) id_chain; /* [N] id hash chain */
  LIST_ENTRY(pipex_session) peer_addr_chain;
  /* [N] peer's address hash chain */
+ u_int refs; /* reference counter for this
+ session */
  uint16_t state; /* [N] pipex session state */
 #define PIPEX_STATE_INITIAL 0x0000
 #define PIPEX_STATE_OPENED 0x0001
@@ -379,7 +381,7 @@ void                  pipex_iface_start
 void                  pipex_iface_stop (struct pipex_iface_context *);
 int                   pipex_init_session(struct pipex_session **,
                                              struct pipex_session_req *);
-void                  pipex_rele_session(struct pipex_session *);
+void                  pipex_ref_session(struct pipex_session *);
 int                   pipex_link_session(struct pipex_session *,
                                              struct pipex_iface_context *);
 void                  pipex_unlink_session(struct pipex_session *);
@@ -393,7 +395,9 @@ int                   pipex_get_stat (st
 int                   pipex_get_closed (struct pipex_session_list_req *,
                           struct pipex_iface_context *);
 int                   pipex_destroy_session (struct pipex_session *);
+struct pipex_session  *pipex_lookup_by_ip_address_nonref (struct in_addr);
 struct pipex_session  *pipex_lookup_by_ip_address (struct in_addr);
+struct pipex_session  *pipex_lookup_by_session_id_nonref (int, int);
 struct pipex_session  *pipex_lookup_by_session_id (int, int);
 void                  pipex_ip_output (struct mbuf *, struct pipex_session *);
 void                  pipex_ppp_output (struct mbuf *, struct pipex_session *, int);
Index: sys/netinet/ip_gre.c
===================================================================
RCS file: /cvs/src/sys/netinet/ip_gre.c,v
retrieving revision 1.71
diff -u -p -r1.71 ip_gre.c
--- sys/netinet/ip_gre.c 7 Feb 2018 22:30:59 -0000 1.71
+++ sys/netinet/ip_gre.c 1 Aug 2020 15:49:36 -0000
@@ -81,10 +81,15 @@ gre_usrreq(struct socket *so, int req, s
  if (in_nam2sin(nam, &sin4) == 0)
  ina_dst = &sin4->sin_addr;
  }
- if (ina_dst != NULL &&
-    (session = pipex_pptp_userland_lookup_session_ipv4(m,
-    *ina_dst)))
- m = pipex_pptp_userland_output(m, session);
+ if (ina_dst != NULL) {
+ session = pipex_pptp_userland_lookup_session_ipv4(m,
+    *ina_dst);
+
+ if (session != NULL) {
+ m = pipex_pptp_userland_output(m, session);
+ pipex_rele_session(session);
+ }
+ }
 
  if (m == NULL)
  return (ENOMEM);
Index: sys/netinet/udp_usrreq.c
===================================================================
RCS file: /cvs/src/sys/netinet/udp_usrreq.c,v
retrieving revision 1.259
diff -u -p -r1.259 udp_usrreq.c
--- sys/netinet/udp_usrreq.c 21 Jun 2020 05:19:27 -0000 1.259
+++ sys/netinet/udp_usrreq.c 1 Aug 2020 15:49:36 -0000
@@ -565,8 +565,11 @@ udp_input(struct mbuf **mp, int *offp, i
  struct pipex_session *session;
  int off = iphlen + sizeof(struct udphdr);
  if ((session = pipex_l2tp_lookup_session(m, off)) != NULL) {
- if ((m = *mp = pipex_l2tp_input(m, off, session,
-    ipsecflowinfo)) == NULL) {
+ m = *mp = pipex_l2tp_input(m, off, session,
+    ipsecflowinfo);
+ pipex_rele_session(session);
+
+ if (m == NULL) {
  /* the packet is handled by PIPEX */
  return IPPROTO_DONE;
  }
@@ -1149,12 +1152,15 @@ udp_usrreq(struct socket *so, int req, s
  session =
     pipex_l2tp_userland_lookup_session_ipv4(
  m, inp->inp_faddr);
- if (session != NULL)
- if ((m = pipex_l2tp_userland_output(
-    m, session)) == NULL) {
+ if (session != NULL) {
+ m = pipex_l2tp_userland_output(m, session);
+ pipex_rele_session(session);
+
+ if(m == NULL) {
  error = ENOMEM;
  goto release;
  }
+ }
  }
 #endif
 

Reply | Threaded
Open this post in threaded view
|

Re: pipex(4): kill pipexintr()

YASUOKA Masahiko-3
On Sat, 1 Aug 2020 18:52:27 +0300
Vitaliy Makkoveev <[hidden email]> wrote:

> On Sat, Aug 01, 2020 at 07:44:17PM +0900, YASUOKA Masahiko wrote:
>> I'm not sure when it is broken, in old versions, it was assumed the
>> pipex queues are empty when pipex_iface_stop() is called.  The problem
>> mvs@ found is the assumption is not true any more.
>>
>> pipex has a mechanism that delete a session when the queues are empty.
>>
>>     819 Static void
>>     820 pipex_timer(void *ignored_arg)
>>     821 {
>>     (snip)
>>     854                 case PIPEX_STATE_CLOSED:
>>     855                         /*
>>     856                          * mbuf queued in pipexinq or pipexoutq may have a
>>     857                          * refererce to this session.
>>     858                          */
>>     859                         if (!mq_empty(&pipexinq) || !mq_empty(&pipexoutq))
>>     860                                 continue;
>>     861
>>     862                         pipex_destroy_session(session);
>>     863                         break;
>>
>> I think using this is better.
>>
>> How about this?
>
> Unfortunately your diff is incorrect. It introduces memory leaks and
> breaks pppx(4). Also it is incomplete.

Thank you for your feedbacks.

> We have multiple ways to kill pipex(sessions):
>
> 1. pppx(4)
>
> We have `struct pppx_if' which has pointer to corresponding session and
> this session is accessed directly within pppx(4) layer. Since we can't
> destroy `ppp_if' in pipex(4) layer we can't destroy these sessions by
> pipex_timer(). The only way to destroy them is pppx_if_destroy() which:
>
> 1. unlink session by pipex_unlink_session()
> 2. detach corresponding `ifnet' by if_detach()
> 3. release session by pipex_rele_session()
>
> It's unsafe because mbuf queues can have references to this session.

Yes.

> 2. pppac(4)
>
> We have no direct access to corresponding sessions within pppac(4)
> layer. Also there are multiple ways to do this:
>
> 1. pipex_ioctl() with `PIPEXSMODE' command. Underlay pipex_iface_stop()
> walks through `pipex_session_list' and destroy sessions by
> pipex_destroy_session() call. It's unsafe because we don't check queues.
>
> 2. pipex_ioctl() with `PIPEXDSESSION'. pipex_close_session() will change
> session's  state and pipex_timer() will kill this sessions later. This
> is the only safe way.
>
> 3. pipex_iface_fini(). The same as `PIPEXSMODE', pipex_iface_stop()
> kills sessions, Which is also unsafe. Also we have another use after
> free issue:
>
> ---- cut begin ----
>
> pipex_iface_fini(struct pipex_iface_context *pipex_iface)
> {
>         pool_put(&pipex_session_pool, pipex_iface->multicast_session);
>         NET_LOCK();
>         pipex_iface_stop(pipex_iface);
>         NET_UNLOCK();
> }
>
> ---- cut end ----
>
> `multicast_session' should be protected too. It also can be pushed to
> `pipexoutq'.

Yes, I missed this point.

> Also since this time pipexintr() and pipex_iface_fini() are
> both serialized by KERNEL_LOCK() too we can't destroy `multicast_session'
> which is in use by pipexintr(). But when we will drop KERNEL_LOCK()
> around pipexintr() we can catch use after free issue here. I already did
> diff for move this pool_put() under NET_LOCK(), but it was rejectedi by
> mpi@ because:
>
> ---- cut begin ----
> pipex_iface_fini() should be called on the last reference of the                
> descriptor.  So this shouldn't be necessary.  If there's an issue              
> with the current order of the operations, we should certainly fix              
> it differently.  
> ---- cut end ----

Yes, I understand what mpi@ is saying.  But this is a separate story.

> So I repeat it again: npppd(8) can be killed in every moment by SIGKILL
> or by SIGSEGV and pppacclose() will be called and it will call
> pipex_iface_fini(). `multicast_session' can be used in this moment by
> pipexintr().
>
> And no locks protect `multicast_session' itself.
>
> The two diffs I proposed in this thread solve problems caused by
> pipexintr().

There are a lot of ways to solve the problems.

The diff I sent few days ago is to destruct the pipex sessions in the
pipex timer.  As you pointed out it has some problems.  Those problems
can be fixed, but I'd suggest another way.  I attached at last.

The problem exposed is "use-after-free".  Since I think this is not a
problem of parallel processing, having reference counter seems too
much for me.


The diff is not to refer the session by a pointer, but by the id.
The idea is come from IPsec tdb.

Comments?


diff --git a/sys/net/pipex.c b/sys/net/pipex.c
index 2ad7757fee9..5f24381878c 100644
--- a/sys/net/pipex.c
+++ b/sys/net/pipex.c
@@ -148,6 +148,7 @@ void
 pipex_iface_init(struct pipex_iface_context *pipex_iface, u_int ifindex)
 {
  struct pipex_session *session;
+ struct pipex_hash_head *chain;
 
  pipex_iface->pipexmode = 0;
  pipex_iface->ifindex = ifindex;
@@ -168,7 +169,12 @@ pipex_iface_init(struct pipex_iface_context *pipex_iface, u_int ifindex)
  session->is_multicast = 1;
  session->pipex_iface = pipex_iface;
  session->ifindex = ifindex;
+ /* register it to the pipex id hash */
+ session->protocol = PIPEX_PROTO_NONE;
+ session->session_id = ifindex;
  pipex_iface->multicast_session = session;
+ chain = PIPEX_ID_HASHTABLE(pipex_iface->multicast_session->session_id);
+ LIST_INSERT_HEAD(chain, pipex_iface->multicast_session, id_chain);
 }
 
 Static void
@@ -197,10 +203,11 @@ pipex_iface_stop(struct pipex_iface_context *pipex_iface)
 void
 pipex_iface_fini(struct pipex_iface_context *pipex_iface)
 {
- pool_put(&pipex_session_pool, pipex_iface->multicast_session);
  NET_LOCK();
+ LIST_REMOVE(pipex_iface->multicast_session, id_chain);
  pipex_iface_stop(pipex_iface);
  NET_UNLOCK();
+ pool_put(&pipex_session_pool, pipex_iface->multicast_session);
 }
 
 int
@@ -734,11 +741,14 @@ pipexintr(void)
  u_int16_t proto;
  struct mbuf *m;
  struct mbuf_list ml;
+ uintptr_t cookie;
 
  /* ppp output */
  mq_delist(&pipexoutq, &ml);
  while ((m = ml_dequeue(&ml)) != NULL) {
- pkt_session = m->m_pkthdr.ph_cookie;
+ cookie = (uintptr_t)m->m_pkthdr.ph_cookie;
+ pkt_session = pipex_lookup_by_session_id(
+    cookie >> 16, cookie & 0xffff);
  if (pkt_session == NULL) {
  m_freem(m);
  continue;
@@ -775,7 +785,9 @@ pipexintr(void)
  /* ppp input */
  mq_delist(&pipexinq, &ml);
  while ((m = ml_dequeue(&ml)) != NULL) {
- pkt_session = m->m_pkthdr.ph_cookie;
+ cookie = (uintptr_t)m->m_pkthdr.ph_cookie;
+ pkt_session = pipex_lookup_by_session_id(
+    cookie >> 16, cookie & 0xffff);
  if (pkt_session == NULL) {
  m_freem(m);
  continue;
@@ -788,7 +800,10 @@ Static int
 pipex_ppp_enqueue(struct mbuf *m0, struct pipex_session *session,
     struct mbuf_queue *mq)
 {
- m0->m_pkthdr.ph_cookie = session;
+ uintptr_t cookie;
+
+ cookie = session->protocol << 16 | session->session_id;
+ m0->m_pkthdr.ph_cookie = (caddr_t)cookie;
  /* XXX need to support other protocols */
  m0->m_pkthdr.ph_ppp_proto = PPP_IP;
 
diff --git a/sys/net/pipex.h b/sys/net/pipex.h
index bba010b46aa..3f493f33c9a 100644
--- a/sys/net/pipex.h
+++ b/sys/net/pipex.h
@@ -44,6 +44,7 @@
         { "outq", CTLTYPE_NODE }, \
 }
 
+#define PIPEX_PROTO_NONE 0
 #define PIPEX_PROTO_L2TP 1 /* protocol L2TP */
 #define PIPEX_PROTO_PPTP 2 /* protocol PPTP */
 #define PIPEX_PROTO_PPPOE 3 /* protocol PPPoE */

Reply | Threaded
Open this post in threaded view
|

Re: pipex(4): kill pipexintr()

Vitaliy Makkoveev-2
On Tue, Aug 04, 2020 at 01:26:14AM +0900, YASUOKA Masahiko wrote:

> On Sat, 1 Aug 2020 18:52:27 +0300
> Vitaliy Makkoveev <[hidden email]> wrote:
> > On Sat, Aug 01, 2020 at 07:44:17PM +0900, YASUOKA Masahiko wrote:
> >> I'm not sure when it is broken, in old versions, it was assumed the
> >> pipex queues are empty when pipex_iface_stop() is called.  The problem
> >> mvs@ found is the assumption is not true any more.
> >>
> >> pipex has a mechanism that delete a session when the queues are empty.
> >>
> >>     819 Static void
> >>     820 pipex_timer(void *ignored_arg)
> >>     821 {
> >>     (snip)
> >>     854                 case PIPEX_STATE_CLOSED:
> >>     855                         /*
> >>     856                          * mbuf queued in pipexinq or pipexoutq may have a
> >>     857                          * refererce to this session.
> >>     858                          */
> >>     859                         if (!mq_empty(&pipexinq) || !mq_empty(&pipexoutq))
> >>     860                                 continue;
> >>     861
> >>     862                         pipex_destroy_session(session);
> >>     863                         break;
> >>
> >> I think using this is better.
> >>
> >> How about this?
> >
> > Unfortunately your diff is incorrect. It introduces memory leaks and
> > breaks pppx(4). Also it is incomplete.
>
> Thank you for your feedbacks.
>
> > We have multiple ways to kill pipex(sessions):
> >
> > 1. pppx(4)
> >
> > We have `struct pppx_if' which has pointer to corresponding session and
> > this session is accessed directly within pppx(4) layer. Since we can't
> > destroy `ppp_if' in pipex(4) layer we can't destroy these sessions by
> > pipex_timer(). The only way to destroy them is pppx_if_destroy() which:
> >
> > 1. unlink session by pipex_unlink_session()
> > 2. detach corresponding `ifnet' by if_detach()
> > 3. release session by pipex_rele_session()
> >
> > It's unsafe because mbuf queues can have references to this session.
>
> Yes.
>
> > 2. pppac(4)
> >
> > We have no direct access to corresponding sessions within pppac(4)
> > layer. Also there are multiple ways to do this:
> >
> > 1. pipex_ioctl() with `PIPEXSMODE' command. Underlay pipex_iface_stop()
> > walks through `pipex_session_list' and destroy sessions by
> > pipex_destroy_session() call. It's unsafe because we don't check queues.
> >
> > 2. pipex_ioctl() with `PIPEXDSESSION'. pipex_close_session() will change
> > session's  state and pipex_timer() will kill this sessions later. This
> > is the only safe way.
> >
> > 3. pipex_iface_fini(). The same as `PIPEXSMODE', pipex_iface_stop()
> > kills sessions, Which is also unsafe. Also we have another use after
> > free issue:
> >
> > ---- cut begin ----
> >
> > pipex_iface_fini(struct pipex_iface_context *pipex_iface)
> > {
> >         pool_put(&pipex_session_pool, pipex_iface->multicast_session);
> >         NET_LOCK();
> >         pipex_iface_stop(pipex_iface);
> >         NET_UNLOCK();
> > }
> >
> > ---- cut end ----
> >
> > `multicast_session' should be protected too. It also can be pushed to
> > `pipexoutq'.
>
> Yes, I missed this point.
>
> > Also since this time pipexintr() and pipex_iface_fini() are
> > both serialized by KERNEL_LOCK() too we can't destroy `multicast_session'
> > which is in use by pipexintr(). But when we will drop KERNEL_LOCK()
> > around pipexintr() we can catch use after free issue here. I already did
> > diff for move this pool_put() under NET_LOCK(), but it was rejectedi by
> > mpi@ because:
> >
> > ---- cut begin ----
> > pipex_iface_fini() should be called on the last reference of the                
> > descriptor.  So this shouldn't be necessary.  If there's an issue              
> > with the current order of the operations, we should certainly fix              
> > it differently.  
> > ---- cut end ----
>
> Yes, I understand what mpi@ is saying.  But this is a separate story.
>
> > So I repeat it again: npppd(8) can be killed in every moment by SIGKILL
> > or by SIGSEGV and pppacclose() will be called and it will call
> > pipex_iface_fini(). `multicast_session' can be used in this moment by
> > pipexintr().
> >
> > And no locks protect `multicast_session' itself.
> >
> > The two diffs I proposed in this thread solve problems caused by
> > pipexintr().
>
> There are a lot of ways to solve the problems.
>
> The diff I sent few days ago is to destruct the pipex sessions in the
> pipex timer.  As you pointed out it has some problems.  Those problems
> can be fixed, but I'd suggest another way.  I attached at last.
>
> The problem exposed is "use-after-free".  Since I think this is not a
> problem of parallel processing, having reference counter seems too
> much for me.

This is object life time problem and it isn't MP specific. Since we do
all operations under lock we can avoid atomic operations for counter
modifications. Also we will introduce these counters just before
introduce fine grained locks for pipex(4) :) For places like below,
because there is no reason to hold lock which protects global pipex
lists:

---- cut begin ----
if ((session = pipex_pppoe_lookup_session(m)) != NULL)
        pipex_pppoe_input(m, session);

---- cut end ----

Also I like my first solution which removes pipexintr() because it
removes context switch in packet processing and makes code simpler.

I understood the problem with crypto thread you pointed to me. But now
we can't offload MPPE to another cpu and I can't predict when it will
be possible.

We have `nettqmp' designed to run multiple threads in parallel, but
according comment above net_tq(), ipsec denies to do this :) I guess
crypto_dispatch() should go the same way and allow multiple
`crypto_taskq_mpsafe' to run in parallel. It can allow us at least
reduce lock awaits in packet processing path.

>
>
> The diff is not to refer the session by a pointer, but by the id.
> The idea is come from IPsec tdb.
>
> Comments?
>

You introduce `cookie' as

        cookie = session->protocol << 16 | session->session_id;

also multicast sessions initialized as

        session->protocol = PIPEX_PROTO_NONE;
        session->session_id = ifindex;

`protocol' and `session_id' come from userland, so I like to have checks
like below. It's allow us to avoid `cookie' be broken while
`pr_session_id' exceeds 16 bit integer. Also userland should not pass
PIPEX_PROTO_NONE as `pr_protocol' because we shouldn't have multicast
and not multicast sessions with the same `cookie'.

---- cut begin ----

pipex_init_session(struct pipex_session **rsession,
    struct pipex_session_req *req)
{
        if (req->pr_protocol == PIPEX_PROTO_NONE)
                return (EINVAL);

        if (req->pr_session_id > 0xffff)
                return (EINVAL);

---- cut end ----


Also cookies introduce invalidation problem. Yes, it has low
probability, but we can have operation order like below:

1. enqueue session with `protocol' = 0xaa and `session_id' = 0xbb, and
        `cookie' = 0xaabb
2. kill this session
3. create new session `protocol' = 0xaa and `session_id' = 0xbb
4. this newly created session will be used by pipexintr()

As I have seen while played with refcounters, session can be enqueued
more than 10 times...

Also It's not obvious that interface index will never exceed 16 bit
counter. It's unsigned int and may be underlay counter's resolution
will be expanded in future. So I like to have at least corresponding
assertion in pipex_iface_init().

So, may be my first solution is the best here. And, as mpi@ pointed,
ipsec(4) should be reworked to allow parallelism.

>
> diff --git a/sys/net/pipex.c b/sys/net/pipex.c
> index 2ad7757fee9..5f24381878c 100644
> --- a/sys/net/pipex.c
> +++ b/sys/net/pipex.c
> @@ -148,6 +148,7 @@ void
>  pipex_iface_init(struct pipex_iface_context *pipex_iface, u_int ifindex)
>  {
>   struct pipex_session *session;
> + struct pipex_hash_head *chain;
>
>   pipex_iface->pipexmode = 0;
>   pipex_iface->ifindex = ifindex;
> @@ -168,7 +169,12 @@ pipex_iface_init(struct pipex_iface_context *pipex_iface, u_int ifindex)
>   session->is_multicast = 1;
>   session->pipex_iface = pipex_iface;
>   session->ifindex = ifindex;
> + /* register it to the pipex id hash */
> + session->protocol = PIPEX_PROTO_NONE;
> + session->session_id = ifindex;
>   pipex_iface->multicast_session = session;
> + chain = PIPEX_ID_HASHTABLE(pipex_iface->multicast_session->session_id);
> + LIST_INSERT_HEAD(chain, pipex_iface->multicast_session, id_chain);
>  }
>  
>  Static void
> @@ -197,10 +203,11 @@ pipex_iface_stop(struct pipex_iface_context *pipex_iface)
>  void
>  pipex_iface_fini(struct pipex_iface_context *pipex_iface)
>  {
> - pool_put(&pipex_session_pool, pipex_iface->multicast_session);
>   NET_LOCK();
> + LIST_REMOVE(pipex_iface->multicast_session, id_chain);
>   pipex_iface_stop(pipex_iface);
>   NET_UNLOCK();
> + pool_put(&pipex_session_pool, pipex_iface->multicast_session);
>  }
>  
>  int
> @@ -734,11 +741,14 @@ pipexintr(void)
>   u_int16_t proto;
>   struct mbuf *m;
>   struct mbuf_list ml;
> + uintptr_t cookie;
>  
>   /* ppp output */
>   mq_delist(&pipexoutq, &ml);
>   while ((m = ml_dequeue(&ml)) != NULL) {
> - pkt_session = m->m_pkthdr.ph_cookie;
> + cookie = (uintptr_t)m->m_pkthdr.ph_cookie;
> + pkt_session = pipex_lookup_by_session_id(
> +    cookie >> 16, cookie & 0xffff);
>   if (pkt_session == NULL) {
>   m_freem(m);
>   continue;
> @@ -775,7 +785,9 @@ pipexintr(void)
>   /* ppp input */
>   mq_delist(&pipexinq, &ml);
>   while ((m = ml_dequeue(&ml)) != NULL) {
> - pkt_session = m->m_pkthdr.ph_cookie;
> + cookie = (uintptr_t)m->m_pkthdr.ph_cookie;
> + pkt_session = pipex_lookup_by_session_id(
> +    cookie >> 16, cookie & 0xffff);
>   if (pkt_session == NULL) {
>   m_freem(m);
>   continue;
> @@ -788,7 +800,10 @@ Static int
>  pipex_ppp_enqueue(struct mbuf *m0, struct pipex_session *session,
>      struct mbuf_queue *mq)
>  {
> - m0->m_pkthdr.ph_cookie = session;
> + uintptr_t cookie;
> +
> + cookie = session->protocol << 16 | session->session_id;
> + m0->m_pkthdr.ph_cookie = (caddr_t)cookie;
>   /* XXX need to support other protocols */
>   m0->m_pkthdr.ph_ppp_proto = PPP_IP;
>  
> diff --git a/sys/net/pipex.h b/sys/net/pipex.h
> index bba010b46aa..3f493f33c9a 100644
> --- a/sys/net/pipex.h
> +++ b/sys/net/pipex.h
> @@ -44,6 +44,7 @@
>          { "outq", CTLTYPE_NODE }, \
>  }
>  
> +#define PIPEX_PROTO_NONE 0
>  #define PIPEX_PROTO_L2TP 1 /* protocol L2TP */
>  #define PIPEX_PROTO_PPTP 2 /* protocol PPTP */
>  #define PIPEX_PROTO_PPPOE 3 /* protocol PPPoE */

Reply | Threaded
Open this post in threaded view
|

Re: pipex(4): kill pipexintr()

YASUOKA Masahiko-3
On Mon, 3 Aug 2020 23:36:09 +0300
Vitaliy Makkoveev <[hidden email]> wrote:

> On Tue, Aug 04, 2020 at 01:26:14AM +0900, YASUOKA Masahiko wrote:
>> Comments?
>
> You introduce `cookie' as
>
> cookie = session->protocol << 16 | session->session_id;
>
> also multicast sessions initialized as
>
> session->protocol = PIPEX_PROTO_NONE;
> session->session_id = ifindex;
>
> `protocol' and `session_id' come from userland, so I like to have checks
> like below. It's allow us to avoid `cookie' be broken while
> `pr_session_id' exceeds 16 bit integer. Also userland should not pass
> PIPEX_PROTO_NONE as `pr_protocol' because we shouldn't have multicast
> and not multicast sessions with the same `cookie'.
>
> ---- cut begin ----
>
> pipex_init_session(struct pipex_session **rsession,
>     struct pipex_session_req *req)
> {
> if (req->pr_protocol == PIPEX_PROTO_NONE)
> return (EINVAL);

pipex_init_session() has the same check already.

 287 int
 288 pipex_init_session(struct pipex_session **rsession,
 289     struct pipex_session_req *req)
 290 {
 (snip)
 297         switch (req->pr_protocol) {
 298 #ifdef PIPEX_PPPOE
 299         case PIPEX_PROTO_PPPOE:
 (snip)
 333         default:
 334                 return (EPROTONOSUPPORT);
 335         }

>
> if (req->pr_session_id > 0xffff)
> return (EINVAL);
>
> ---- cut end ----

req->pr_session_id can't be > 0xffff since it's uint16_t.

> Also cookies introduce invalidation problem. Yes, it has low
> probability, but we can have operation order like below:
>
> 1. enqueue session with `protocol' = 0xaa and `session_id' = 0xbb, and
> `cookie' = 0xaabb
> 2. kill this session
> 3. create new session `protocol' = 0xaa and `session_id' = 0xbb
> 4. this newly created session will be used by pipexintr()
>
> As I have seen while played with refcounters, session can be enqueued
> more than 10 times...

The diff makes the problem worse, but it could happen already if the
session-id is reused.

> Also It's not obvious that interface index will never exceed 16 bit
> counter. It's unsigned int and may be underlay counter's resolution
> will be expanded in future. So I like to have at least corresponding
> assertion in pipex_iface_init().

Right.  This is fixable with another unique number.

> So, may be my first solution is the best here. And, as mpi@ pointed,
> ipsec(4) should be reworked to allow parallelism.

Does first mean killing the pipexintr?

What I explained was wrong.  I'm sorry about this.

On Fri, 31 Jul 2020 09:36:32 +0900 (JST)
YASUOKA Masahiko <[hidden email]> wrote:

> A packet of L2TP/IPsec (encapsulated IP/PPP/L2TP/UDP/ESP/UDP/IP) is
> processed like:
>
>    ipv4_input
>      ...
>        udp_input
>          ipsec_common_input
>          esp_input
>            crypto_dispatch
>              => crypto_taskq_mp_safe
>
>    kthread "crynlk"
>      crypto_invoke
>        ... (*1)
>          crypto_done
>      esp_input_cb
>        ipsec_common_input_cb
>          ip_deliver
>            udp_input
>              pipex_l2tp_input
>                pipex_common_input
>                  (*2)
>                  pipex_ppp_input
>                    pipex_mppe_input (*3)
>                      pipex_ppp_input
>                        pipex_ip_input
>                          ipv4_input
>                            ...

This should be

   kthread "crynlk"
     crypto_invoke
       ... (*1)
         crypto_done
   kthread "crypto"             <---- another thread
     ipsec_input_cb             <---- this is missed
       esp_input_cb
         ipsec_common_input_cb
           ip_deliver
             udp_input
               pipex_l2tp_input
                 pipex_common_input
                   (*2)
                   pipex_ppp_input
                     pipex_mppe_input (*3)
                       pipex_ppp_input
                         pipex_ip_input
                           ipv4_input
                             ...

> At *2 there was a queue.  "crynlk" is a busy thread, since it is doing
> decryption at *1.  I think it's better pipex input is be done by
> another thread than crypto since it also has decryption at *3.

This is false.  *3 is done by another thread.
It is the same if crypto driver is not CRYPTOCAP_F_MPSAFE.
(crypto_invoke() is done by the caller's thread and the callback
 (ipsec_input_cb) is called by"crypto" thread.)

So I have no actual reason to keep the queues.

ok yasuoka for the diff which kills pipexintr.

12