mbuf statistics, tracking of drops

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

mbuf statistics, tracking of drops

Gregor Best-2
Hi people,

while reading around in /sys/kern/uipc_mbuf.c to try to track down a
problem with my iwm(4) that seems to correlate with mbuf allocation
failures, I noticed that the MBSTAT_DROPS counter and its friends
MBSTAT_{WAIT,DRAIN} don't seem to get increased anywhere in /sys.

Does the patch below the signature make sense for counting MBSTAT_DROPS?

I've got a similar patch for MBSTAT_WAIT, but it's pretty ugly because
as far as I can see, there's no real way to notice when pool_get sleeps
except for "Try pool_get with M_NOWAIT first and if that returns NULL,
try again with M_WAITOK".

--
        Gregor

Index: /sys/kern/uipc_mbuf.c
===================================================================
RCS file: /home/cvs/src/sys/kern/uipc_mbuf.c,v
retrieving revision 1.250
diff -u -p -r1.250 uipc_mbuf.c
--- /sys/kern/uipc_mbuf.c 12 Oct 2017 09:14:16 -0000 1.250
+++ /sys/kern/uipc_mbuf.c 11 Nov 2017 12:03:39 -0000
@@ -233,14 +233,14 @@ m_get(int nowait, int type)
  KDASSERT(type < MT_NTYPES);
 
  m = pool_get(&mbpool, nowait == M_WAIT ? PR_WAITOK : PR_NOWAIT);
- if (m == NULL)
- return (NULL);
 
  s = splnet();
  counters = counters_enter(&cr, mbstat);
+ if (m == NULL) {
+ counters[MBSTAT_DROPS]++;
+ goto out;
+ }
  counters[type]++;
- counters_leave(&cr, mbstat);
- splx(s);
 
  m->m_type = type;
  m->m_next = NULL;
@@ -248,6 +248,10 @@ m_get(int nowait, int type)
  m->m_data = m->m_dat;
  m->m_flags = 0;
 
+out:
+ counters_leave(&cr, mbstat);
+ splx(s);
+
  return (m);
 }
 
@@ -266,16 +270,23 @@ m_gethdr(int nowait, int type)
  KDASSERT(type < MT_NTYPES);
 
  m = pool_get(&mbpool, nowait == M_WAIT ? PR_WAITOK : PR_NOWAIT);
- if (m == NULL)
- return (NULL);
 
  s = splnet();
  counters = counters_enter(&cr, mbstat);
+ if (m == NULL) {
+ counters[MBSTAT_DROPS]++;
+ goto out;
+ }
  counters[type]++;
+
+ m->m_type = type;
+
+out:
  counters_leave(&cr, mbstat);
  splx(s);
 
- m->m_type = type;
+ if (m == NULL)
+ return NULL;
 
  return (m_inithdr(m));
 }
@@ -349,7 +360,10 @@ m_clget(struct mbuf *m, int how, u_int p
 {
  struct mbuf *m0 = NULL;
  struct pool *pp;
+ struct counters_ref cr;
+ uint64_t *counters;
  caddr_t buf;
+ int s;
 
  pp = m_clpool(pktlen);
 #ifdef DIAGNOSTIC
@@ -364,9 +378,16 @@ m_clget(struct mbuf *m, int how, u_int p
 
  m = m0;
  }
+
  buf = pool_get(pp, how == M_WAIT ? PR_WAITOK : PR_NOWAIT);
+
  if (buf == NULL) {
  m_freem(m0);
+ s = splnet();
+ counters = counters_enter(&cr, mbstat);
+ counters[MBSTAT_DROPS]++;
+ counters_leave(&cr, mbstat);
+ splx(s);
  return (NULL);
  }
 

Reply | Threaded
Open this post in threaded view
|

Re: mbuf statistics, tracking of drops

Martin Pieuchot
On 11/11/17(Sat) 13:05, Gregor Best wrote:
> Hi people,
>
> while reading around in /sys/kern/uipc_mbuf.c to try to track down a
> problem with my iwm(4) that seems to correlate with mbuf allocation
> failures, I noticed that the MBSTAT_DROPS counter and its friends
> MBSTAT_{WAIT,DRAIN} don't seem to get increased anywhere in /sys.
>
> Does the patch below the signature make sense for counting MBSTAT_DROPS?

It does, some comments below.

> I've got a similar patch for MBSTAT_WAIT, but it's pretty ugly because
> as far as I can see, there's no real way to notice when pool_get sleeps
> except for "Try pool_get with M_NOWAIT first and if that returns NULL,
> try again with M_WAITOK".

This would be an approximation because it might happen that after
returning NULL the second pool_get(9) won't sleep.  However I think
it's the way to go because m_get(9) calls that can wait are generally
not performance critical.

> Index: /sys/kern/uipc_mbuf.c
> ===================================================================
> RCS file: /home/cvs/src/sys/kern/uipc_mbuf.c,v
> retrieving revision 1.250
> diff -u -p -r1.250 uipc_mbuf.c
> --- /sys/kern/uipc_mbuf.c 12 Oct 2017 09:14:16 -0000 1.250
> +++ /sys/kern/uipc_mbuf.c 11 Nov 2017 12:03:39 -0000
> @@ -233,14 +233,14 @@ m_get(int nowait, int type)
>   KDASSERT(type < MT_NTYPES);
>  
>   m = pool_get(&mbpool, nowait == M_WAIT ? PR_WAITOK : PR_NOWAIT);
> - if (m == NULL)
> - return (NULL);
>  
>   s = splnet();
>   counters = counters_enter(&cr, mbstat);
> + if (m == NULL) {
> + counters[MBSTAT_DROPS]++;
> + goto out;
> + }
>   counters[type]++;
> - counters_leave(&cr, mbstat);
> - splx(s);

Please do not expand the splx().  Only the counter need it.  Simply put
the NULL check after the splx().

Reply | Threaded
Open this post in threaded view
|

Re: mbuf statistics, tracking of drops

Gregor Best-2
In reply to this post by Gregor Best-2
Hi Martin,

On Sun, Nov 12, 2017 at 03:40:59PM +0100, Martin Pieuchot wrote:
> [...]
> It does, some comments below.
> [...]

Wonderful.

> [...]
> This would be an approximation because it might happen that after
> returning NULL the second pool_get(9) won't sleep.  However I think
> it's the way to go because m_get(9) calls that can wait are generally
> not performance critical.
> [...]

I had not considered that the second pool_get might not block at all. On
the other hand `netstat -m` says "requests for memory delayed" for that
counter, so returning immediately on the second try not counting as a
delay is OK, I think.

> [...]
> Please do not expand the splx().  Only the counter need it.  Simply put
> the NULL check after the splx().
> [...]

I'm not sure I understand. If I move the NULL check after the splx(),
counters_leave() will already have been called so increasing the counter
value has no effect anymore. The only additional things running under
splnet will be a few assignments, so I think moving the splx() a bit
further down does not hurt too much.

Alternatively, I've attached a slightly different suggestion which
doesn't expand the scope of the splx() but duplicates a bit of code.
Does that make more sense?

--
        Gregor

Index: uipc_mbuf.c
===================================================================
RCS file: /home/cvs/src/sys/kern/uipc_mbuf.c,v
retrieving revision 1.250
diff -u -p -r1.250 uipc_mbuf.c
--- uipc_mbuf.c 12 Oct 2017 09:14:16 -0000 1.250
+++ uipc_mbuf.c 12 Nov 2017 22:09:00 -0000
@@ -233,11 +233,15 @@ m_get(int nowait, int type)
  KDASSERT(type < MT_NTYPES);
 
  m = pool_get(&mbpool, nowait == M_WAIT ? PR_WAITOK : PR_NOWAIT);
- if (m == NULL)
- return (NULL);
 
  s = splnet();
  counters = counters_enter(&cr, mbstat);
+ if (m == NULL) {
+ counters[MBSTAT_DROPS]++;
+ counters_leave(&cr, mbstat);
+ splx(s);
+ return NULL;
+ }
  counters[type]++;
  counters_leave(&cr, mbstat);
  splx(s);
@@ -266,11 +270,15 @@ m_gethdr(int nowait, int type)
  KDASSERT(type < MT_NTYPES);
 
  m = pool_get(&mbpool, nowait == M_WAIT ? PR_WAITOK : PR_NOWAIT);
- if (m == NULL)
- return (NULL);
 
  s = splnet();
  counters = counters_enter(&cr, mbstat);
+ if (m == NULL) {
+ counters[MBSTAT_DROPS]++;
+ counters_leave(&cr, mbstat);
+ splx(s);
+ return NULL;
+ }
  counters[type]++;
  counters_leave(&cr, mbstat);
  splx(s);
@@ -349,7 +357,10 @@ m_clget(struct mbuf *m, int how, u_int p
 {
  struct mbuf *m0 = NULL;
  struct pool *pp;
+ struct counters_ref cr;
+ uint64_t *counters;
  caddr_t buf;
+ int s;
 
  pp = m_clpool(pktlen);
 #ifdef DIAGNOSTIC
@@ -364,9 +375,16 @@ m_clget(struct mbuf *m, int how, u_int p
 
  m = m0;
  }
+
  buf = pool_get(pp, how == M_WAIT ? PR_WAITOK : PR_NOWAIT);
+
  if (buf == NULL) {
  m_freem(m0);
+ s = splnet();
+ counters = counters_enter(&cr, mbstat);
+ counters[MBSTAT_DROPS]++;
+ counters_leave(&cr, mbstat);
+ splx(s);
  return (NULL);
  }

Reply | Threaded
Open this post in threaded view
|

Re: mbuf statistics, tracking of drops

David Gwynne-5
On Sun, Nov 12, 2017 at 11:23:31PM +0100, Gregor Best wrote:

> Hi Martin,
>
> On Sun, Nov 12, 2017 at 03:40:59PM +0100, Martin Pieuchot wrote:
> > [...]
> > It does, some comments below.
> > [...]
>
> Wonderful.
>
> > [...]
> > This would be an approximation because it might happen that after
> > returning NULL the second pool_get(9) won't sleep.  However I think
> > it's the way to go because m_get(9) calls that can wait are generally
> > not performance critical.
> > [...]
>
> I had not considered that the second pool_get might not block at all. On
> the other hand `netstat -m` says "requests for memory delayed" for that
> counter, so returning immediately on the second try not counting as a
> delay is OK, I think.
>
> > [...]
> > Please do not expand the splx().  Only the counter need it.  Simply put
> > the NULL check after the splx().
> > [...]
>
> I'm not sure I understand. If I move the NULL check after the splx(),
> counters_leave() will already have been called so increasing the counter
> value has no effect anymore. The only additional things running under
> splnet will be a few assignments, so I think moving the splx() a bit
> further down does not hurt too much.
>
> Alternatively, I've attached a slightly different suggestion which
> doesn't expand the scope of the splx() but duplicates a bit of code.
> Does that make more sense?

pools maintain count of how many times they failed to provide an
allocation. you can watch this with vmstat -m or systat pool.
however, we could use that to populate mb_drops too.

the diff below moves populating the mbstat from kern_sysctl to
uipc_mbuf, and adds copying of pr_nfails in.

if we want to count the number of "delays", i could easily add that
to pools too and copy it out in sysctl_mbstat.

Index: sys/sysctl.h
===================================================================
RCS file: /cvs/src/sys/sys/sysctl.h,v
retrieving revision 1.175
diff -u -p -r1.175 sysctl.h
--- sys/sysctl.h 12 Oct 2017 09:14:16 -0000 1.175
+++ sys/sysctl.h 13 Nov 2017 03:42:32 -0000
@@ -936,6 +936,7 @@ int sysctl_rdstruct(void *, size_t *, vo
 int sysctl_struct(void *, size_t *, void *, size_t, void *, size_t);
 int sysctl_file(int *, u_int, char *, size_t *, struct proc *);
 int sysctl_doproc(int *, u_int, char *, size_t *);
+int sysctl_mbstat(int *, u_int, void *, size_t *, void *, size_t);
 struct mbuf_queue;
 int sysctl_mq(int *, u_int, void *, size_t *, void *, size_t,
     struct mbuf_queue *);
Index: kern/kern_sysctl.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_sysctl.c,v
retrieving revision 1.330
diff -u -p -r1.330 kern_sysctl.c
--- kern/kern_sysctl.c 11 Aug 2017 21:24:19 -0000 1.330
+++ kern/kern_sysctl.c 13 Nov 2017 03:42:32 -0000
@@ -392,24 +392,9 @@ kern_sysctl(int *name, u_int namelen, vo
  case KERN_FILE:
  return (sysctl_file(name + 1, namelen - 1, oldp, oldlenp, p));
 #endif
- case KERN_MBSTAT: {
- extern struct cpumem *mbstat;
- uint64_t counters[MBSTAT_COUNT];
- struct mbstat mbs;
- unsigned int i;
-
- memset(&mbs, 0, sizeof(mbs));
- counters_read(mbstat, counters, MBSTAT_COUNT);
- for (i = 0; i < MBSTAT_TYPES; i++)
- mbs.m_mtypes[i] = counters[i];
-
- mbs.m_drops = counters[MBSTAT_DROPS];
- mbs.m_wait = counters[MBSTAT_WAIT];
- mbs.m_drain = counters[MBSTAT_DRAIN];
-
- return (sysctl_rdstruct(oldp, oldlenp, newp,
-    &mbs, sizeof(mbs)));
- }
+ case KERN_MBSTAT:
+ return (sysctl_mbstat(name + 1, namelen -1, oldp, oldlenp,
+    newp, newlen));
 #if defined(GPROF) || defined(DDBPROF)
  case KERN_PROF:
  return (sysctl_doprof(name + 1, namelen - 1, oldp, oldlenp,
Index: kern/uipc_mbuf.c
===================================================================
RCS file: /cvs/src/sys/kern/uipc_mbuf.c,v
retrieving revision 1.250
diff -u -p -r1.250 uipc_mbuf.c
--- kern/uipc_mbuf.c 12 Oct 2017 09:14:16 -0000 1.250
+++ kern/uipc_mbuf.c 13 Nov 2017 03:42:32 -0000
@@ -1421,6 +1421,30 @@ m_pool_init(struct pool *pp, u_int size,
  pool_set_constraints(pp, &kp_dma_contig);
 }
 
+int
+sysctl_mbstat(int *name, u_int namelen, void *oldp, size_t *oldlenp,
+    void *newp, size_t newlen)
+{
+ uint64_t counters[MBSTAT_COUNT];
+ struct mbstat mbs;
+ unsigned int i;
+
+ if (namelen != 0)
+ return (ENOTDIR);
+
+ memset(&mbs, 0, sizeof(mbs));
+
+ counters_read(mbstat, counters, MBSTAT_COUNT);
+ for (i = 0; i < MBSTAT_TYPES; i++)
+ mbs.m_mtypes[i] = counters[i];
+
+ mbs.m_drops = counters[MBSTAT_DROPS] + mbpool.pr_nfail;
+ mbs.m_wait = counters[MBSTAT_WAIT];
+ mbs.m_drain = counters[MBSTAT_DRAIN]; /* pool gcs? */
+
+ return (sysctl_rdstruct(oldp, oldlenp, newp, &mbs, sizeof(mbs)));
+}
+
 #ifdef DDB
 void
 m_print(void *v,

Reply | Threaded
Open this post in threaded view
|

Re: mbuf statistics, tracking of drops

Martin Pieuchot
On 13/11/17(Mon) 13:47, David Gwynne wrote:

> On Sun, Nov 12, 2017 at 11:23:31PM +0100, Gregor Best wrote:
> > Hi Martin,
> >
> > On Sun, Nov 12, 2017 at 03:40:59PM +0100, Martin Pieuchot wrote:
> > > [...]
> > > It does, some comments below.
> > > [...]
> >
> > Wonderful.
> >
> > > [...]
> > > This would be an approximation because it might happen that after
> > > returning NULL the second pool_get(9) won't sleep.  However I think
> > > it's the way to go because m_get(9) calls that can wait are generally
> > > not performance critical.
> > > [...]
> >
> > I had not considered that the second pool_get might not block at all. On
> > the other hand `netstat -m` says "requests for memory delayed" for that
> > counter, so returning immediately on the second try not counting as a
> > delay is OK, I think.
> >
> > > [...]
> > > Please do not expand the splx().  Only the counter need it.  Simply put
> > > the NULL check after the splx().
> > > [...]
> >
> > I'm not sure I understand. If I move the NULL check after the splx(),
> > counters_leave() will already have been called so increasing the counter
> > value has no effect anymore. The only additional things running under
> > splnet will be a few assignments, so I think moving the splx() a bit
> > further down does not hurt too much.
> >
> > Alternatively, I've attached a slightly different suggestion which
> > doesn't expand the scope of the splx() but duplicates a bit of code.
> > Does that make more sense?
>
> pools maintain count of how many times they failed to provide an
> allocation. you can watch this with vmstat -m or systat pool.
> however, we could use that to populate mb_drops too.
>
> the diff below moves populating the mbstat from kern_sysctl to
> uipc_mbuf, and adds copying of pr_nfails in.

Makes sense.  So if we go this way, we can get rid of MBSTAT_DROPS,
MBSTAT_WAIT and MBSTAT_DRAIN that are currently unused.

> if we want to count the number of "delays", i could easily add that
> to pools too and copy it out in sysctl_mbstat.

That's be an interesting statistic.

> Index: sys/sysctl.h
> ===================================================================
> RCS file: /cvs/src/sys/sys/sysctl.h,v
> retrieving revision 1.175
> diff -u -p -r1.175 sysctl.h
> --- sys/sysctl.h 12 Oct 2017 09:14:16 -0000 1.175
> +++ sys/sysctl.h 13 Nov 2017 03:42:32 -0000
> @@ -936,6 +936,7 @@ int sysctl_rdstruct(void *, size_t *, vo
>  int sysctl_struct(void *, size_t *, void *, size_t, void *, size_t);
>  int sysctl_file(int *, u_int, char *, size_t *, struct proc *);
>  int sysctl_doproc(int *, u_int, char *, size_t *);
> +int sysctl_mbstat(int *, u_int, void *, size_t *, void *, size_t);
>  struct mbuf_queue;
>  int sysctl_mq(int *, u_int, void *, size_t *, void *, size_t,
>      struct mbuf_queue *);
> Index: kern/kern_sysctl.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/kern_sysctl.c,v
> retrieving revision 1.330
> diff -u -p -r1.330 kern_sysctl.c
> --- kern/kern_sysctl.c 11 Aug 2017 21:24:19 -0000 1.330
> +++ kern/kern_sysctl.c 13 Nov 2017 03:42:32 -0000
> @@ -392,24 +392,9 @@ kern_sysctl(int *name, u_int namelen, vo
>   case KERN_FILE:
>   return (sysctl_file(name + 1, namelen - 1, oldp, oldlenp, p));
>  #endif
> - case KERN_MBSTAT: {
> - extern struct cpumem *mbstat;
> - uint64_t counters[MBSTAT_COUNT];
> - struct mbstat mbs;
> - unsigned int i;
> -
> - memset(&mbs, 0, sizeof(mbs));
> - counters_read(mbstat, counters, MBSTAT_COUNT);
> - for (i = 0; i < MBSTAT_TYPES; i++)
> - mbs.m_mtypes[i] = counters[i];
> -
> - mbs.m_drops = counters[MBSTAT_DROPS];
> - mbs.m_wait = counters[MBSTAT_WAIT];
> - mbs.m_drain = counters[MBSTAT_DRAIN];
> -
> - return (sysctl_rdstruct(oldp, oldlenp, newp,
> -    &mbs, sizeof(mbs)));
> - }
> + case KERN_MBSTAT:
> + return (sysctl_mbstat(name + 1, namelen -1, oldp, oldlenp,
> +    newp, newlen));
>  #if defined(GPROF) || defined(DDBPROF)
>   case KERN_PROF:
>   return (sysctl_doprof(name + 1, namelen - 1, oldp, oldlenp,
> Index: kern/uipc_mbuf.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/uipc_mbuf.c,v
> retrieving revision 1.250
> diff -u -p -r1.250 uipc_mbuf.c
> --- kern/uipc_mbuf.c 12 Oct 2017 09:14:16 -0000 1.250
> +++ kern/uipc_mbuf.c 13 Nov 2017 03:42:32 -0000
> @@ -1421,6 +1421,30 @@ m_pool_init(struct pool *pp, u_int size,
>   pool_set_constraints(pp, &kp_dma_contig);
>  }
>  
> +int
> +sysctl_mbstat(int *name, u_int namelen, void *oldp, size_t *oldlenp,
> +    void *newp, size_t newlen)
> +{
> + uint64_t counters[MBSTAT_COUNT];
> + struct mbstat mbs;
> + unsigned int i;
> +
> + if (namelen != 0)
> + return (ENOTDIR);
> +
> + memset(&mbs, 0, sizeof(mbs));
> +
> + counters_read(mbstat, counters, MBSTAT_COUNT);
> + for (i = 0; i < MBSTAT_TYPES; i++)
> + mbs.m_mtypes[i] = counters[i];
> +
> + mbs.m_drops = counters[MBSTAT_DROPS] + mbpool.pr_nfail;
> + mbs.m_wait = counters[MBSTAT_WAIT];
> + mbs.m_drain = counters[MBSTAT_DRAIN]; /* pool gcs? */
> +
> + return (sysctl_rdstruct(oldp, oldlenp, newp, &mbs, sizeof(mbs)));
> +}
> +
>  #ifdef DDB
>  void
>  m_print(void *v,

Reply | Threaded
Open this post in threaded view
|

Re: mbuf statistics, tracking of drops

Gregor Best-2
In reply to this post by David Gwynne-5
Hi,

On Mon, Nov 13, 2017 at 01:47:01PM +1000, David Gwynne wrote:
> [...]
> pools maintain count of how many times they failed to provide an
> allocation. you can watch this with vmstat -m or systat pool.
> however, we could use that to populate mb_drops too.
> [...]

That's certainly smarter than my idea.

--
        Gregor

Reply | Threaded
Open this post in threaded view
|

Re: mbuf statistics, tracking of drops

David Gwynne-5

> On 16 Nov 2017, at 7:23 am, Gregor Best <[hidden email]> wrote:
>
> Hi,
>
> On Mon, Nov 13, 2017 at 01:47:01PM +1000, David Gwynne wrote:
>> [...]
>> pools maintain count of how many times they failed to provide an
>> allocation. you can watch this with vmstat -m or systat pool.
>> however, we could use that to populate mb_drops too.
>> [...]
>
> That's certainly smarter than my idea.

does it work though?

Reply | Threaded
Open this post in threaded view
|

Re: mbuf statistics, tracking of drops

Gregor Best-2
On Thu, Nov 16, 2017 at 11:13:04AM +1000, David Gwynne wrote:

>
> > On 16 Nov 2017, at 7:23 am, Gregor Best <[hidden email]> wrote:
> >
> > Hi,
> >
> > On Mon, Nov 13, 2017 at 01:47:01PM +1000, David Gwynne wrote:
> >> [...]
> >> pools maintain count of how many times they failed to provide an
> >> allocation. you can watch this with vmstat -m or systat pool.
> >> however, we could use that to populate mb_drops too.
> >> [...]
> >
> > That's certainly smarter than my idea.
>
> does it work though?
>

Now that I think about it, not quite, or at least not without reaching
around into the pools innards to lock them while reading the statistics
to get a consistent view.

The problem is that the MBSTAT_DROPS part of the `counters` data
structure never gets modified, so we'd still need to loop over all pools
in sysctl_mbstat, since this is not just about mbpool but also about the
pools for payload data.

On the other hand, going back to my initial proposal would mean
essentially duplicating functionality the pools already provide, so
that's at least a bit unelegant. Especially since it adds at least one
splnet/splx dance.

Another option would be to just say "screw it" and count the failed
allocations without acquiring any locks on the pools, maybe via atomic
operations. Sounds a bit too complicated though.

At the moment, and after a night's sleep, I think my initial proposal is
the most straightforward way to do this. That, or getting rid of the
confusing counters in `netstat -m` that stay at 0 all the time...

--
        Gregor

Reply | Threaded
Open this post in threaded view
|

Re: mbuf statistics, tracking of drops

David Gwynne-5
On Thu, Nov 16, 2017 at 08:13:39PM +0100, Gregor Best wrote:

> On Thu, Nov 16, 2017 at 11:13:04AM +1000, David Gwynne wrote:
> >
> > > On 16 Nov 2017, at 7:23 am, Gregor Best <[hidden email]> wrote:
> > >
> > > Hi,
> > >
> > > On Mon, Nov 13, 2017 at 01:47:01PM +1000, David Gwynne wrote:
> > >> [...]
> > >> pools maintain count of how many times they failed to provide an
> > >> allocation. you can watch this with vmstat -m or systat pool.
> > >> however, we could use that to populate mb_drops too.
> > >> [...]
> > >
> > > That's certainly smarter than my idea.
> >
> > does it work though?
> >
>
> Now that I think about it, not quite, or at least not without reaching
> around into the pools innards to lock them while reading the statistics
> to get a consistent view.

currently pr_nfails is an unsigned long, which can be atomically
read on all our machines. it is a consistent view of the number.
however, i would like to make the stats uint64_t one day, so i agree
that it is better to lock to read them.

the diff below factors our the reading of the pool stats into a
struct kinfo_pool, which is used internally by pool and now by the
sysctl_mbstat.

>
> The problem is that the MBSTAT_DROPS part of the `counters` data
> structure never gets modified, so we'd still need to loop over all pools
> in sysctl_mbstat, since this is not just about mbpool but also about the
> pools for payload data.

ok. the updated diff below loops over the cluster pools too now.

>
> On the other hand, going back to my initial proposal would mean
> essentially duplicating functionality the pools already provide, so
> that's at least a bit unelegant. Especially since it adds at least one
> splnet/splx dance.
>
> Another option would be to just say "screw it" and count the failed
> allocations without acquiring any locks on the pools, maybe via atomic
> operations. Sounds a bit too complicated though.
>
> At the moment, and after a night's sleep, I think my initial proposal is
> the most straightforward way to do this. That, or getting rid of the
> confusing counters in `netstat -m` that stay at 0 all the time...

getting rid of the stats does appeal to me too...

Index: sys/pool.h
===================================================================
RCS file: /cvs/src/sys/sys/pool.h,v
retrieving revision 1.74
diff -u -p -r1.74 pool.h
--- sys/pool.h 13 Aug 2017 20:26:33 -0000 1.74
+++ sys/pool.h 17 Nov 2017 00:58:13 -0000
@@ -259,6 +259,7 @@ void pool_init(struct pool *, size_t, u
     const char *, struct pool_allocator *);
 void pool_cache_init(struct pool *);
 void pool_destroy(struct pool *);
+void pool_info(struct pool *, struct kinfo_pool *);
 void pool_setlowat(struct pool *, int);
 void pool_sethiwat(struct pool *, int);
 int pool_sethardlimit(struct pool *, u_int, const char *, int);
Index: sys/sysctl.h
===================================================================
RCS file: /cvs/src/sys/sys/sysctl.h,v
retrieving revision 1.175
diff -u -p -r1.175 sysctl.h
--- sys/sysctl.h 12 Oct 2017 09:14:16 -0000 1.175
+++ sys/sysctl.h 17 Nov 2017 00:58:13 -0000
@@ -936,6 +936,7 @@ int sysctl_rdstruct(void *, size_t *, vo
 int sysctl_struct(void *, size_t *, void *, size_t, void *, size_t);
 int sysctl_file(int *, u_int, char *, size_t *, struct proc *);
 int sysctl_doproc(int *, u_int, char *, size_t *);
+int sysctl_mbstat(int *, u_int, void *, size_t *, void *, size_t);
 struct mbuf_queue;
 int sysctl_mq(int *, u_int, void *, size_t *, void *, size_t,
     struct mbuf_queue *);
Index: kern/kern_sysctl.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_sysctl.c,v
retrieving revision 1.330
diff -u -p -r1.330 kern_sysctl.c
--- kern/kern_sysctl.c 11 Aug 2017 21:24:19 -0000 1.330
+++ kern/kern_sysctl.c 17 Nov 2017 00:58:13 -0000
@@ -392,24 +392,9 @@ kern_sysctl(int *name, u_int namelen, vo
  case KERN_FILE:
  return (sysctl_file(name + 1, namelen - 1, oldp, oldlenp, p));
 #endif
- case KERN_MBSTAT: {
- extern struct cpumem *mbstat;
- uint64_t counters[MBSTAT_COUNT];
- struct mbstat mbs;
- unsigned int i;
-
- memset(&mbs, 0, sizeof(mbs));
- counters_read(mbstat, counters, MBSTAT_COUNT);
- for (i = 0; i < MBSTAT_TYPES; i++)
- mbs.m_mtypes[i] = counters[i];
-
- mbs.m_drops = counters[MBSTAT_DROPS];
- mbs.m_wait = counters[MBSTAT_WAIT];
- mbs.m_drain = counters[MBSTAT_DRAIN];
-
- return (sysctl_rdstruct(oldp, oldlenp, newp,
-    &mbs, sizeof(mbs)));
- }
+ case KERN_MBSTAT:
+ return (sysctl_mbstat(name + 1, namelen -1, oldp, oldlenp,
+    newp, newlen));
 #if defined(GPROF) || defined(DDBPROF)
  case KERN_PROF:
  return (sysctl_doprof(name + 1, namelen - 1, oldp, oldlenp,
Index: kern/uipc_mbuf.c
===================================================================
RCS file: /cvs/src/sys/kern/uipc_mbuf.c,v
retrieving revision 1.250
diff -u -p -r1.250 uipc_mbuf.c
--- kern/uipc_mbuf.c 12 Oct 2017 09:14:16 -0000 1.250
+++ kern/uipc_mbuf.c 17 Nov 2017 00:58:13 -0000
@@ -1421,6 +1421,39 @@ m_pool_init(struct pool *pp, u_int size,
  pool_set_constraints(pp, &kp_dma_contig);
 }
 
+int
+sysctl_mbstat(int *name, u_int namelen, void *oldp, size_t *oldlenp,
+    void *newp, size_t newlen)
+{
+ struct kinfo_pool pi;
+ uint64_t counters[MBSTAT_COUNT];
+ struct mbstat mbs;
+ unsigned int i;
+
+ if (namelen != 0)
+ return (ENOTDIR);
+
+ memset(&mbs, 0, sizeof(mbs));
+
+ counters_read(mbstat, counters, MBSTAT_COUNT);
+ for (i = 0; i < MBSTAT_TYPES; i++)
+ mbs.m_mtypes[i] = counters[i];
+
+ mbs.m_drops = counters[MBSTAT_DROPS];
+ mbs.m_wait = counters[MBSTAT_WAIT];
+ mbs.m_drain = counters[MBSTAT_DRAIN]; /* pool gcs? */
+
+ pool_info(&mbpool, &pi);
+ mbs.m_wait += pi.pr_nfail;
+
+ for (i = 0; i < nitems(mclpools); i++) {
+ pool_info(&mclpools[i], &pi);
+ mbs.m_wait += pi.pr_nfail;
+ }
+
+ return (sysctl_rdstruct(oldp, oldlenp, newp, &mbs, sizeof(mbs)));
+}
+
 #ifdef DDB
 void
 m_print(void *v,
Index: kern/subr_pool.c
===================================================================
RCS file: /cvs/src/sys/kern/subr_pool.c,v
retrieving revision 1.220
diff -u -p -r1.220 subr_pool.c
--- kern/subr_pool.c 13 Aug 2017 20:26:33 -0000 1.220
+++ kern/subr_pool.c 17 Nov 2017 00:58:13 -0000
@@ -1442,6 +1442,29 @@ pool_walk(struct pool *pp, int full,
 }
 #endif
 
+void
+pool_info(struct pool *pp, struct kinfo_pool *pi)
+{
+ pl_enter(pp, &pp->pr_lock);
+ pi->pr_size = pp->pr_size;
+ pi->pr_pgsize = pp->pr_pgsize;
+ pi->pr_itemsperpage = pp->pr_itemsperpage;
+ pi->pr_npages = pp->pr_npages;
+ pi->pr_minpages = pp->pr_minpages;
+ pi->pr_maxpages = pp->pr_maxpages;
+ pi->pr_hardlimit = pp->pr_hardlimit;
+ pi->pr_nout = pp->pr_nout;
+ pi->pr_nitems = pp->pr_nitems;
+ pi->pr_nget = pp->pr_nget;
+ pi->pr_nput = pp->pr_nput;
+ pi->pr_nfail = pp->pr_nfail;
+ pi->pr_npagealloc = pp->pr_npagealloc;
+ pi->pr_npagefree = pp->pr_npagefree;
+ pi->pr_hiwat = pp->pr_hiwat;
+ pi->pr_nidle = pp->pr_nidle;
+ pl_leave(pp, &pp->pr_lock);
+}
+
 /*
  * We have three different sysctls.
  * kern.pool.npools - the number of pools.
@@ -1490,25 +1513,7 @@ sysctl_dopool(int *name, u_int namelen,
  case KERN_POOL_POOL:
  memset(&pi, 0, sizeof(pi));
 
- pl_enter(pp, &pp->pr_lock);
- pi.pr_size = pp->pr_size;
- pi.pr_pgsize = pp->pr_pgsize;
- pi.pr_itemsperpage = pp->pr_itemsperpage;
- pi.pr_npages = pp->pr_npages;
- pi.pr_minpages = pp->pr_minpages;
- pi.pr_maxpages = pp->pr_maxpages;
- pi.pr_hardlimit = pp->pr_hardlimit;
- pi.pr_nout = pp->pr_nout;
- pi.pr_nitems = pp->pr_nitems;
- pi.pr_nget = pp->pr_nget;
- pi.pr_nput = pp->pr_nput;
- pi.pr_nfail = pp->pr_nfail;
- pi.pr_npagealloc = pp->pr_npagealloc;
- pi.pr_npagefree = pp->pr_npagefree;
- pi.pr_hiwat = pp->pr_hiwat;
- pi.pr_nidle = pp->pr_nidle;
- pl_leave(pp, &pp->pr_lock);
-
+ pool_info(pp, &pi);
  pool_cache_pool_info(pp, &pi);
 
  rv = sysctl_rdstruct(oldp, oldlenp, NULL, &pi, sizeof(pi));