Block device poll(2) vs kqueue(2)

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Block device poll(2) vs kqueue(2)

Martin Pieuchot
Diff below fixes an incoherency between poll(2) and kqueue(2) when it
comes to non-character devices.  It makes spec_kqfilter() behaves like
spec_poll(): returns "true" when applied to any non-character device.

ok?

Index: kern/spec_vnops.c
===================================================================
RCS file: /cvs/src/sys/kern/spec_vnops.c,v
retrieving revision 1.100
diff -u -p -r1.100 spec_vnops.c
--- kern/spec_vnops.c 20 Jan 2020 23:21:55 -0000 1.100
+++ kern/spec_vnops.c 20 May 2020 09:36:48 -0000
@@ -386,11 +386,9 @@ spec_poll(void *v)
  dev_t dev;
 
  switch (ap->a_vp->v_type) {
-
  default:
  return (ap->a_events &
     (POLLIN | POLLOUT | POLLRDNORM | POLLWRNORM));
-
  case VCHR:
  dev = ap->a_vp->v_rdev;
  return (*cdevsw[major(dev)].d_poll)(dev, ap->a_events, ap->a_p);
@@ -400,12 +398,17 @@ int
 spec_kqfilter(void *v)
 {
  struct vop_kqfilter_args *ap = v;
-
  dev_t dev;
 
  dev = ap->a_vp->v_rdev;
- if (cdevsw[major(dev)].d_kqfilter)
- return (*cdevsw[major(dev)].d_kqfilter)(dev, ap->a_kn);
+
+ switch (ap->a_vp->v_type) {
+ default:
+ return seltrue_kqfilter(dev, ap->a_kn);
+ case VCHR:
+ if (cdevsw[major(dev)].d_kqfilter)
+ return (*cdevsw[major(dev)].d_kqfilter)(dev, ap->a_kn);
+ }
  return (EOPNOTSUPP);
 }
 

Reply | Threaded
Open this post in threaded view
|

Re: Block device poll(2) vs kqueue(2)

Mark Kettenis
> Date: Wed, 20 May 2020 11:42:32 +0200
> From: Martin Pieuchot <[hidden email]>
>
> Diff below fixes an incoherency between poll(2) and kqueue(2) when it
> comes to non-character devices.  It makes spec_kqfilter() behaves like
> spec_poll(): returns "true" when applied to any non-character device.
>
> ok?

That is a change in behaviour.  What do other BSDs do in this case?

I think the new behaviour makes sense, but it could reveal some nasty
bugs in cases where code accidantily uses kqueue(2) with a file
descriptor for a block device.

To achieve your goal, d_kqfilter() will become non-optional in the
future isn't it?


> Index: kern/spec_vnops.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/spec_vnops.c,v
> retrieving revision 1.100
> diff -u -p -r1.100 spec_vnops.c
> --- kern/spec_vnops.c 20 Jan 2020 23:21:55 -0000 1.100
> +++ kern/spec_vnops.c 20 May 2020 09:36:48 -0000
> @@ -386,11 +386,9 @@ spec_poll(void *v)
>   dev_t dev;
>  
>   switch (ap->a_vp->v_type) {
> -
>   default:
>   return (ap->a_events &
>      (POLLIN | POLLOUT | POLLRDNORM | POLLWRNORM));
> -
>   case VCHR:
>   dev = ap->a_vp->v_rdev;
>   return (*cdevsw[major(dev)].d_poll)(dev, ap->a_events, ap->a_p);
> @@ -400,12 +398,17 @@ int
>  spec_kqfilter(void *v)
>  {
>   struct vop_kqfilter_args *ap = v;
> -
>   dev_t dev;
>  
>   dev = ap->a_vp->v_rdev;
> - if (cdevsw[major(dev)].d_kqfilter)
> - return (*cdevsw[major(dev)].d_kqfilter)(dev, ap->a_kn);
> +
> + switch (ap->a_vp->v_type) {
> + default:
> + return seltrue_kqfilter(dev, ap->a_kn);
> + case VCHR:
> + if (cdevsw[major(dev)].d_kqfilter)
> + return (*cdevsw[major(dev)].d_kqfilter)(dev, ap->a_kn);
> + }
>   return (EOPNOTSUPP);
>  }
>  
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Block device poll(2) vs kqueue(2)

Martin Pieuchot
On 20/05/20(Wed) 12:28, Mark Kettenis wrote:

> > Date: Wed, 20 May 2020 11:42:32 +0200
> > From: Martin Pieuchot <[hidden email]>
> >
> > Diff below fixes an incoherency between poll(2) and kqueue(2) when it
> > comes to non-character devices.  It makes spec_kqfilter() behaves like
> > spec_poll(): returns "true" when applied to any non-character device.
> >
> > ok?
>
> That is a change in behaviour.  What do other BSDs do in this case?

By reading code I figured out that:
  - NetBSD only calls the kqfilter handler for character devices and
    returns EOPNOTSUPP otherwise.

  - FreeBSD doesn't allow opening block devices in devfs_open(), so
    only character devices seem to support poll(2) and kqueue(2).

  - I couldn't find a difference between block and character devices
    handling in DragonFlyBSD, it seems that the underlying kqfilter
    handler is always called.

> I think the new behaviour makes sense, but it could reveal some nasty
> bugs in cases where code accidantily uses kqueue(2) with a file
> descriptor for a block device.

The alternative I could think of was changing the existing poll handler
to do selfalse() instead of seltrue().  However this makes it very hard
to figure out a regression.  On the other hand, assuming that a block
device is always ready to be read via kqueue(2) seems to have a low
impact.

> To achieve your goal, d_kqfilter() will become non-optional in the
> future isn't it?

It should become the replacement for the existing d_poll() functions,
that's why I'm trying to make sure their behavior are coherent.

> > Index: kern/spec_vnops.c
> > ===================================================================
> > RCS file: /cvs/src/sys/kern/spec_vnops.c,v
> > retrieving revision 1.100
> > diff -u -p -r1.100 spec_vnops.c
> > --- kern/spec_vnops.c 20 Jan 2020 23:21:55 -0000 1.100
> > +++ kern/spec_vnops.c 20 May 2020 09:36:48 -0000
> > @@ -386,11 +386,9 @@ spec_poll(void *v)
> >   dev_t dev;
> >  
> >   switch (ap->a_vp->v_type) {
> > -
> >   default:
> >   return (ap->a_events &
> >      (POLLIN | POLLOUT | POLLRDNORM | POLLWRNORM));
> > -
> >   case VCHR:
> >   dev = ap->a_vp->v_rdev;
> >   return (*cdevsw[major(dev)].d_poll)(dev, ap->a_events, ap->a_p);
> > @@ -400,12 +398,17 @@ int
> >  spec_kqfilter(void *v)
> >  {
> >   struct vop_kqfilter_args *ap = v;
> > -
> >   dev_t dev;
> >  
> >   dev = ap->a_vp->v_rdev;
> > - if (cdevsw[major(dev)].d_kqfilter)
> > - return (*cdevsw[major(dev)].d_kqfilter)(dev, ap->a_kn);
> > +
> > + switch (ap->a_vp->v_type) {
> > + default:
> > + return seltrue_kqfilter(dev, ap->a_kn);
> > + case VCHR:
> > + if (cdevsw[major(dev)].d_kqfilter)
> > + return (*cdevsw[major(dev)].d_kqfilter)(dev, ap->a_kn);
> > + }
> >   return (EOPNOTSUPP);
> >  }
> >  
> >
> >

Reply | Threaded
Open this post in threaded view
|

Re: Block device poll(2) vs kqueue(2)

Christian Weisgerber
On 2020-05-20, Martin Pieuchot <[hidden email]> wrote:

>   - FreeBSD doesn't allow opening block devices in devfs_open(), so
>     only character devices seem to support poll(2) and kqueue(2).

FreeBSD doesn't have block devices anymore.

--
Christian "naddy" Weisgerber                          [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Block device poll(2) vs kqueue(2)

Martin Pieuchot
In reply to this post by Martin Pieuchot
On 20/05/20(Wed) 14:22, Martin Pieuchot wrote:

> On 20/05/20(Wed) 12:28, Mark Kettenis wrote:
> > > Date: Wed, 20 May 2020 11:42:32 +0200
> > > From: Martin Pieuchot <[hidden email]>
> > >
> > > Diff below fixes an incoherency between poll(2) and kqueue(2) when it
> > > comes to non-character devices.  It makes spec_kqfilter() behaves like
> > > spec_poll(): returns "true" when applied to any non-character device.
> > >
> > > ok?
> >
> > That is a change in behaviour.  What do other BSDs do in this case?
>
> By reading code I figured out that:
>   - NetBSD only calls the kqfilter handler for character devices and
>     returns EOPNOTSUPP otherwise.
>
>   - FreeBSD doesn't allow opening block devices in devfs_open(), so
>     only character devices seem to support poll(2) and kqueue(2).
>
>   - I couldn't find a difference between block and character devices
>     handling in DragonFlyBSD, it seems that the underlying kqfilter
>     handler is always called.
>
> > I think the new behaviour makes sense, but it could reveal some nasty
> > bugs in cases where code accidantily uses kqueue(2) with a file
> > descriptor for a block device.
>
> The alternative I could think of was changing the existing poll handler
> to do selfalse() instead of seltrue().  However this makes it very hard
> to figure out a regression.  On the other hand, assuming that a block
> device is always ready to be read via kqueue(2) seems to have a low
> impact.

Updated diff that considers the new EV_OLDAPI flag and fallbacks to
seltrue_kqfilter() if it is set.

This flag is kernel-only an intended to be set by the new poll(2) and
select(2) implementations.  This way kevent(2) keeps its current
behavior.

While here change cttykqfilter() in the same way.

ok?

Index: kern/spec_vnops.c
===================================================================
RCS file: /cvs/src/sys/kern/spec_vnops.c,v
retrieving revision 1.100
diff -u -p -r1.100 spec_vnops.c
--- kern/spec_vnops.c 20 Jan 2020 23:21:55 -0000 1.100
+++ kern/spec_vnops.c 9 Jun 2020 08:57:14 -0000
@@ -386,11 +386,9 @@ spec_poll(void *v)
  dev_t dev;
 
  switch (ap->a_vp->v_type) {
-
  default:
  return (ap->a_events &
     (POLLIN | POLLOUT | POLLRDNORM | POLLWRNORM));
-
  case VCHR:
  dev = ap->a_vp->v_rdev;
  return (*cdevsw[major(dev)].d_poll)(dev, ap->a_events, ap->a_p);
@@ -400,12 +398,19 @@ int
 spec_kqfilter(void *v)
 {
  struct vop_kqfilter_args *ap = v;
-
  dev_t dev;
 
  dev = ap->a_vp->v_rdev;
- if (cdevsw[major(dev)].d_kqfilter)
- return (*cdevsw[major(dev)].d_kqfilter)(dev, ap->a_kn);
+
+ switch (ap->a_vp->v_type) {
+ default:
+ if (ap->a_kn->kn_flags & EV_OLDAPI)
+ return seltrue_kqfilter(dev, ap->a_kn);
+ break;
+ case VCHR:
+ if (cdevsw[major(dev)].d_kqfilter)
+ return (*cdevsw[major(dev)].d_kqfilter)(dev, ap->a_kn);
+ }
  return (EOPNOTSUPP);
 }
 
Index: kern/tty_tty.c
===================================================================
RCS file: /cvs/src/sys/kern/tty_tty.c,v
retrieving revision 1.25
diff -u -p -r1.25 tty_tty.c
--- kern/tty_tty.c 8 Apr 2020 08:07:51 -0000 1.25
+++ kern/tty_tty.c 9 Jun 2020 08:57:55 -0000
@@ -158,7 +158,10 @@ cttykqfilter(dev_t dev, struct knote *kn
 {
  struct vnode *ttyvp = cttyvp(curproc);
 
- if (ttyvp == NULL)
+ if (ttyvp == NULL) {
+ if (kn->kn_flags & EV_OLDAPI)
+ return (seltrue_kqfilter(dev, kn));
  return (ENXIO);
+ }
  return (VOP_KQFILTER(ttyvp, FREAD|FWRITE, kn));
 }

Reply | Threaded
Open this post in threaded view
|

Re: Block device poll(2) vs kqueue(2)

Mark Kettenis
> Date: Tue, 9 Jun 2020 11:04:02 +0200
> From: Martin Pieuchot <[hidden email]>
>
> On 20/05/20(Wed) 14:22, Martin Pieuchot wrote:
> > On 20/05/20(Wed) 12:28, Mark Kettenis wrote:
> > > > Date: Wed, 20 May 2020 11:42:32 +0200
> > > > From: Martin Pieuchot <[hidden email]>
> > > >
> > > > Diff below fixes an incoherency between poll(2) and kqueue(2) when it
> > > > comes to non-character devices.  It makes spec_kqfilter() behaves like
> > > > spec_poll(): returns "true" when applied to any non-character device.
> > > >
> > > > ok?
> > >
> > > That is a change in behaviour.  What do other BSDs do in this case?
> >
> > By reading code I figured out that:
> >   - NetBSD only calls the kqfilter handler for character devices and
> >     returns EOPNOTSUPP otherwise.
> >
> >   - FreeBSD doesn't allow opening block devices in devfs_open(), so
> >     only character devices seem to support poll(2) and kqueue(2).
> >
> >   - I couldn't find a difference between block and character devices
> >     handling in DragonFlyBSD, it seems that the underlying kqfilter
> >     handler is always called.
> >
> > > I think the new behaviour makes sense, but it could reveal some nasty
> > > bugs in cases where code accidantily uses kqueue(2) with a file
> > > descriptor for a block device.
> >
> > The alternative I could think of was changing the existing poll handler
> > to do selfalse() instead of seltrue().  However this makes it very hard
> > to figure out a regression.  On the other hand, assuming that a block
> > device is always ready to be read via kqueue(2) seems to have a low
> > impact.
>
> Updated diff that considers the new EV_OLDAPI flag and fallbacks to
> seltrue_kqfilter() if it is set.
>
> This flag is kernel-only an intended to be set by the new poll(2) and
> select(2) implementations.  This way kevent(2) keeps its current
> behavior.
>
> While here change cttykqfilter() in the same way.
>
> ok?

I think EV_OLDAPI is a really weird name for this.  And if it is
kernel-only, should it have a double-underscore in front of it to
signal it is not to be used in userland?  Maybe __EV_SELECT or
__EV_POLL would be a better name for this?

I'm not sure the controling tty device behaviour is actually sensible,
but it is retaining the status quuo which is what you want at this
point.

> Index: kern/spec_vnops.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/spec_vnops.c,v
> retrieving revision 1.100
> diff -u -p -r1.100 spec_vnops.c
> --- kern/spec_vnops.c 20 Jan 2020 23:21:55 -0000 1.100
> +++ kern/spec_vnops.c 9 Jun 2020 08:57:14 -0000
> @@ -386,11 +386,9 @@ spec_poll(void *v)
>   dev_t dev;
>  
>   switch (ap->a_vp->v_type) {
> -
>   default:
>   return (ap->a_events &
>      (POLLIN | POLLOUT | POLLRDNORM | POLLWRNORM));
> -
>   case VCHR:
>   dev = ap->a_vp->v_rdev;
>   return (*cdevsw[major(dev)].d_poll)(dev, ap->a_events, ap->a_p);
> @@ -400,12 +398,19 @@ int
>  spec_kqfilter(void *v)
>  {
>   struct vop_kqfilter_args *ap = v;
> -
>   dev_t dev;
>  
>   dev = ap->a_vp->v_rdev;
> - if (cdevsw[major(dev)].d_kqfilter)
> - return (*cdevsw[major(dev)].d_kqfilter)(dev, ap->a_kn);
> +
> + switch (ap->a_vp->v_type) {
> + default:
> + if (ap->a_kn->kn_flags & EV_OLDAPI)
> + return seltrue_kqfilter(dev, ap->a_kn);
> + break;
> + case VCHR:
> + if (cdevsw[major(dev)].d_kqfilter)
> + return (*cdevsw[major(dev)].d_kqfilter)(dev, ap->a_kn);
> + }
>   return (EOPNOTSUPP);
>  }
>  
> Index: kern/tty_tty.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/tty_tty.c,v
> retrieving revision 1.25
> diff -u -p -r1.25 tty_tty.c
> --- kern/tty_tty.c 8 Apr 2020 08:07:51 -0000 1.25
> +++ kern/tty_tty.c 9 Jun 2020 08:57:55 -0000
> @@ -158,7 +158,10 @@ cttykqfilter(dev_t dev, struct knote *kn
>  {
>   struct vnode *ttyvp = cttyvp(curproc);
>  
> - if (ttyvp == NULL)
> + if (ttyvp == NULL) {
> + if (kn->kn_flags & EV_OLDAPI)
> + return (seltrue_kqfilter(dev, kn));
>   return (ENXIO);
> + }
>   return (VOP_KQFILTER(ttyvp, FREAD|FWRITE, kn));
>  }
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Block device poll(2) vs kqueue(2)

Martin Pieuchot
On 09/06/20(Tue) 11:26, Mark Kettenis wrote:

> > Date: Tue, 9 Jun 2020 11:04:02 +0200
> > From: Martin Pieuchot <[hidden email]>
> >
> > On 20/05/20(Wed) 14:22, Martin Pieuchot wrote:
> > > On 20/05/20(Wed) 12:28, Mark Kettenis wrote:
> > > > > Date: Wed, 20 May 2020 11:42:32 +0200
> > > > > From: Martin Pieuchot <[hidden email]>
> > > > >
> > > > > Diff below fixes an incoherency between poll(2) and kqueue(2) when it
> > > > > comes to non-character devices.  It makes spec_kqfilter() behaves like
> > > > > spec_poll(): returns "true" when applied to any non-character device.
> > > > >
> > > > > ok?
> > > >
> > > > That is a change in behaviour.  What do other BSDs do in this case?
> > >
> > > By reading code I figured out that:
> > >   - NetBSD only calls the kqfilter handler for character devices and
> > >     returns EOPNOTSUPP otherwise.
> > >
> > >   - FreeBSD doesn't allow opening block devices in devfs_open(), so
> > >     only character devices seem to support poll(2) and kqueue(2).
> > >
> > >   - I couldn't find a difference between block and character devices
> > >     handling in DragonFlyBSD, it seems that the underlying kqfilter
> > >     handler is always called.
> > >
> > > > I think the new behaviour makes sense, but it could reveal some nasty
> > > > bugs in cases where code accidantily uses kqueue(2) with a file
> > > > descriptor for a block device.
> > >
> > > The alternative I could think of was changing the existing poll handler
> > > to do selfalse() instead of seltrue().  However this makes it very hard
> > > to figure out a regression.  On the other hand, assuming that a block
> > > device is always ready to be read via kqueue(2) seems to have a low
> > > impact.
> >
> > Updated diff that considers the new EV_OLDAPI flag and fallbacks to
> > seltrue_kqfilter() if it is set.
> >
> > This flag is kernel-only an intended to be set by the new poll(2) and
> > select(2) implementations.  This way kevent(2) keeps its current
> > behavior.
> >
> > While here change cttykqfilter() in the same way.
> >
> > ok?
>
> I think EV_OLDAPI is a really weird name for this.  And if it is
> kernel-only, should it have a double-underscore in front of it to
> signal it is not to be used in userland?  Maybe __EV_SELECT or
> __EV_POLL would be a better name for this?

Sure, I'm happy to do the rename.

> I'm not sure the controling tty device behaviour is actually sensible,
> but it is retaining the status quuo which is what you want at this
> point.

Exactly.  Is that an ok?