xhci(4) isoc: fix bogus handling of chained TRBs

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

xhci(4) isoc: fix bogus handling of chained TRBs

Marcus Glocker
Hi All,

Laurence Tratt was reporting about corrupted video images when using
uvideo(4) on xhci(4) with MJPEG and higher resolutions, e.g. when using
ffplay:

$ ffplay -f v4l2 -input_format mjpeg -video_size 1280x720 -f /dev/video1

When trying to re-produce the issue on my side, the video images were
still OK, but I also could see a lot of errors returned by ffplay
related to corrupted MJPEG data.

When debugging the error I noticed that the corruption only happens
when TRBs are get chained due to hitting a 64k boundary, and therefore
require to queue up two TRBs.  So, what happened?

In xhci.c:xhci_event_xfer_isoc() we do the accounting of the
processed, chained TD:

/*
 * If we queued two TRBs for a frame and this is the second TRB,
 * check if the first TRB needs accounting since it might not have
 * raised an interrupt in case of full data received.
 */

If there was a zero length transfer with such a TD, and the 2. TRB
receives the interrupt with len=0, it assumes that the 1. TRB was
processed (but it wasn't), and adds the requested 1. TRB length to
actlen, passing back that data has been received, which is obviously
wrong, resulting in the seen data corruptions.

Instead, when the 2. TRB receives the IOC interrupt with len=0, we
need to ignore the 1. TRB, considering this was a zero length transfer.

With the below diff Laurent is getting a clear video image now with
high resolutions, and I could remove all the output errors from ffplay
in my case.  I also tested with an uaudio(4) device.

Feedback, more testers, OKs?

Thanks,
Marcus


Index: xhci.c
===================================================================
RCS file: /cvs/src/sys/dev/usb/xhci.c,v
retrieving revision 1.116
diff -u -p -u -p -r1.116 xhci.c
--- xhci.c 30 Jun 2020 10:21:59 -0000 1.116
+++ xhci.c 10 Jul 2020 05:57:11 -0000
@@ -932,6 +932,7 @@ xhci_event_xfer_isoc(struct usbd_xfer *x
  struct usbd_xfer *skipxfer;
  struct xhci_xfer *xx = (struct xhci_xfer *)xfer;
  int trb0_idx, frame_idx = 0;
+ uint32_t len;
 
  KASSERT(xx->index >= 0);
  trb0_idx =
@@ -945,6 +946,7 @@ xhci_event_xfer_isoc(struct usbd_xfer *x
  if (trb0_idx++ == (xp->ring.ntrb - 1))
  trb0_idx = 0;
  }
+ len = XHCI_TRB_LEN(letoh32(xp->ring.trbs[trb_idx].trb_status)) - remain;
 
  /*
  * If we queued two TRBs for a frame and this is the second TRB,
@@ -954,18 +956,17 @@ xhci_event_xfer_isoc(struct usbd_xfer *x
  if ((letoh32(xp->ring.trbs[trb_idx].trb_flags) & XHCI_TRB_TYPE_MASK) ==
     XHCI_TRB_TYPE_NORMAL) {
  frame_idx--;
- if (trb_idx == 0)
- trb0_idx = xp->ring.ntrb - 2;
- else
- trb0_idx = trb_idx - 1;
- if (xfer->frlengths[frame_idx] == 0) {
+ if (len > 0 && xfer->frlengths[frame_idx] == 0) {
+ if (trb_idx == 0)
+ trb0_idx = xp->ring.ntrb - 2;
+ else
+ trb0_idx = trb_idx - 1;
  xfer->frlengths[frame_idx] = XHCI_TRB_LEN(letoh32(
     xp->ring.trbs[trb0_idx].trb_status));
  }
  }
 
- xfer->frlengths[frame_idx] +=
-    XHCI_TRB_LEN(letoh32(xp->ring.trbs[trb_idx].trb_status)) - remain;
+ xfer->frlengths[frame_idx] += len;
  xfer->actlen += xfer->frlengths[frame_idx];
 
  if (xx->index != trb_idx)

Reply | Threaded
Open this post in threaded view
|

Re: xhci(4) isoc: fix bogus handling of chained TRBs

Patrick Wildt-3
On Fri, Jul 10, 2020 at 01:00:31PM +0200, Marcus Glocker wrote:

> Hi All,
>
> Laurence Tratt was reporting about corrupted video images when using
> uvideo(4) on xhci(4) with MJPEG and higher resolutions, e.g. when using
> ffplay:
>
> $ ffplay -f v4l2 -input_format mjpeg -video_size 1280x720 -f /dev/video1
>
> When trying to re-produce the issue on my side, the video images were
> still OK, but I also could see a lot of errors returned by ffplay
> related to corrupted MJPEG data.
>
> When debugging the error I noticed that the corruption only happens
> when TRBs are get chained due to hitting a 64k boundary, and therefore
> require to queue up two TRBs.  So, what happened?
>
> In xhci.c:xhci_event_xfer_isoc() we do the accounting of the
> processed, chained TD:
>
> /*
>  * If we queued two TRBs for a frame and this is the second TRB,
>  * check if the first TRB needs accounting since it might not have
>  * raised an interrupt in case of full data received.
>  */
>
> If there was a zero length transfer with such a TD, and the 2. TRB
> receives the interrupt with len=0, it assumes that the 1. TRB was
> processed (but it wasn't), and adds the requested 1. TRB length to
> actlen, passing back that data has been received, which is obviously
> wrong, resulting in the seen data corruptions.
>
> Instead, when the 2. TRB receives the IOC interrupt with len=0, we
> need to ignore the 1. TRB, considering this was a zero length transfer.
>
> With the below diff Laurent is getting a clear video image now with
> high resolutions, and I could remove all the output errors from ffplay
> in my case.  I also tested with an uaudio(4) device.
>
> Feedback, more testers, OKs?
>
> Thanks,
> Marcus
>

What if the first TRB was filled completely, but the second TRB
transferred 0?  Wouldn't we then need to add 1. TRB?  I think your
diff wouldn't do that.

I think what we actually need is some "processed" flag.  If you look
on bugs@ or so, there's a diff that tries to fix the same issue by
adding a "CHAINED" or so flag.  I'm not sure if that's the right way.

But, anyway, I think we need a way to say "this TRB got an interrupt"
or "this TRB was processed".

>
> Index: xhci.c
> ===================================================================
> RCS file: /cvs/src/sys/dev/usb/xhci.c,v
> retrieving revision 1.116
> diff -u -p -u -p -r1.116 xhci.c
> --- xhci.c 30 Jun 2020 10:21:59 -0000 1.116
> +++ xhci.c 10 Jul 2020 05:57:11 -0000
> @@ -932,6 +932,7 @@ xhci_event_xfer_isoc(struct usbd_xfer *x
>   struct usbd_xfer *skipxfer;
>   struct xhci_xfer *xx = (struct xhci_xfer *)xfer;
>   int trb0_idx, frame_idx = 0;
> + uint32_t len;
>  
>   KASSERT(xx->index >= 0);
>   trb0_idx =
> @@ -945,6 +946,7 @@ xhci_event_xfer_isoc(struct usbd_xfer *x
>   if (trb0_idx++ == (xp->ring.ntrb - 1))
>   trb0_idx = 0;
>   }
> + len = XHCI_TRB_LEN(letoh32(xp->ring.trbs[trb_idx].trb_status)) - remain;
>  
>   /*
>   * If we queued two TRBs for a frame and this is the second TRB,
> @@ -954,18 +956,17 @@ xhci_event_xfer_isoc(struct usbd_xfer *x
>   if ((letoh32(xp->ring.trbs[trb_idx].trb_flags) & XHCI_TRB_TYPE_MASK) ==
>      XHCI_TRB_TYPE_NORMAL) {
>   frame_idx--;
> - if (trb_idx == 0)
> - trb0_idx = xp->ring.ntrb - 2;
> - else
> - trb0_idx = trb_idx - 1;
> - if (xfer->frlengths[frame_idx] == 0) {
> + if (len > 0 && xfer->frlengths[frame_idx] == 0) {
> + if (trb_idx == 0)
> + trb0_idx = xp->ring.ntrb - 2;
> + else
> + trb0_idx = trb_idx - 1;
>   xfer->frlengths[frame_idx] = XHCI_TRB_LEN(letoh32(
>      xp->ring.trbs[trb0_idx].trb_status));
>   }
>   }
>  
> - xfer->frlengths[frame_idx] +=
> -    XHCI_TRB_LEN(letoh32(xp->ring.trbs[trb_idx].trb_status)) - remain;
> + xfer->frlengths[frame_idx] += len;
>   xfer->actlen += xfer->frlengths[frame_idx];
>  
>   if (xx->index != trb_idx)
>

Reply | Threaded
Open this post in threaded view
|

Re: xhci(4) isoc: fix bogus handling of chained TRBs

Marcus Glocker
On Fri, 10 Jul 2020 14:19:00 +0200
Patrick Wildt <[hidden email]> wrote:

> On Fri, Jul 10, 2020 at 01:00:31PM +0200, Marcus Glocker wrote:
> > Hi All,
> >
> > Laurence Tratt was reporting about corrupted video images when using
> > uvideo(4) on xhci(4) with MJPEG and higher resolutions, e.g. when
> > using ffplay:
> >
> > $ ffplay -f v4l2 -input_format mjpeg -video_size 1280x720 -f
> > /dev/video1
> >
> > When trying to re-produce the issue on my side, the video images
> > were still OK, but I also could see a lot of errors returned by
> > ffplay related to corrupted MJPEG data.
> >
> > When debugging the error I noticed that the corruption only happens
> > when TRBs are get chained due to hitting a 64k boundary, and
> > therefore require to queue up two TRBs.  So, what happened?
> >
> > In xhci.c:xhci_event_xfer_isoc() we do the accounting of the
> > processed, chained TD:
> >
> > /*
> >  * If we queued two TRBs for a frame and this is the second TRB,
> >  * check if the first TRB needs accounting since it might not have
> >  * raised an interrupt in case of full data received.
> >  */
> >
> > If there was a zero length transfer with such a TD, and the 2. TRB
> > receives the interrupt with len=0, it assumes that the 1. TRB was
> > processed (but it wasn't), and adds the requested 1. TRB length to
> > actlen, passing back that data has been received, which is obviously
> > wrong, resulting in the seen data corruptions.
> >
> > Instead, when the 2. TRB receives the IOC interrupt with len=0, we
> > need to ignore the 1. TRB, considering this was a zero length
> > transfer.
> >
> > With the below diff Laurent is getting a clear video image now with
> > high resolutions, and I could remove all the output errors from
> > ffplay in my case.  I also tested with an uaudio(4) device.
> >
> > Feedback, more testers, OKs?
> >
> > Thanks,
> > Marcus
> >  
>
> What if the first TRB was filled completely, but the second TRB
> transferred 0?  Wouldn't we then need to add 1. TRB?  I think your
> diff wouldn't do that.

Right - That was the only case I was thinking about as well which
wouldn't get handled at the moment, but I wasn't sure if it can be a
case at all (according to your comment it can :-)

> I think what we actually need is some "processed" flag.  If you look
> on bugs@ or so, there's a diff that tries to fix the same issue by
> adding a "CHAINED" or so flag.  I'm not sure if that's the right way.
>
> But, anyway, I think we need a way to say "this TRB got an interrupt"
> or "this TRB was processed".

And I was thinking about this too.

I think it should be more something like "this TRB was processed", since
at the moment we don't receive an interrupt for the 1st TRB.

Two questions related to that:

1. Setting XHCI_TRB_IOC together with XHCI_TRB_CHAIN is probably not
   something which would work with our current code nor supported with
   the specs?

2. Is there already something, somewhere which sets a status flag when
   the 1. TRB was touched, or is it something we really need to write
   an own handler for from scratch?

> >
> > Index: xhci.c
> > ===================================================================
> > RCS file: /cvs/src/sys/dev/usb/xhci.c,v
> > retrieving revision 1.116
> > diff -u -p -u -p -r1.116 xhci.c
> > --- xhci.c 30 Jun 2020 10:21:59 -0000 1.116
> > +++ xhci.c 10 Jul 2020 05:57:11 -0000
> > @@ -932,6 +932,7 @@ xhci_event_xfer_isoc(struct usbd_xfer *x
> >   struct usbd_xfer *skipxfer;
> >   struct xhci_xfer *xx = (struct xhci_xfer *)xfer;
> >   int trb0_idx, frame_idx = 0;
> > + uint32_t len;
> >  
> >   KASSERT(xx->index >= 0);
> >   trb0_idx =
> > @@ -945,6 +946,7 @@ xhci_event_xfer_isoc(struct usbd_xfer *x
> >   if (trb0_idx++ == (xp->ring.ntrb - 1))
> >   trb0_idx = 0;
> >   }
> > + len =
> > XHCI_TRB_LEN(letoh32(xp->ring.trbs[trb_idx].trb_status)) - remain;
> >   /*
> >   * If we queued two TRBs for a frame and this is the
> > second TRB, @@ -954,18 +956,17 @@ xhci_event_xfer_isoc(struct
> > usbd_xfer *x if ((letoh32(xp->ring.trbs[trb_idx].trb_flags) &
> > XHCI_TRB_TYPE_MASK) == XHCI_TRB_TYPE_NORMAL) {
> >   frame_idx--;
> > - if (trb_idx == 0)
> > - trb0_idx = xp->ring.ntrb - 2;
> > - else
> > - trb0_idx = trb_idx - 1;
> > - if (xfer->frlengths[frame_idx] == 0) {
> > + if (len > 0 && xfer->frlengths[frame_idx] == 0) {
> > + if (trb_idx == 0)
> > + trb0_idx = xp->ring.ntrb - 2;
> > + else
> > + trb0_idx = trb_idx - 1;
> >   xfer->frlengths[frame_idx] =
> > XHCI_TRB_LEN(letoh32( xp->ring.trbs[trb0_idx].trb_status));
> >   }
> >   }
> >  
> > - xfer->frlengths[frame_idx] +=
> > -
> > XHCI_TRB_LEN(letoh32(xp->ring.trbs[trb_idx].trb_status)) - remain;
> > + xfer->frlengths[frame_idx] += len;
> >   xfer->actlen += xfer->frlengths[frame_idx];
> >  
> >   if (xx->index != trb_idx)
> >  

Reply | Threaded
Open this post in threaded view
|

Re: xhci(4) isoc: fix bogus handling of chained TRBs

Marcus Glocker
On Fri, 10 Jul 2020 14:32:47 +0200
Marcus Glocker <[hidden email]> wrote:

> On Fri, 10 Jul 2020 14:19:00 +0200
> Patrick Wildt <[hidden email]> wrote:
>
> > On Fri, Jul 10, 2020 at 01:00:31PM +0200, Marcus Glocker wrote:  
> > > Hi All,
> > >
> > > Laurence Tratt was reporting about corrupted video images when
> > > using uvideo(4) on xhci(4) with MJPEG and higher resolutions,
> > > e.g. when using ffplay:
> > >
> > > $ ffplay -f v4l2 -input_format mjpeg -video_size 1280x720 -f
> > > /dev/video1
> > >
> > > When trying to re-produce the issue on my side, the video images
> > > were still OK, but I also could see a lot of errors returned by
> > > ffplay related to corrupted MJPEG data.
> > >
> > > When debugging the error I noticed that the corruption only
> > > happens when TRBs are get chained due to hitting a 64k boundary,
> > > and therefore require to queue up two TRBs.  So, what happened?
> > >
> > > In xhci.c:xhci_event_xfer_isoc() we do the accounting of the
> > > processed, chained TD:
> > >
> > > /*
> > >  * If we queued two TRBs for a frame and this is the second TRB,
> > >  * check if the first TRB needs accounting since it might not have
> > >  * raised an interrupt in case of full data received.
> > >  */
> > >
> > > If there was a zero length transfer with such a TD, and the 2. TRB
> > > receives the interrupt with len=0, it assumes that the 1. TRB was
> > > processed (but it wasn't), and adds the requested 1. TRB length to
> > > actlen, passing back that data has been received, which is
> > > obviously wrong, resulting in the seen data corruptions.
> > >
> > > Instead, when the 2. TRB receives the IOC interrupt with len=0, we
> > > need to ignore the 1. TRB, considering this was a zero length
> > > transfer.
> > >
> > > With the below diff Laurent is getting a clear video image now
> > > with high resolutions, and I could remove all the output errors
> > > from ffplay in my case.  I also tested with an uaudio(4) device.
> > >
> > > Feedback, more testers, OKs?
> > >
> > > Thanks,
> > > Marcus
> > >    
> >
> > What if the first TRB was filled completely, but the second TRB
> > transferred 0?  Wouldn't we then need to add 1. TRB?  I think your
> > diff wouldn't do that.  
>
> Right - That was the only case I was thinking about as well which
> wouldn't get handled at the moment, but I wasn't sure if it can be a
> case at all (according to your comment it can :-)
>
> > I think what we actually need is some "processed" flag.  If you look
> > on bugs@ or so, there's a diff that tries to fix the same issue by
> > adding a "CHAINED" or so flag.  I'm not sure if that's the right
> > way.
> >
> > But, anyway, I think we need a way to say "this TRB got an
> > interrupt" or "this TRB was processed".  

Sorry, I just did see the "xhci: zero length multi-TRB inbound xfer does
not work" mail thread now.

Honestly, at the moment I also don't know what's the right way to
implement this.  I need to understand the approach from sc.dyings diff
a bit better first.

> And I was thinking about this too.
>
> I think it should be more something like "this TRB was processed",
> since at the moment we don't receive an interrupt for the 1st TRB.
>
> Two questions related to that:
>
> 1. Setting XHCI_TRB_IOC together with XHCI_TRB_CHAIN is probably not
>    something which would work with our current code nor supported with
>    the specs?
>
> 2. Is there already something, somewhere which sets a status flag when
>    the 1. TRB was touched, or is it something we really need to write
>    an own handler for from scratch?
>
> > >
> > > Index: xhci.c
> > > ===================================================================
> > > RCS file: /cvs/src/sys/dev/usb/xhci.c,v
> > > retrieving revision 1.116
> > > diff -u -p -u -p -r1.116 xhci.c
> > > --- xhci.c 30 Jun 2020 10:21:59 -0000 1.116
> > > +++ xhci.c 10 Jul 2020 05:57:11 -0000
> > > @@ -932,6 +932,7 @@ xhci_event_xfer_isoc(struct usbd_xfer *x
> > >   struct usbd_xfer *skipxfer;
> > >   struct xhci_xfer *xx = (struct xhci_xfer *)xfer;
> > >   int trb0_idx, frame_idx = 0;
> > > + uint32_t len;
> > >  
> > >   KASSERT(xx->index >= 0);
> > >   trb0_idx =
> > > @@ -945,6 +946,7 @@ xhci_event_xfer_isoc(struct usbd_xfer *x
> > >   if (trb0_idx++ == (xp->ring.ntrb - 1))
> > >   trb0_idx = 0;
> > >   }
> > > + len =
> > > XHCI_TRB_LEN(letoh32(xp->ring.trbs[trb_idx].trb_status)) -
> > > remain; /*
> > >   * If we queued two TRBs for a frame and this is the
> > > second TRB, @@ -954,18 +956,17 @@ xhci_event_xfer_isoc(struct
> > > usbd_xfer *x if ((letoh32(xp->ring.trbs[trb_idx].trb_flags) &
> > > XHCI_TRB_TYPE_MASK) == XHCI_TRB_TYPE_NORMAL) {
> > >   frame_idx--;
> > > - if (trb_idx == 0)
> > > - trb0_idx = xp->ring.ntrb - 2;
> > > - else
> > > - trb0_idx = trb_idx - 1;
> > > - if (xfer->frlengths[frame_idx] == 0) {
> > > + if (len > 0 && xfer->frlengths[frame_idx] == 0) {
> > > + if (trb_idx == 0)
> > > + trb0_idx = xp->ring.ntrb - 2;
> > > + else
> > > + trb0_idx = trb_idx - 1;
> > >   xfer->frlengths[frame_idx] =
> > > XHCI_TRB_LEN(letoh32( xp->ring.trbs[trb0_idx].trb_status));
> > >   }
> > >   }
> > >  
> > > - xfer->frlengths[frame_idx] +=
> > > -
> > > XHCI_TRB_LEN(letoh32(xp->ring.trbs[trb_idx].trb_status)) - remain;
> > > + xfer->frlengths[frame_idx] += len;
> > >   xfer->actlen += xfer->frlengths[frame_idx];
> > >  
> > >   if (xx->index != trb_idx)
> > >    
>

Reply | Threaded
Open this post in threaded view
|

Re: xhci(4) isoc: fix bogus handling of chained TRBs

Marcus Glocker
On Sat, 11 Jul 2020 11:56:38 +0200
Marcus Glocker <[hidden email]> wrote:

> On Fri, 10 Jul 2020 14:32:47 +0200
> Marcus Glocker <[hidden email]> wrote:
>
> > On Fri, 10 Jul 2020 14:19:00 +0200
> > Patrick Wildt <[hidden email]> wrote:
> >  
> > > On Fri, Jul 10, 2020 at 01:00:31PM +0200, Marcus Glocker wrote:
> > >  
> > > > Hi All,
> > > >
> > > > Laurence Tratt was reporting about corrupted video images when
> > > > using uvideo(4) on xhci(4) with MJPEG and higher resolutions,
> > > > e.g. when using ffplay:
> > > >
> > > > $ ffplay -f v4l2 -input_format mjpeg -video_size 1280x720 -f
> > > > /dev/video1
> > > >
> > > > When trying to re-produce the issue on my side, the video images
> > > > were still OK, but I also could see a lot of errors returned by
> > > > ffplay related to corrupted MJPEG data.
> > > >
> > > > When debugging the error I noticed that the corruption only
> > > > happens when TRBs are get chained due to hitting a 64k boundary,
> > > > and therefore require to queue up two TRBs.  So, what happened?
> > > >
> > > > In xhci.c:xhci_event_xfer_isoc() we do the accounting of the
> > > > processed, chained TD:
> > > >
> > > > /*
> > > >  * If we queued two TRBs for a frame and this is the second TRB,
> > > >  * check if the first TRB needs accounting since it might not
> > > > have
> > > >  * raised an interrupt in case of full data received.
> > > >  */
> > > >
> > > > If there was a zero length transfer with such a TD, and the 2.
> > > > TRB receives the interrupt with len=0, it assumes that the 1.
> > > > TRB was processed (but it wasn't), and adds the requested 1.
> > > > TRB length to actlen, passing back that data has been received,
> > > > which is obviously wrong, resulting in the seen data
> > > > corruptions.
> > > >
> > > > Instead, when the 2. TRB receives the IOC interrupt with len=0,
> > > > we need to ignore the 1. TRB, considering this was a zero length
> > > > transfer.
> > > >
> > > > With the below diff Laurent is getting a clear video image now
> > > > with high resolutions, and I could remove all the output errors
> > > > from ffplay in my case.  I also tested with an uaudio(4) device.
> > > >
> > > > Feedback, more testers, OKs?
> > > >
> > > > Thanks,
> > > > Marcus
> > > >      
> > >
> > > What if the first TRB was filled completely, but the second TRB
> > > transferred 0?  Wouldn't we then need to add 1. TRB?  I think your
> > > diff wouldn't do that.    
> >
> > Right - That was the only case I was thinking about as well which
> > wouldn't get handled at the moment, but I wasn't sure if it can be a
> > case at all (according to your comment it can :-)
> >  
> > > I think what we actually need is some "processed" flag.  If you
> > > look on bugs@ or so, there's a diff that tries to fix the same
> > > issue by adding a "CHAINED" or so flag.  I'm not sure if that's
> > > the right way.
> > >
> > > But, anyway, I think we need a way to say "this TRB got an
> > > interrupt" or "this TRB was processed".    
>
> Sorry, I just did see the "xhci: zero length multi-TRB inbound xfer
> does not work" mail thread now.
>
> Honestly, at the moment I also don't know what's the right way to
> implement this.  I need to understand the approach from sc.dyings diff
> a bit better first.

Ok, I think sc.dyings diff is going in to the right direction but I
agree with Patrick that we should track the TRB processed status per
TRB.

How's that as a start (for isoc)?


Index: xhci.c
===================================================================
RCS file: /cvs/src/sys/dev/usb/xhci.c,v
retrieving revision 1.116
diff -u -p -u -p -r1.116 xhci.c
--- xhci.c 30 Jun 2020 10:21:59 -0000 1.116
+++ xhci.c 11 Jul 2020 18:02:15 -0000
@@ -945,6 +945,7 @@ xhci_event_xfer_isoc(struct usbd_xfer *x
  if (trb0_idx++ == (xp->ring.ntrb - 1))
  trb0_idx = 0;
  }
+ xp->ring.trb_processed[trb_idx] = true;
 
  /*
  * If we queued two TRBs for a frame and this is the second TRB,
@@ -958,7 +959,7 @@ xhci_event_xfer_isoc(struct usbd_xfer *x
  trb0_idx = xp->ring.ntrb - 2;
  else
  trb0_idx = trb_idx - 1;
- if (xfer->frlengths[frame_idx] == 0) {
+ if (xp->ring.trb_processed[trb0_idx] == false) {
  xfer->frlengths[frame_idx] = XHCI_TRB_LEN(letoh32(
     xp->ring.trbs[trb0_idx].trb_status));
  }
@@ -1702,6 +1703,9 @@ xhci_ring_alloc(struct xhci_softc *sc, s
 
  ring->ntrb = ntrb;
 
+ ring->trb_processed = malloc(ring->ntrb * sizeof(*ring->trb_processed),
+    M_DEVBUF, M_NOWAIT);
+
  xhci_ring_reset(sc, ring);
 
  return (0);
@@ -1710,6 +1714,8 @@ xhci_ring_alloc(struct xhci_softc *sc, s
 void
 xhci_ring_free(struct xhci_softc *sc, struct xhci_ring *ring)
 {
+ free(ring->trb_processed, M_DEVBUF,
+    ring->ntrb * sizeof(*ring->trb_processed));
  usbd_dma_contig_free(&sc->sc_bus, &ring->dma);
 }
 
@@ -1721,6 +1727,8 @@ xhci_ring_reset(struct xhci_softc *sc, s
  size = ring->ntrb * sizeof(struct xhci_trb);
 
  memset(ring->trbs, 0, size);
+ memset(ring->trb_processed, 0,
+    ring->ntrb * sizeof(*ring->trb_processed));
 
  ring->index = 0;
  ring->toggle = XHCI_TRB_CYCLE;
@@ -1815,6 +1823,7 @@ xhci_xfer_get_trb(struct xhci_softc *sc,
 {
  struct xhci_pipe *xp = (struct xhci_pipe *)xfer->pipe;
  struct xhci_xfer *xx = (struct xhci_xfer *)xfer;
+ struct xhci_trb *trb;
 
  KASSERT(xp->free_trbs >= 1);
  xp->free_trbs--;
@@ -1836,7 +1845,10 @@ xhci_xfer_get_trb(struct xhci_softc *sc,
  break;
  }
 
- return (xhci_ring_produce(sc, &xp->ring));
+ trb = xhci_ring_produce(sc, &xp->ring);
+ xp->ring.trb_processed[(trb - &xp->ring.trbs[0])] = false;
+
+ return (trb);
 }
 
 int
Index: xhcivar.h
===================================================================
RCS file: /cvs/src/sys/dev/usb/xhcivar.h,v
retrieving revision 1.11
diff -u -p -u -p -r1.11 xhcivar.h
--- xhcivar.h 6 Oct 2019 17:30:00 -0000 1.11
+++ xhcivar.h 11 Jul 2020 18:02:15 -0000
@@ -44,6 +44,7 @@ struct xhci_xfer {
 
 struct xhci_ring {
  struct xhci_trb *trbs;
+ bool *trb_processed;
  size_t ntrb;
  struct usbd_dma_info dma;
 

Reply | Threaded
Open this post in threaded view
|

Re: xhci(4) isoc: fix bogus handling of chained TRBs

Marcus Glocker
On Sat, 11 Jul 2020 20:14:03 +0200
Marcus Glocker <[hidden email]> wrote:

> On Sat, 11 Jul 2020 11:56:38 +0200
> Marcus Glocker <[hidden email]> wrote:
>
> > On Fri, 10 Jul 2020 14:32:47 +0200
> > Marcus Glocker <[hidden email]> wrote:
> >  
> > > On Fri, 10 Jul 2020 14:19:00 +0200
> > > Patrick Wildt <[hidden email]> wrote:
> > >    
> > > > On Fri, Jul 10, 2020 at 01:00:31PM +0200, Marcus Glocker wrote:
> > > >    
> > > > > Hi All,
> > > > >
> > > > > Laurence Tratt was reporting about corrupted video images when
> > > > > using uvideo(4) on xhci(4) with MJPEG and higher resolutions,
> > > > > e.g. when using ffplay:
> > > > >
> > > > > $ ffplay -f v4l2 -input_format mjpeg -video_size 1280x720 -f
> > > > > /dev/video1
> > > > >
> > > > > When trying to re-produce the issue on my side, the video
> > > > > images were still OK, but I also could see a lot of errors
> > > > > returned by ffplay related to corrupted MJPEG data.
> > > > >
> > > > > When debugging the error I noticed that the corruption only
> > > > > happens when TRBs are get chained due to hitting a 64k
> > > > > boundary, and therefore require to queue up two TRBs.  So,
> > > > > what happened?
> > > > >
> > > > > In xhci.c:xhci_event_xfer_isoc() we do the accounting of the
> > > > > processed, chained TD:
> > > > >
> > > > > /*
> > > > >  * If we queued two TRBs for a frame and this is the second
> > > > > TRB,
> > > > >  * check if the first TRB needs accounting since it might not
> > > > > have
> > > > >  * raised an interrupt in case of full data received.
> > > > >  */
> > > > >
> > > > > If there was a zero length transfer with such a TD, and the 2.
> > > > > TRB receives the interrupt with len=0, it assumes that the 1.
> > > > > TRB was processed (but it wasn't), and adds the requested 1.
> > > > > TRB length to actlen, passing back that data has been
> > > > > received, which is obviously wrong, resulting in the seen data
> > > > > corruptions.
> > > > >
> > > > > Instead, when the 2. TRB receives the IOC interrupt with
> > > > > len=0, we need to ignore the 1. TRB, considering this was a
> > > > > zero length transfer.
> > > > >
> > > > > With the below diff Laurent is getting a clear video image now
> > > > > with high resolutions, and I could remove all the output
> > > > > errors from ffplay in my case.  I also tested with an
> > > > > uaudio(4) device.
> > > > >
> > > > > Feedback, more testers, OKs?
> > > > >
> > > > > Thanks,
> > > > > Marcus
> > > > >        
> > > >
> > > > What if the first TRB was filled completely, but the second TRB
> > > > transferred 0?  Wouldn't we then need to add 1. TRB?  I think
> > > > your diff wouldn't do that.      
> > >
> > > Right - That was the only case I was thinking about as well which
> > > wouldn't get handled at the moment, but I wasn't sure if it can
> > > be a case at all (according to your comment it can :-)
> > >    
> > > > I think what we actually need is some "processed" flag.  If you
> > > > look on bugs@ or so, there's a diff that tries to fix the same
> > > > issue by adding a "CHAINED" or so flag.  I'm not sure if that's
> > > > the right way.
> > > >
> > > > But, anyway, I think we need a way to say "this TRB got an
> > > > interrupt" or "this TRB was processed".      
> >
> > Sorry, I just did see the "xhci: zero length multi-TRB inbound xfer
> > does not work" mail thread now.
> >
> > Honestly, at the moment I also don't know what's the right way to
> > implement this.  I need to understand the approach from sc.dyings
> > diff a bit better first.  
>
> Ok, I think sc.dyings diff is going in to the right direction but I
> agree with Patrick that we should track the TRB processed status per
> TRB.
>
> How's that as a start (for isoc)?

A slightly re-worked diff which moves the trb_processed=false
initialization to xhci_ring_produce().
I couldn't manage to move the setting of trb_processed=true to
xhci_ring_consume().  Maybe at one point we need to introduce a new
xhci_ring_* function/macro to do this which is getting called by the
according TRB processing functions?


Index: xhci.c
===================================================================
RCS file: /cvs/src/sys/dev/usb/xhci.c,v
retrieving revision 1.116
diff -u -p -u -p -r1.116 xhci.c
--- xhci.c 30 Jun 2020 10:21:59 -0000 1.116
+++ xhci.c 11 Jul 2020 21:20:47 -0000
@@ -945,6 +945,7 @@ xhci_event_xfer_isoc(struct usbd_xfer *x
  if (trb0_idx++ == (xp->ring.ntrb - 1))
  trb0_idx = 0;
  }
+ xp->ring.trb_processed[trb_idx] = true;
 
  /*
  * If we queued two TRBs for a frame and this is the second TRB,
@@ -958,7 +959,7 @@ xhci_event_xfer_isoc(struct usbd_xfer *x
  trb0_idx = xp->ring.ntrb - 2;
  else
  trb0_idx = trb_idx - 1;
- if (xfer->frlengths[frame_idx] == 0) {
+ if (xp->ring.trb_processed[trb0_idx] == false) {
  xfer->frlengths[frame_idx] = XHCI_TRB_LEN(letoh32(
     xp->ring.trbs[trb0_idx].trb_status));
  }
@@ -1702,6 +1703,9 @@ xhci_ring_alloc(struct xhci_softc *sc, s
 
  ring->ntrb = ntrb;
 
+ ring->trb_processed = malloc(ring->ntrb * sizeof(*ring->trb_processed),
+    M_DEVBUF, M_NOWAIT);
+
  xhci_ring_reset(sc, ring);
 
  return (0);
@@ -1710,6 +1714,8 @@ xhci_ring_alloc(struct xhci_softc *sc, s
 void
 xhci_ring_free(struct xhci_softc *sc, struct xhci_ring *ring)
 {
+ free(ring->trb_processed, M_DEVBUF,
+    ring->ntrb * sizeof(*ring->trb_processed));
  usbd_dma_contig_free(&sc->sc_bus, &ring->dma);
 }
 
@@ -1721,6 +1727,8 @@ xhci_ring_reset(struct xhci_softc *sc, s
  size = ring->ntrb * sizeof(struct xhci_trb);
 
  memset(ring->trbs, 0, size);
+ memset(ring->trb_processed, 0,
+    ring->ntrb * sizeof(*ring->trb_processed));
 
  ring->index = 0;
  ring->toggle = XHCI_TRB_CYCLE;
@@ -1805,6 +1813,8 @@ xhci_ring_produce(struct xhci_softc *sc,
  ring->index = 0;
  ring->toggle ^= 1;
  }
+
+ ring->trb_processed[(trb - &ring->trbs[0])] = false;
 
  return (trb);
 }
Index: xhcivar.h
===================================================================
RCS file: /cvs/src/sys/dev/usb/xhcivar.h,v
retrieving revision 1.11
diff -u -p -u -p -r1.11 xhcivar.h
--- xhcivar.h 6 Oct 2019 17:30:00 -0000 1.11
+++ xhcivar.h 11 Jul 2020 21:20:47 -0000
@@ -44,6 +44,7 @@ struct xhci_xfer {
 
 struct xhci_ring {
  struct xhci_trb *trbs;
+ bool *trb_processed;
  size_t ntrb;
  struct usbd_dma_info dma;
 

Reply | Threaded
Open this post in threaded view
|

Re: xhci(4) isoc: fix bogus handling of chained TRBs

Marcus Glocker
On Sat, 11 Jul 2020 23:43:43 +0200
Marcus Glocker <[hidden email]> wrote:

> On Sat, 11 Jul 2020 20:14:03 +0200
> Marcus Glocker <[hidden email]> wrote:
>
> > On Sat, 11 Jul 2020 11:56:38 +0200
> > Marcus Glocker <[hidden email]> wrote:
> >  
> > > On Fri, 10 Jul 2020 14:32:47 +0200
> > > Marcus Glocker <[hidden email]> wrote:
> > >    
> > > > On Fri, 10 Jul 2020 14:19:00 +0200
> > > > Patrick Wildt <[hidden email]> wrote:
> > > >      
> > > > > On Fri, Jul 10, 2020 at 01:00:31PM +0200, Marcus Glocker
> > > > > wrote:
> > > > > > Hi All,
> > > > > >
> > > > > > Laurence Tratt was reporting about corrupted video images
> > > > > > when using uvideo(4) on xhci(4) with MJPEG and higher
> > > > > > resolutions, e.g. when using ffplay:
> > > > > >
> > > > > > $ ffplay -f v4l2 -input_format mjpeg -video_size 1280x720 -f
> > > > > > /dev/video1
> > > > > >
> > > > > > When trying to re-produce the issue on my side, the video
> > > > > > images were still OK, but I also could see a lot of errors
> > > > > > returned by ffplay related to corrupted MJPEG data.
> > > > > >
> > > > > > When debugging the error I noticed that the corruption only
> > > > > > happens when TRBs are get chained due to hitting a 64k
> > > > > > boundary, and therefore require to queue up two TRBs.  So,
> > > > > > what happened?
> > > > > >
> > > > > > In xhci.c:xhci_event_xfer_isoc() we do the accounting of the
> > > > > > processed, chained TD:
> > > > > >
> > > > > > /*
> > > > > >  * If we queued two TRBs for a frame and this is the second
> > > > > > TRB,
> > > > > >  * check if the first TRB needs accounting since it might
> > > > > > not have
> > > > > >  * raised an interrupt in case of full data received.
> > > > > >  */
> > > > > >
> > > > > > If there was a zero length transfer with such a TD, and the
> > > > > > 2. TRB receives the interrupt with len=0, it assumes that
> > > > > > the 1. TRB was processed (but it wasn't), and adds the
> > > > > > requested 1. TRB length to actlen, passing back that data
> > > > > > has been received, which is obviously wrong, resulting in
> > > > > > the seen data corruptions.
> > > > > >
> > > > > > Instead, when the 2. TRB receives the IOC interrupt with
> > > > > > len=0, we need to ignore the 1. TRB, considering this was a
> > > > > > zero length transfer.
> > > > > >
> > > > > > With the below diff Laurent is getting a clear video image
> > > > > > now with high resolutions, and I could remove all the output
> > > > > > errors from ffplay in my case.  I also tested with an
> > > > > > uaudio(4) device.
> > > > > >
> > > > > > Feedback, more testers, OKs?
> > > > > >
> > > > > > Thanks,
> > > > > > Marcus
> > > > > >          
> > > > >
> > > > > What if the first TRB was filled completely, but the second
> > > > > TRB transferred 0?  Wouldn't we then need to add 1. TRB?  I
> > > > > think your diff wouldn't do that.        
> > > >
> > > > Right - That was the only case I was thinking about as well
> > > > which wouldn't get handled at the moment, but I wasn't sure if
> > > > it can be a case at all (according to your comment it can :-)
> > > >      
> > > > > I think what we actually need is some "processed" flag.  If
> > > > > you look on bugs@ or so, there's a diff that tries to fix the
> > > > > same issue by adding a "CHAINED" or so flag.  I'm not sure if
> > > > > that's the right way.
> > > > >
> > > > > But, anyway, I think we need a way to say "this TRB got an
> > > > > interrupt" or "this TRB was processed".        
> > >
> > > Sorry, I just did see the "xhci: zero length multi-TRB inbound
> > > xfer does not work" mail thread now.
> > >
> > > Honestly, at the moment I also don't know what's the right way to
> > > implement this.  I need to understand the approach from sc.dyings
> > > diff a bit better first.    
> >
> > Ok, I think sc.dyings diff is going in to the right direction but I
> > agree with Patrick that we should track the TRB processed status per
> > TRB.
> >
> > How's that as a start (for isoc)?  
>
> A slightly re-worked diff which moves the trb_processed=false
> initialization to xhci_ring_produce().
> I couldn't manage to move the setting of trb_processed=true to
> xhci_ring_consume().  Maybe at one point we need to introduce a new
> xhci_ring_* function/macro to do this which is getting called by the
> according TRB processing functions?

Updated diff including feedback from mpi@ that we don't want to
introduce 'bool' in the kernel.


Index: xhci.c
===================================================================
RCS file: /cvs/src/sys/dev/usb/xhci.c,v
retrieving revision 1.116
diff -u -p -r1.116 xhci.c
--- xhci.c 30 Jun 2020 10:21:59 -0000 1.116
+++ xhci.c 13 Jul 2020 09:49:23 -0000
@@ -945,6 +945,7 @@ xhci_event_xfer_isoc(struct usbd_xfer *x
  if (trb0_idx++ == (xp->ring.ntrb - 1))
  trb0_idx = 0;
  }
+ xp->ring.trb_processed[trb_idx] = 1;
 
  /*
  * If we queued two TRBs for a frame and this is the second TRB,
@@ -958,7 +959,7 @@ xhci_event_xfer_isoc(struct usbd_xfer *x
  trb0_idx = xp->ring.ntrb - 2;
  else
  trb0_idx = trb_idx - 1;
- if (xfer->frlengths[frame_idx] == 0) {
+ if (xp->ring.trb_processed[trb0_idx] == 0) {
  xfer->frlengths[frame_idx] = XHCI_TRB_LEN(letoh32(
     xp->ring.trbs[trb0_idx].trb_status));
  }
@@ -1702,6 +1703,9 @@ xhci_ring_alloc(struct xhci_softc *sc, s
 
  ring->ntrb = ntrb;
 
+ ring->trb_processed = malloc(ring->ntrb * sizeof(*ring->trb_processed),
+    M_DEVBUF, M_NOWAIT);
+
  xhci_ring_reset(sc, ring);
 
  return (0);
@@ -1710,6 +1714,8 @@ xhci_ring_alloc(struct xhci_softc *sc, s
 void
 xhci_ring_free(struct xhci_softc *sc, struct xhci_ring *ring)
 {
+ free(ring->trb_processed, M_DEVBUF,
+    ring->ntrb * sizeof(*ring->trb_processed));
  usbd_dma_contig_free(&sc->sc_bus, &ring->dma);
 }
 
@@ -1721,6 +1727,8 @@ xhci_ring_reset(struct xhci_softc *sc, s
  size = ring->ntrb * sizeof(struct xhci_trb);
 
  memset(ring->trbs, 0, size);
+ memset(ring->trb_processed, 0,
+    ring->ntrb * sizeof(*ring->trb_processed));
 
  ring->index = 0;
  ring->toggle = XHCI_TRB_CYCLE;
@@ -1805,6 +1813,8 @@ xhci_ring_produce(struct xhci_softc *sc,
  ring->index = 0;
  ring->toggle ^= 1;
  }
+
+ ring->trb_processed[(trb - &ring->trbs[0])] = 0;
 
  return (trb);
 }
Index: xhcivar.h
===================================================================
RCS file: /cvs/src/sys/dev/usb/xhcivar.h,v
retrieving revision 1.11
diff -u -p -r1.11 xhcivar.h
--- xhcivar.h 6 Oct 2019 17:30:00 -0000 1.11
+++ xhcivar.h 13 Jul 2020 09:49:23 -0000
@@ -44,6 +44,7 @@ struct xhci_xfer {
 
 struct xhci_ring {
  struct xhci_trb *trbs;
+ uint8_t *trb_processed;
  size_t ntrb;
  struct usbd_dma_info dma;
 
 

Reply | Threaded
Open this post in threaded view
|

Re: xhci(4) isoc: fix bogus handling of chained TRBs

sc.dying
hi,

The patch works well on Intel PCH xhci, but not on AMD Bolton xHCI.
I'll investigate this problem.

BTW please check if malloc returns NULL.


On 2020/07/13 09:59, Marcus Glocker wrote:

> On Sat, 11 Jul 2020 23:43:43 +0200
> Marcus Glocker <[hidden email]> wrote:
>
>> On Sat, 11 Jul 2020 20:14:03 +0200
>> Marcus Glocker <[hidden email]> wrote:
>>
>>> On Sat, 11 Jul 2020 11:56:38 +0200
>>> Marcus Glocker <[hidden email]> wrote:
>>>  
>>>> On Fri, 10 Jul 2020 14:32:47 +0200
>>>> Marcus Glocker <[hidden email]> wrote:
>>>>    
>>>>> On Fri, 10 Jul 2020 14:19:00 +0200
>>>>> Patrick Wildt <[hidden email]> wrote:
>>>>>      
>>>>>> On Fri, Jul 10, 2020 at 01:00:31PM +0200, Marcus Glocker
>>>>>> wrote:
>>>>>>> Hi All,
>>>>>>>
>>>>>>> Laurence Tratt was reporting about corrupted video images
>>>>>>> when using uvideo(4) on xhci(4) with MJPEG and higher
>>>>>>> resolutions, e.g. when using ffplay:
>>>>>>>
>>>>>>> $ ffplay -f v4l2 -input_format mjpeg -video_size 1280x720 -f
>>>>>>> /dev/video1
>>>>>>>
>>>>>>> When trying to re-produce the issue on my side, the video
>>>>>>> images were still OK, but I also could see a lot of errors
>>>>>>> returned by ffplay related to corrupted MJPEG data.
>>>>>>>
>>>>>>> When debugging the error I noticed that the corruption only
>>>>>>> happens when TRBs are get chained due to hitting a 64k
>>>>>>> boundary, and therefore require to queue up two TRBs.  So,
>>>>>>> what happened?
>>>>>>>
>>>>>>> In xhci.c:xhci_event_xfer_isoc() we do the accounting of the
>>>>>>> processed, chained TD:
>>>>>>>
>>>>>>> /*
>>>>>>>  * If we queued two TRBs for a frame and this is the second
>>>>>>> TRB,
>>>>>>>  * check if the first TRB needs accounting since it might
>>>>>>> not have
>>>>>>>  * raised an interrupt in case of full data received.
>>>>>>>  */
>>>>>>>
>>>>>>> If there was a zero length transfer with such a TD, and the
>>>>>>> 2. TRB receives the interrupt with len=0, it assumes that
>>>>>>> the 1. TRB was processed (but it wasn't), and adds the
>>>>>>> requested 1. TRB length to actlen, passing back that data
>>>>>>> has been received, which is obviously wrong, resulting in
>>>>>>> the seen data corruptions.
>>>>>>>
>>>>>>> Instead, when the 2. TRB receives the IOC interrupt with
>>>>>>> len=0, we need to ignore the 1. TRB, considering this was a
>>>>>>> zero length transfer.
>>>>>>>
>>>>>>> With the below diff Laurent is getting a clear video image
>>>>>>> now with high resolutions, and I could remove all the output
>>>>>>> errors from ffplay in my case.  I also tested with an
>>>>>>> uaudio(4) device.
>>>>>>>
>>>>>>> Feedback, more testers, OKs?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Marcus
>>>>>>>          
>>>>>>
>>>>>> What if the first TRB was filled completely, but the second
>>>>>> TRB transferred 0?  Wouldn't we then need to add 1. TRB?  I
>>>>>> think your diff wouldn't do that.        
>>>>>
>>>>> Right - That was the only case I was thinking about as well
>>>>> which wouldn't get handled at the moment, but I wasn't sure if
>>>>> it can be a case at all (according to your comment it can :-)
>>>>>      
>>>>>> I think what we actually need is some "processed" flag.  If
>>>>>> you look on bugs@ or so, there's a diff that tries to fix the
>>>>>> same issue by adding a "CHAINED" or so flag.  I'm not sure if
>>>>>> that's the right way.
>>>>>>
>>>>>> But, anyway, I think we need a way to say "this TRB got an
>>>>>> interrupt" or "this TRB was processed".        
>>>>
>>>> Sorry, I just did see the "xhci: zero length multi-TRB inbound
>>>> xfer does not work" mail thread now.
>>>>
>>>> Honestly, at the moment I also don't know what's the right way to
>>>> implement this.  I need to understand the approach from sc.dyings
>>>> diff a bit better first.    
>>>
>>> Ok, I think sc.dyings diff is going in to the right direction but I
>>> agree with Patrick that we should track the TRB processed status per
>>> TRB.
>>>
>>> How's that as a start (for isoc)?  
>>
>> A slightly re-worked diff which moves the trb_processed=false
>> initialization to xhci_ring_produce().
>> I couldn't manage to move the setting of trb_processed=true to
>> xhci_ring_consume().  Maybe at one point we need to introduce a new
>> xhci_ring_* function/macro to do this which is getting called by the
>> according TRB processing functions?
>
> Updated diff including feedback from mpi@ that we don't want to
> introduce 'bool' in the kernel.
>
>
> Index: xhci.c
> ===================================================================
> RCS file: /cvs/src/sys/dev/usb/xhci.c,v
> retrieving revision 1.116
> diff -u -p -r1.116 xhci.c
> --- xhci.c 30 Jun 2020 10:21:59 -0000 1.116
> +++ xhci.c 13 Jul 2020 09:49:23 -0000
> @@ -945,6 +945,7 @@ xhci_event_xfer_isoc(struct usbd_xfer *x
>   if (trb0_idx++ == (xp->ring.ntrb - 1))
>   trb0_idx = 0;
>   }
> + xp->ring.trb_processed[trb_idx] = 1;
>  
>   /*
>   * If we queued two TRBs for a frame and this is the second TRB,
> @@ -958,7 +959,7 @@ xhci_event_xfer_isoc(struct usbd_xfer *x
>   trb0_idx = xp->ring.ntrb - 2;
>   else
>   trb0_idx = trb_idx - 1;
> - if (xfer->frlengths[frame_idx] == 0) {
> + if (xp->ring.trb_processed[trb0_idx] == 0) {
>   xfer->frlengths[frame_idx] = XHCI_TRB_LEN(letoh32(
>      xp->ring.trbs[trb0_idx].trb_status));
>   }
> @@ -1702,6 +1703,9 @@ xhci_ring_alloc(struct xhci_softc *sc, s
>  
>   ring->ntrb = ntrb;
>  
> + ring->trb_processed = malloc(ring->ntrb * sizeof(*ring->trb_processed),
> +    M_DEVBUF, M_NOWAIT);
> +
>   xhci_ring_reset(sc, ring);
>  
>   return (0);
> @@ -1710,6 +1714,8 @@ xhci_ring_alloc(struct xhci_softc *sc, s
>  void
>  xhci_ring_free(struct xhci_softc *sc, struct xhci_ring *ring)
>  {
> + free(ring->trb_processed, M_DEVBUF,
> +    ring->ntrb * sizeof(*ring->trb_processed));
>   usbd_dma_contig_free(&sc->sc_bus, &ring->dma);
>  }
>  
> @@ -1721,6 +1727,8 @@ xhci_ring_reset(struct xhci_softc *sc, s
>   size = ring->ntrb * sizeof(struct xhci_trb);
>  
>   memset(ring->trbs, 0, size);
> + memset(ring->trb_processed, 0,
> +    ring->ntrb * sizeof(*ring->trb_processed));
>  
>   ring->index = 0;
>   ring->toggle = XHCI_TRB_CYCLE;
> @@ -1805,6 +1813,8 @@ xhci_ring_produce(struct xhci_softc *sc,
>   ring->index = 0;
>   ring->toggle ^= 1;
>   }
> +
> + ring->trb_processed[(trb - &ring->trbs[0])] = 0;
>  
>   return (trb);
>  }
> Index: xhcivar.h
> ===================================================================
> RCS file: /cvs/src/sys/dev/usb/xhcivar.h,v
> retrieving revision 1.11
> diff -u -p -r1.11 xhcivar.h
> --- xhcivar.h 6 Oct 2019 17:30:00 -0000 1.11
> +++ xhcivar.h 13 Jul 2020 09:49:23 -0000
> @@ -44,6 +44,7 @@ struct xhci_xfer {
>  
>  struct xhci_ring {
>   struct xhci_trb *trbs;
> + uint8_t *trb_processed;
>   size_t ntrb;
>   struct usbd_dma_info dma;
>  
>  
>

Reply | Threaded
Open this post in threaded view
|

Re: xhci(4) isoc: fix bogus handling of chained TRBs

Marcus Glocker
On Wed, 15 Jul 2020 21:28:29 +0000
[hidden email] wrote:

> hi,
>
> The patch works well on Intel PCH xhci, but not on AMD Bolton xHCI.
> I'll investigate this problem.

Ok, thanks for checking this - I unfortunately only can test this on a
Intel xhci currently.

> BTW please check if malloc returns NULL.

Sure thing ...

>
>
> On 2020/07/13 09:59, Marcus Glocker wrote:
> > On Sat, 11 Jul 2020 23:43:43 +0200
> > Marcus Glocker <[hidden email]> wrote:
> >  
> >> On Sat, 11 Jul 2020 20:14:03 +0200
> >> Marcus Glocker <[hidden email]> wrote:
> >>  
> >>> On Sat, 11 Jul 2020 11:56:38 +0200
> >>> Marcus Glocker <[hidden email]> wrote:
> >>>    
> >>>> On Fri, 10 Jul 2020 14:32:47 +0200
> >>>> Marcus Glocker <[hidden email]> wrote:
> >>>>      
> >>>>> On Fri, 10 Jul 2020 14:19:00 +0200
> >>>>> Patrick Wildt <[hidden email]> wrote:
> >>>>>        
> >>>>>> On Fri, Jul 10, 2020 at 01:00:31PM +0200, Marcus Glocker
> >>>>>> wrote:  
> >>>>>>> Hi All,
> >>>>>>>
> >>>>>>> Laurence Tratt was reporting about corrupted video images
> >>>>>>> when using uvideo(4) on xhci(4) with MJPEG and higher
> >>>>>>> resolutions, e.g. when using ffplay:
> >>>>>>>
> >>>>>>> $ ffplay -f v4l2 -input_format mjpeg -video_size 1280x720 -f
> >>>>>>> /dev/video1
> >>>>>>>
> >>>>>>> When trying to re-produce the issue on my side, the video
> >>>>>>> images were still OK, but I also could see a lot of errors
> >>>>>>> returned by ffplay related to corrupted MJPEG data.
> >>>>>>>
> >>>>>>> When debugging the error I noticed that the corruption only
> >>>>>>> happens when TRBs are get chained due to hitting a 64k
> >>>>>>> boundary, and therefore require to queue up two TRBs.  So,
> >>>>>>> what happened?
> >>>>>>>
> >>>>>>> In xhci.c:xhci_event_xfer_isoc() we do the accounting of the
> >>>>>>> processed, chained TD:
> >>>>>>>
> >>>>>>> /*
> >>>>>>>  * If we queued two TRBs for a frame and this is the second
> >>>>>>> TRB,
> >>>>>>>  * check if the first TRB needs accounting since it might
> >>>>>>> not have
> >>>>>>>  * raised an interrupt in case of full data received.
> >>>>>>>  */
> >>>>>>>
> >>>>>>> If there was a zero length transfer with such a TD, and the
> >>>>>>> 2. TRB receives the interrupt with len=0, it assumes that
> >>>>>>> the 1. TRB was processed (but it wasn't), and adds the
> >>>>>>> requested 1. TRB length to actlen, passing back that data
> >>>>>>> has been received, which is obviously wrong, resulting in
> >>>>>>> the seen data corruptions.
> >>>>>>>
> >>>>>>> Instead, when the 2. TRB receives the IOC interrupt with
> >>>>>>> len=0, we need to ignore the 1. TRB, considering this was a
> >>>>>>> zero length transfer.
> >>>>>>>
> >>>>>>> With the below diff Laurent is getting a clear video image
> >>>>>>> now with high resolutions, and I could remove all the output
> >>>>>>> errors from ffplay in my case.  I also tested with an
> >>>>>>> uaudio(4) device.
> >>>>>>>
> >>>>>>> Feedback, more testers, OKs?
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Marcus
> >>>>>>>            
> >>>>>>
> >>>>>> What if the first TRB was filled completely, but the second
> >>>>>> TRB transferred 0?  Wouldn't we then need to add 1. TRB?  I
> >>>>>> think your diff wouldn't do that.          
> >>>>>
> >>>>> Right - That was the only case I was thinking about as well
> >>>>> which wouldn't get handled at the moment, but I wasn't sure if
> >>>>> it can be a case at all (according to your comment it can :-)
> >>>>>        
> >>>>>> I think what we actually need is some "processed" flag.  If
> >>>>>> you look on bugs@ or so, there's a diff that tries to fix the
> >>>>>> same issue by adding a "CHAINED" or so flag.  I'm not sure if
> >>>>>> that's the right way.
> >>>>>>
> >>>>>> But, anyway, I think we need a way to say "this TRB got an
> >>>>>> interrupt" or "this TRB was processed".          
> >>>>
> >>>> Sorry, I just did see the "xhci: zero length multi-TRB inbound
> >>>> xfer does not work" mail thread now.
> >>>>
> >>>> Honestly, at the moment I also don't know what's the right way to
> >>>> implement this.  I need to understand the approach from sc.dyings
> >>>> diff a bit better first.      
> >>>
> >>> Ok, I think sc.dyings diff is going in to the right direction but
> >>> I agree with Patrick that we should track the TRB processed
> >>> status per TRB.
> >>>
> >>> How's that as a start (for isoc)?    
> >>
> >> A slightly re-worked diff which moves the trb_processed=false
> >> initialization to xhci_ring_produce().
> >> I couldn't manage to move the setting of trb_processed=true to
> >> xhci_ring_consume().  Maybe at one point we need to introduce a new
> >> xhci_ring_* function/macro to do this which is getting called by
> >> the according TRB processing functions?  
> >
> > Updated diff including feedback from mpi@ that we don't want to
> > introduce 'bool' in the kernel.
> >
> >
> > Index: xhci.c
> > ===================================================================
> > RCS file: /cvs/src/sys/dev/usb/xhci.c,v
> > retrieving revision 1.116
> > diff -u -p -r1.116 xhci.c
> > --- xhci.c 30 Jun 2020 10:21:59 -0000 1.116
> > +++ xhci.c 13 Jul 2020 09:49:23 -0000
> > @@ -945,6 +945,7 @@ xhci_event_xfer_isoc(struct usbd_xfer *x
> >   if (trb0_idx++ == (xp->ring.ntrb - 1))
> >   trb0_idx = 0;
> >   }
> > + xp->ring.trb_processed[trb_idx] = 1;
> >  
> >   /*
> >   * If we queued two TRBs for a frame and this is the
> > second TRB, @@ -958,7 +959,7 @@ xhci_event_xfer_isoc(struct
> > usbd_xfer *x trb0_idx = xp->ring.ntrb - 2;
> >   else
> >   trb0_idx = trb_idx - 1;
> > - if (xfer->frlengths[frame_idx] == 0) {
> > + if (xp->ring.trb_processed[trb0_idx] == 0) {
> >   xfer->frlengths[frame_idx] =
> > XHCI_TRB_LEN(letoh32( xp->ring.trbs[trb0_idx].trb_status));
> >   }
> > @@ -1702,6 +1703,9 @@ xhci_ring_alloc(struct xhci_softc *sc, s
> >  
> >   ring->ntrb = ntrb;
> >  
> > + ring->trb_processed = malloc(ring->ntrb *
> > sizeof(*ring->trb_processed),
> > +    M_DEVBUF, M_NOWAIT);
> > +
> >   xhci_ring_reset(sc, ring);
> >  
> >   return (0);
> > @@ -1710,6 +1714,8 @@ xhci_ring_alloc(struct xhci_softc *sc, s
> >  void
> >  xhci_ring_free(struct xhci_softc *sc, struct xhci_ring *ring)
> >  {
> > + free(ring->trb_processed, M_DEVBUF,
> > +    ring->ntrb * sizeof(*ring->trb_processed));
> >   usbd_dma_contig_free(&sc->sc_bus, &ring->dma);
> >  }
> >  
> > @@ -1721,6 +1727,8 @@ xhci_ring_reset(struct xhci_softc *sc, s
> >   size = ring->ntrb * sizeof(struct xhci_trb);
> >  
> >   memset(ring->trbs, 0, size);
> > + memset(ring->trb_processed, 0,
> > +    ring->ntrb * sizeof(*ring->trb_processed));
> >  
> >   ring->index = 0;
> >   ring->toggle = XHCI_TRB_CYCLE;
> > @@ -1805,6 +1813,8 @@ xhci_ring_produce(struct xhci_softc *sc,
> >   ring->index = 0;
> >   ring->toggle ^= 1;
> >   }
> > +
> > + ring->trb_processed[(trb - &ring->trbs[0])] = 0;
> >  
> >   return (trb);
> >  }
> > Index: xhcivar.h
> > ===================================================================
> > RCS file: /cvs/src/sys/dev/usb/xhcivar.h,v
> > retrieving revision 1.11
> > diff -u -p -r1.11 xhcivar.h
> > --- xhcivar.h 6 Oct 2019 17:30:00 -0000 1.11
> > +++ xhcivar.h 13 Jul 2020 09:49:23 -0000
> > @@ -44,6 +44,7 @@ struct xhci_xfer {
> >  
> >  struct xhci_ring {
> >   struct xhci_trb *trbs;
> > + uint8_t *trb_processed;
> >   size_t ntrb;
> >   struct usbd_dma_info dma;
> >  
> >  
> >  
>

Reply | Threaded
Open this post in threaded view
|

Re: xhci(4) isoc: fix bogus handling of chained TRBs

sc.dying
In reply to this post by sc.dying
hi,

On 2020/07/15 21:28, [hidden email] wrote:
> hi,
>
> The patch works well on Intel PCH xhci, but not on AMD Bolton xHCI.
> I'll investigate this problem.

Bolton xHCI sometimes raises following events.
(I added printf Completion Code.)

#246 cc 13 remain 0 type 5 origlen 2048 frlengths[6] 2048
#247 cc 1 remain 0 type 1 origlen 1024 frlengths[6] 3072

It seems this behavior violates the spec. 4.10.1.1.
When the remain of 1. TRB is 0, CC should be SUCCESS, not SHORT_PKT,
and HC should not raise interrupt.
The problem is how many length 2. TRB transferred.
If real CC of 1. TRB = SUCCESS, we should add 2. TRB length to frlengths.
If SHORT_PKT, we should keep frlengths as is.
Actually with the latter assumption I can see video smoothly w/o errors.

I changed your patch to skip adding to frlengths and actlen
if previous TRB is processed AND remain == 0.


--- sys/dev/usb/xhci.c.orig Sun Apr  5 10:12:37 2020
+++ sys/dev/usb/xhci.c Fri Jul 17 23:30:30 2020
@@ -931,7 +931,7 @@ xhci_event_xfer_isoc(struct usbd_xfer *xfer, struct xh
 {
  struct usbd_xfer *skipxfer;
  struct xhci_xfer *xx = (struct xhci_xfer *)xfer;
- int trb0_idx, frame_idx = 0;
+ int trb0_idx, frame_idx = 0, prev_trb_processed = 0;
 
  KASSERT(xx->index >= 0);
  trb0_idx =
@@ -945,6 +945,7 @@ xhci_event_xfer_isoc(struct usbd_xfer *xfer, struct xh
  if (trb0_idx++ == (xp->ring.ntrb - 1))
  trb0_idx = 0;
  }
+ xp->ring.trb_processed[trb_idx] = 1;
 
  /*
  * If we queued two TRBs for a frame and this is the second TRB,
@@ -958,15 +959,19 @@ xhci_event_xfer_isoc(struct usbd_xfer *xfer, struct xh
  trb0_idx = xp->ring.ntrb - 2;
  else
  trb0_idx = trb_idx - 1;
- if (xfer->frlengths[frame_idx] == 0) {
+ prev_trb_processed = xp->ring.trb_processed[trb0_idx];
+ if (prev_trb_processed == 0) {
  xfer->frlengths[frame_idx] = XHCI_TRB_LEN(letoh32(
     xp->ring.trbs[trb0_idx].trb_status));
  }
  }
 
- xfer->frlengths[frame_idx] +=
-    XHCI_TRB_LEN(letoh32(xp->ring.trbs[trb_idx].trb_status)) - remain;
- xfer->actlen += xfer->frlengths[frame_idx];
+ if (!(prev_trb_processed != 0 && remain == 0)) {
+ xfer->frlengths[frame_idx] +=
+    XHCI_TRB_LEN(letoh32(xp->ring.trbs[trb_idx].trb_status)) -
+    remain;
+ xfer->actlen += xfer->frlengths[frame_idx];
+ }
 
  if (xx->index != trb_idx)
  return (1);
@@ -1696,6 +1701,9 @@ xhci_ring_alloc(struct xhci_softc *sc, struct xhci_rin
 
  ring->ntrb = ntrb;
 
+ ring->trb_processed = malloc(ring->ntrb * sizeof(*ring->trb_processed),
+    M_DEVBUF, M_NOWAIT);
+
  xhci_ring_reset(sc, ring);
 
  return (0);
@@ -1704,6 +1712,8 @@ xhci_ring_alloc(struct xhci_softc *sc, struct xhci_rin
 void
 xhci_ring_free(struct xhci_softc *sc, struct xhci_ring *ring)
 {
+ free(ring->trb_processed, M_DEVBUF,
+    ring->ntrb * sizeof(*ring->trb_processed));
  usbd_dma_contig_free(&sc->sc_bus, &ring->dma);
 }
 
@@ -1715,6 +1725,8 @@ xhci_ring_reset(struct xhci_softc *sc, struct xhci_rin
  size = ring->ntrb * sizeof(struct xhci_trb);
 
  memset(ring->trbs, 0, size);
+ memset(ring->trb_processed, 0,
+    ring->ntrb * sizeof(*ring->trb_processed));
 
  ring->index = 0;
  ring->toggle = XHCI_TRB_CYCLE;
@@ -1799,6 +1811,8 @@ xhci_ring_produce(struct xhci_softc *sc, struct xhci_r
  ring->index = 0;
  ring->toggle ^= 1;
  }
+
+ ring->trb_processed[(trb - &ring->trbs[0])] = 0;
 
  return (trb);
 }

Reply | Threaded
Open this post in threaded view
|

Re: xhci(4) isoc: fix bogus handling of chained TRBs

Marcus Glocker
Hey,

On Sat, 18 Jul 2020 00:14:32 +0000
[hidden email] wrote:

> hi,
>
> On 2020/07/15 21:28, [hidden email] wrote:
> > hi,
> >
> > The patch works well on Intel PCH xhci, but not on AMD Bolton xHCI.
> > I'll investigate this problem.  
>
> Bolton xHCI sometimes raises following events.
> (I added printf Completion Code.)
>
> #246 cc 13 remain 0 type 5 origlen 2048 frlengths[6] 2048
> #247 cc 1 remain 0 type 1 origlen 1024 frlengths[6] 3072
>
> It seems this behavior violates the spec. 4.10.1.1.
> When the remain of 1. TRB is 0, CC should be SUCCESS, not SHORT_PKT,
> and HC should not raise interrupt.
> The problem is how many length 2. TRB transferred.
> If real CC of 1. TRB = SUCCESS, we should add 2. TRB length to
> frlengths. If SHORT_PKT, we should keep frlengths as is.
> Actually with the latter assumption I can see video smoothly w/o
> errors.
>
> I changed your patch to skip adding to frlengths and actlen
> if previous TRB is processed AND remain == 0.

Interesting!  Thanks for tracing this down.
So in general we can say that for a chained TD, if the first TRB is
SHORT, we can simply skip the second TRB.

To make the code a bit more understandable, and use the new
trb_processed flag to control this, I made another diff which keeps
track whether the TRB was not processed, processed, or processed short.

It also includes the malloc NULL check and replaces malloc by
mallocarray as suggested by tb@.

Does this diff also work for you on your AMD xhci?


Index: xhci.c
===================================================================
RCS file: /cvs/src/sys/dev/usb/xhci.c,v
retrieving revision 1.116
diff -u -p -u -p -r1.116 xhci.c
--- xhci.c 30 Jun 2020 10:21:59 -0000 1.116
+++ xhci.c 18 Jul 2020 14:20:28 -0000
@@ -82,7 +82,7 @@ void xhci_event_xfer(struct xhci_softc *
 int xhci_event_xfer_generic(struct xhci_softc *, struct usbd_xfer *,
     struct xhci_pipe *, uint32_t, int, uint8_t, uint8_t, uint8_t);
 int xhci_event_xfer_isoc(struct usbd_xfer *, struct xhci_pipe *,
-    uint32_t, int);
+    uint32_t, int, uint8_t);
 void xhci_event_command(struct xhci_softc *, uint64_t);
 void xhci_event_port_change(struct xhci_softc *, uint64_t, uint32_t);
 int xhci_pipe_init(struct xhci_softc *, struct usbd_pipe *);
@@ -800,7 +800,7 @@ xhci_event_xfer(struct xhci_softc *sc, u
  return;
  break;
  case UE_ISOCHRONOUS:
- if (xhci_event_xfer_isoc(xfer, xp, remain, trb_idx))
+ if (xhci_event_xfer_isoc(xfer, xp, remain, trb_idx, code))
  return;
  break;
  default:
@@ -927,13 +927,23 @@ xhci_event_xfer_generic(struct xhci_soft
 
 int
 xhci_event_xfer_isoc(struct usbd_xfer *xfer, struct xhci_pipe *xp,
-    uint32_t remain, int trb_idx)
+    uint32_t remain, int trb_idx, uint8_t code)
 {
  struct usbd_xfer *skipxfer;
  struct xhci_xfer *xx = (struct xhci_xfer *)xfer;
- int trb0_idx, frame_idx = 0;
+ int trb0_idx, frame_idx = 0, skip_trb = 0;
 
  KASSERT(xx->index >= 0);
+
+ switch (code) {
+ case XHCI_CODE_SHORT_XFER:
+ xp->ring.trb_processed[trb_idx] = TRB_PROCESSED_SHORT;
+ break;
+ default:
+ xp->ring.trb_processed[trb_idx] = TRB_PROCESSED_YES;
+ break;
+ }
+
  trb0_idx =
     ((xx->index + xp->ring.ntrb) - xx->ntrb) % (xp->ring.ntrb - 1);
 
@@ -958,15 +968,21 @@ xhci_event_xfer_isoc(struct usbd_xfer *x
  trb0_idx = xp->ring.ntrb - 2;
  else
  trb0_idx = trb_idx - 1;
- if (xfer->frlengths[frame_idx] == 0) {
+ if (xp->ring.trb_processed[trb0_idx] == TRB_PROCESSED_NO) {
  xfer->frlengths[frame_idx] = XHCI_TRB_LEN(letoh32(
     xp->ring.trbs[trb0_idx].trb_status));
+ } else if (xp->ring.trb_processed[trb0_idx] ==
+    TRB_PROCESSED_SHORT) {
+ skip_trb = 1;
  }
  }
 
- xfer->frlengths[frame_idx] +=
-    XHCI_TRB_LEN(letoh32(xp->ring.trbs[trb_idx].trb_status)) - remain;
- xfer->actlen += xfer->frlengths[frame_idx];
+ if (!skip_trb) {
+ xfer->frlengths[frame_idx] +=
+    XHCI_TRB_LEN(letoh32(xp->ring.trbs[trb_idx].trb_status)) -
+    remain;
+ xfer->actlen += xfer->frlengths[frame_idx];
+ }
 
  if (xx->index != trb_idx)
  return (1);
@@ -1702,6 +1718,15 @@ xhci_ring_alloc(struct xhci_softc *sc, s
 
  ring->ntrb = ntrb;
 
+ ring->trb_processed =
+    mallocarray(ring->ntrb, sizeof(*ring->trb_processed), M_DEVBUF,
+    M_NOWAIT);
+ if (ring->trb_processed == NULL) {
+ printf("%s: unable to allocate ring trb_processed array\n",
+    DEVNAME(sc));
+ return (ENOMEM);
+ }
+
  xhci_ring_reset(sc, ring);
 
  return (0);
@@ -1710,6 +1735,8 @@ xhci_ring_alloc(struct xhci_softc *sc, s
 void
 xhci_ring_free(struct xhci_softc *sc, struct xhci_ring *ring)
 {
+ free(ring->trb_processed, M_DEVBUF,
+    ring->ntrb * sizeof(*ring->trb_processed));
  usbd_dma_contig_free(&sc->sc_bus, &ring->dma);
 }
 
@@ -1721,6 +1748,8 @@ xhci_ring_reset(struct xhci_softc *sc, s
  size = ring->ntrb * sizeof(struct xhci_trb);
 
  memset(ring->trbs, 0, size);
+ memset(ring->trb_processed, 0,
+    ring->ntrb * sizeof(*ring->trb_processed));
 
  ring->index = 0;
  ring->toggle = XHCI_TRB_CYCLE;
@@ -1805,6 +1834,8 @@ xhci_ring_produce(struct xhci_softc *sc,
  ring->index = 0;
  ring->toggle ^= 1;
  }
+
+ ring->trb_processed[(trb - &ring->trbs[0])] = TRB_PROCESSED_NO;
 
  return (trb);
 }
Index: xhcivar.h
===================================================================
RCS file: /cvs/src/sys/dev/usb/xhcivar.h,v
retrieving revision 1.11
diff -u -p -u -p -r1.11 xhcivar.h
--- xhcivar.h 6 Oct 2019 17:30:00 -0000 1.11
+++ xhcivar.h 18 Jul 2020 14:20:28 -0000
@@ -44,6 +44,12 @@ struct xhci_xfer {
 
 struct xhci_ring {
  struct xhci_trb *trbs;
+enum {
+ TRB_PROCESSED_NO,
+ TRB_PROCESSED_YES,
+ TRB_PROCESSED_SHORT
+};
+ uint8_t *trb_processed;
  size_t ntrb;
  struct usbd_dma_info dma;
 

Reply | Threaded
Open this post in threaded view
|

Re: xhci(4) isoc: fix bogus handling of chained TRBs

sc.dying
hi,

It works on AMD Bolton xHCI (78141022), Intel PCH (1e318086),
and ASM1042 (10421b21).
I simply play with ffplay -f v4l2 /dev/video0 to test.
At this moment it does not work on VL805, but I have no idea.
I'll investigate furthermore...

BTW I think xhci_ring_alloc should free ring dma buffer
if trb_processed[] allocation fails.


On 2020/07/18 14:47, Marcus Glocker wrote:

> Hey,
>
> On Sat, 18 Jul 2020 00:14:32 +0000
> [hidden email] wrote:
>
>> hi,
>>
>> On 2020/07/15 21:28, [hidden email] wrote:
>>> hi,
>>>
>>> The patch works well on Intel PCH xhci, but not on AMD Bolton xHCI.
>>> I'll investigate this problem.  
>>
>> Bolton xHCI sometimes raises following events.
>> (I added printf Completion Code.)
>>
>> #246 cc 13 remain 0 type 5 origlen 2048 frlengths[6] 2048
>> #247 cc 1 remain 0 type 1 origlen 1024 frlengths[6] 3072
>>
>> It seems this behavior violates the spec. 4.10.1.1.
>> When the remain of 1. TRB is 0, CC should be SUCCESS, not SHORT_PKT,
>> and HC should not raise interrupt.
>> The problem is how many length 2. TRB transferred.
>> If real CC of 1. TRB = SUCCESS, we should add 2. TRB length to
>> frlengths. If SHORT_PKT, we should keep frlengths as is.
>> Actually with the latter assumption I can see video smoothly w/o
>> errors.
>>
>> I changed your patch to skip adding to frlengths and actlen
>> if previous TRB is processed AND remain == 0.
>
> Interesting!  Thanks for tracing this down.
> So in general we can say that for a chained TD, if the first TRB is
> SHORT, we can simply skip the second TRB.
>
> To make the code a bit more understandable, and use the new
> trb_processed flag to control this, I made another diff which keeps
> track whether the TRB was not processed, processed, or processed short.
>
> It also includes the malloc NULL check and replaces malloc by
> mallocarray as suggested by tb@.
>
> Does this diff also work for you on your AMD xhci?
>
>
> Index: xhci.c
> ===================================================================
> RCS file: /cvs/src/sys/dev/usb/xhci.c,v
> retrieving revision 1.116
> diff -u -p -u -p -r1.116 xhci.c
> --- xhci.c 30 Jun 2020 10:21:59 -0000 1.116
> +++ xhci.c 18 Jul 2020 14:20:28 -0000
> @@ -82,7 +82,7 @@ void xhci_event_xfer(struct xhci_softc *
>  int xhci_event_xfer_generic(struct xhci_softc *, struct usbd_xfer *,
>      struct xhci_pipe *, uint32_t, int, uint8_t, uint8_t, uint8_t);
>  int xhci_event_xfer_isoc(struct usbd_xfer *, struct xhci_pipe *,
> -    uint32_t, int);
> +    uint32_t, int, uint8_t);
>  void xhci_event_command(struct xhci_softc *, uint64_t);
>  void xhci_event_port_change(struct xhci_softc *, uint64_t, uint32_t);
>  int xhci_pipe_init(struct xhci_softc *, struct usbd_pipe *);
> @@ -800,7 +800,7 @@ xhci_event_xfer(struct xhci_softc *sc, u
>   return;
>   break;
>   case UE_ISOCHRONOUS:
> - if (xhci_event_xfer_isoc(xfer, xp, remain, trb_idx))
> + if (xhci_event_xfer_isoc(xfer, xp, remain, trb_idx, code))
>   return;
>   break;
>   default:
> @@ -927,13 +927,23 @@ xhci_event_xfer_generic(struct xhci_soft
>  
>  int
>  xhci_event_xfer_isoc(struct usbd_xfer *xfer, struct xhci_pipe *xp,
> -    uint32_t remain, int trb_idx)
> +    uint32_t remain, int trb_idx, uint8_t code)
>  {
>   struct usbd_xfer *skipxfer;
>   struct xhci_xfer *xx = (struct xhci_xfer *)xfer;
> - int trb0_idx, frame_idx = 0;
> + int trb0_idx, frame_idx = 0, skip_trb = 0;
>  
>   KASSERT(xx->index >= 0);
> +
> + switch (code) {
> + case XHCI_CODE_SHORT_XFER:
> + xp->ring.trb_processed[trb_idx] = TRB_PROCESSED_SHORT;
> + break;
> + default:
> + xp->ring.trb_processed[trb_idx] = TRB_PROCESSED_YES;
> + break;
> + }
> +
>   trb0_idx =
>      ((xx->index + xp->ring.ntrb) - xx->ntrb) % (xp->ring.ntrb - 1);
>  
> @@ -958,15 +968,21 @@ xhci_event_xfer_isoc(struct usbd_xfer *x
>   trb0_idx = xp->ring.ntrb - 2;
>   else
>   trb0_idx = trb_idx - 1;
> - if (xfer->frlengths[frame_idx] == 0) {
> + if (xp->ring.trb_processed[trb0_idx] == TRB_PROCESSED_NO) {
>   xfer->frlengths[frame_idx] = XHCI_TRB_LEN(letoh32(
>      xp->ring.trbs[trb0_idx].trb_status));
> + } else if (xp->ring.trb_processed[trb0_idx] ==
> +    TRB_PROCESSED_SHORT) {
> + skip_trb = 1;
>   }
>   }
>  
> - xfer->frlengths[frame_idx] +=
> -    XHCI_TRB_LEN(letoh32(xp->ring.trbs[trb_idx].trb_status)) - remain;
> - xfer->actlen += xfer->frlengths[frame_idx];
> + if (!skip_trb) {
> + xfer->frlengths[frame_idx] +=
> +    XHCI_TRB_LEN(letoh32(xp->ring.trbs[trb_idx].trb_status)) -
> +    remain;
> + xfer->actlen += xfer->frlengths[frame_idx];
> + }
>  
>   if (xx->index != trb_idx)
>   return (1);
> @@ -1702,6 +1718,15 @@ xhci_ring_alloc(struct xhci_softc *sc, s
>  
>   ring->ntrb = ntrb;
>  
> + ring->trb_processed =
> +    mallocarray(ring->ntrb, sizeof(*ring->trb_processed), M_DEVBUF,
> +    M_NOWAIT);
> + if (ring->trb_processed == NULL) {
> + printf("%s: unable to allocate ring trb_processed array\n",
> +    DEVNAME(sc));
> + return (ENOMEM);
> + }
> +
>   xhci_ring_reset(sc, ring);
>  
>   return (0);
> @@ -1710,6 +1735,8 @@ xhci_ring_alloc(struct xhci_softc *sc, s
>  void
>  xhci_ring_free(struct xhci_softc *sc, struct xhci_ring *ring)
>  {
> + free(ring->trb_processed, M_DEVBUF,
> +    ring->ntrb * sizeof(*ring->trb_processed));
>   usbd_dma_contig_free(&sc->sc_bus, &ring->dma);
>  }
>  
> @@ -1721,6 +1748,8 @@ xhci_ring_reset(struct xhci_softc *sc, s
>   size = ring->ntrb * sizeof(struct xhci_trb);
>  
>   memset(ring->trbs, 0, size);
> + memset(ring->trb_processed, 0,
> +    ring->ntrb * sizeof(*ring->trb_processed));
>  
>   ring->index = 0;
>   ring->toggle = XHCI_TRB_CYCLE;
> @@ -1805,6 +1834,8 @@ xhci_ring_produce(struct xhci_softc *sc,
>   ring->index = 0;
>   ring->toggle ^= 1;
>   }
> +
> + ring->trb_processed[(trb - &ring->trbs[0])] = TRB_PROCESSED_NO;
>  
>   return (trb);
>  }
> Index: xhcivar.h
> ===================================================================
> RCS file: /cvs/src/sys/dev/usb/xhcivar.h,v
> retrieving revision 1.11
> diff -u -p -u -p -r1.11 xhcivar.h
> --- xhcivar.h 6 Oct 2019 17:30:00 -0000 1.11
> +++ xhcivar.h 18 Jul 2020 14:20:28 -0000
> @@ -44,6 +44,12 @@ struct xhci_xfer {
>  
>  struct xhci_ring {
>   struct xhci_trb *trbs;
> +enum {
> + TRB_PROCESSED_NO,
> + TRB_PROCESSED_YES,
> + TRB_PROCESSED_SHORT
> +};
> + uint8_t *trb_processed;
>   size_t ntrb;
>   struct usbd_dma_info dma;
>  
>

Reply | Threaded
Open this post in threaded view
|

Re: xhci(4) isoc: fix bogus handling of chained TRBs

Marcus Glocker
On Sun, 19 Jul 2020 02:25:30 +0000
[hidden email] wrote:

> hi,
>
> It works on AMD Bolton xHCI (78141022), Intel PCH (1e318086),
> and ASM1042 (10421b21).
> I simply play with ffplay -f v4l2 /dev/video0 to test.

If your cam supports MJPEG it's good to add '-input_format mjpeg' with
higher resolutions like 1280x720, because that will generated varying
image sizes, which hit the 64k memory boundary more often, and thus
generate potentially more chained TDs.

> At this moment it does not work on VL805, but I have no idea.
> I'll investigate furthermore...

Oh dear :-) - Ok.
 
> BTW I think xhci_ring_alloc should free ring dma buffer
> if trb_processed[] allocation fails.

Ah right, I'll add the usbd_dma_contig_free() there - Thanks.

>
> On 2020/07/18 14:47, Marcus Glocker wrote:
> > Hey,
> >
> > On Sat, 18 Jul 2020 00:14:32 +0000
> > [hidden email] wrote:
> >  
> >> hi,
> >>
> >> On 2020/07/15 21:28, [hidden email] wrote:  
> >>> hi,
> >>>
> >>> The patch works well on Intel PCH xhci, but not on AMD Bolton
> >>> xHCI. I'll investigate this problem.    
> >>
> >> Bolton xHCI sometimes raises following events.
> >> (I added printf Completion Code.)
> >>
> >> #246 cc 13 remain 0 type 5 origlen 2048 frlengths[6] 2048
> >> #247 cc 1 remain 0 type 1 origlen 1024 frlengths[6] 3072
> >>
> >> It seems this behavior violates the spec. 4.10.1.1.
> >> When the remain of 1. TRB is 0, CC should be SUCCESS, not
> >> SHORT_PKT, and HC should not raise interrupt.
> >> The problem is how many length 2. TRB transferred.
> >> If real CC of 1. TRB = SUCCESS, we should add 2. TRB length to
> >> frlengths. If SHORT_PKT, we should keep frlengths as is.
> >> Actually with the latter assumption I can see video smoothly w/o
> >> errors.
> >>
> >> I changed your patch to skip adding to frlengths and actlen
> >> if previous TRB is processed AND remain == 0.  
> >
> > Interesting!  Thanks for tracing this down.
> > So in general we can say that for a chained TD, if the first TRB is
> > SHORT, we can simply skip the second TRB.
> >
> > To make the code a bit more understandable, and use the new
> > trb_processed flag to control this, I made another diff which keeps
> > track whether the TRB was not processed, processed, or processed
> > short.
> >
> > It also includes the malloc NULL check and replaces malloc by
> > mallocarray as suggested by tb@.
> >
> > Does this diff also work for you on your AMD xhci?
> >
> >
> > Index: xhci.c
> > ===================================================================
> > RCS file: /cvs/src/sys/dev/usb/xhci.c,v
> > retrieving revision 1.116
> > diff -u -p -u -p -r1.116 xhci.c
> > --- xhci.c 30 Jun 2020 10:21:59 -0000 1.116
> > +++ xhci.c 18 Jul 2020 14:20:28 -0000
> > @@ -82,7 +82,7 @@ void xhci_event_xfer(struct xhci_softc *
> >  int xhci_event_xfer_generic(struct xhci_softc *, struct
> > usbd_xfer *, struct xhci_pipe *, uint32_t, int, uint8_t, uint8_t,
> > uint8_t); int xhci_event_xfer_isoc(struct usbd_xfer *,
> > struct xhci_pipe *,
> > -    uint32_t, int);
> > +    uint32_t, int, uint8_t);
> >  void xhci_event_command(struct xhci_softc *, uint64_t);
> >  void xhci_event_port_change(struct xhci_softc *, uint64_t,
> > uint32_t); int xhci_pipe_init(struct xhci_softc *, struct
> > usbd_pipe *); @@ -800,7 +800,7 @@ xhci_event_xfer(struct xhci_softc
> > *sc, u return;
> >   break;
> >   case UE_ISOCHRONOUS:
> > - if (xhci_event_xfer_isoc(xfer, xp, remain,
> > trb_idx))
> > + if (xhci_event_xfer_isoc(xfer, xp, remain,
> > trb_idx, code)) return;
> >   break;
> >   default:
> > @@ -927,13 +927,23 @@ xhci_event_xfer_generic(struct xhci_soft
> >  
> >  int
> >  xhci_event_xfer_isoc(struct usbd_xfer *xfer, struct xhci_pipe *xp,
> > -    uint32_t remain, int trb_idx)
> > +    uint32_t remain, int trb_idx, uint8_t code)
> >  {
> >   struct usbd_xfer *skipxfer;
> >   struct xhci_xfer *xx = (struct xhci_xfer *)xfer;
> > - int trb0_idx, frame_idx = 0;
> > + int trb0_idx, frame_idx = 0, skip_trb = 0;
> >  
> >   KASSERT(xx->index >= 0);
> > +
> > + switch (code) {
> > + case XHCI_CODE_SHORT_XFER:
> > + xp->ring.trb_processed[trb_idx] =
> > TRB_PROCESSED_SHORT;
> > + break;
> > + default:
> > + xp->ring.trb_processed[trb_idx] =
> > TRB_PROCESSED_YES;
> > + break;
> > + }
> > +
> >   trb0_idx =
> >      ((xx->index + xp->ring.ntrb) - xx->ntrb) %
> > (xp->ring.ntrb - 1);
> > @@ -958,15 +968,21 @@ xhci_event_xfer_isoc(struct usbd_xfer *x
> >   trb0_idx = xp->ring.ntrb - 2;
> >   else
> >   trb0_idx = trb_idx - 1;
> > - if (xfer->frlengths[frame_idx] == 0) {
> > + if (xp->ring.trb_processed[trb0_idx] ==
> > TRB_PROCESSED_NO) { xfer->frlengths[frame_idx] =
> > XHCI_TRB_LEN(letoh32( xp->ring.trbs[trb0_idx].trb_status));
> > + } else if (xp->ring.trb_processed[trb0_idx] ==
> > +    TRB_PROCESSED_SHORT) {
> > + skip_trb = 1;
> >   }
> >   }
> >  
> > - xfer->frlengths[frame_idx] +=
> > -
> > XHCI_TRB_LEN(letoh32(xp->ring.trbs[trb_idx].trb_status)) - remain;
> > - xfer->actlen += xfer->frlengths[frame_idx];
> > + if (!skip_trb) {
> > + xfer->frlengths[frame_idx] +=
> > +
> > XHCI_TRB_LEN(letoh32(xp->ring.trbs[trb_idx].trb_status)) -
> > +    remain;
> > + xfer->actlen += xfer->frlengths[frame_idx];
> > + }
> >  
> >   if (xx->index != trb_idx)
> >   return (1);
> > @@ -1702,6 +1718,15 @@ xhci_ring_alloc(struct xhci_softc *sc, s
> >  
> >   ring->ntrb = ntrb;
> >  
> > + ring->trb_processed =
> > +    mallocarray(ring->ntrb, sizeof(*ring->trb_processed),
> > M_DEVBUF,
> > +    M_NOWAIT);
> > + if (ring->trb_processed == NULL) {
> > + printf("%s: unable to allocate ring trb_processed
> > array\n",
> > +    DEVNAME(sc));
> > + return (ENOMEM);
> > + }
> > +
> >   xhci_ring_reset(sc, ring);
> >  
> >   return (0);
> > @@ -1710,6 +1735,8 @@ xhci_ring_alloc(struct xhci_softc *sc, s
> >  void
> >  xhci_ring_free(struct xhci_softc *sc, struct xhci_ring *ring)
> >  {
> > + free(ring->trb_processed, M_DEVBUF,
> > +    ring->ntrb * sizeof(*ring->trb_processed));
> >   usbd_dma_contig_free(&sc->sc_bus, &ring->dma);
> >  }
> >  
> > @@ -1721,6 +1748,8 @@ xhci_ring_reset(struct xhci_softc *sc, s
> >   size = ring->ntrb * sizeof(struct xhci_trb);
> >  
> >   memset(ring->trbs, 0, size);
> > + memset(ring->trb_processed, 0,
> > +    ring->ntrb * sizeof(*ring->trb_processed));
> >  
> >   ring->index = 0;
> >   ring->toggle = XHCI_TRB_CYCLE;
> > @@ -1805,6 +1834,8 @@ xhci_ring_produce(struct xhci_softc *sc,
> >   ring->index = 0;
> >   ring->toggle ^= 1;
> >   }
> > +
> > + ring->trb_processed[(trb - &ring->trbs[0])] =
> > TRB_PROCESSED_NO;
> >   return (trb);
> >  }
> > Index: xhcivar.h
> > ===================================================================
> > RCS file: /cvs/src/sys/dev/usb/xhcivar.h,v
> > retrieving revision 1.11
> > diff -u -p -u -p -r1.11 xhcivar.h
> > --- xhcivar.h 6 Oct 2019 17:30:00 -0000 1.11
> > +++ xhcivar.h 18 Jul 2020 14:20:28 -0000
> > @@ -44,6 +44,12 @@ struct xhci_xfer {
> >  
> >  struct xhci_ring {
> >   struct xhci_trb *trbs;
> > +enum {
> > + TRB_PROCESSED_NO,
> > + TRB_PROCESSED_YES,
> > + TRB_PROCESSED_SHORT
> > +};
> > + uint8_t *trb_processed;
> >   size_t ntrb;
> >   struct usbd_dma_info dma;
> >  
> >  

Reply | Threaded
Open this post in threaded view
|

Re: xhci(4) isoc: fix bogus handling of chained TRBs

sc.dying
On 2020/07/19 11:25, Marcus Glocker wrote:

> On Sun, 19 Jul 2020 02:25:30 +0000
> [hidden email] wrote:
>
>> hi,
>>
>> It works on AMD Bolton xHCI (78141022), Intel PCH (1e318086),
>> and ASM1042 (10421b21).
>> I simply play with ffplay -f v4l2 /dev/video0 to test.
>
> If your cam supports MJPEG it's good to add '-input_format mjpeg' with
> higher resolutions like 1280x720, because that will generated varying
> image sizes, which hit the 64k memory boundary more often, and thus
> generate potentially more chained TDs.

Thank you for useful information.
My webcam supprots at most 640x480, but 1024 bytes/frame x (2+1) maxburst
x 40 frames = 122880 bytes/xfer is enough to observe TD fragmentation.


>> At this moment it does not work on VL805, but I have no idea.
>> I'll investigate furthermore...
>
> Oh dear :-) - Ok.
>  
>> BTW I think xhci_ring_alloc should free ring dma buffer
>> if trb_processed[] allocation fails.
>
> Ah right, I'll add the usbd_dma_contig_free() there - Thanks.
>
>>
>> On 2020/07/18 14:47, Marcus Glocker wrote:
>>> Hey,
>>>
>>> On Sat, 18 Jul 2020 00:14:32 +0000
>>> [hidden email] wrote:
>>>  
>>>> hi,
>>>>
>>>> On 2020/07/15 21:28, [hidden email] wrote:  
>>>>> hi,
>>>>>
>>>>> The patch works well on Intel PCH xhci, but not on AMD Bolton
>>>>> xHCI. I'll investigate this problem.    
>>>>
>>>> Bolton xHCI sometimes raises following events.
>>>> (I added printf Completion Code.)
>>>>
>>>> #246 cc 13 remain 0 type 5 origlen 2048 frlengths[6] 2048
>>>> #247 cc 1 remain 0 type 1 origlen 1024 frlengths[6] 3072
>>>>
>>>> It seems this behavior violates the spec. 4.10.1.1.
>>>> When the remain of 1. TRB is 0, CC should be SUCCESS, not
>>>> SHORT_PKT, and HC should not raise interrupt.
>>>> The problem is how many length 2. TRB transferred.
>>>> If real CC of 1. TRB = SUCCESS, we should add 2. TRB length to
>>>> frlengths. If SHORT_PKT, we should keep frlengths as is.
>>>> Actually with the latter assumption I can see video smoothly w/o
>>>> errors.
>>>>
>>>> I changed your patch to skip adding to frlengths and actlen
>>>> if previous TRB is processed AND remain == 0.  
>>>
>>> Interesting!  Thanks for tracing this down.
>>> So in general we can say that for a chained TD, if the first TRB is
>>> SHORT, we can simply skip the second TRB.
>>>
>>> To make the code a bit more understandable, and use the new
>>> trb_processed flag to control this, I made another diff which keeps
>>> track whether the TRB was not processed, processed, or processed
>>> short.
>>>
>>> It also includes the malloc NULL check and replaces malloc by
>>> mallocarray as suggested by tb@.
>>>
>>> Does this diff also work for you on your AMD xhci?
>>>
>>>
>>> Index: xhci.c
>>> ===================================================================
>>> RCS file: /cvs/src/sys/dev/usb/xhci.c,v
>>> retrieving revision 1.116
>>> diff -u -p -u -p -r1.116 xhci.c
>>> --- xhci.c 30 Jun 2020 10:21:59 -0000 1.116
>>> +++ xhci.c 18 Jul 2020 14:20:28 -0000
>>> @@ -82,7 +82,7 @@ void xhci_event_xfer(struct xhci_softc *
>>>  int xhci_event_xfer_generic(struct xhci_softc *, struct
>>> usbd_xfer *, struct xhci_pipe *, uint32_t, int, uint8_t, uint8_t,
>>> uint8_t); int xhci_event_xfer_isoc(struct usbd_xfer *,
>>> struct xhci_pipe *,
>>> -    uint32_t, int);
>>> +    uint32_t, int, uint8_t);
>>>  void xhci_event_command(struct xhci_softc *, uint64_t);
>>>  void xhci_event_port_change(struct xhci_softc *, uint64_t,
>>> uint32_t); int xhci_pipe_init(struct xhci_softc *, struct
>>> usbd_pipe *); @@ -800,7 +800,7 @@ xhci_event_xfer(struct xhci_softc
>>> *sc, u return;
>>>   break;
>>>   case UE_ISOCHRONOUS:
>>> - if (xhci_event_xfer_isoc(xfer, xp, remain,
>>> trb_idx))
>>> + if (xhci_event_xfer_isoc(xfer, xp, remain,
>>> trb_idx, code)) return;
>>>   break;
>>>   default:
>>> @@ -927,13 +927,23 @@ xhci_event_xfer_generic(struct xhci_soft
>>>  
>>>  int
>>>  xhci_event_xfer_isoc(struct usbd_xfer *xfer, struct xhci_pipe *xp,
>>> -    uint32_t remain, int trb_idx)
>>> +    uint32_t remain, int trb_idx, uint8_t code)
>>>  {
>>>   struct usbd_xfer *skipxfer;
>>>   struct xhci_xfer *xx = (struct xhci_xfer *)xfer;
>>> - int trb0_idx, frame_idx = 0;
>>> + int trb0_idx, frame_idx = 0, skip_trb = 0;
>>>  
>>>   KASSERT(xx->index >= 0);
>>> +
>>> + switch (code) {
>>> + case XHCI_CODE_SHORT_XFER:
>>> + xp->ring.trb_processed[trb_idx] =
>>> TRB_PROCESSED_SHORT;
>>> + break;
>>> + default:
>>> + xp->ring.trb_processed[trb_idx] =
>>> TRB_PROCESSED_YES;
>>> + break;
>>> + }
>>> +
>>>   trb0_idx =
>>>      ((xx->index + xp->ring.ntrb) - xx->ntrb) %
>>> (xp->ring.ntrb - 1);
>>> @@ -958,15 +968,21 @@ xhci_event_xfer_isoc(struct usbd_xfer *x
>>>   trb0_idx = xp->ring.ntrb - 2;
>>>   else
>>>   trb0_idx = trb_idx - 1;
>>> - if (xfer->frlengths[frame_idx] == 0) {
>>> + if (xp->ring.trb_processed[trb0_idx] ==
>>> TRB_PROCESSED_NO) { xfer->frlengths[frame_idx] =
>>> XHCI_TRB_LEN(letoh32( xp->ring.trbs[trb0_idx].trb_status));
>>> + } else if (xp->ring.trb_processed[trb0_idx] ==
>>> +    TRB_PROCESSED_SHORT) {
>>> + skip_trb = 1;
>>>   }
>>>   }
>>>  
>>> - xfer->frlengths[frame_idx] +=
>>> -
>>> XHCI_TRB_LEN(letoh32(xp->ring.trbs[trb_idx].trb_status)) - remain;
>>> - xfer->actlen += xfer->frlengths[frame_idx];
>>> + if (!skip_trb) {
>>> + xfer->frlengths[frame_idx] +=
>>> +
>>> XHCI_TRB_LEN(letoh32(xp->ring.trbs[trb_idx].trb_status)) -
>>> +    remain;
>>> + xfer->actlen += xfer->frlengths[frame_idx];
>>> + }
>>>  
>>>   if (xx->index != trb_idx)
>>>   return (1);
>>> @@ -1702,6 +1718,15 @@ xhci_ring_alloc(struct xhci_softc *sc, s
>>>  
>>>   ring->ntrb = ntrb;
>>>  
>>> + ring->trb_processed =
>>> +    mallocarray(ring->ntrb, sizeof(*ring->trb_processed),
>>> M_DEVBUF,
>>> +    M_NOWAIT);
>>> + if (ring->trb_processed == NULL) {
>>> + printf("%s: unable to allocate ring trb_processed
>>> array\n",
>>> +    DEVNAME(sc));
>>> + return (ENOMEM);
>>> + }
>>> +
>>>   xhci_ring_reset(sc, ring);
>>>  
>>>   return (0);
>>> @@ -1710,6 +1735,8 @@ xhci_ring_alloc(struct xhci_softc *sc, s
>>>  void
>>>  xhci_ring_free(struct xhci_softc *sc, struct xhci_ring *ring)
>>>  {
>>> + free(ring->trb_processed, M_DEVBUF,
>>> +    ring->ntrb * sizeof(*ring->trb_processed));
>>>   usbd_dma_contig_free(&sc->sc_bus, &ring->dma);
>>>  }
>>>  
>>> @@ -1721,6 +1748,8 @@ xhci_ring_reset(struct xhci_softc *sc, s
>>>   size = ring->ntrb * sizeof(struct xhci_trb);
>>>  
>>>   memset(ring->trbs, 0, size);
>>> + memset(ring->trb_processed, 0,
>>> +    ring->ntrb * sizeof(*ring->trb_processed));
>>>  
>>>   ring->index = 0;
>>>   ring->toggle = XHCI_TRB_CYCLE;
>>> @@ -1805,6 +1834,8 @@ xhci_ring_produce(struct xhci_softc *sc,
>>>   ring->index = 0;
>>>   ring->toggle ^= 1;
>>>   }
>>> +
>>> + ring->trb_processed[(trb - &ring->trbs[0])] =
>>> TRB_PROCESSED_NO;
>>>   return (trb);
>>>  }
>>> Index: xhcivar.h
>>> ===================================================================
>>> RCS file: /cvs/src/sys/dev/usb/xhcivar.h,v
>>> retrieving revision 1.11
>>> diff -u -p -u -p -r1.11 xhcivar.h
>>> --- xhcivar.h 6 Oct 2019 17:30:00 -0000 1.11
>>> +++ xhcivar.h 18 Jul 2020 14:20:28 -0000
>>> @@ -44,6 +44,12 @@ struct xhci_xfer {
>>>  
>>>  struct xhci_ring {
>>>   struct xhci_trb *trbs;
>>> +enum {
>>> + TRB_PROCESSED_NO,
>>> + TRB_PROCESSED_YES,
>>> + TRB_PROCESSED_SHORT
>>> +};
>>> + uint8_t *trb_processed;
>>>   size_t ntrb;
>>>   struct usbd_dma_info dma;
>>>  
>>>  

Reply | Threaded
Open this post in threaded view
|

Re: xhci(4) isoc: fix bogus handling of chained TRBs

Marcus Glocker
On Sun, Jul 19, 2020 at 02:12:21PM +0000, [hidden email] wrote:

> On 2020/07/19 11:25, Marcus Glocker wrote:
> > On Sun, 19 Jul 2020 02:25:30 +0000
> > [hidden email] wrote:
> >
> >> hi,
> >>
> >> It works on AMD Bolton xHCI (78141022), Intel PCH (1e318086),
> >> and ASM1042 (10421b21).
> >> I simply play with ffplay -f v4l2 /dev/video0 to test.
> >
> > If your cam supports MJPEG it's good to add '-input_format mjpeg' with
> > higher resolutions like 1280x720, because that will generated varying
> > image sizes, which hit the 64k memory boundary more often, and thus
> > generate potentially more chained TDs.
>
> Thank you for useful information.
> My webcam supprots at most 640x480, but 1024 bytes/frame x (2+1) maxburst
> x 40 frames = 122880 bytes/xfer is enough to observe TD fragmentation.
>
>
> >> At this moment it does not work on VL805, but I have no idea.
> >> I'll investigate furthermore...

Did you already had a chance to figure out something regarding the
issue you faced on your VL805 controller?

I'm running the diff here since then on the Intel xHCI controller and
couldn't re-produce any errors using different uvideo(4) and uaudio(4)
devices.

> > Oh dear :-) - Ok.
> >  
> >> BTW I think xhci_ring_alloc should free ring dma buffer
> >> if trb_processed[] allocation fails.
> >
> > Ah right, I'll add the usbd_dma_contig_free() there - Thanks.
> >
> >>
> >> On 2020/07/18 14:47, Marcus Glocker wrote:
> >>> Hey,
> >>>
> >>> On Sat, 18 Jul 2020 00:14:32 +0000
> >>> [hidden email] wrote:
> >>>  
> >>>> hi,
> >>>>
> >>>> On 2020/07/15 21:28, [hidden email] wrote:  
> >>>>> hi,
> >>>>>
> >>>>> The patch works well on Intel PCH xhci, but not on AMD Bolton
> >>>>> xHCI. I'll investigate this problem.    
> >>>>
> >>>> Bolton xHCI sometimes raises following events.
> >>>> (I added printf Completion Code.)
> >>>>
> >>>> #246 cc 13 remain 0 type 5 origlen 2048 frlengths[6] 2048
> >>>> #247 cc 1 remain 0 type 1 origlen 1024 frlengths[6] 3072
> >>>>
> >>>> It seems this behavior violates the spec. 4.10.1.1.
> >>>> When the remain of 1. TRB is 0, CC should be SUCCESS, not
> >>>> SHORT_PKT, and HC should not raise interrupt.
> >>>> The problem is how many length 2. TRB transferred.
> >>>> If real CC of 1. TRB = SUCCESS, we should add 2. TRB length to
> >>>> frlengths. If SHORT_PKT, we should keep frlengths as is.
> >>>> Actually with the latter assumption I can see video smoothly w/o
> >>>> errors.
> >>>>
> >>>> I changed your patch to skip adding to frlengths and actlen
> >>>> if previous TRB is processed AND remain == 0.  
> >>>
> >>> Interesting!  Thanks for tracing this down.
> >>> So in general we can say that for a chained TD, if the first TRB is
> >>> SHORT, we can simply skip the second TRB.
> >>>
> >>> To make the code a bit more understandable, and use the new
> >>> trb_processed flag to control this, I made another diff which keeps
> >>> track whether the TRB was not processed, processed, or processed
> >>> short.
> >>>
> >>> It also includes the malloc NULL check and replaces malloc by
> >>> mallocarray as suggested by tb@.
> >>>
> >>> Does this diff also work for you on your AMD xhci?
> >>>
> >>>
> >>> Index: xhci.c
> >>> ===================================================================
> >>> RCS file: /cvs/src/sys/dev/usb/xhci.c,v
> >>> retrieving revision 1.116
> >>> diff -u -p -u -p -r1.116 xhci.c
> >>> --- xhci.c 30 Jun 2020 10:21:59 -0000 1.116
> >>> +++ xhci.c 18 Jul 2020 14:20:28 -0000
> >>> @@ -82,7 +82,7 @@ void xhci_event_xfer(struct xhci_softc *
> >>>  int xhci_event_xfer_generic(struct xhci_softc *, struct
> >>> usbd_xfer *, struct xhci_pipe *, uint32_t, int, uint8_t, uint8_t,
> >>> uint8_t); int xhci_event_xfer_isoc(struct usbd_xfer *,
> >>> struct xhci_pipe *,
> >>> -    uint32_t, int);
> >>> +    uint32_t, int, uint8_t);
> >>>  void xhci_event_command(struct xhci_softc *, uint64_t);
> >>>  void xhci_event_port_change(struct xhci_softc *, uint64_t,
> >>> uint32_t); int xhci_pipe_init(struct xhci_softc *, struct
> >>> usbd_pipe *); @@ -800,7 +800,7 @@ xhci_event_xfer(struct xhci_softc
> >>> *sc, u return;
> >>>   break;
> >>>   case UE_ISOCHRONOUS:
> >>> - if (xhci_event_xfer_isoc(xfer, xp, remain,
> >>> trb_idx))
> >>> + if (xhci_event_xfer_isoc(xfer, xp, remain,
> >>> trb_idx, code)) return;
> >>>   break;
> >>>   default:
> >>> @@ -927,13 +927,23 @@ xhci_event_xfer_generic(struct xhci_soft
> >>>  
> >>>  int
> >>>  xhci_event_xfer_isoc(struct usbd_xfer *xfer, struct xhci_pipe *xp,
> >>> -    uint32_t remain, int trb_idx)
> >>> +    uint32_t remain, int trb_idx, uint8_t code)
> >>>  {
> >>>   struct usbd_xfer *skipxfer;
> >>>   struct xhci_xfer *xx = (struct xhci_xfer *)xfer;
> >>> - int trb0_idx, frame_idx = 0;
> >>> + int trb0_idx, frame_idx = 0, skip_trb = 0;
> >>>  
> >>>   KASSERT(xx->index >= 0);
> >>> +
> >>> + switch (code) {
> >>> + case XHCI_CODE_SHORT_XFER:
> >>> + xp->ring.trb_processed[trb_idx] =
> >>> TRB_PROCESSED_SHORT;
> >>> + break;
> >>> + default:
> >>> + xp->ring.trb_processed[trb_idx] =
> >>> TRB_PROCESSED_YES;
> >>> + break;
> >>> + }
> >>> +
> >>>   trb0_idx =
> >>>      ((xx->index + xp->ring.ntrb) - xx->ntrb) %
> >>> (xp->ring.ntrb - 1);
> >>> @@ -958,15 +968,21 @@ xhci_event_xfer_isoc(struct usbd_xfer *x
> >>>   trb0_idx = xp->ring.ntrb - 2;
> >>>   else
> >>>   trb0_idx = trb_idx - 1;
> >>> - if (xfer->frlengths[frame_idx] == 0) {
> >>> + if (xp->ring.trb_processed[trb0_idx] ==
> >>> TRB_PROCESSED_NO) { xfer->frlengths[frame_idx] =
> >>> XHCI_TRB_LEN(letoh32( xp->ring.trbs[trb0_idx].trb_status));
> >>> + } else if (xp->ring.trb_processed[trb0_idx] ==
> >>> +    TRB_PROCESSED_SHORT) {
> >>> + skip_trb = 1;
> >>>   }
> >>>   }
> >>>  
> >>> - xfer->frlengths[frame_idx] +=
> >>> -
> >>> XHCI_TRB_LEN(letoh32(xp->ring.trbs[trb_idx].trb_status)) - remain;
> >>> - xfer->actlen += xfer->frlengths[frame_idx];
> >>> + if (!skip_trb) {
> >>> + xfer->frlengths[frame_idx] +=
> >>> +
> >>> XHCI_TRB_LEN(letoh32(xp->ring.trbs[trb_idx].trb_status)) -
> >>> +    remain;
> >>> + xfer->actlen += xfer->frlengths[frame_idx];
> >>> + }
> >>>  
> >>>   if (xx->index != trb_idx)
> >>>   return (1);
> >>> @@ -1702,6 +1718,15 @@ xhci_ring_alloc(struct xhci_softc *sc, s
> >>>  
> >>>   ring->ntrb = ntrb;
> >>>  
> >>> + ring->trb_processed =
> >>> +    mallocarray(ring->ntrb, sizeof(*ring->trb_processed),
> >>> M_DEVBUF,
> >>> +    M_NOWAIT);
> >>> + if (ring->trb_processed == NULL) {
> >>> + printf("%s: unable to allocate ring trb_processed
> >>> array\n",
> >>> +    DEVNAME(sc));
> >>> + return (ENOMEM);
> >>> + }
> >>> +
> >>>   xhci_ring_reset(sc, ring);
> >>>  
> >>>   return (0);
> >>> @@ -1710,6 +1735,8 @@ xhci_ring_alloc(struct xhci_softc *sc, s
> >>>  void
> >>>  xhci_ring_free(struct xhci_softc *sc, struct xhci_ring *ring)
> >>>  {
> >>> + free(ring->trb_processed, M_DEVBUF,
> >>> +    ring->ntrb * sizeof(*ring->trb_processed));
> >>>   usbd_dma_contig_free(&sc->sc_bus, &ring->dma);
> >>>  }
> >>>  
> >>> @@ -1721,6 +1748,8 @@ xhci_ring_reset(struct xhci_softc *sc, s
> >>>   size = ring->ntrb * sizeof(struct xhci_trb);
> >>>  
> >>>   memset(ring->trbs, 0, size);
> >>> + memset(ring->trb_processed, 0,
> >>> +    ring->ntrb * sizeof(*ring->trb_processed));
> >>>  
> >>>   ring->index = 0;
> >>>   ring->toggle = XHCI_TRB_CYCLE;
> >>> @@ -1805,6 +1834,8 @@ xhci_ring_produce(struct xhci_softc *sc,
> >>>   ring->index = 0;
> >>>   ring->toggle ^= 1;
> >>>   }
> >>> +
> >>> + ring->trb_processed[(trb - &ring->trbs[0])] =
> >>> TRB_PROCESSED_NO;
> >>>   return (trb);
> >>>  }
> >>> Index: xhcivar.h
> >>> ===================================================================
> >>> RCS file: /cvs/src/sys/dev/usb/xhcivar.h,v
> >>> retrieving revision 1.11
> >>> diff -u -p -u -p -r1.11 xhcivar.h
> >>> --- xhcivar.h 6 Oct 2019 17:30:00 -0000 1.11
> >>> +++ xhcivar.h 18 Jul 2020 14:20:28 -0000
> >>> @@ -44,6 +44,12 @@ struct xhci_xfer {
> >>>  
> >>>  struct xhci_ring {
> >>>   struct xhci_trb *trbs;
> >>> +enum {
> >>> + TRB_PROCESSED_NO,
> >>> + TRB_PROCESSED_YES,
> >>> + TRB_PROCESSED_SHORT
> >>> +};
> >>> + uint8_t *trb_processed;
> >>>   size_t ntrb;
> >>>   struct usbd_dma_info dma;
> >>>  
> >>>  
>

Reply | Threaded
Open this post in threaded view
|

Re: xhci(4) isoc: fix bogus handling of chained TRBs

sc.dying
On 2020/07/25 18:10, Marcus Glocker wrote:

> On Sun, Jul 19, 2020 at 02:12:21PM +0000, [hidden email] wrote:
>
>> On 2020/07/19 11:25, Marcus Glocker wrote:
>>> On Sun, 19 Jul 2020 02:25:30 +0000
>>> [hidden email] wrote:
>>>
>>>> hi,
>>>>
>>>> It works on AMD Bolton xHCI (78141022), Intel PCH (1e318086),
>>>> and ASM1042 (10421b21).
>>>> I simply play with ffplay -f v4l2 /dev/video0 to test.
>>>
>>> If your cam supports MJPEG it's good to add '-input_format mjpeg' with
>>> higher resolutions like 1280x720, because that will generated varying
>>> image sizes, which hit the 64k memory boundary more often, and thus
>>> generate potentially more chained TDs.
>>
>> Thank you for useful information.
>> My webcam supprots at most 640x480, but 1024 bytes/frame x (2+1) maxburst
>> x 40 frames = 122880 bytes/xfer is enough to observe TD fragmentation.
>>
>>
>>>> At this moment it does not work on VL805, but I have no idea.
>>>> I'll investigate furthermore...
>
> Did you already had a chance to figure out something regarding the
> issue you faced on your VL805 controller?
>
> I'm running the diff here since then on the Intel xHCI controller and
> couldn't re-produce any errors using different uvideo(4) and uaudio(4)
> devices.
>

No, yet -- all I know about this problem is VL805 genegates
many MISSED_SRV Transfer Event for Isoch-IN pipe.

xhci0: slot 3 missed srv with 123 TRB
 :

Even if I increase UVIDEO_IXFERS in uvideo.h to 6, HC still detects
MISSED_SRV. When I disable splitting TD, it works well.
I added printf paddr in the event TRB but each paddr of MISSED_SRV is 0,
that does not meet 4.10.3.2.
Parameters in this endpoint context are

xhci0: slot 3 dci 3 ival 0 mps 1024 maxb 2 mep 3072 atl 3072 mult 0

looks sane.


>>> Oh dear :-) - Ok.
>>>  
>>>> BTW I think xhci_ring_alloc should free ring dma buffer
>>>> if trb_processed[] allocation fails.
>>>
>>> Ah right, I'll add the usbd_dma_contig_free() there - Thanks.
>>>
>>>>
>>>> On 2020/07/18 14:47, Marcus Glocker wrote:
>>>>> Hey,
>>>>>
>>>>> On Sat, 18 Jul 2020 00:14:32 +0000
>>>>> [hidden email] wrote:
>>>>>  
>>>>>> hi,
>>>>>>
>>>>>> On 2020/07/15 21:28, [hidden email] wrote:  
>>>>>>> hi,
>>>>>>>
>>>>>>> The patch works well on Intel PCH xhci, but not on AMD Bolton
>>>>>>> xHCI. I'll investigate this problem.    
>>>>>>
>>>>>> Bolton xHCI sometimes raises following events.
>>>>>> (I added printf Completion Code.)
>>>>>>
>>>>>> #246 cc 13 remain 0 type 5 origlen 2048 frlengths[6] 2048
>>>>>> #247 cc 1 remain 0 type 1 origlen 1024 frlengths[6] 3072
>>>>>>
>>>>>> It seems this behavior violates the spec. 4.10.1.1.
>>>>>> When the remain of 1. TRB is 0, CC should be SUCCESS, not
>>>>>> SHORT_PKT, and HC should not raise interrupt.
>>>>>> The problem is how many length 2. TRB transferred.
>>>>>> If real CC of 1. TRB = SUCCESS, we should add 2. TRB length to
>>>>>> frlengths. If SHORT_PKT, we should keep frlengths as is.
>>>>>> Actually with the latter assumption I can see video smoothly w/o
>>>>>> errors.
>>>>>>
>>>>>> I changed your patch to skip adding to frlengths and actlen
>>>>>> if previous TRB is processed AND remain == 0.  
>>>>>
>>>>> Interesting!  Thanks for tracing this down.
>>>>> So in general we can say that for a chained TD, if the first TRB is
>>>>> SHORT, we can simply skip the second TRB.
>>>>>
>>>>> To make the code a bit more understandable, and use the new
>>>>> trb_processed flag to control this, I made another diff which keeps
>>>>> track whether the TRB was not processed, processed, or processed
>>>>> short.
>>>>>
>>>>> It also includes the malloc NULL check and replaces malloc by
>>>>> mallocarray as suggested by tb@.
>>>>>
>>>>> Does this diff also work for you on your AMD xhci?
>>>>>
>>>>>
>>>>> Index: xhci.c
>>>>> ===================================================================
>>>>> RCS file: /cvs/src/sys/dev/usb/xhci.c,v
>>>>> retrieving revision 1.116
>>>>> diff -u -p -u -p -r1.116 xhci.c
>>>>> --- xhci.c 30 Jun 2020 10:21:59 -0000 1.116
>>>>> +++ xhci.c 18 Jul 2020 14:20:28 -0000
>>>>> @@ -82,7 +82,7 @@ void xhci_event_xfer(struct xhci_softc *
>>>>>  int xhci_event_xfer_generic(struct xhci_softc *, struct
>>>>> usbd_xfer *, struct xhci_pipe *, uint32_t, int, uint8_t, uint8_t,
>>>>> uint8_t); int xhci_event_xfer_isoc(struct usbd_xfer *,
>>>>> struct xhci_pipe *,
>>>>> -    uint32_t, int);
>>>>> +    uint32_t, int, uint8_t);
>>>>>  void xhci_event_command(struct xhci_softc *, uint64_t);
>>>>>  void xhci_event_port_change(struct xhci_softc *, uint64_t,
>>>>> uint32_t); int xhci_pipe_init(struct xhci_softc *, struct
>>>>> usbd_pipe *); @@ -800,7 +800,7 @@ xhci_event_xfer(struct xhci_softc
>>>>> *sc, u return;
>>>>>   break;
>>>>>   case UE_ISOCHRONOUS:
>>>>> - if (xhci_event_xfer_isoc(xfer, xp, remain,
>>>>> trb_idx))
>>>>> + if (xhci_event_xfer_isoc(xfer, xp, remain,
>>>>> trb_idx, code)) return;
>>>>>   break;
>>>>>   default:
>>>>> @@ -927,13 +927,23 @@ xhci_event_xfer_generic(struct xhci_soft
>>>>>  
>>>>>  int
>>>>>  xhci_event_xfer_isoc(struct usbd_xfer *xfer, struct xhci_pipe *xp,
>>>>> -    uint32_t remain, int trb_idx)
>>>>> +    uint32_t remain, int trb_idx, uint8_t code)
>>>>>  {
>>>>>   struct usbd_xfer *skipxfer;
>>>>>   struct xhci_xfer *xx = (struct xhci_xfer *)xfer;
>>>>> - int trb0_idx, frame_idx = 0;
>>>>> + int trb0_idx, frame_idx = 0, skip_trb = 0;
>>>>>  
>>>>>   KASSERT(xx->index >= 0);
>>>>> +
>>>>> + switch (code) {
>>>>> + case XHCI_CODE_SHORT_XFER:
>>>>> + xp->ring.trb_processed[trb_idx] =
>>>>> TRB_PROCESSED_SHORT;
>>>>> + break;
>>>>> + default:
>>>>> + xp->ring.trb_processed[trb_idx] =
>>>>> TRB_PROCESSED_YES;
>>>>> + break;
>>>>> + }
>>>>> +
>>>>>   trb0_idx =
>>>>>      ((xx->index + xp->ring.ntrb) - xx->ntrb) %
>>>>> (xp->ring.ntrb - 1);
>>>>> @@ -958,15 +968,21 @@ xhci_event_xfer_isoc(struct usbd_xfer *x
>>>>>   trb0_idx = xp->ring.ntrb - 2;
>>>>>   else
>>>>>   trb0_idx = trb_idx - 1;
>>>>> - if (xfer->frlengths[frame_idx] == 0) {
>>>>> + if (xp->ring.trb_processed[trb0_idx] ==
>>>>> TRB_PROCESSED_NO) { xfer->frlengths[frame_idx] =
>>>>> XHCI_TRB_LEN(letoh32( xp->ring.trbs[trb0_idx].trb_status));
>>>>> + } else if (xp->ring.trb_processed[trb0_idx] ==
>>>>> +    TRB_PROCESSED_SHORT) {
>>>>> + skip_trb = 1;
>>>>>   }
>>>>>   }
>>>>>  
>>>>> - xfer->frlengths[frame_idx] +=
>>>>> -
>>>>> XHCI_TRB_LEN(letoh32(xp->ring.trbs[trb_idx].trb_status)) - remain;
>>>>> - xfer->actlen += xfer->frlengths[frame_idx];
>>>>> + if (!skip_trb) {
>>>>> + xfer->frlengths[frame_idx] +=
>>>>> +
>>>>> XHCI_TRB_LEN(letoh32(xp->ring.trbs[trb_idx].trb_status)) -
>>>>> +    remain;
>>>>> + xfer->actlen += xfer->frlengths[frame_idx];
>>>>> + }
>>>>>  
>>>>>   if (xx->index != trb_idx)
>>>>>   return (1);
>>>>> @@ -1702,6 +1718,15 @@ xhci_ring_alloc(struct xhci_softc *sc, s
>>>>>  
>>>>>   ring->ntrb = ntrb;
>>>>>  
>>>>> + ring->trb_processed =
>>>>> +    mallocarray(ring->ntrb, sizeof(*ring->trb_processed),
>>>>> M_DEVBUF,
>>>>> +    M_NOWAIT);
>>>>> + if (ring->trb_processed == NULL) {
>>>>> + printf("%s: unable to allocate ring trb_processed
>>>>> array\n",
>>>>> +    DEVNAME(sc));
>>>>> + return (ENOMEM);
>>>>> + }
>>>>> +
>>>>>   xhci_ring_reset(sc, ring);
>>>>>  
>>>>>   return (0);
>>>>> @@ -1710,6 +1735,8 @@ xhci_ring_alloc(struct xhci_softc *sc, s
>>>>>  void
>>>>>  xhci_ring_free(struct xhci_softc *sc, struct xhci_ring *ring)
>>>>>  {
>>>>> + free(ring->trb_processed, M_DEVBUF,
>>>>> +    ring->ntrb * sizeof(*ring->trb_processed));
>>>>>   usbd_dma_contig_free(&sc->sc_bus, &ring->dma);
>>>>>  }
>>>>>  
>>>>> @@ -1721,6 +1748,8 @@ xhci_ring_reset(struct xhci_softc *sc, s
>>>>>   size = ring->ntrb * sizeof(struct xhci_trb);
>>>>>  
>>>>>   memset(ring->trbs, 0, size);
>>>>> + memset(ring->trb_processed, 0,
>>>>> +    ring->ntrb * sizeof(*ring->trb_processed));
>>>>>  
>>>>>   ring->index = 0;
>>>>>   ring->toggle = XHCI_TRB_CYCLE;
>>>>> @@ -1805,6 +1834,8 @@ xhci_ring_produce(struct xhci_softc *sc,
>>>>>   ring->index = 0;
>>>>>   ring->toggle ^= 1;
>>>>>   }
>>>>> +
>>>>> + ring->trb_processed[(trb - &ring->trbs[0])] =
>>>>> TRB_PROCESSED_NO;
>>>>>   return (trb);
>>>>>  }
>>>>> Index: xhcivar.h
>>>>> ===================================================================
>>>>> RCS file: /cvs/src/sys/dev/usb/xhcivar.h,v
>>>>> retrieving revision 1.11
>>>>> diff -u -p -u -p -r1.11 xhcivar.h
>>>>> --- xhcivar.h 6 Oct 2019 17:30:00 -0000 1.11
>>>>> +++ xhcivar.h 18 Jul 2020 14:20:28 -0000
>>>>> @@ -44,6 +44,12 @@ struct xhci_xfer {
>>>>>  
>>>>>  struct xhci_ring {
>>>>>   struct xhci_trb *trbs;
>>>>> +enum {
>>>>> + TRB_PROCESSED_NO,
>>>>> + TRB_PROCESSED_YES,
>>>>> + TRB_PROCESSED_SHORT
>>>>> +};
>>>>> + uint8_t *trb_processed;
>>>>>   size_t ntrb;
>>>>>   struct usbd_dma_info dma;
>>>>>  
>>>>>  
>>

Reply | Threaded
Open this post in threaded view
|

Re: xhci(4) isoc: fix bogus handling of chained TRBs

Marcus Glocker
On Sat, 25 Jul 2020 20:31:44 +0000
[hidden email] wrote:

> On 2020/07/25 18:10, Marcus Glocker wrote:
> > On Sun, Jul 19, 2020 at 02:12:21PM +0000, [hidden email] wrote:
> >  
> >> On 2020/07/19 11:25, Marcus Glocker wrote:  
> >>> On Sun, 19 Jul 2020 02:25:30 +0000
> >>> [hidden email] wrote:
> >>>  
> >>>> hi,
> >>>>
> >>>> It works on AMD Bolton xHCI (78141022), Intel PCH (1e318086),
> >>>> and ASM1042 (10421b21).
> >>>> I simply play with ffplay -f v4l2 /dev/video0 to test.  
> >>>
> >>> If your cam supports MJPEG it's good to add '-input_format mjpeg'
> >>> with higher resolutions like 1280x720, because that will
> >>> generated varying image sizes, which hit the 64k memory boundary
> >>> more often, and thus generate potentially more chained TDs.  
> >>
> >> Thank you for useful information.
> >> My webcam supprots at most 640x480, but 1024 bytes/frame x (2+1)
> >> maxburst x 40 frames = 122880 bytes/xfer is enough to observe TD
> >> fragmentation.
> >>
> >>  
> >>>> At this moment it does not work on VL805, but I have no idea.
> >>>> I'll investigate furthermore...  
> >
> > Did you already had a chance to figure out something regarding the
> > issue you faced on your VL805 controller?
> >
> > I'm running the diff here since then on the Intel xHCI controller
> > and couldn't re-produce any errors using different uvideo(4) and
> > uaudio(4) devices.
> >  
>
> No, yet -- all I know about this problem is VL805 genegates
> many MISSED_SRV Transfer Event for Isoch-IN pipe.
>
> xhci0: slot 3 missed srv with 123 TRB
>  :
>
> Even if I increase UVIDEO_IXFERS in uvideo.h to 6, HC still detects
> MISSED_SRV. When I disable splitting TD, it works well.
> I added printf paddr in the event TRB but each paddr of MISSED_SRV is
> 0, that does not meet 4.10.3.2.
> Parameters in this endpoint context are
>
> xhci0: slot 3 dci 3 ival 0 mps 1024 maxb 2 mep 3072 atl 3072 mult 0
>
> looks sane.

Hmm, I see.

I currently have also no idea what exactly is causing the missed service
events.  I was reading a little bit yesterday about the VL805 and could
find some statements where people say it's not fully compliant with the
xHCI specs, and in Linux it took some cooperation with the vendor to
make it work.

One thing I still wanted to ask you to understand whether the problem
on your VL805 is only related with my last diff;  Are the multi-trb
transfers working fine with your last diff on the VL805?

Reply | Threaded
Open this post in threaded view
|

Re: xhci(4) isoc: fix bogus handling of chained TRBs

sc.dying
On 2020/07/26 10:54, Marcus Glocker wrote:

> On Sat, 25 Jul 2020 20:31:44 +0000
> [hidden email] wrote:
>
>> On 2020/07/25 18:10, Marcus Glocker wrote:
>>> On Sun, Jul 19, 2020 at 02:12:21PM +0000, [hidden email] wrote:
>>>  
>>>> On 2020/07/19 11:25, Marcus Glocker wrote:  
>>>>> On Sun, 19 Jul 2020 02:25:30 +0000
>>>>> [hidden email] wrote:
>>>>>  
>>>>>> hi,
>>>>>>
>>>>>> It works on AMD Bolton xHCI (78141022), Intel PCH (1e318086),
>>>>>> and ASM1042 (10421b21).
>>>>>> I simply play with ffplay -f v4l2 /dev/video0 to test.  
>>>>>
>>>>> If your cam supports MJPEG it's good to add '-input_format mjpeg'
>>>>> with higher resolutions like 1280x720, because that will
>>>>> generated varying image sizes, which hit the 64k memory boundary
>>>>> more often, and thus generate potentially more chained TDs.  
>>>>
>>>> Thank you for useful information.
>>>> My webcam supprots at most 640x480, but 1024 bytes/frame x (2+1)
>>>> maxburst x 40 frames = 122880 bytes/xfer is enough to observe TD
>>>> fragmentation.
>>>>
>>>>  
>>>>>> At this moment it does not work on VL805, but I have no idea.
>>>>>> I'll investigate furthermore...  
>>>
>>> Did you already had a chance to figure out something regarding the
>>> issue you faced on your VL805 controller?
>>>
>>> I'm running the diff here since then on the Intel xHCI controller
>>> and couldn't re-produce any errors using different uvideo(4) and
>>> uaudio(4) devices.
>>>  
>>
>> No, yet -- all I know about this problem is VL805 genegates
>> many MISSED_SRV Transfer Event for Isoch-IN pipe.
>>
>> xhci0: slot 3 missed srv with 123 TRB
>>  :
>>
>> Even if I increase UVIDEO_IXFERS in uvideo.h to 6, HC still detects
>> MISSED_SRV. When I disable splitting TD, it works well.
>> I added printf paddr in the event TRB but each paddr of MISSED_SRV is
>> 0, that does not meet 4.10.3.2.
>> Parameters in this endpoint context are
>>
>> xhci0: slot 3 dci 3 ival 0 mps 1024 maxb 2 mep 3072 atl 3072 mult 0
>>
>> looks sane.
>
> Hmm, I see.
>
> I currently have also no idea what exactly is causing the missed service
> events.  I was reading a little bit yesterday about the VL805 and could
> find some statements where people say it's not fully compliant with the
> xHCI specs, and in Linux it took some cooperation with the vendor to
> make it work.
>
> One thing I still wanted to ask you to understand whether the problem
> on your VL805 is only related with my last diff;  Are the multi-trb
> transfers working fine with your last diff on the VL805?

On VL805 ffplay plays the movie sometimes smoothly, sometimes laggy.
The multi-TRB transfer itself works on VL805 with your patch.
Not all splitted TD fail to transfer. Successful splitted transfer
works as intended.
I think MISSED_SRV is caused by other reason, maybe isochronous
scheduling problem.
Thus, IMO your patch can be committed.

VL805 also has same problem that AMD Bolton has.
It may generate the 1. TRB event w/ cc = SHORT_PKT and
remain = requested length (that is, transferred length = 0),
but the 2. TRB w/ cc = SUCCESS and remain = 0.
remain of 2. TRB should be given length, and CC should be SHORT_PKT.
Your patch fixes this problem.

Reply | Threaded
Open this post in threaded view
|

Re: xhci(4) isoc: fix bogus handling of chained TRBs

Marcus Glocker
On Sun, 26 Jul 2020 13:27:34 +0000
[hidden email] wrote:

> On 2020/07/26 10:54, Marcus Glocker wrote:
> > On Sat, 25 Jul 2020 20:31:44 +0000
> > [hidden email] wrote:
> >  
> >> On 2020/07/25 18:10, Marcus Glocker wrote:  
> >>> On Sun, Jul 19, 2020 at 02:12:21PM +0000, [hidden email]
> >>> wrote:
> >>>> On 2020/07/19 11:25, Marcus Glocker wrote:    
> >>>>> On Sun, 19 Jul 2020 02:25:30 +0000
> >>>>> [hidden email] wrote:
> >>>>>    
> >>>>>> hi,
> >>>>>>
> >>>>>> It works on AMD Bolton xHCI (78141022), Intel PCH (1e318086),
> >>>>>> and ASM1042 (10421b21).
> >>>>>> I simply play with ffplay -f v4l2 /dev/video0 to test.    
> >>>>>
> >>>>> If your cam supports MJPEG it's good to add '-input_format
> >>>>> mjpeg' with higher resolutions like 1280x720, because that will
> >>>>> generated varying image sizes, which hit the 64k memory boundary
> >>>>> more often, and thus generate potentially more chained TDs.    
> >>>>
> >>>> Thank you for useful information.
> >>>> My webcam supprots at most 640x480, but 1024 bytes/frame x (2+1)
> >>>> maxburst x 40 frames = 122880 bytes/xfer is enough to observe TD
> >>>> fragmentation.
> >>>>
> >>>>    
> >>>>>> At this moment it does not work on VL805, but I have no idea.
> >>>>>> I'll investigate furthermore...    
> >>>
> >>> Did you already had a chance to figure out something regarding the
> >>> issue you faced on your VL805 controller?
> >>>
> >>> I'm running the diff here since then on the Intel xHCI controller
> >>> and couldn't re-produce any errors using different uvideo(4) and
> >>> uaudio(4) devices.
> >>>    
> >>
> >> No, yet -- all I know about this problem is VL805 genegates
> >> many MISSED_SRV Transfer Event for Isoch-IN pipe.
> >>
> >> xhci0: slot 3 missed srv with 123 TRB
> >>  :
> >>
> >> Even if I increase UVIDEO_IXFERS in uvideo.h to 6, HC still detects
> >> MISSED_SRV. When I disable splitting TD, it works well.
> >> I added printf paddr in the event TRB but each paddr of MISSED_SRV
> >> is 0, that does not meet 4.10.3.2.
> >> Parameters in this endpoint context are
> >>
> >> xhci0: slot 3 dci 3 ival 0 mps 1024 maxb 2 mep 3072 atl 3072 mult 0
> >>
> >> looks sane.  
> >
> > Hmm, I see.
> >
> > I currently have also no idea what exactly is causing the missed
> > service events.  I was reading a little bit yesterday about the
> > VL805 and could find some statements where people say it's not
> > fully compliant with the xHCI specs, and in Linux it took some
> > cooperation with the vendor to make it work.
> >
> > One thing I still wanted to ask you to understand whether the
> > problem on your VL805 is only related with my last diff;  Are the
> > multi-trb transfers working fine with your last diff on the VL805?  
>
> On VL805 ffplay plays the movie sometimes smoothly, sometimes laggy.
> The multi-TRB transfer itself works on VL805 with your patch.
> Not all splitted TD fail to transfer. Successful splitted transfer
> works as intended.
> I think MISSED_SRV is caused by other reason, maybe isochronous
> scheduling problem.
> Thus, IMO your patch can be committed.
>
> VL805 also has same problem that AMD Bolton has.
> It may generate the 1. TRB event w/ cc = SHORT_PKT and
> remain = requested length (that is, transferred length = 0),
> but the 2. TRB w/ cc = SUCCESS and remain = 0.
> remain of 2. TRB should be given length, and CC should be SHORT_PKT.
> Your patch fixes this problem.

OK, that's what I wanted to understand.
I also have the impression that the MISSED_SRV issue on the VL805 is
related to another problem which we should trace separately from the
multi-trb problem.  Thanks for that useful feedback.

Attached the latest version of my patch including all the inputs
received (mostly related to malloc/free).

Patrick, Martin, would you be fine to OK that?


Index: xhci.c
===================================================================
RCS file: /cvs/src/sys/dev/usb/xhci.c,v
retrieving revision 1.116
diff -u -p -u -p -r1.116 xhci.c
--- xhci.c 30 Jun 2020 10:21:59 -0000 1.116
+++ xhci.c 19 Jul 2020 06:51:58 -0000
@@ -82,7 +82,7 @@ void xhci_event_xfer(struct xhci_softc *
 int xhci_event_xfer_generic(struct xhci_softc *, struct usbd_xfer *,
     struct xhci_pipe *, uint32_t, int, uint8_t, uint8_t, uint8_t);
 int xhci_event_xfer_isoc(struct usbd_xfer *, struct xhci_pipe *,
-    uint32_t, int);
+    uint32_t, int, uint8_t);
 void xhci_event_command(struct xhci_softc *, uint64_t);
 void xhci_event_port_change(struct xhci_softc *, uint64_t, uint32_t);
 int xhci_pipe_init(struct xhci_softc *, struct usbd_pipe *);
@@ -800,7 +800,7 @@ xhci_event_xfer(struct xhci_softc *sc, u
  return;
  break;
  case UE_ISOCHRONOUS:
- if (xhci_event_xfer_isoc(xfer, xp, remain, trb_idx))
+ if (xhci_event_xfer_isoc(xfer, xp, remain, trb_idx, code))
  return;
  break;
  default:
@@ -927,13 +927,23 @@ xhci_event_xfer_generic(struct xhci_soft
 
 int
 xhci_event_xfer_isoc(struct usbd_xfer *xfer, struct xhci_pipe *xp,
-    uint32_t remain, int trb_idx)
+    uint32_t remain, int trb_idx, uint8_t code)
 {
  struct usbd_xfer *skipxfer;
  struct xhci_xfer *xx = (struct xhci_xfer *)xfer;
- int trb0_idx, frame_idx = 0;
+ int trb0_idx, frame_idx = 0, skip_trb = 0;
 
  KASSERT(xx->index >= 0);
+
+ switch (code) {
+ case XHCI_CODE_SHORT_XFER:
+ xp->ring.trb_processed[trb_idx] = TRB_PROCESSED_SHORT;
+ break;
+ default:
+ xp->ring.trb_processed[trb_idx] = TRB_PROCESSED_YES;
+ break;
+ }
+
  trb0_idx =
     ((xx->index + xp->ring.ntrb) - xx->ntrb) % (xp->ring.ntrb - 1);
 
@@ -958,15 +968,21 @@ xhci_event_xfer_isoc(struct usbd_xfer *x
  trb0_idx = xp->ring.ntrb - 2;
  else
  trb0_idx = trb_idx - 1;
- if (xfer->frlengths[frame_idx] == 0) {
+ if (xp->ring.trb_processed[trb0_idx] == TRB_PROCESSED_NO) {
  xfer->frlengths[frame_idx] = XHCI_TRB_LEN(letoh32(
     xp->ring.trbs[trb0_idx].trb_status));
+ } else if (xp->ring.trb_processed[trb0_idx] ==
+    TRB_PROCESSED_SHORT) {
+ skip_trb = 1;
  }
  }
 
- xfer->frlengths[frame_idx] +=
-    XHCI_TRB_LEN(letoh32(xp->ring.trbs[trb_idx].trb_status)) - remain;
- xfer->actlen += xfer->frlengths[frame_idx];
+ if (!skip_trb) {
+ xfer->frlengths[frame_idx] +=
+    XHCI_TRB_LEN(letoh32(xp->ring.trbs[trb_idx].trb_status)) -
+    remain;
+ xfer->actlen += xfer->frlengths[frame_idx];
+ }
 
  if (xx->index != trb_idx)
  return (1);
@@ -1702,6 +1718,16 @@ xhci_ring_alloc(struct xhci_softc *sc, s
 
  ring->ntrb = ntrb;
 
+ ring->trb_processed =
+    mallocarray(ring->ntrb, sizeof(*ring->trb_processed), M_DEVBUF,
+    M_NOWAIT);
+ if (ring->trb_processed == NULL) {
+ printf("%s: unable to allocate ring trb_processed array\n",
+    DEVNAME(sc));
+ usbd_dma_contig_free(&sc->sc_bus, &ring->dma);
+ return (ENOMEM);
+ }
+
  xhci_ring_reset(sc, ring);
 
  return (0);
@@ -1710,6 +1736,8 @@ xhci_ring_alloc(struct xhci_softc *sc, s
 void
 xhci_ring_free(struct xhci_softc *sc, struct xhci_ring *ring)
 {
+ free(ring->trb_processed, M_DEVBUF,
+    ring->ntrb * sizeof(*ring->trb_processed));
  usbd_dma_contig_free(&sc->sc_bus, &ring->dma);
 }
 
@@ -1721,6 +1749,8 @@ xhci_ring_reset(struct xhci_softc *sc, s
  size = ring->ntrb * sizeof(struct xhci_trb);
 
  memset(ring->trbs, 0, size);
+ memset(ring->trb_processed, 0,
+    ring->ntrb * sizeof(*ring->trb_processed));
 
  ring->index = 0;
  ring->toggle = XHCI_TRB_CYCLE;
@@ -1805,6 +1835,8 @@ xhci_ring_produce(struct xhci_softc *sc,
  ring->index = 0;
  ring->toggle ^= 1;
  }
+
+ ring->trb_processed[(trb - &ring->trbs[0])] = TRB_PROCESSED_NO;
 
  return (trb);
 }
Index: xhcivar.h
===================================================================
RCS file: /cvs/src/sys/dev/usb/xhcivar.h,v
retrieving revision 1.11
diff -u -p -u -p -r1.11 xhcivar.h
--- xhcivar.h 6 Oct 2019 17:30:00 -0000 1.11
+++ xhcivar.h 19 Jul 2020 06:51:58 -0000
@@ -44,6 +44,12 @@ struct xhci_xfer {
 
 struct xhci_ring {
  struct xhci_trb *trbs;
+enum {
+ TRB_PROCESSED_NO,
+ TRB_PROCESSED_YES,
+ TRB_PROCESSED_SHORT
+};
+ uint8_t *trb_processed;
  size_t ntrb;
  struct usbd_dma_info dma;
 

Reply | Threaded
Open this post in threaded view
|

Re: xhci(4) isoc: fix bogus handling of chained TRBs

Martin Pieuchot
On 26/07/20(Sun) 16:23, Marcus Glocker wrote:

> On Sun, 26 Jul 2020 13:27:34 +0000
> [hidden email] wrote:
>
> > On 2020/07/26 10:54, Marcus Glocker wrote:
> > > On Sat, 25 Jul 2020 20:31:44 +0000
> > > [hidden email] wrote:
> > >  
> > >> On 2020/07/25 18:10, Marcus Glocker wrote:  
> > >>> On Sun, Jul 19, 2020 at 02:12:21PM +0000, [hidden email]
> > >>> wrote:
> > >>>> On 2020/07/19 11:25, Marcus Glocker wrote:    
> > >>>>> On Sun, 19 Jul 2020 02:25:30 +0000
> > >>>>> [hidden email] wrote:
> > >>>>>    
> > >>>>>> hi,
> > >>>>>>
> > >>>>>> It works on AMD Bolton xHCI (78141022), Intel PCH (1e318086),
> > >>>>>> and ASM1042 (10421b21).
> > >>>>>> I simply play with ffplay -f v4l2 /dev/video0 to test.    
> > >>>>>
> > >>>>> If your cam supports MJPEG it's good to add '-input_format
> > >>>>> mjpeg' with higher resolutions like 1280x720, because that will
> > >>>>> generated varying image sizes, which hit the 64k memory boundary
> > >>>>> more often, and thus generate potentially more chained TDs.    
> > >>>>
> > >>>> Thank you for useful information.
> > >>>> My webcam supprots at most 640x480, but 1024 bytes/frame x (2+1)
> > >>>> maxburst x 40 frames = 122880 bytes/xfer is enough to observe TD
> > >>>> fragmentation.
> > >>>>
> > >>>>    
> > >>>>>> At this moment it does not work on VL805, but I have no idea.
> > >>>>>> I'll investigate furthermore...    
> > >>>
> > >>> Did you already had a chance to figure out something regarding the
> > >>> issue you faced on your VL805 controller?
> > >>>
> > >>> I'm running the diff here since then on the Intel xHCI controller
> > >>> and couldn't re-produce any errors using different uvideo(4) and
> > >>> uaudio(4) devices.
> > >>>    
> > >>
> > >> No, yet -- all I know about this problem is VL805 genegates
> > >> many MISSED_SRV Transfer Event for Isoch-IN pipe.
> > >>
> > >> xhci0: slot 3 missed srv with 123 TRB
> > >>  :
> > >>
> > >> Even if I increase UVIDEO_IXFERS in uvideo.h to 6, HC still detects
> > >> MISSED_SRV. When I disable splitting TD, it works well.
> > >> I added printf paddr in the event TRB but each paddr of MISSED_SRV
> > >> is 0, that does not meet 4.10.3.2.
> > >> Parameters in this endpoint context are
> > >>
> > >> xhci0: slot 3 dci 3 ival 0 mps 1024 maxb 2 mep 3072 atl 3072 mult 0
> > >>
> > >> looks sane.  
> > >
> > > Hmm, I see.
> > >
> > > I currently have also no idea what exactly is causing the missed
> > > service events.  I was reading a little bit yesterday about the
> > > VL805 and could find some statements where people say it's not
> > > fully compliant with the xHCI specs, and in Linux it took some
> > > cooperation with the vendor to make it work.
> > >
> > > One thing I still wanted to ask you to understand whether the
> > > problem on your VL805 is only related with my last diff;  Are the
> > > multi-trb transfers working fine with your last diff on the VL805?  
> >
> > On VL805 ffplay plays the movie sometimes smoothly, sometimes laggy.
> > The multi-TRB transfer itself works on VL805 with your patch.
> > Not all splitted TD fail to transfer. Successful splitted transfer
> > works as intended.
> > I think MISSED_SRV is caused by other reason, maybe isochronous
> > scheduling problem.
> > Thus, IMO your patch can be committed.
> >
> > VL805 also has same problem that AMD Bolton has.
> > It may generate the 1. TRB event w/ cc = SHORT_PKT and
> > remain = requested length (that is, transferred length = 0),
> > but the 2. TRB w/ cc = SUCCESS and remain = 0.
> > remain of 2. TRB should be given length, and CC should be SHORT_PKT.
> > Your patch fixes this problem.
>
> OK, that's what I wanted to understand.
> I also have the impression that the MISSED_SRV issue on the VL805 is
> related to another problem which we should trace separately from the
> multi-trb problem.  Thanks for that useful feedback.
>
> Attached the latest version of my patch including all the inputs
> received (mostly related to malloc/free).
>
> Patrick, Martin, would you be fine to OK that?

The logic looks fine.  I'd suggest you move the trb_processed array to
the xhci_pipe structure.  Command and Event rings do not need it, right?
This should also allow you to get rid of the malloc/free by always using
XHCI_MAX_XFER elements.

> Index: xhci.c
> ===================================================================
> RCS file: /cvs/src/sys/dev/usb/xhci.c,v
> retrieving revision 1.116
> diff -u -p -u -p -r1.116 xhci.c
> --- xhci.c 30 Jun 2020 10:21:59 -0000 1.116
> +++ xhci.c 19 Jul 2020 06:51:58 -0000
> @@ -82,7 +82,7 @@ void xhci_event_xfer(struct xhci_softc *
>  int xhci_event_xfer_generic(struct xhci_softc *, struct usbd_xfer *,
>      struct xhci_pipe *, uint32_t, int, uint8_t, uint8_t, uint8_t);
>  int xhci_event_xfer_isoc(struct usbd_xfer *, struct xhci_pipe *,
> -    uint32_t, int);
> +    uint32_t, int, uint8_t);
>  void xhci_event_command(struct xhci_softc *, uint64_t);
>  void xhci_event_port_change(struct xhci_softc *, uint64_t, uint32_t);
>  int xhci_pipe_init(struct xhci_softc *, struct usbd_pipe *);
> @@ -800,7 +800,7 @@ xhci_event_xfer(struct xhci_softc *sc, u
>   return;
>   break;
>   case UE_ISOCHRONOUS:
> - if (xhci_event_xfer_isoc(xfer, xp, remain, trb_idx))
> + if (xhci_event_xfer_isoc(xfer, xp, remain, trb_idx, code))
>   return;
>   break;
>   default:
> @@ -927,13 +927,23 @@ xhci_event_xfer_generic(struct xhci_soft
>  
>  int
>  xhci_event_xfer_isoc(struct usbd_xfer *xfer, struct xhci_pipe *xp,
> -    uint32_t remain, int trb_idx)
> +    uint32_t remain, int trb_idx, uint8_t code)
>  {
>   struct usbd_xfer *skipxfer;
>   struct xhci_xfer *xx = (struct xhci_xfer *)xfer;
> - int trb0_idx, frame_idx = 0;
> + int trb0_idx, frame_idx = 0, skip_trb = 0;
>  
>   KASSERT(xx->index >= 0);
> +
> + switch (code) {
> + case XHCI_CODE_SHORT_XFER:
> + xp->ring.trb_processed[trb_idx] = TRB_PROCESSED_SHORT;
> + break;
> + default:
> + xp->ring.trb_processed[trb_idx] = TRB_PROCESSED_YES;
> + break;
> + }
> +
>   trb0_idx =
>      ((xx->index + xp->ring.ntrb) - xx->ntrb) % (xp->ring.ntrb - 1);
>  
> @@ -958,15 +968,21 @@ xhci_event_xfer_isoc(struct usbd_xfer *x
>   trb0_idx = xp->ring.ntrb - 2;
>   else
>   trb0_idx = trb_idx - 1;
> - if (xfer->frlengths[frame_idx] == 0) {
> + if (xp->ring.trb_processed[trb0_idx] == TRB_PROCESSED_NO) {
>   xfer->frlengths[frame_idx] = XHCI_TRB_LEN(letoh32(
>      xp->ring.trbs[trb0_idx].trb_status));
> + } else if (xp->ring.trb_processed[trb0_idx] ==
> +    TRB_PROCESSED_SHORT) {
> + skip_trb = 1;
>   }
>   }
>  
> - xfer->frlengths[frame_idx] +=
> -    XHCI_TRB_LEN(letoh32(xp->ring.trbs[trb_idx].trb_status)) - remain;
> - xfer->actlen += xfer->frlengths[frame_idx];
> + if (!skip_trb) {
> + xfer->frlengths[frame_idx] +=
> +    XHCI_TRB_LEN(letoh32(xp->ring.trbs[trb_idx].trb_status)) -
> +    remain;
> + xfer->actlen += xfer->frlengths[frame_idx];
> + }
>  
>   if (xx->index != trb_idx)
>   return (1);
> @@ -1702,6 +1718,16 @@ xhci_ring_alloc(struct xhci_softc *sc, s
>  
>   ring->ntrb = ntrb;
>  
> + ring->trb_processed =
> +    mallocarray(ring->ntrb, sizeof(*ring->trb_processed), M_DEVBUF,
> +    M_NOWAIT);
> + if (ring->trb_processed == NULL) {
> + printf("%s: unable to allocate ring trb_processed array\n",
> +    DEVNAME(sc));
> + usbd_dma_contig_free(&sc->sc_bus, &ring->dma);
> + return (ENOMEM);
> + }
> +
>   xhci_ring_reset(sc, ring);
>  
>   return (0);
> @@ -1710,6 +1736,8 @@ xhci_ring_alloc(struct xhci_softc *sc, s
>  void
>  xhci_ring_free(struct xhci_softc *sc, struct xhci_ring *ring)
>  {
> + free(ring->trb_processed, M_DEVBUF,
> +    ring->ntrb * sizeof(*ring->trb_processed));
>   usbd_dma_contig_free(&sc->sc_bus, &ring->dma);
>  }
>  
> @@ -1721,6 +1749,8 @@ xhci_ring_reset(struct xhci_softc *sc, s
>   size = ring->ntrb * sizeof(struct xhci_trb);
>  
>   memset(ring->trbs, 0, size);
> + memset(ring->trb_processed, 0,
> +    ring->ntrb * sizeof(*ring->trb_processed));
>  
>   ring->index = 0;
>   ring->toggle = XHCI_TRB_CYCLE;
> @@ -1805,6 +1835,8 @@ xhci_ring_produce(struct xhci_softc *sc,
>   ring->index = 0;
>   ring->toggle ^= 1;
>   }
> +
> + ring->trb_processed[(trb - &ring->trbs[0])] = TRB_PROCESSED_NO;
>  
>   return (trb);
>  }
> Index: xhcivar.h
> ===================================================================
> RCS file: /cvs/src/sys/dev/usb/xhcivar.h,v
> retrieving revision 1.11
> diff -u -p -u -p -r1.11 xhcivar.h
> --- xhcivar.h 6 Oct 2019 17:30:00 -0000 1.11
> +++ xhcivar.h 19 Jul 2020 06:51:58 -0000
> @@ -44,6 +44,12 @@ struct xhci_xfer {
>  
>  struct xhci_ring {
>   struct xhci_trb *trbs;
> +enum {
> + TRB_PROCESSED_NO,
> + TRB_PROCESSED_YES,
> + TRB_PROCESSED_SHORT
> +};
> + uint8_t *trb_processed;
>   size_t ntrb;
>   struct usbd_dma_info dma;
>  

12