Fix for a possible ahci issue on intel skylake hardware

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Fix for a possible ahci issue on intel skylake hardware

Imre Vadasz-2
For us, this patch fixed ahci getting stuck during high disk-IO load (mainly
writes) from at least 2 threads in parallel.
We can reproduce this on a Skylake notebook with a single SSD disk, but
running openbsd within a custom virtualization solution.

However, I so far failed to reproduce this issue with 5.9 running natively on
the same hardware.
It *might* be reproducable when running virtualized with an emulated ahci
controller, e.g. in qemu/kvm or Virtualbox, because of the long delay between
clearing the AHCI_PREG_IS and the AHCI_REG_IS register (that would be caused
by emulation of the ahci controller happening at each mmio access).

What seems to happen is that a new ahci port interrupt becomes pending
inbetween the ahci_write(sc, AHCI_PREG_IS, is) in ahci_port_intr() and the
ahci_write(sc, AHCI_REG_IS, ack) in ahci_intr() afterwards.
It could be that the global IS register bit only gets updated from
AHCI_PREG_IS at the instant when the interrupt becomes pending (instead of
continuously updating the global IS bit while the interrupt is still pending).
This would explain ahci stalling since no more interrupts would be triggered
for the port after the global AHCI_REG_IS is cleared at the end of
ahci_intr().

The ahci freeze no longer occurs for us, when changing the ahci interrupt
handling to closer match the specification, by clearing the global IS bit for
each port directly after clearing the per-port IS register:

Index: sys/dev/ic/ahci.c
===================================================================
RCS file: /cvs/src/sys/dev/ic/ahci.c,v
retrieving revision 1.24
diff -u -r1.24 ahci.c
--- sys/dev/ic/ahci.c 10 Mar 2016 13:56:14 -0000 1.24
+++ sys/dev/ic/ahci.c 22 Mar 2016 17:06:28 -0000
@@ -1958,7 +1958,9 @@
  if (is & sc->sc_ccc_mask) {
  DPRINTF(AHCI_D_INTR, "%s: command coalescing interrupt\n",
     DEVNAME(sc));
+ ahci_write(sc, AHCI_REG_IS, is & sc->sc_ccc_mask);
  is &= ~sc->sc_ccc_mask;
+ ack &= ~sc->sc_ccc_mask;
  is |= sc->sc_ccc_ports_cur;
  }
 #endif
@@ -1966,14 +1968,20 @@
  /* Process interrupts for each port */
  while (is) {
  port = ffs(is) - 1;
- if (sc->sc_ports[port])
+ if (sc->sc_ports[port]) {
  ahci_port_intr(sc->sc_ports[port],
     AHCI_PREG_CI_ALL_SLOTS);
+ ack &= ~(1 << port);
+ }
  is &= ~(1 << port);
  }
 
- /* Finally, acknowledge global interrupt */
- ahci_write(sc, AHCI_REG_IS, ack);
+ /* Clear remaining bits in the global interrupt status register */
+ if (ack != 0) {
+ DPRINTF(AHCI_D_VERBOSE, "%s: unhandled interrupt status "
+    "bits: 0x%08x\n", DEVNAME(sc), ack);
+ ahci_write(sc, AHCI_REG_IS, ack);
+ }
 
  return (1);
 }
@@ -1994,8 +2002,11 @@
  is = ahci_pread(ap, AHCI_PREG_IS);
 
  /* Ack port interrupt only if checking all command slots. */
- if (ci_mask == AHCI_PREG_CI_ALL_SLOTS)
+ if (ci_mask == AHCI_PREG_CI_ALL_SLOTS) {
  ahci_pwrite(ap, AHCI_PREG_IS, is);
+ /* Clear corresponding bit in global IS register. */
+ ahci_write(sc, AHCI_REG_IS, (1 << ap->ap_port));
+ }
 
  if (is)
  DPRINTF(AHCI_D_INTR, "%s: interrupt: %b\n", PORTNAME(ap),