Virtio 1.0 for the kernel

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Virtio 1.0 for the kernel

Stefan Fritsch
Hi,

the diff below implements the virtio 1.0 standard in the kernel (0.9 keeps
working, of course). It's not ready for commit, yet, but I would like some
input.

For the most part, the changes from 0.9 to 1.0 are not that big, but there
are some notable differences. Since some headers are also used by vmd in
userspace, how those things are handled in the kernel are also relevant to
vmd.

One change is that the number of feature bits is no longer limited to 32.
This means in the long run, defining constants with values that have only
the relevant feature bit set won't work anymore. So eventually, we will
have to change the constants to contain the bit offsets instead. Then we
can add some macros to make handling easier. This is what the attached
diff does. However, in the nearer future 64 bits will be sufficient and
another possibility is to continue as now and define features as uint64_t
values. So, should we make the switch from "#define foo (1<<24)" to
"#define foo 24" now or at some later time, when devices start using more
than 64 bits?

The biggest change in 1.0 is the way the resources are found and accessed
for virtio_pci devices. This means that all the offsets for the generic
configuration registers change between 0.9 and 1.0. One possibility would
be to just define a second set of constants and use that for 1.0. However,
since the register layout is documented as structs in the standard, I have
tried something different and used some macros (CREAD/CWRITE in
virtio_pci.c) and offsetof() magic to determine the register offset and
width for an access. This has the advantage that the widths are
automatically taken care of, whereas in the old style one has to manually
pick the correct bus_dma_read/write_* function. However, I don't like the
0.9 code to use the one style and the 1.0 code to use the other style. I
also don't know how suitable the offsetof hackery would be for vmd. So I
would like input on which way you think is preferable?

The diff does not implement 1.0 for virtio/mmio, yet. I hope that it does
not break 0.9 virtio/mmio, but I have not tested that. So far I did not
have success in running openbsd arm or aarch64 on qemu (openbsd panics).
Any pointers about this?

So far, I have only tested the net, block and rng devices. It's possible
that the other devices need some more minor tweaks, too.

And in case anyone wonders, there is no killer feature in virtio 1.0 that
we really want to have. It's just that we run on hypervisors that don't
support 0.9, and we run better on systems where 1.0-only is the default
(libvirt does that when the newer qemu machine type q35 is used :-( ).

Cheers,
Stefan


diff --git a/share/man/man4/virtio.4 b/share/man/man4/virtio.4
index 5e60e2e0c32..6ee23a7b47d 100644
--- a/share/man/man4/virtio.4
+++ b/share/man/man4/virtio.4
@@ -22,7 +22,7 @@
 .Nd VirtIO support driver
 .Sh SYNOPSIS
 .Cd "virtio* at fdt?"
-.Cd "virtio* at pci?"
+.Cd "virtio* at pci? flags 0x00"
 .Sh DESCRIPTION
 The
 .Nm
@@ -56,7 +56,12 @@ control interface
 The
 .Nm
 driver conforms to the virtio 0.9.5 specification.
-The virtio 1.0 standard is not supported, yet.
+The virtio 1.0 standard is only supported for pci devices.
+.Pp
+By default 0.9 is preferred over 1.0.
+This can be changed by setting the bit 0x4 in the flags.
+.Pp
+Setting the bit 0x8 in the flags disables the 1.0 support completely.
 .Sh SEE ALSO
 .Xr intro 4
 .Sh HISTORY
diff --git a/sys/dev/fdt/virtio_mmio.c b/sys/dev/fdt/virtio_mmio.c
index a48091d7fca..3a74e647c19 100644
--- a/sys/dev/fdt/virtio_mmio.c
+++ b/sys/dev/fdt/virtio_mmio.c
@@ -89,7 +89,7 @@ void virtio_mmio_write_device_config_2(struct virtio_softc *, int, uint16_t);
 void virtio_mmio_write_device_config_4(struct virtio_softc *, int, uint32_t);
 void virtio_mmio_write_device_config_8(struct virtio_softc *, int, uint64_t);
 uint16_t virtio_mmio_read_queue_size(struct virtio_softc *, uint16_t);
-void virtio_mmio_setup_queue(struct virtio_softc *, uint16_t, uint32_t);
+void virtio_mmio_setup_queue(struct virtio_softc *, struct virtqueue *, uint64_t);
 void virtio_mmio_set_status(struct virtio_softc *, int);
 uint32_t virtio_mmio_negotiate_features(struct virtio_softc *, uint32_t,
       const struct virtio_feature_name *);
@@ -151,15 +151,18 @@ virtio_mmio_read_queue_size(struct virtio_softc *vsc, uint16_t idx)
 }
 
 void
-virtio_mmio_setup_queue(struct virtio_softc *vsc, uint16_t idx, uint32_t addr)
+virtio_mmio_setup_queue(struct virtio_softc *vsc, struct virtqueue *vq,
+    uint64_t addr)
 {
  struct virtio_mmio_softc *sc = (struct virtio_mmio_softc *)vsc;
- bus_space_write_4(sc->sc_iot, sc->sc_ioh, VIRTIO_MMIO_QUEUE_SEL, idx);
+ bus_space_write_4(sc->sc_iot, sc->sc_ioh, VIRTIO_MMIO_QUEUE_SEL,
+    vq->vq_index);
  bus_space_write_4(sc->sc_iot, sc->sc_ioh, VIRTIO_MMIO_QUEUE_NUM,
     bus_space_read_4(sc->sc_iot, sc->sc_ioh, VIRTIO_MMIO_QUEUE_NUM_MAX));
  bus_space_write_4(sc->sc_iot, sc->sc_ioh, VIRTIO_MMIO_QUEUE_ALIGN,
     PAGE_SIZE);
- bus_space_write_4(sc->sc_iot, sc->sc_ioh, VIRTIO_MMIO_QUEUE_PFN, addr);
+ bus_space_write_4(sc->sc_iot, sc->sc_ioh, VIRTIO_MMIO_QUEUE_PFN,
+    addr / VIRTIO_PAGE_SIZE);
 }
 
 void
@@ -308,8 +311,8 @@ virtio_mmio_negotiate_features(struct virtio_softc *vsc, uint32_t guest_features
  * indirect descriptors can be switched off by setting bit 1 in the
  * driver flags, see config(8)
  */
- if (!(vsc->sc_dev.dv_cfdata->cf_flags & 1) &&
-    !(vsc->sc_child->dv_cfdata->cf_flags & 1)) {
+ if (!(vsc->sc_dev.dv_cfdata->cf_flags & VIRTIO_CF_NO_INDIRECT) &&
+    !(vsc->sc_child->dv_cfdata->cf_flags & VIRTIO_CF_NO_INDIRECT)) {
  guest_features |= VIRTIO_F_RING_INDIRECT_DESC;
  } else {
  printf("RingIndirectDesc disabled by UKC\n");
diff --git a/sys/dev/pci/virtio_pci.c b/sys/dev/pci/virtio_pci.c
index d9c8d144e21..8fdc3e4801c 100644
--- a/sys/dev/pci/virtio_pci.c
+++ b/sys/dev/pci/virtio_pci.c
@@ -35,11 +35,66 @@
 #include <dev/pci/pcidevs.h>
 #include <dev/pci/pcireg.h>
 #include <dev/pci/pcivar.h>
+#include <dev/pci/virtio_pcireg.h>
 
 #include <dev/pv/virtioreg.h>
 #include <dev/pv/virtiovar.h>
 #include <dev/pci/virtio_pcireg.h>
 
+#define DNPRINTF(n,x...) \
+    do { if (VIRTIO_DEBUG >= n) printf(x); } while(0)
+
+#define CREAD(sc, memb) \
+ ({ \
+ struct virtio_pci_common_cfg c; \
+ size_t off = offsetof(struct virtio_pci_common_cfg, memb); \
+ size_t size = sizeof(c.memb); \
+ typeof(c.memb) val; \
+ \
+ switch (size) { \
+ case 1: \
+ val = bus_space_read_1(sc->sc_iot, sc->sc_ioh, off); \
+ break; \
+ case 2: \
+ val = bus_space_read_2(sc->sc_iot, sc->sc_ioh, off); \
+ break; \
+ case 4: \
+ val = bus_space_read_4(sc->sc_iot, sc->sc_ioh, off); \
+ break; \
+ case 8: \
+ val = bus_space_read_8(sc->sc_iot, sc->sc_ioh, off); \
+ break; \
+ } \
+ DNPRINTF(2, "%s: %d: off %#zx size %#zx read %#llx\n", \
+   __func__, __LINE__, off, size, (unsigned long long)val); \
+ val; \
+ })
+
+#define CWRITE(sc, memb, val) \
+ do { \
+ struct virtio_pci_common_cfg c; \
+ size_t off = offsetof(struct virtio_pci_common_cfg, memb); \
+ size_t size = sizeof(c.memb); \
+ \
+ DNPRINTF(2, "%s: %d: off %#zx size %#zx write %#llx\n", \
+    __func__, __LINE__, off, size, (unsigned long long)val); \
+ switch (size) { \
+ case 1: \
+ bus_space_write_1(sc->sc_iot, sc->sc_ioh, off, val); \
+ break; \
+ case 2: \
+ bus_space_write_2(sc->sc_iot, sc->sc_ioh, off, val); \
+ break; \
+ case 4: \
+ bus_space_write_4(sc->sc_iot, sc->sc_ioh, off, val); \
+ break; \
+ case 8: \
+ bus_space_write_8(sc->sc_iot, sc->sc_ioh, off, val); \
+ break; \
+ } \
+ } while (0)
+
+
 /*
  * XXX: Before being used on big endian arches, the access to config registers
  * XXX: needs to be reviewed/fixed. The non-device specific registers are
@@ -52,9 +107,12 @@ struct virtio_pci_softc;
 
 int virtio_pci_match(struct device *, void *, void *);
 void virtio_pci_attach(struct device *, struct device *, void *);
+int virtio_pci_attach_09(struct virtio_pci_softc *sc, struct pci_attach_args *pa);
+int virtio_pci_attach_10(struct virtio_pci_softc *sc, struct pci_attach_args *pa);
 int virtio_pci_detach(struct device *, int);
 
 void virtio_pci_kick(struct virtio_softc *, uint16_t);
+int virtio_pci_adjust_config_region(struct virtio_pci_softc *);
 uint8_t virtio_pci_read_device_config_1(struct virtio_softc *, int);
 uint16_t virtio_pci_read_device_config_2(struct virtio_softc *, int);
 uint32_t virtio_pci_read_device_config_4(struct virtio_softc *, int);
@@ -64,10 +122,12 @@ void virtio_pci_write_device_config_2(struct virtio_softc *, int, uint16_t);
 void virtio_pci_write_device_config_4(struct virtio_softc *, int, uint32_t);
 void virtio_pci_write_device_config_8(struct virtio_softc *, int, uint64_t);
 uint16_t virtio_pci_read_queue_size(struct virtio_softc *, uint16_t);
-void virtio_pci_setup_queue(struct virtio_softc *, uint16_t, uint32_t);
+void virtio_pci_setup_queue(struct virtio_softc *, struct virtqueue *, uint64_t);
 void virtio_pci_set_status(struct virtio_softc *, int);
-uint32_t virtio_pci_negotiate_features(struct virtio_softc *, uint32_t,
-      const struct virtio_feature_name *);
+int virtio_pci_negotiate_features(struct virtio_softc *, const struct virtio_feature_name *);
+int virtio_pci_negotiate_features_10(struct virtio_softc *, const struct virtio_feature_name *);
+void virtio_pci_set_msix_queue_vector(struct virtio_pci_softc *, uint32_t, uint16_t);
+void virtio_pci_set_msix_config_vector(struct virtio_pci_softc *, uint16_t);
 int virtio_pci_msix_establish(struct virtio_pci_softc *, struct pci_attach_args *, int, int (*)(void *), void *);
 int virtio_pci_setup_msix(struct virtio_pci_softc *, struct pci_attach_args *, int);
 void virtio_pci_free_irqs(struct virtio_pci_softc *);
@@ -77,6 +137,10 @@ int virtio_pci_legacy_intr_mpsafe(void *);
 int virtio_pci_config_intr(void *);
 int virtio_pci_queue_intr(void *);
 int virtio_pci_shared_queue_intr(void *);
+int virtio_pci_find_cap(struct virtio_pci_softc *sc, int cfg_type, void *buf, int buflen);
+#ifdef VIRTIO_DEBUG
+void virtio_pci_dump_caps(struct virtio_pci_softc *sc);
+#endif
 
 enum irq_type {
  IRQ_NO_MSIX,
@@ -87,14 +151,38 @@ enum irq_type {
 struct virtio_pci_softc {
  struct virtio_softc sc_sc;
  pci_chipset_tag_t sc_pc;
+ pcitag_t sc_ptag;
 
  bus_space_tag_t sc_iot;
  bus_space_handle_t sc_ioh;
  bus_size_t sc_iosize;
 
+ bus_space_tag_t sc_bars_iot[4];
+ bus_space_handle_t sc_bars_ioh[4];
+ bus_size_t sc_bars_iosize[4];
+
+ bus_space_tag_t sc_notify_iot;
+ bus_space_handle_t sc_notify_ioh;
+ bus_size_t sc_notify_iosize;
+ unsigned int sc_notify_off_multiplier;
+
+ bus_space_tag_t sc_devcfg_iot;
+ bus_space_handle_t sc_devcfg_ioh;
+ bus_size_t sc_devcfg_iosize;
+ /*
+ * With 0.9, the offset of the devcfg region in the io bar changes
+ * depending on MSI-X being enabled or not.
+ * With 1.0, this field is still used to remember if MSI-X is enabled
+ * or not.
+ */
+ unsigned int sc_devcfg_offset;
+
+ bus_space_tag_t sc_isr_iot;
+ bus_space_handle_t sc_isr_ioh;
+ bus_size_t sc_isr_iosize;
+
  void *sc_ih[MAX_MSIX_VECS];
 
- int sc_config_offset;
  enum irq_type sc_irq_type;
 };
 
@@ -127,20 +215,44 @@ uint16_t
 virtio_pci_read_queue_size(struct virtio_softc *vsc, uint16_t idx)
 {
  struct virtio_pci_softc *sc = (struct virtio_pci_softc *)vsc;
- bus_space_write_2(sc->sc_iot, sc->sc_ioh, VIRTIO_CONFIG_QUEUE_SELECT,
-    idx);
- return bus_space_read_2(sc->sc_iot, sc->sc_ioh,
-    VIRTIO_CONFIG_QUEUE_SIZE);
+ uint16_t ret;
+ if (sc->sc_sc.sc_version_1) {
+ CWRITE(sc, queue_select, idx);
+ ret = CREAD(sc, queue_size);
+ } else {
+ bus_space_write_2(sc->sc_iot, sc->sc_ioh,
+    VIRTIO_CONFIG_QUEUE_SELECT, idx);
+ ret = bus_space_read_2(sc->sc_iot, sc->sc_ioh,
+    VIRTIO_CONFIG_QUEUE_SIZE);
+ }
+ return ret;
 }
 
 void
-virtio_pci_setup_queue(struct virtio_softc *vsc, uint16_t idx, uint32_t addr)
+virtio_pci_setup_queue(struct virtio_softc *vsc, struct virtqueue *vq,
+    uint64_t addr)
 {
  struct virtio_pci_softc *sc = (struct virtio_pci_softc *)vsc;
- bus_space_write_2(sc->sc_iot, sc->sc_ioh, VIRTIO_CONFIG_QUEUE_SELECT,
-    idx);
- bus_space_write_4(sc->sc_iot, sc->sc_ioh, VIRTIO_CONFIG_QUEUE_ADDRESS,
-    addr);
+ if (sc->sc_sc.sc_version_1) {
+ CWRITE(sc, queue_select, vq->vq_index);
+ if (addr == 0) {
+ CWRITE(sc, queue_enable, 0);
+ CWRITE(sc, queue_desc, 0);
+ CWRITE(sc, queue_avail, 0);
+ CWRITE(sc, queue_used, 0);
+ } else {
+ CWRITE(sc, queue_desc, addr);
+ CWRITE(sc, queue_avail, addr + vq->vq_availoffset);
+ CWRITE(sc, queue_used, addr + vq->vq_usedoffset);
+ CWRITE(sc, queue_enable, 1);
+ vq->vq_notify_off = CREAD(sc, queue_notify_off);
+ }
+ } else {
+ bus_space_write_2(sc->sc_iot, sc->sc_ioh,
+    VIRTIO_CONFIG_QUEUE_SELECT, vq->vq_index);
+ bus_space_write_4(sc->sc_iot, sc->sc_ioh,
+    VIRTIO_CONFIG_QUEUE_ADDRESS, addr / VIRTIO_PAGE_SIZE);
+ }
 
  /*
  * This path is only executed if this function is called after
@@ -150,9 +262,13 @@ virtio_pci_setup_queue(struct virtio_softc *vsc, uint16_t idx, uint32_t addr)
  if (sc->sc_irq_type != IRQ_NO_MSIX) {
  int vec = 1;
  if (sc->sc_irq_type == IRQ_MSIX_PER_VQ)
-       vec += idx;
- bus_space_write_2(sc->sc_iot, sc->sc_ioh,
-    VIRTIO_MSI_QUEUE_VECTOR, vec);
+       vec += vq->vq_index;
+ if (sc->sc_sc.sc_version_1) {
+ CWRITE(sc, queue_msix_vector, vec);
+ } else {
+ bus_space_write_2(sc->sc_iot, sc->sc_ioh,
+    VIRTIO_MSI_QUEUE_VECTOR, vec);
+ }
  }
 }
 
@@ -162,11 +278,17 @@ virtio_pci_set_status(struct virtio_softc *vsc, int status)
  struct virtio_pci_softc *sc = (struct virtio_pci_softc *)vsc;
  int old = 0;
 
- if (status != 0)
- old = bus_space_read_1(sc->sc_iot, sc->sc_ioh,
-       VIRTIO_CONFIG_DEVICE_STATUS);
- bus_space_write_1(sc->sc_iot, sc->sc_ioh, VIRTIO_CONFIG_DEVICE_STATUS,
-  status|old);
+ if (sc->sc_sc.sc_version_1) {
+ if (status != 0)
+ old = CREAD(sc, device_status);
+ CWRITE(sc, device_status, status|old);
+ } else {
+ if (status != 0)
+ old = bus_space_read_1(sc->sc_iot, sc->sc_ioh,
+    VIRTIO_CONFIG_DEVICE_STATUS);
+ bus_space_write_1(sc->sc_iot, sc->sc_ioh,
+    VIRTIO_CONFIG_DEVICE_STATUS, status|old);
+ }
 }
 
 int
@@ -175,17 +297,226 @@ virtio_pci_match(struct device *parent, void *match, void *aux)
  struct pci_attach_args *pa;
 
  pa = (struct pci_attach_args *)aux;
- if (PCI_VENDOR(pa->pa_id) == PCI_VENDOR_QUMRANET &&
-    PCI_PRODUCT(pa->pa_id) >= 0x1000 &&
+ if (PCI_VENDOR(pa->pa_id) == PCI_VENDOR_OPENBSD &&
+    PCI_PRODUCT(pa->pa_id) == PCI_PRODUCT_OPENBSD_CONTROL)
+ return 1;
+ if (PCI_VENDOR(pa->pa_id) != PCI_VENDOR_QUMRANET)
+ return 0;
+ /* virtio 0.9 */
+ if (PCI_PRODUCT(pa->pa_id) >= 0x1000 &&
     PCI_PRODUCT(pa->pa_id) <= 0x103f &&
     PCI_REVISION(pa->pa_class) == 0)
  return 1;
- if (PCI_VENDOR(pa->pa_id) == PCI_VENDOR_OPENBSD &&
-    PCI_PRODUCT(pa->pa_id) == PCI_PRODUCT_OPENBSD_CONTROL)
+ /* virtio 1.0 */
+ if (PCI_PRODUCT(pa->pa_id) >= 0x1040 &&
+    PCI_PRODUCT(pa->pa_id) <= 0x107f &&
+    PCI_REVISION(pa->pa_class) == 1)
  return 1;
  return 0;
 }
 
+#ifdef VIRTIO_DEBUG
+void
+virtio_pci_dump_caps(struct virtio_pci_softc *sc)
+{
+ pci_chipset_tag_t pc = sc->sc_pc;
+ pcitag_t tag = sc->sc_ptag;
+ int offset;
+ union {
+ pcireg_t reg[4];
+ struct virtio_pci_cap vcap;
+ } v;
+
+ if (!pci_get_capability(pc, tag, PCI_CAP_VENDSPEC, &offset, &v.reg[0]))
+ return;
+
+ printf("\n");
+ do {
+ for (int i = 0; i < 4; i++)
+ v.reg[i] = pci_conf_read(pc, tag, offset + i * 4);
+ printf("%s: cfgoff %#x len %#x type %#x bar %#x: off %#x len %#x\n",
+ __func__, offset, v.vcap.cap_len, v.vcap.cfg_type, v.vcap.bar,
+ v.vcap.offset, v.vcap.length);
+ offset = v.vcap.cap_next;
+ } while (offset != 0);
+}
+#endif
+
+int
+virtio_pci_find_cap(struct virtio_pci_softc *sc, int cfg_type, void *buf, int buflen)
+{
+ pci_chipset_tag_t pc = sc->sc_pc;
+ pcitag_t tag = sc->sc_ptag;
+ unsigned int offset, i, len;
+ union {
+ pcireg_t reg[8];
+ struct virtio_pci_cap vcap;
+ } *v = buf;
+
+ if (buflen < sizeof(struct virtio_pci_cap))
+ return ERANGE;
+
+ if (!pci_get_capability(pc, tag, PCI_CAP_VENDSPEC, &offset, &v->reg[0]))
+ return ENOENT;
+
+ do {
+ for (i = 0; i < 4; i++)
+ v->reg[i] = pci_conf_read(pc, tag, offset + i * 4);
+ if (v->vcap.cfg_type == cfg_type)
+ break;
+ offset = v->vcap.cap_next;
+ } while (offset != 0);
+
+ if (offset == 0)
+ return ENOENT;
+
+ if (v->vcap.cap_len > sizeof(struct virtio_pci_cap)) {
+ len = roundup(v->vcap.cap_len, sizeof(pcireg_t));
+ if (len > buflen) {
+ printf("%s: cap too large\n", __func__);
+ return ERANGE;
+ }
+ for (i = 4; i < len / sizeof(pcireg_t);  i++)
+ v->reg[i] = pci_conf_read(pc, tag, offset + i * 4);
+ }
+
+ return 0;
+}
+
+int
+virtio_pci_attach_10(struct virtio_pci_softc *sc, struct pci_attach_args *pa)
+{
+ struct virtio_pci_cap common, isr, device;
+ struct virtio_pci_notify_cap notify;
+ int have_device_cfg = 0;
+ bus_size_t bars[6] = { 0 };
+ int bars_idx[6] = { 0 };
+ struct virtio_pci_cap *caps[] = { &common, &isr, &device, &notify.cap };
+ int i, j = 0, ret = 0;
+
+ if (virtio_pci_find_cap(sc, VIRTIO_PCI_CAP_COMMON_CFG, &common, sizeof(common)) != 0)
+ return ENODEV;
+
+ if (virtio_pci_find_cap(sc, VIRTIO_PCI_CAP_NOTIFY_CFG, &notify, sizeof(notify)) != 0)
+ return ENODEV;
+ if (virtio_pci_find_cap(sc, VIRTIO_PCI_CAP_ISR_CFG, &isr, sizeof(isr)) != 0)
+ return ENODEV;
+ if (virtio_pci_find_cap(sc, VIRTIO_PCI_CAP_DEVICE_CFG, &device, sizeof(device)) != 0)
+ memset(&device, 0, sizeof(device));
+ else
+ have_device_cfg = 1;
+
+ /*
+ * XXX Maybe there are devices that offer the pci caps but not the
+ * XXX VERSION_1 feature bit? Then we should check the feature bit
+ * XXX here and fall back to 0.9 out if not present.
+ */
+
+ /* Figure out which bars we need to map */
+ for (i = 0; i < nitems(caps); i++) {
+ int bar = caps[i]->bar;
+ bus_size_t len = caps[i]->offset + caps[i]->length;
+ if (caps[i]->length == 0)
+ continue;
+ if (bars[bar] < len)
+ bars[bar] = len;
+ }
+
+ for (i = 0; i < nitems(bars); i++) {
+ int reg;
+ pcireg_t type;
+ if (bars[i] == 0)
+ continue;
+ reg = PCI_MAPREG_START + i * 4;
+ type = pci_mapreg_type(sc->sc_pc, sc->sc_ptag, reg);
+ if (pci_mapreg_map(pa, reg, type, 0, &sc->sc_bars_iot[j],
+    &sc->sc_bars_ioh[j], NULL, &sc->sc_bars_iosize[j],
+    bars[i])) {
+ printf("%s: can't map bar %u \n",
+    sc->sc_sc.sc_dev.dv_xname, i);
+ ret = EIO;
+ goto err;
+ }
+ bars_idx[i] = j;
+ j++;
+ }
+
+ i = bars_idx[notify.cap.bar];
+ if (bus_space_subregion(sc->sc_bars_iot[i], sc->sc_bars_ioh[i],
+    notify.cap.offset, notify.cap.length, &sc->sc_notify_ioh) != 0) {
+ printf("%s: can't map notify i/o space\n",
+    sc->sc_sc.sc_dev.dv_xname);
+ ret = EIO;
+ goto err;
+ }
+ sc->sc_notify_iosize = notify.cap.length;
+ sc->sc_notify_iot = sc->sc_bars_iot[i];
+ sc->sc_notify_off_multiplier = notify.notify_off_multiplier;
+
+ if (have_device_cfg) {
+ i = bars_idx[device.bar];
+ if (bus_space_subregion(sc->sc_bars_iot[i], sc->sc_bars_ioh[i],
+    device.offset, device.length, &sc->sc_devcfg_ioh) != 0) {
+ printf("%s: can't map devcfg i/o space\n",
+    sc->sc_sc.sc_dev.dv_xname);
+ ret = EIO;
+ goto err;
+ }
+ sc->sc_devcfg_iosize = device.length;
+ sc->sc_devcfg_iot = sc->sc_bars_iot[i];
+ }
+
+ i = bars_idx[isr.bar];
+ if (bus_space_subregion(sc->sc_bars_iot[i], sc->sc_bars_ioh[i],
+    isr.offset, isr.length, &sc->sc_isr_ioh) != 0) {
+ printf("%s: can't map isr i/o space\n",
+    sc->sc_sc.sc_dev.dv_xname);
+ ret = EIO;
+ goto err;
+ }
+ sc->sc_isr_iosize = isr.length;
+ sc->sc_isr_iot = sc->sc_bars_iot[i];
+
+ i = bars_idx[common.bar];
+ if (bus_space_subregion(sc->sc_bars_iot[i], sc->sc_bars_ioh[i],
+    common.offset, common.length, &sc->sc_ioh) != 0) {
+ printf("%s: can't map common i/o space\n",
+    sc->sc_sc.sc_dev.dv_xname);
+ ret = EIO;
+ goto err;
+ }
+ sc->sc_iosize = common.length;
+ sc->sc_iot = sc->sc_bars_iot[i];
+
+ sc->sc_sc.sc_version_1 = 1;
+ return 0;
+
+err:
+ /* there is no pci_mapreg_unmap() */
+ return ret;
+}
+
+int
+virtio_pci_attach_09(struct virtio_pci_softc *sc, struct pci_attach_args *pa)
+{
+ struct virtio_softc *vsc = &sc->sc_sc;
+ if (pci_mapreg_map(pa, PCI_MAPREG_START, PCI_MAPREG_TYPE_IO, 0,
+    &sc->sc_iot, &sc->sc_ioh, NULL, &sc->sc_iosize, 0)) {
+ printf("%s: can't map i/o space\n", vsc->sc_dev.dv_xname);
+ return EIO;
+ }
+
+ if (bus_space_subregion(sc->sc_iot, sc->sc_ioh,
+    VIRTIO_CONFIG_QUEUE_NOTIFY, 2, &sc->sc_notify_ioh) != 0) {
+ printf("%s: can't map notify i/o space\n",
+    vsc->sc_dev.dv_xname);
+ return EIO;
+ }
+ sc->sc_notify_iosize = 2;
+ sc->sc_notify_iot = sc->sc_iot;
+ return 0;
+}
+
 void
 virtio_pci_attach(struct device *parent, struct device *self, void *aux)
 {
@@ -194,27 +525,28 @@ virtio_pci_attach(struct device *parent, struct device *self, void *aux)
  struct pci_attach_args *pa = (struct pci_attach_args *)aux;
  pci_chipset_tag_t pc = pa->pa_pc;
  pcitag_t tag = pa->pa_tag;
- int revision;
+ int revision, ret = ENODEV;
  pcireg_t id;
  char const *intrstr;
  pci_intr_handle_t ih;
 
  revision = PCI_REVISION(pa->pa_class);
- if (revision != 0) {
+ switch (revision) {
+ case 0:
+ /* subsystem ID shows what I am */
+ id = PCI_PRODUCT(pci_conf_read(pc, tag, PCI_SUBSYS_ID_REG));
+ break;
+ case 1:
+ id = PCI_PRODUCT(pa->pa_id) - 0x1040;
+ break;
+ default:
  printf("unknown revision 0x%02x; giving up\n", revision);
  return;
  }
 
- /* subsystem ID shows what I am */
- id = PCI_PRODUCT(pci_conf_read(pc, tag, PCI_SUBSYS_ID_REG));
-
- printf("\n");
-
- vsc->sc_ops = &virtio_pci_ops;
  sc->sc_pc = pc;
+ sc->sc_ptag = pa->pa_tag;
  vsc->sc_dmat = pa->pa_dmat;
- sc->sc_config_offset = VIRTIO_CONFIG_DEVICE_CONFIG_NOMSI;
- sc->sc_irq_type = IRQ_NO_MSIX;
 
  /*
  * For virtio, ignore normal MSI black/white-listing depending on the
@@ -222,17 +554,34 @@ virtio_pci_attach(struct device *parent, struct device *self, void *aux)
  */
  pa->pa_flags |= PCI_FLAGS_MSI_ENABLED;
 
- if (pci_mapreg_map(pa, PCI_MAPREG_START, PCI_MAPREG_TYPE_IO, 0,
-    &sc->sc_iot, &sc->sc_ioh, NULL, &sc->sc_iosize, 0)) {
- printf("%s: can't map i/o space\n", vsc->sc_dev.dv_xname);
+#ifdef VIRTIO_DEBUG
+ virtio_pci_dump_caps(sc);
+#endif
+
+ vsc->sc_ops = &virtio_pci_ops;
+ if ((vsc->sc_dev.dv_cfdata->cf_flags & VIRTIO_CF_NO_VERSION_1) == 0 &&
+    (revision == 1 ||
+     (vsc->sc_dev.dv_cfdata->cf_flags & VIRTIO_CF_PREFER_VERSION_1))) {
+ ret = virtio_pci_attach_10(sc, pa);
+ }
+ if (ret != 0 && revision == 0) {
+ /* revision 0 means 0.9 only or both 0.9 and 1.0 */
+ ret = virtio_pci_attach_09(sc, pa);
+ }
+ if (ret != 0) {
+ printf(": Cannot attach (%d)\n", ret);
  return;
  }
 
+ sc->sc_devcfg_offset = VIRTIO_CONFIG_DEVICE_CONFIG_NOMSI;
+ sc->sc_irq_type = IRQ_NO_MSIX;
+ if (virtio_pci_adjust_config_region(sc) != 0)
+ return;
+
  virtio_device_reset(vsc);
- virtio_pci_set_status(vsc, VIRTIO_CONFIG_DEVICE_STATUS_ACK);
- virtio_pci_set_status(vsc, VIRTIO_CONFIG_DEVICE_STATUS_DRIVER);
+ virtio_set_status(vsc, VIRTIO_CONFIG_DEVICE_STATUS_ACK);
+ virtio_set_status(vsc, VIRTIO_CONFIG_DEVICE_STATUS_DRIVER);
 
- /* XXX: use softc as aux... */
  vsc->sc_childdevid = id;
  vsc->sc_child = NULL;
  config_found(self, sc, NULL);
@@ -311,44 +660,132 @@ virtio_pci_detach(struct device *self, int flags)
  return 0;
 }
 
+int
+virtio_pci_adjust_config_region(struct virtio_pci_softc *sc)
+{
+ if (sc->sc_sc.sc_version_1)
+ return 0;
+ sc->sc_devcfg_iosize = sc->sc_iosize - sc->sc_devcfg_offset;
+ sc->sc_devcfg_iot = sc->sc_iot;
+ if (bus_space_subregion(sc->sc_iot, sc->sc_ioh, sc->sc_devcfg_offset,
+    sc->sc_devcfg_iosize, &sc->sc_devcfg_ioh) != 0) {
+ printf("%s: can't map config i/o space\n",
+    sc->sc_sc.sc_dev.dv_xname);
+ return 1;
+ }
+ return 0;
+}
+
 /*
  * Feature negotiation.
  * Prints available / negotiated features if guest_feature_names != NULL and
  * VIRTIO_DEBUG is 1
  */
-uint32_t
-virtio_pci_negotiate_features(struct virtio_softc *vsc, uint32_t guest_features,
+int
+virtio_pci_negotiate_features(struct virtio_softc *vsc,
     const struct virtio_feature_name *guest_feature_names)
 {
  struct virtio_pci_softc *sc = (struct virtio_pci_softc *)vsc;
- uint32_t host, neg;
+ uint64_t host, negotiated;
+
+ vsc->sc_active_features = 0;
 
  /*
- * indirect descriptors can be switched off by setting bit 1 in the
- * driver flags, see config(8)
+ * We enable indirect descriptors by default. They can be switched
+ * off by setting bit 1 in the driver flags, see config(8)
  */
- if (!(vsc->sc_dev.dv_cfdata->cf_flags & 1) &&
-    !(vsc->sc_child->dv_cfdata->cf_flags & 1)) {
- guest_features |= VIRTIO_F_RING_INDIRECT_DESC;
- } else {
- printf("RingIndirectDesc disabled by UKC\n");
+ if (!(vsc->sc_dev.dv_cfdata->cf_flags & VIRTIO_CF_NO_INDIRECT) &&
+    !(vsc->sc_child->dv_cfdata->cf_flags & VIRTIO_CF_NO_INDIRECT)) {
+ vsc->sc_driver_features |= 1ULL << VIRTIO_F_RING_INDIRECT_DESC;
+ } else if (guest_feature_names != NULL) {
+ printf(" RingIndirectDesc disabled by UKC");
  }
+
+ /*
+ * The driver must add VIRTIO_F_RING_EVENT_IDX if it supports it.
+ * If it did, check if it is disabled by bit 2 in the driver flags.
+ */
+ if (vsc->sc_driver_features & (1ULL << VIRTIO_F_RING_EVENT_IDX)) {
+ if ((vsc->sc_dev.dv_cfdata->cf_flags & VIRTIO_CF_NO_EVENT_IDX) ||
+    (vsc->sc_child->dv_cfdata->cf_flags & VIRTIO_CF_NO_EVENT_IDX)) {
+ if (guest_feature_names != NULL)
+ printf(" RingEventIdx disabled by UKC");
+ vsc->sc_driver_features &= ~(1ULL << VIRTIO_F_RING_EVENT_IDX);
+ }
+ }
+
+ if (vsc->sc_version_1) {
+ return virtio_pci_negotiate_features_10(vsc,
+    guest_feature_names);
+ }
+
+ /* virtio 0.9 only */
  host = bus_space_read_4(sc->sc_iot, sc->sc_ioh,
  VIRTIO_CONFIG_DEVICE_FEATURES);
- neg = host & guest_features;
+ negotiated = host & vsc->sc_driver_features;
 #if VIRTIO_DEBUG
  if (guest_feature_names)
- virtio_log_features(host, neg, guest_feature_names);
+ virtio_log_features(host, negotiated, guest_feature_names);
 #endif
  bus_space_write_4(sc->sc_iot, sc->sc_ioh,
-  VIRTIO_CONFIG_GUEST_FEATURES, neg);
- vsc->sc_features = neg;
- if (neg & VIRTIO_F_RING_INDIRECT_DESC)
+  VIRTIO_CONFIG_GUEST_FEATURES, negotiated);
+ vsc->sc_active_features = negotiated;
+ if (negotiated & VIRTIO_F_RING_INDIRECT_DESC)
+ vsc->sc_indirect = 1;
+ else
+ vsc->sc_indirect = 0;
+ return 0;
+}
+
+int
+virtio_pci_negotiate_features_10(struct virtio_softc *vsc,
+    const struct virtio_feature_name *guest_feature_names)
+{
+ struct virtio_pci_softc *sc = (struct virtio_pci_softc *)vsc;
+ uint64_t host, negotiated;
+
+ vsc->sc_driver_features |= 1ULL << VIRTIO_F_VERSION_1;
+ /* notify on empty is 0.9 only */
+ vsc->sc_driver_features &= ~(1ULL << VIRTIO_F_NOTIFY_ON_EMPTY);
+ CWRITE(sc, device_feature_select, 0);
+ host = CREAD(sc, device_feature);
+ CWRITE(sc, device_feature_select, 1);
+ host |= (uint64_t)CREAD(sc, device_feature) << 32;
+
+ negotiated = host & vsc->sc_driver_features;
+#if VIRTIO_DEBUG
+ if (guest_feature_names)
+ virtio_log_features(host, negotiated, guest_feature_names);
+#endif
+ CWRITE(sc, driver_feature_select, 0);
+ CWRITE(sc, driver_feature, negotiated & 0xffffffff);
+ CWRITE(sc, driver_feature_select, 1);
+ CWRITE(sc, driver_feature, negotiated >> 32);
+ virtio_pci_set_status(vsc, VIRTIO_CONFIG_DEVICE_STATUS_FEATURES_OK);
+
+ if ((CREAD(sc, device_status) &
+    VIRTIO_CONFIG_DEVICE_STATUS_FEATURES_OK) == 0) {
+#if VIRTIO_DEBUG
+ printf("%s: Feature negotiation failed\n", __func__);
+#endif
+ CWRITE(sc, device_status, VIRTIO_CONFIG_DEVICE_STATUS_FAILED);
+ return ENXIO;
+ }
+ vsc->sc_active_features = negotiated;
+
+ if (negotiated & VIRTIO_F_RING_INDIRECT_DESC)
  vsc->sc_indirect = 1;
  else
  vsc->sc_indirect = 0;
 
- return neg;
+ if ((negotiated & (1ULL << VIRTIO_F_VERSION_1)) == 0) {
+#if VIRTIO_DEBUG
+ printf("%s: Host rejected Version_1\n", __func__);
+#endif
+ CWRITE(sc, device_status, VIRTIO_CONFIG_DEVICE_STATUS_FAILED);
+ return EINVAL;
+ }
+ return 0;
 }
 
 /*
@@ -358,24 +795,21 @@ uint8_t
 virtio_pci_read_device_config_1(struct virtio_softc *vsc, int index)
 {
  struct virtio_pci_softc *sc = (struct virtio_pci_softc *)vsc;
- return bus_space_read_1(sc->sc_iot, sc->sc_ioh,
-    sc->sc_config_offset + index);
+ return bus_space_read_1(sc->sc_devcfg_iot, sc->sc_devcfg_ioh, index);
 }
 
 uint16_t
 virtio_pci_read_device_config_2(struct virtio_softc *vsc, int index)
 {
  struct virtio_pci_softc *sc = (struct virtio_pci_softc *)vsc;
- return bus_space_read_2(sc->sc_iot, sc->sc_ioh,
-    sc->sc_config_offset + index);
+ return bus_space_read_2(sc->sc_devcfg_iot, sc->sc_devcfg_ioh, index);
 }
 
 uint32_t
 virtio_pci_read_device_config_4(struct virtio_softc *vsc, int index)
 {
  struct virtio_pci_softc *sc = (struct virtio_pci_softc *)vsc;
- return bus_space_read_4(sc->sc_iot, sc->sc_ioh,
-    sc->sc_config_offset + index);
+ return bus_space_read_4(sc->sc_devcfg_iot, sc->sc_devcfg_ioh, index);
 }
 
 uint64_t
@@ -384,11 +818,10 @@ virtio_pci_read_device_config_8(struct virtio_softc *vsc, int index)
  struct virtio_pci_softc *sc = (struct virtio_pci_softc *)vsc;
  uint64_t r;
 
- r = bus_space_read_4(sc->sc_iot, sc->sc_ioh,
-    sc->sc_config_offset + index + sizeof(uint32_t));
+ r = bus_space_read_4(sc->sc_devcfg_iot, sc->sc_devcfg_ioh,
+    index + sizeof(uint32_t));
  r <<= 32;
- r += bus_space_read_4(sc->sc_iot, sc->sc_ioh,
-    sc->sc_config_offset + index);
+ r += bus_space_read_4(sc->sc_devcfg_iot, sc->sc_devcfg_ioh, index);
  return r;
 }
 
@@ -397,8 +830,7 @@ virtio_pci_write_device_config_1(struct virtio_softc *vsc, int index,
     uint8_t value)
 {
  struct virtio_pci_softc *sc = (struct virtio_pci_softc *)vsc;
- bus_space_write_1(sc->sc_iot, sc->sc_ioh,
-    sc->sc_config_offset + index, value);
+ bus_space_write_1(sc->sc_devcfg_iot, sc->sc_devcfg_ioh, index, value);
 }
 
 void
@@ -406,8 +838,7 @@ virtio_pci_write_device_config_2(struct virtio_softc *vsc, int index,
     uint16_t value)
 {
  struct virtio_pci_softc *sc = (struct virtio_pci_softc *)vsc;
- bus_space_write_2(sc->sc_iot, sc->sc_ioh,
-    sc->sc_config_offset + index, value);
+ bus_space_write_2(sc->sc_devcfg_iot, sc->sc_devcfg_ioh, index, value);
 }
 
 void
@@ -415,8 +846,7 @@ virtio_pci_write_device_config_4(struct virtio_softc *vsc,
      int index, uint32_t value)
 {
  struct virtio_pci_softc *sc = (struct virtio_pci_softc *)vsc;
- bus_space_write_4(sc->sc_iot, sc->sc_ioh,
-    sc->sc_config_offset + index, value);
+ bus_space_write_4(sc->sc_devcfg_iot, sc->sc_devcfg_ioh, index, value);
 }
 
 void
@@ -424,10 +854,10 @@ virtio_pci_write_device_config_8(struct virtio_softc *vsc,
      int index, uint64_t value)
 {
  struct virtio_pci_softc *sc = (struct virtio_pci_softc *)vsc;
- bus_space_write_4(sc->sc_iot, sc->sc_ioh,
-    sc->sc_config_offset + index, value & 0xffffffff);
- bus_space_write_4(sc->sc_iot, sc->sc_ioh,
-    sc->sc_config_offset + index + sizeof(uint32_t), value >> 32);
+ bus_space_write_4(sc->sc_devcfg_iot, sc->sc_devcfg_ioh,
+    index, value & 0xffffffff);
+ bus_space_write_4(sc->sc_devcfg_iot, sc->sc_devcfg_ioh,
+    index + sizeof(uint32_t), value >> 32);
 }
 
 int
@@ -454,18 +884,42 @@ virtio_pci_msix_establish(struct virtio_pci_softc *sc,
  return 0;
 }
 
+void
+virtio_pci_set_msix_queue_vector(struct virtio_pci_softc *sc, uint32_t idx, uint16_t vector)
+{
+ if (sc->sc_sc.sc_version_1) {
+ CWRITE(sc, queue_select, idx);
+ CWRITE(sc, queue_msix_vector, vector);
+ } else {
+ bus_space_write_2(sc->sc_iot, sc->sc_ioh,
+    VIRTIO_CONFIG_QUEUE_SELECT, idx);
+ bus_space_write_2(sc->sc_iot, sc->sc_ioh,
+    VIRTIO_MSI_QUEUE_VECTOR, vector);
+ }
+}
+
+void
+virtio_pci_set_msix_config_vector(struct virtio_pci_softc *sc, uint16_t vector)
+{
+ if (sc->sc_sc.sc_version_1) {
+ CWRITE(sc, config_msix_vector, vector);
+ } else {
+ bus_space_write_2(sc->sc_iot, sc->sc_ioh,
+    VIRTIO_MSI_CONFIG_VECTOR, vector);
+ }
+}
+
+
 void
 virtio_pci_free_irqs(struct virtio_pci_softc *sc)
 {
  struct virtio_softc *vsc = &sc->sc_sc;
  int i;
 
- if (sc->sc_config_offset == VIRTIO_CONFIG_DEVICE_CONFIG_MSI) {
+ if (sc->sc_devcfg_offset == VIRTIO_CONFIG_DEVICE_CONFIG_MSI) {
  for (i = 0; i < vsc->sc_nvqs; i++) {
- bus_space_write_2(sc->sc_iot, sc->sc_ioh,
-    VIRTIO_CONFIG_QUEUE_SELECT, i);
- bus_space_write_2(sc->sc_iot, sc->sc_ioh,
-    VIRTIO_MSI_QUEUE_VECTOR, VIRTIO_MSI_NO_VECTOR);
+ virtio_pci_set_msix_queue_vector(sc, i,
+    VIRTIO_MSI_NO_VECTOR);
  }
  }
 
@@ -476,7 +930,8 @@ virtio_pci_free_irqs(struct virtio_pci_softc *sc)
  }
  }
 
- sc->sc_config_offset = VIRTIO_CONFIG_DEVICE_CONFIG_NOMSI;
+ sc->sc_devcfg_offset = VIRTIO_CONFIG_DEVICE_CONFIG_NOMSI;
+ virtio_pci_adjust_config_region(sc);
 }
 
 int
@@ -488,8 +943,9 @@ virtio_pci_setup_msix(struct virtio_pci_softc *sc, struct pci_attach_args *pa,
 
  if (virtio_pci_msix_establish(sc, pa, 0, virtio_pci_config_intr, vsc))
  return 1;
- sc->sc_config_offset = VIRTIO_CONFIG_DEVICE_CONFIG_MSI;
- bus_space_write_2(sc->sc_iot, sc->sc_ioh, VIRTIO_MSI_CONFIG_VECTOR, 0);
+ sc->sc_devcfg_offset = VIRTIO_CONFIG_DEVICE_CONFIG_MSI;
+ virtio_pci_adjust_config_region(sc);
+ virtio_pci_set_msix_config_vector(sc, 0);
 
  if (shared) {
  if (virtio_pci_msix_establish(sc, pa, 1,
@@ -497,22 +953,15 @@ virtio_pci_setup_msix(struct virtio_pci_softc *sc, struct pci_attach_args *pa,
  goto fail;
  }
 
- for (i = 0; i < vsc->sc_nvqs; i++) {
- bus_space_write_2(sc->sc_iot, sc->sc_ioh,
-    VIRTIO_CONFIG_QUEUE_SELECT, i);
- bus_space_write_2(sc->sc_iot, sc->sc_ioh,
-    VIRTIO_MSI_QUEUE_VECTOR, 1);
- }
+ for (i = 0; i < vsc->sc_nvqs; i++)
+ virtio_pci_set_msix_queue_vector(sc, i, 1);
  } else {
  for (i = 0; i <= vsc->sc_nvqs; i++) {
  if (virtio_pci_msix_establish(sc, pa, i + 1,
     virtio_pci_queue_intr, &vsc->sc_vqs[i])) {
  goto fail;
  }
- bus_space_write_2(sc->sc_iot, sc->sc_ioh,
-    VIRTIO_CONFIG_QUEUE_SELECT, i);
- bus_space_write_2(sc->sc_iot, sc->sc_ioh,
-    VIRTIO_MSI_QUEUE_VECTOR, i + 1);
+ virtio_pci_set_msix_queue_vector(sc, i, i + 1);
  }
  }
 
@@ -537,8 +986,13 @@ virtio_pci_legacy_intr(void *arg)
  int isr, r = 0;
 
  /* check and ack the interrupt */
- isr = bus_space_read_1(sc->sc_iot, sc->sc_ioh,
-    VIRTIO_CONFIG_ISR_STATUS);
+
+ if (vsc->sc_version_1) {
+ isr = bus_space_read_1(sc->sc_isr_iot, sc->sc_isr_ioh, 0);
+ } else {
+ isr = bus_space_read_1(sc->sc_iot, sc->sc_ioh,
+    VIRTIO_CONFIG_ISR_STATUS);
+ }
  if (isr == 0)
  return 0;
  KERNEL_LOCK();
@@ -560,8 +1014,12 @@ virtio_pci_legacy_intr_mpsafe(void *arg)
  int isr, r = 0;
 
  /* check and ack the interrupt */
- isr = bus_space_read_1(sc->sc_iot, sc->sc_ioh,
-    VIRTIO_CONFIG_ISR_STATUS);
+ if (vsc->sc_version_1) {
+ isr = bus_space_read_1(sc->sc_isr_iot, sc->sc_isr_ioh, 0);
+ } else {
+ isr = bus_space_read_1(sc->sc_iot, sc->sc_ioh,
+    VIRTIO_CONFIG_ISR_STATUS);
+ }
  if (isr == 0)
  return 0;
  if ((isr & VIRTIO_CONFIG_ISR_CONFIG_CHANGE) &&
@@ -629,6 +1087,10 @@ void
 virtio_pci_kick(struct virtio_softc *vsc, uint16_t idx)
 {
  struct virtio_pci_softc *sc = (struct virtio_pci_softc *)vsc;
- bus_space_write_2(sc->sc_iot, sc->sc_ioh, VIRTIO_CONFIG_QUEUE_NOTIFY,
-    idx);
+ unsigned offset = 0;
+ if (vsc->sc_version_1) {
+ offset = vsc->sc_vqs[idx].vq_notify_off *
+    sc->sc_notify_off_multiplier;
+ }
+ bus_space_write_2(sc->sc_notify_iot, sc->sc_notify_ioh, offset, idx);
 }
diff --git a/sys/dev/pci/virtio_pcireg.h b/sys/dev/pci/virtio_pcireg.h
index dc838f38d59..f47ec9c6e7a 100644
--- a/sys/dev/pci/virtio_pcireg.h
+++ b/sys/dev/pci/virtio_pcireg.h
@@ -35,5 +35,62 @@
 
 #define VIRTIO_MSI_NO_VECTOR 0xffff
 
+/*
+ * Virtio 1.0 specific
+ */
+
+struct virtio_pci_cap {
+ uint8_t cap_vndr; /* Generic PCI field: PCI_CAP_ID_VNDR */
+ uint8_t cap_next; /* Generic PCI field: next ptr. */
+ uint8_t cap_len; /* Generic PCI field: capability length */
+ uint8_t cfg_type; /* Identifies the structure. */
+ uint8_t bar; /* Where to find it. */
+ uint8_t padding[3]; /* Pad to full dword. */
+ uint32_t offset; /* Offset within bar. */
+ uint32_t length; /* Length of the structure, in bytes. */
+} __packed;
+
+/* Common configuration */
+#define VIRTIO_PCI_CAP_COMMON_CFG 1
+/* Notifications */
+#define VIRTIO_PCI_CAP_NOTIFY_CFG 2
+/* ISR Status */
+#define VIRTIO_PCI_CAP_ISR_CFG 3
+/* Device specific configuration */
+#define VIRTIO_PCI_CAP_DEVICE_CFG 4
+/* PCI configuration access */
+#define VIRTIO_PCI_CAP_PCI_CFG 5
+
+struct virtio_pci_notify_cap {
+ struct virtio_pci_cap cap;
+ uint32_t notify_off_multiplier; /* Multiplier for queue_notify_off. */
+} __packed;
+
+struct virtio_pci_cfg_cap {
+ struct virtio_pci_cap cap;
+ uint8_t pci_cfg_data[4]; /* Data for BAR access. */
+} __packed;
+
+struct virtio_pci_common_cfg {
+ /* About the whole device. */
+ uint32_t device_feature_select; /* read-write */
+ uint32_t device_feature; /* read-only for driver */
+ uint32_t driver_feature_select; /* read-write */
+ uint32_t driver_feature; /* read-write */
+ uint16_t config_msix_vector; /* read-write */
+ uint16_t num_queues; /* read-only for driver */
+ uint8_t device_status; /* read-write */
+ uint8_t config_generation; /* read-only for driver */
+
+ /* About a specific virtqueue. */
+ uint16_t queue_select; /* read-write */
+ uint16_t queue_size; /* read-write, power of 2, or 0. */
+ uint16_t queue_msix_vector; /* read-write */
+ uint16_t queue_enable; /* read-write */
+ uint16_t queue_notify_off; /* read-only for driver */
+ uint64_t queue_desc; /* read-write */
+ uint64_t queue_avail; /* read-write */
+ uint64_t queue_used; /* read-write */
+} __packed;
 
 #endif /* _DEV_PCI_VIRTIO_PCIREG_H_ */
diff --git a/sys/dev/pv/if_vio.c b/sys/dev/pv/if_vio.c
index 28658380507..b004216beeb 100644
--- a/sys/dev/pv/if_vio.c
+++ b/sys/dev/pv/if_vio.c
@@ -68,25 +68,29 @@
 #define VIRTIO_NET_CONFIG_STATUS 6 /* 16bit */
 
 /* Feature bits */
-#define VIRTIO_NET_F_CSUM (1<<0)
-#define VIRTIO_NET_F_GUEST_CSUM (1<<1)
-#define VIRTIO_NET_F_MAC (1<<5)
-#define VIRTIO_NET_F_GSO (1<<6)
-#define VIRTIO_NET_F_GUEST_TSO4 (1<<7)
-#define VIRTIO_NET_F_GUEST_TSO6 (1<<8)
-#define VIRTIO_NET_F_GUEST_ECN (1<<9)
-#define VIRTIO_NET_F_GUEST_UFO (1<<10)
-#define VIRTIO_NET_F_HOST_TSO4 (1<<11)
-#define VIRTIO_NET_F_HOST_TSO6 (1<<12)
-#define VIRTIO_NET_F_HOST_ECN (1<<13)
-#define VIRTIO_NET_F_HOST_UFO (1<<14)
-#define VIRTIO_NET_F_MRG_RXBUF (1<<15)
-#define VIRTIO_NET_F_STATUS (1<<16)
-#define VIRTIO_NET_F_CTRL_VQ (1<<17)
-#define VIRTIO_NET_F_CTRL_RX (1<<18)
-#define VIRTIO_NET_F_CTRL_VLAN (1<<19)
-#define VIRTIO_NET_F_CTRL_RX_EXTRA (1<<20)
-#define VIRTIO_NET_F_GUEST_ANNOUNCE (1<<21)
+#define VIRTIO_NET_F_CSUM 0
+#define VIRTIO_NET_F_GUEST_CSUM 1
+#define VIRTIO_NET_F_CTRL_GUEST_OFFLOADS 2
+#define VIRTIO_NET_F_MTU 3
+#define VIRTIO_NET_F_MAC 5
+#define VIRTIO_NET_F_GSO 6
+#define VIRTIO_NET_F_GUEST_TSO4 7
+#define VIRTIO_NET_F_GUEST_TSO6 8
+#define VIRTIO_NET_F_GUEST_ECN 9
+#define VIRTIO_NET_F_GUEST_UFO 10
+#define VIRTIO_NET_F_HOST_TSO4 11
+#define VIRTIO_NET_F_HOST_TSO6 12
+#define VIRTIO_NET_F_HOST_ECN 13
+#define VIRTIO_NET_F_HOST_UFO 14
+#define VIRTIO_NET_F_MRG_RXBUF 15
+#define VIRTIO_NET_F_STATUS 16
+#define VIRTIO_NET_F_CTRL_VQ 17
+#define VIRTIO_NET_F_CTRL_RX 18
+#define VIRTIO_NET_F_CTRL_VLAN 19
+#define VIRTIO_NET_F_CTRL_RX_EXTRA 20
+#define VIRTIO_NET_F_GUEST_ANNOUNCE 21
+#define VIRTIO_NET_F_MQ 22
+#define VIRTIO_NET_F_CTRL_MAC_ADDR 23
 
 /*
  * Config(8) flags. The lowest byte is reserved for generic virtio stuff.
@@ -97,25 +101,29 @@
 
 static const struct virtio_feature_name virtio_net_feature_names[] = {
 #if VIRTIO_DEBUG
- { VIRTIO_NET_F_CSUM, "CSum" },
- { VIRTIO_NET_F_GUEST_CSUM, "GuestCSum" },
- { VIRTIO_NET_F_MAC, "MAC" },
- { VIRTIO_NET_F_GSO, "GSO" },
- { VIRTIO_NET_F_GUEST_TSO4, "GuestTSO4" },
- { VIRTIO_NET_F_GUEST_TSO6, "GuestTSO6" },
- { VIRTIO_NET_F_GUEST_ECN, "GuestECN" },
- { VIRTIO_NET_F_GUEST_UFO, "GuestUFO" },
- { VIRTIO_NET_F_HOST_TSO4, "HostTSO4" },
- { VIRTIO_NET_F_HOST_TSO6, "HostTSO6" },
- { VIRTIO_NET_F_HOST_ECN, "HostECN" },
- { VIRTIO_NET_F_HOST_UFO, "HostUFO" },
- { VIRTIO_NET_F_MRG_RXBUF, "MrgRXBuf" },
- { VIRTIO_NET_F_STATUS, "Status" },
- { VIRTIO_NET_F_CTRL_VQ, "CtrlVQ" },
- { VIRTIO_NET_F_CTRL_RX, "CtrlRX" },
- { VIRTIO_NET_F_CTRL_VLAN, "CtrlVLAN" },
- { VIRTIO_NET_F_CTRL_RX_EXTRA, "CtrlRXExtra" },
- { VIRTIO_NET_F_GUEST_ANNOUNCE, "GuestAnnounce" },
+ { VIRTIO_NET_F_CSUM, "CSum" },
+ { VIRTIO_NET_F_GUEST_CSUM, "GuestCSum" },
+ { VIRTIO_NET_F_CTRL_GUEST_OFFLOADS, "CtrlGuestOffl" },
+ { VIRTIO_NET_F_MTU, "MTU", },
+ { VIRTIO_NET_F_MAC, "MAC" },
+ { VIRTIO_NET_F_GSO, "GSO" },
+ { VIRTIO_NET_F_GUEST_TSO4, "GuestTSO4" },
+ { VIRTIO_NET_F_GUEST_TSO6, "GuestTSO6" },
+ { VIRTIO_NET_F_GUEST_ECN, "GuestECN" },
+ { VIRTIO_NET_F_GUEST_UFO, "GuestUFO" },
+ { VIRTIO_NET_F_HOST_TSO4, "HostTSO4" },
+ { VIRTIO_NET_F_HOST_TSO6, "HostTSO6" },
+ { VIRTIO_NET_F_HOST_ECN, "HostECN" },
+ { VIRTIO_NET_F_HOST_UFO, "HostUFO" },
+ { VIRTIO_NET_F_MRG_RXBUF, "MrgRXBuf" },
+ { VIRTIO_NET_F_STATUS, "Status" },
+ { VIRTIO_NET_F_CTRL_VQ, "CtrlVQ" },
+ { VIRTIO_NET_F_CTRL_RX, "CtrlRX" },
+ { VIRTIO_NET_F_CTRL_VLAN, "CtrlVLAN" },
+ { VIRTIO_NET_F_CTRL_RX_EXTRA, "CtrlRXExtra" },
+ { VIRTIO_NET_F_GUEST_ANNOUNCE, "GuestAnnounce" },
+ { VIRTIO_NET_F_MQ, "MQ" },
+ { VIRTIO_NET_F_CTRL_MAC_ADDR, "CtrlMAC" },
 #endif
  { 0, NULL }
 };
@@ -507,9 +515,17 @@ vio_attach(struct device *parent, struct device *self, void *aux)
 {
  struct vio_softc *sc = (struct vio_softc *)self;
  struct virtio_softc *vsc = (struct virtio_softc *)parent;
- uint32_t features;
  int i;
  struct ifnet *ifp = &sc->sc_ac.ac_if;
+ uint32_t features[] = {
+ VIRTIO_F_RING_EVENT_IDX,
+ VIRTIO_NET_F_MAC,
+ VIRTIO_NET_F_STATUS,
+ VIRTIO_NET_F_CTRL_VQ,
+ VIRTIO_NET_F_CTRL_RX,
+ VIRTIO_NET_F_MRG_RXBUF,
+ VIRTIO_NET_F_CSUM
+ };
 
  if (vsc->sc_child != NULL) {
  printf(": child already attached for %s; something wrong...\n",
@@ -524,22 +540,9 @@ vio_attach(struct device *parent, struct device *self, void *aux)
  vsc->sc_vqs = &sc->sc_vq[0];
  vsc->sc_config_change = 0;
 
- features = VIRTIO_NET_F_MAC | VIRTIO_NET_F_STATUS |
-    VIRTIO_NET_F_CTRL_VQ | VIRTIO_NET_F_CTRL_RX |
-    VIRTIO_NET_F_MRG_RXBUF | VIRTIO_NET_F_CSUM;
- /*
- * VIRTIO_F_RING_EVENT_IDX can be switched off by setting bit 2 in the
- * driver flags, see config(8)
- */
- if (!(sc->sc_dev.dv_cfdata->cf_flags & 2) &&
-    !(vsc->sc_dev.dv_cfdata->cf_flags & 2))
- features |= VIRTIO_F_RING_EVENT_IDX;
- else
- printf(": RingEventIdx disabled by UKC");
-
- features = virtio_negotiate_features(vsc, features,
-    virtio_net_feature_names);
- if (features & VIRTIO_NET_F_MAC) {
+ virtio_set_features(vsc, features, nitems(features));
+ virtio_negotiate_features(vsc, virtio_net_feature_names);
+ if (virtio_has_feature(vsc, VIRTIO_NET_F_MAC)) {
  vio_get_lladr(&sc->sc_ac, vsc);
  } else {
  ether_fakeaddr(ifp);
@@ -547,13 +550,16 @@ vio_attach(struct device *parent, struct device *self, void *aux)
  }
  printf(": address %s\n", ether_sprintf(sc->sc_ac.ac_enaddr));
 
- if (features & VIRTIO_NET_F_MRG_RXBUF) {
+ if (virtio_has_feature(vsc, VIRTIO_NET_F_MRG_RXBUF) ||
+    vsc->sc_version_1) {
  sc->sc_hdr_size = sizeof(struct virtio_net_hdr);
- ifp->if_hardmtu = 16000; /* arbitrary limit */
  } else {
  sc->sc_hdr_size = offsetof(struct virtio_net_hdr, num_buffers);
- ifp->if_hardmtu = MCLBYTES - sc->sc_hdr_size - ETHER_HDR_LEN;
  }
+ if (virtio_has_feature(vsc, VIRTIO_NET_F_MRG_RXBUF))
+ ifp->if_hardmtu = 16000; /* arbitrary limit */
+ else
+ ifp->if_hardmtu = MCLBYTES - sc->sc_hdr_size - ETHER_HDR_LEN;
 
  if (virtio_alloc_vq(vsc, &sc->sc_vq[VQRX], 0, MCLBYTES, 2, "rx") != 0)
  goto err;
@@ -567,12 +573,12 @@ vio_attach(struct device *parent, struct device *self, void *aux)
  vsc->sc_nvqs = 2;
  sc->sc_vq[VQTX].vq_done = vio_tx_intr;
  virtio_start_vq_intr(vsc, &sc->sc_vq[VQRX]);
- if (features & VIRTIO_F_RING_EVENT_IDX)
+ if (virtio_has_feature(vsc, VIRTIO_F_RING_EVENT_IDX))
  virtio_postpone_intr_far(&sc->sc_vq[VQTX]);
  else
  virtio_stop_vq_intr(vsc, &sc->sc_vq[VQTX]);
- if ((features & VIRTIO_NET_F_CTRL_VQ)
-    && (features & VIRTIO_NET_F_CTRL_RX)) {
+ if (virtio_has_feature(vsc, VIRTIO_NET_F_CTRL_VQ)
+    && virtio_has_feature(vsc, VIRTIO_NET_F_CTRL_RX)) {
  if (virtio_alloc_vq(vsc, &sc->sc_vq[VQCTL], 2, NBPG, 1,
     "control") == 0) {
  sc->sc_vq[VQCTL].vq_done = vio_ctrleof;
@@ -590,7 +596,7 @@ vio_attach(struct device *parent, struct device *self, void *aux)
  ifp->if_start = vio_start;
  ifp->if_ioctl = vio_ioctl;
  ifp->if_capabilities = IFCAP_VLAN_MTU;
- if (features & VIRTIO_NET_F_CSUM)
+ if (virtio_has_feature(vsc, VIRTIO_NET_F_CSUM))
  ifp->if_capabilities |= IFCAP_CSUM_TCPv4|IFCAP_CSUM_UDPv4;
  IFQ_SET_MAXLEN(&ifp->if_snd, vsc->sc_vqs[1].vq_num - 1);
  ifmedia_init(&sc->sc_media, 0, vio_media_change, vio_media_status);
@@ -621,7 +627,7 @@ vio_link_state(struct ifnet *ifp)
  struct virtio_softc *vsc = sc->sc_virtio;
  int link_state = LINK_STATE_FULL_DUPLEX;
 
- if (vsc->sc_features & VIRTIO_NET_F_STATUS) {
+ if (virtio_has_feature(vsc, VIRTIO_NET_F_STATUS)) {
  int status = virtio_read_device_config_2(vsc,
     VIRTIO_NET_CONFIG_STATUS);
  if (!(status & VIRTIO_NET_S_LINK_UP))
@@ -698,7 +704,6 @@ vio_stop(struct ifnet *ifp, int disable)
  vio_rx_drain(sc);
 
  virtio_reinit_start(vsc);
- virtio_negotiate_features(vsc, vsc->sc_features, NULL);
  virtio_start_vq_intr(vsc, &sc->sc_vq[VQRX]);
  virtio_stop_vq_intr(vsc, &sc->sc_vq[VQTX]);
  if (vsc->sc_nvqs >= 3)
@@ -807,7 +812,7 @@ again:
  }
  if (ifq_is_oactive(&ifp->if_snd)) {
  int r;
- if (vsc->sc_features & VIRTIO_F_RING_EVENT_IDX)
+ if (virtio_has_feature(vsc, VIRTIO_F_RING_EVENT_IDX))
  r = virtio_postpone_intr_smart(&sc->sc_vq[VQTX]);
  else
  r = virtio_start_vq_intr(vsc, &sc->sc_vq[VQTX]);
@@ -1061,7 +1066,7 @@ again:
  if (r) {
  vio_populate_rx_mbufs(sc);
  /* set used event index to the next slot */
- if (vsc->sc_features & VIRTIO_F_RING_EVENT_IDX) {
+ if (virtio_has_feature(vsc, VIRTIO_F_RING_EVENT_IDX)) {
  if (virtio_start_vq_intr(vq->vq_owner, vq))
  goto again;
  }
diff --git a/sys/dev/pv/vioblk.c b/sys/dev/pv/vioblk.c
index 061043e8796..c17264e1be4 100644
--- a/sys/dev/pv/vioblk.c
+++ b/sys/dev/pv/vioblk.c
@@ -168,7 +168,13 @@ vioblk_attach(struct device *parent, struct device *self, void *aux)
  struct vioblk_softc *sc = (struct vioblk_softc *)self;
  struct virtio_softc *vsc = (struct virtio_softc *)parent;
  struct scsibus_attach_args saa;
- uint32_t features;
+ uint32_t features[] = {
+ VIRTIO_F_NOTIFY_ON_EMPTY,
+ VIRTIO_BLK_F_RO,
+ VIRTIO_BLK_F_SIZE_MAX,
+ VIRTIO_BLK_F_SEG_MAX,
+ VIRTIO_BLK_F_FLUSH
+ };
  int qsize;
 
  vsc->sc_vqs = &sc->sc_vq[0];
@@ -180,14 +186,10 @@ vioblk_attach(struct device *parent, struct device *self, void *aux)
  vsc->sc_ipl = IPL_BIO;
  sc->sc_virtio = vsc;
 
-        features = virtio_negotiate_features(vsc,
-    (VIRTIO_BLK_F_RO       | VIRTIO_F_NOTIFY_ON_EMPTY |
-     VIRTIO_BLK_F_SIZE_MAX | VIRTIO_BLK_F_SEG_MAX |
-     VIRTIO_BLK_F_FLUSH),
-    vioblk_feature_names);
+ virtio_set_features(vsc, features, nitems(features));
+ virtio_negotiate_features(vsc, vioblk_feature_names);
 
-
- if (features & VIRTIO_BLK_F_SIZE_MAX) {
+ if (virtio_has_feature(vsc, VIRTIO_BLK_F_SIZE_MAX)) {
  uint32_t size_max = virtio_read_device_config_4(vsc,
     VIRTIO_BLK_CONFIG_SIZE_MAX);
  if (size_max < PAGE_SIZE) {
@@ -196,7 +198,7 @@ vioblk_attach(struct device *parent, struct device *self, void *aux)
  }
  }
 
- if (features & VIRTIO_BLK_F_SEG_MAX) {
+ if (virtio_has_feature(vsc, VIRTIO_BLK_F_SEG_MAX)) {
  uint32_t seg_max = virtio_read_device_config_4(vsc,
     VIRTIO_BLK_CONFIG_SEG_MAX);
  if (seg_max < SEG_MAX) {
@@ -217,7 +219,7 @@ vioblk_attach(struct device *parent, struct device *self, void *aux)
  qsize = sc->sc_vq[0].vq_num;
  sc->sc_vq[0].vq_done = vioblk_vq_done;
 
- if (features & VIRTIO_F_NOTIFY_ON_EMPTY) {
+ if (virtio_has_feature(vsc, VIRTIO_F_NOTIFY_ON_EMPTY)) {
  virtio_stop_vq_intr(vsc, &sc->sc_vq[0]);
  sc->sc_notify_on_empty = 1;
  }
@@ -249,7 +251,7 @@ vioblk_attach(struct device *parent, struct device *self, void *aux)
  sc->sc_link.luns = 1;
  sc->sc_link.adapter_target = 2;
  DNPRINTF(1, "%s: qsize: %d\n", __func__, qsize);
- if (features & VIRTIO_BLK_F_RO)
+ if (virtio_has_feature(vsc, VIRTIO_BLK_F_RO))
  sc->sc_link.flags |= SDEV_READONLY;
 
  bzero(&saa, sizeof(saa));
@@ -427,7 +429,7 @@ vioblk_scsi_cmd(struct scsi_xfer *xs)
  break;
 
  case SYNCHRONIZE_CACHE:
- if ((vsc->sc_features & VIRTIO_BLK_F_FLUSH) == 0) {
+ if (!virtio_has_feature(vsc, VIRTIO_BLK_F_FLUSH)) {
  vioblk_scsi_done(xs, XS_NOERROR);
  return;
  }
@@ -550,13 +552,10 @@ vioblk_scsi_cmd(struct scsi_xfer *xs)
  delay(1000);
  } while(--timeout > 0);
  if (timeout <= 0) {
- uint32_t features;
  printf("%s: SCSI_POLL timed out\n", __func__);
  vioblk_reset(sc);
  virtio_reinit_start(vsc);
- features = virtio_negotiate_features(vsc, vsc->sc_features,
-    NULL);
- KASSERT(features == vsc->sc_features);
+ virtio_reinit_end(vsc);
  }
  splx(s);
  return;
diff --git a/sys/dev/pv/vioblkreg.h b/sys/dev/pv/vioblkreg.h
index 754f2bb9793..c950a8876af 100644
--- a/sys/dev/pv/vioblkreg.h
+++ b/sys/dev/pv/vioblkreg.h
@@ -40,15 +40,15 @@
 #define VIRTIO_BLK_CONFIG_BLK_SIZE 20 /* 32bit */
 
 /* Feature bits */
-#define VIRTIO_BLK_F_BARRIER (1<<0)
-#define VIRTIO_BLK_F_SIZE_MAX (1<<1)
-#define VIRTIO_BLK_F_SEG_MAX (1<<2)
-#define VIRTIO_BLK_F_GEOMETRY (1<<4)
-#define VIRTIO_BLK_F_RO (1<<5)
-#define VIRTIO_BLK_F_BLK_SIZE (1<<6)
-#define VIRTIO_BLK_F_SCSI (1<<7)
-#define VIRTIO_BLK_F_FLUSH (1<<9)
-#define VIRTIO_BLK_F_TOPOLOGY (1<<10)
+#define VIRTIO_BLK_F_BARRIER 0
+#define VIRTIO_BLK_F_SIZE_MAX 1
+#define VIRTIO_BLK_F_SEG_MAX 2
+#define VIRTIO_BLK_F_GEOMETRY 4
+#define VIRTIO_BLK_F_RO 5
+#define VIRTIO_BLK_F_BLK_SIZE 6
+#define VIRTIO_BLK_F_SCSI 7
+#define VIRTIO_BLK_F_FLUSH 9
+#define VIRTIO_BLK_F_TOPOLOGY 10
 
 /* Command */
 #define VIRTIO_BLK_T_IN 0
diff --git a/sys/dev/pv/viocon.c b/sys/dev/pv/viocon.c
index cdd547ac33c..f070606b3e0 100644
--- a/sys/dev/pv/viocon.c
+++ b/sys/dev/pv/viocon.c
@@ -30,9 +30,9 @@
 
 
 /* features */
-#define VIRTIO_CONSOLE_F_SIZE (1<<0)
-#define VIRTIO_CONSOLE_F_MULTIPORT (1<<1)
-#define VIRTIO_CONSOLE_F_EMERG_WRITE (1<<2)
+#define VIRTIO_CONSOLE_F_SIZE 0
+#define VIRTIO_CONSOLE_F_MULTIPORT 1
+#define VIRTIO_CONSOLE_F_EMERG_WRITE 2
 
 /* config space */
 #define VIRTIO_CONSOLE_COLS 0 /* 16 bits */
@@ -175,6 +175,7 @@ viocon_attach(struct device *parent, struct device *self, void *aux)
  struct viocon_softc *sc = (struct viocon_softc *)self;
  struct virtio_softc *vsc = (struct virtio_softc *)parent;
  int maxports = 1;
+ uint32_t features[] = { VIRTIO_CONSOLE_F_SIZE };
 
  if (vsc->sc_child)
  panic("already attached to something else");
@@ -193,8 +194,8 @@ viocon_attach(struct device *parent, struct device *self, void *aux)
  goto err;
  }
 
- virtio_negotiate_features(vsc, VIRTIO_CONSOLE_F_SIZE,
-    viocon_feature_names);
+ virtio_set_features(vsc, features, nitems(features));
+ virtio_negotiate_features(vsc, viocon_feature_names);
 
  printf("\n");
  DPRINTF("%s: softc: %p\n", __func__, sc);
@@ -277,7 +278,7 @@ viocon_port_create(struct viocon_softc *sc, int portidx)
  */
  vp->vp_tx_buf = vp->vp_rx_buf + vp->vp_rx->vq_num * BUFSIZE;
 
- if (vsc->sc_features & VIRTIO_CONSOLE_F_SIZE) {
+ if (virtio_has_feature(vsc, VIRTIO_CONSOLE_F_SIZE)) {
  vp->vp_cols = virtio_read_device_config_2(vsc,
     VIRTIO_CONSOLE_COLS);
  vp->vp_rows = virtio_read_device_config_2(vsc,
diff --git a/sys/dev/pv/viomb.c b/sys/dev/pv/viomb.c
index 98c595dc2e2..fec4b23316f 100644
--- a/sys/dev/pv/viomb.c
+++ b/sys/dev/pv/viomb.c
@@ -68,8 +68,8 @@
 #define VIRTIO_BALLOON_CONFIG_ACTUAL 4 /* 32bit */
 
 /* Feature bits */
-#define VIRTIO_BALLOON_F_MUST_TELL_HOST (1<<0)
-#define VIRTIO_BALLOON_F_STATS_VQ (1<<1)
+#define VIRTIO_BALLOON_F_MUST_TELL_HOST 0
+#define VIRTIO_BALLOON_F_STATS_VQ 1
 
 static const struct virtio_feature_name viomb_feature_names[] = {
 #if VIRTIO_DEBUG
@@ -136,7 +136,6 @@ viomb_attach(struct device *parent, struct device *self, void *aux)
 {
  struct viomb_softc *sc = (struct viomb_softc *)self;
  struct virtio_softc *vsc = (struct virtio_softc *)parent;
- u_int32_t features;
  int i;
 
  if (vsc->sc_child != NULL) {
@@ -159,10 +158,11 @@ viomb_attach(struct device *parent, struct device *self, void *aux)
  vsc->sc_ipl = IPL_BIO;
  vsc->sc_config_change = viomb_config_change;
 
- /* negotiate features */
- features = VIRTIO_F_RING_INDIRECT_DESC;
- features = virtio_negotiate_features(vsc, features,
-     viomb_feature_names);
+ /*
+ * XXX VIRTIO_BALLOON_F_MUST_TELL_HOST is checked in some places but
+ * XXX has never been negotiated.
+ */
+ virtio_negotiate_features(vsc, viomb_feature_names);
 
  if ((virtio_alloc_vq(vsc, &sc->sc_vq[VQ_INFLATE], VQ_INFLATE,
      sizeof(u_int32_t) * PGS_PER_REQ, 1, "inflate") != 0))
@@ -368,7 +368,7 @@ viomb_deflate(struct viomb_softc *sc)
  virtio_enqueue_p(vq, slot, b->bl_dmamap, 0,
  sizeof(u_int32_t) * nvpages, VRING_READ);
 
- if (!(vsc->sc_features & VIRTIO_BALLOON_F_MUST_TELL_HOST))
+ if (!virtio_has_feature(vsc, VIRTIO_BALLOON_F_MUST_TELL_HOST))
  uvm_pglistfree(&b->bl_pglist);
  virtio_enqueue_commit(vsc, vq, slot, VRING_NOTIFY);
  return;
@@ -466,7 +466,7 @@ viomb_deflate_intr(struct virtqueue *vq)
  sizeof(u_int32_t) * nvpages,
  BUS_DMASYNC_POSTWRITE);
 
- if (vsc->sc_features & VIRTIO_BALLOON_F_MUST_TELL_HOST)
+ if (virtio_has_feature(vsc, VIRTIO_BALLOON_F_MUST_TELL_HOST))
  uvm_pglistfree(&b->bl_pglist);
 
  VIOMBDEBUG(sc, "updating sc->sc_actual from %u to %llu\n",
diff --git a/sys/dev/pv/viornd.c b/sys/dev/pv/viornd.c
index 9d1e8c60938..0886d052310 100644
--- a/sys/dev/pv/viornd.c
+++ b/sys/dev/pv/viornd.c
@@ -96,7 +96,7 @@ viornd_attach(struct device *parent, struct device *self, void *aux)
  vsc->sc_ipl = IPL_NET;
  sc->sc_virtio = vsc;
 
- virtio_negotiate_features(vsc, 0, NULL);
+ virtio_negotiate_features(vsc, NULL);
 
  if (sc->sc_dev.dv_cfdata->cf_flags & VIORND_ONESHOT) {
  sc->sc_interval = 0;
diff --git a/sys/dev/pv/vioscsi.c b/sys/dev/pv/vioscsi.c
index a7ccd8913b3..ad18ec151ec 100644
--- a/sys/dev/pv/vioscsi.c
+++ b/sys/dev/pv/vioscsi.c
@@ -126,7 +126,7 @@ vioscsi_attach(struct device *parent, struct device *self, void *aux)
  vsc->sc_vqs = sc->sc_vqs;
  vsc->sc_nvqs = nitems(sc->sc_vqs);
 
- virtio_negotiate_features(vsc, 0, NULL);
+ virtio_negotiate_features(vsc, NULL);
  uint32_t cmd_per_lun = virtio_read_device_config_4(vsc,
     VIRTIO_SCSI_CONFIG_CMD_PER_LUN);
  uint32_t seg_max = virtio_read_device_config_4(vsc,
diff --git a/sys/dev/pv/vioscsireg.h b/sys/dev/pv/vioscsireg.h
index 4034309cef5..72280bc502b 100644
--- a/sys/dev/pv/vioscsireg.h
+++ b/sys/dev/pv/vioscsireg.h
@@ -28,8 +28,8 @@
 #define VIRTIO_SCSI_CONFIG_MAX_LUN 32 /* 32bit */
 
 /* Feature bits */
-#define VIRTIO_SCSI_F_INOUT (1<<0)
-#define VIRTIO_SCSI_F_HOTPLUG (1<<1)
+#define VIRTIO_SCSI_F_INOUT 0
+#define VIRTIO_SCSI_F_HOTPLUG 1
 
 /* Response status values */
 #define VIRTIO_SCSI_S_OK 0
diff --git a/sys/dev/pv/virtio.c b/sys/dev/pv/virtio.c
index e6cd1ace5aa..e9032b3f479 100644
--- a/sys/dev/pv/virtio.c
+++ b/sys/dev/pv/virtio.c
@@ -80,32 +80,33 @@ static const struct virtio_feature_name transport_feature_names[] = {
  { VIRTIO_F_RING_INDIRECT_DESC, "RingIndirectDesc"},
  { VIRTIO_F_RING_EVENT_IDX, "RingEventIdx"},
  { VIRTIO_F_BAD_FEATURE, "BadFeature"},
+ { VIRTIO_F_VERSION_1, "Version1"},
  { 0, NULL}
 };
 
 void
-virtio_log_features(uint32_t host, uint32_t neg,
+virtio_log_features(uint64_t host, uint64_t neg,
     const struct virtio_feature_name *guest_feature_names)
 {
  const struct virtio_feature_name *namep;
  int i;
  char c;
- uint32_t bit;
+ uint64_t bit;
 
- for (i = 0; i < 32; i++) {
- if (i == 30) {
+ for (i = 0; i < VIRTIO_MAX_FEATURES; i++) {
+ if (i == VIRTIO_F_BAD_FEATURE) {
  /*
  * VIRTIO_F_BAD_FEATURE is only used for
  * checking correct negotiation
  */
  continue;
  }
- bit = 1 << i;
+ bit = 1ULL << i;
  if ((host&bit) == 0)
  continue;
- namep = (i < 24) ? guest_feature_names :
+ namep = (i < 24 || i > 37) ? guest_feature_names :
     transport_feature_names;
- while (namep->bit && namep->bit != bit)
+ while (namep->name && namep->bit != i)
  namep++;
  c = (neg&bit) ? '+' : '-';
  if (namep->name)
@@ -125,15 +126,15 @@ virtio_log_features(uint32_t host, uint32_t neg,
  * <dequeue finished requests>; // virtio_dequeue() still can be called
  * <revoke pending requests in the vqs if any>;
  * virtio_reinit_start(sc);     // dequeue prohibitted
- * newfeatures = virtio_negotiate_features(sc, requestedfeatures);
  * <some other initialization>;
  * virtio_reinit_end(sc);     // device activated; enqueue allowed
- * Once attached, feature negotiation can only be allowed after virtio_reset.
+ * Once attached, features are assumed to not change again.
  */
 void
 virtio_reset(struct virtio_softc *sc)
 {
  virtio_device_reset(sc);
+ sc->sc_active_features = 0;
 }
 
 void
@@ -143,6 +144,7 @@ virtio_reinit_start(struct virtio_softc *sc)
 
  virtio_set_status(sc, VIRTIO_CONFIG_DEVICE_STATUS_ACK);
  virtio_set_status(sc, VIRTIO_CONFIG_DEVICE_STATUS_DRIVER);
+ virtio_negotiate_features(sc, NULL);
  for (i = 0; i < sc->sc_nvqs; i++) {
  int n;
  struct virtqueue *vq = &sc->sc_vqs[i];
@@ -154,8 +156,7 @@ virtio_reinit_start(struct virtio_softc *sc)
     sc->sc_dev.dv_xname, vq->vq_index);
  }
  virtio_init_vq(sc, vq, 1);
- virtio_setup_queue(sc, vq->vq_index,
-    vq->vq_dmamap->dm_segs[0].ds_addr / VIRTIO_PAGE_SIZE);
+ virtio_setup_queue(sc, vq, vq->vq_dmamap->dm_segs[0].ds_addr);
  }
 }
 
@@ -165,6 +166,18 @@ virtio_reinit_end(struct virtio_softc *sc)
  virtio_set_status(sc, VIRTIO_CONFIG_DEVICE_STATUS_DRIVER_OK);
 }
 
+void
+virtio_set_features(struct virtio_softc *vsc, uint32_t *guest_features,
+    int features_count)
+{
+ unsigned int i;
+
+ vsc->sc_driver_features = 0;
+ for (i = 0; i< features_count; i++) {
+ vsc->sc_driver_features |= 1ULL << guest_features[i];
+ }
+}
+
 /*
  * dmamap sync operations for a virtqueue.
  */
@@ -300,7 +313,7 @@ virtio_alloc_vq(struct virtio_softc *sc, struct virtqueue *vq, int index,
  if (((vq_size - 1) & vq_size) != 0)
  panic("vq_size not power of two: %d", vq_size);
 
- hdrlen = (sc->sc_features & VIRTIO_F_RING_EVENT_IDX) ? 3 : 2;
+ hdrlen = virtio_has_feature(sc, VIRTIO_F_RING_EVENT_IDX) ? 3 : 2;
 
  /* allocsize1: descriptor table + avail ring + pad */
  allocsize1 = VIRTQUEUE_ALIGN(sizeof(struct vring_desc) * vq_size
@@ -345,9 +358,6 @@ virtio_alloc_vq(struct virtio_softc *sc, struct virtqueue *vq, int index,
  goto err;
  }
 
- virtio_setup_queue(sc, index,
-    vq->vq_dmamap->dm_segs[0].ds_addr / VIRTIO_PAGE_SIZE);
-
  /* remember addresses and offsets for later use */
  vq->vq_owner = sc;
  vq->vq_num = vq_size;
@@ -367,11 +377,13 @@ virtio_alloc_vq(struct virtio_softc *sc, struct virtqueue *vq, int index,
  }
  vq->vq_bytesize = allocsize;
  vq->vq_maxnsegs = maxnsegs;
+ virtio_setup_queue(sc, vq, vq->vq_dmamap->dm_segs[0].ds_addr);
 
  /* free slot management */
  vq->vq_entries = mallocarray(vq_size, sizeof(struct vq_entry),
     M_DEVBUF, M_NOWAIT | M_ZERO);
  if (vq->vq_entries == NULL) {
+ printf("%s: ENOMEM: vq_size %d\n", __func__, vq_size);
  r = ENOMEM;
  goto err;
  }
@@ -388,7 +400,7 @@ virtio_alloc_vq(struct virtio_softc *sc, struct virtqueue *vq, int index,
  return 0;
 
 err:
- virtio_setup_queue(sc, index, 0);
+ virtio_setup_queue(sc, vq, 0);
  if (vq->vq_dmamap)
  bus_dmamap_destroy(sc->sc_dmat, vq->vq_dmamap);
  if (vq->vq_vaddr)
@@ -418,7 +430,7 @@ virtio_free_vq(struct virtio_softc *sc, struct virtqueue *vq)
  }
 
  /* tell device that there's no virtqueue any longer */
- virtio_setup_queue(sc, vq->vq_index, 0);
+ virtio_setup_queue(sc, vq, 0);
 
  free(vq->vq_entries, M_DEVBUF, 0);
  bus_dmamap_unload(sc->sc_dmat, vq->vq_dmamap);
@@ -679,7 +691,7 @@ virtio_enqueue_commit(struct virtio_softc *sc, struct virtqueue *vq, int slot,
 
 notify:
  if (notifynow) {
- if (vq->vq_owner->sc_features & VIRTIO_F_RING_EVENT_IDX) {
+ if (virtio_has_feature(vq->vq_owner, VIRTIO_F_RING_EVENT_IDX)) {
  uint16_t o = vq->vq_avail->idx;
  uint16_t n = vq->vq_avail_idx;
  uint16_t t;
@@ -876,7 +888,7 @@ virtio_postpone_intr_far(struct virtqueue *vq)
 void
 virtio_stop_vq_intr(struct virtio_softc *sc, struct virtqueue *vq)
 {
- if ((sc->sc_features & VIRTIO_F_RING_EVENT_IDX)) {
+ if (virtio_has_feature(sc, VIRTIO_F_RING_EVENT_IDX)) {
  /*
  * No way to disable the interrupt completely with
  * RingEventIdx. Instead advance used_event by half
@@ -900,7 +912,7 @@ virtio_start_vq_intr(struct virtio_softc *sc, struct virtqueue *vq)
  * interrupts is done through setting the latest
  * consumed index in the used_event field
  */
- if (sc->sc_features & VIRTIO_F_RING_EVENT_IDX)
+ if (virtio_has_feature(sc, VIRTIO_F_RING_EVENT_IDX))
  VQ_USED_EVENT(vq) = vq->vq_used_idx;
  else
  vq->vq_avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
diff --git a/sys/dev/pv/virtioreg.h b/sys/dev/pv/virtioreg.h
index 2cc4af2b32d..0ec98ea6ad8 100644
--- a/sys/dev/pv/virtioreg.h
+++ b/sys/dev/pv/virtioreg.h
@@ -83,11 +83,11 @@
 #define PCI_PRODUCT_VIRTIO_VMMCI 65535 /* private id */
 
 /* device-independent feature bits */
-#define  VIRTIO_F_NOTIFY_ON_EMPTY (1<<24)
-#define  VIRTIO_F_RING_INDIRECT_DESC (1<<28)
-#define  VIRTIO_F_RING_EVENT_IDX (1<<29)
-#define  VIRTIO_F_BAD_FEATURE (1<<30)
-#define  VIRTIO_F_VERSION_1 (1<<32)
+#define  VIRTIO_F_NOTIFY_ON_EMPTY 24
+#define  VIRTIO_F_RING_INDIRECT_DESC 28
+#define  VIRTIO_F_RING_EVENT_IDX 29
+#define  VIRTIO_F_BAD_FEATURE 30
+#define  VIRTIO_F_VERSION_1 32
 
 /* device status bits */
 #define  VIRTIO_CONFIG_DEVICE_STATUS_RESET 0
diff --git a/sys/dev/pv/virtiovar.h b/sys/dev/pv/virtiovar.h
index 0c353a8f177..f84afdb1b82 100644
--- a/sys/dev/pv/virtiovar.h
+++ b/sys/dev/pv/virtiovar.h
@@ -76,9 +76,18 @@
 #include <dev/pv/virtioreg.h>
 
 #ifndef VIRTIO_DEBUG
-#define VIRTIO_DEBUG 0
+#define VIRTIO_DEBUG 1
 #endif
 
+/* supported number of feature bits */
+#define VIRTIO_MAX_FEATURES 64
+
+/* flags for config(8) */
+#define VIRTIO_CF_NO_INDIRECT 1
+#define VIRTIO_CF_NO_EVENT_IDX 2
+#define VIRTIO_CF_PREFER_VERSION_1 4
+#define VIRTIO_CF_NO_VERSION_1 8
+
 struct vq_entry {
  SLIST_ENTRY(vq_entry) qe_list; /* free list */
  uint16_t qe_index; /* index in vq_desc array */
@@ -127,6 +136,8 @@ struct virtqueue {
 
  /* interrupt handler */
  int (*vq_done)(struct virtqueue*);
+ /* 1.x only: offset for notify address calculation */
+ uint32_t vq_notify_off;
 };
 
 struct virtio_feature_name {
@@ -145,9 +156,9 @@ struct virtio_ops {
  void (*write_dev_cfg_4)(struct virtio_softc *, int, uint32_t);
  void (*write_dev_cfg_8)(struct virtio_softc *, int, uint64_t);
  uint16_t (*read_queue_size)(struct virtio_softc *, uint16_t);
- void (*setup_queue)(struct virtio_softc *, uint16_t, uint32_t);
+ void (*setup_queue)(struct virtio_softc *, struct virtqueue *, uint64_t);
  void (*set_status)(struct virtio_softc *, int);
- uint32_t (*neg_features)(struct virtio_softc *, uint32_t, const struct virtio_feature_name *);
+ int (*neg_features)(struct virtio_softc *, const struct virtio_feature_name *);
  int (*poll_intr)(void *);
 };
 
@@ -160,8 +171,10 @@ struct virtio_softc {
 
  int sc_ipl; /* set by child */
 
- uint32_t sc_features;
+ uint64_t sc_driver_features;
+ uint64_t sc_active_features;
  int sc_indirect;
+ int sc_version_1;
 
  int sc_nvqs; /* set by child */
  struct virtqueue *sc_vqs; /* set by child */
@@ -185,16 +198,25 @@ struct virtio_softc {
 #define virtio_write_device_config_8(sc, o, v) (sc)->sc_ops->write_dev_cfg_8(sc, o, v)
 #define virtio_read_queue_size(sc, i) (sc)->sc_ops->read_queue_size(sc, i)
 #define virtio_setup_queue(sc, i, v) (sc)->sc_ops->setup_queue(sc, i, v)
-#define virtio_negotiate_features(sc, f, n) (sc)->sc_ops->neg_features(sc, f, n)
+#define virtio_negotiate_features(sc, n) (sc)->sc_ops->neg_features(sc, n)
 #define virtio_poll_intr(sc) (sc)->sc_ops->poll_intr(sc)
 
 /* only for transport drivers */
 #define virtio_set_status(sc, i) (sc)->sc_ops->set_status(sc, i)
 #define virtio_device_reset(sc) virtio_set_status((sc), 0)
 
+static inline int
+virtio_has_feature(struct virtio_softc *sc, unsigned int fbit)
+{
+ if (sc->sc_active_features & (1ULL << fbit))
+ return 1;
+ return 0;
+}
+
 int virtio_alloc_vq(struct virtio_softc*, struct virtqueue*, int, int, int,
     const char*);
 int virtio_free_vq(struct virtio_softc*, struct virtqueue*);
+void virtio_set_features(struct virtio_softc *vsc, uint32_t *, int);
 void virtio_reset(struct virtio_softc *);
 void virtio_reinit_start(struct virtio_softc *);
 void virtio_reinit_end(struct virtio_softc *);
@@ -220,7 +242,7 @@ int virtio_start_vq_intr(struct virtio_softc *, struct virtqueue *);
 
 const char *virtio_device_string(int);
 #if VIRTIO_DEBUG
-void virtio_log_features(uint32_t, uint32_t, const struct virtio_feature_name *);
+void virtio_log_features(uint64_t, uint64_t, const struct virtio_feature_name *);
 void virtio_vq_dump(struct virtqueue *vq);
 #endif
 int virtio_nused(struct virtqueue *vq);
diff --git a/sys/dev/pv/vmmci.c b/sys/dev/pv/vmmci.c
index ac44e542885..c1011dff21c 100644
--- a/sys/dev/pv/vmmci.c
+++ b/sys/dev/pv/vmmci.c
@@ -94,7 +94,11 @@ vmmci_attach(struct device *parent, struct device *self, void *aux)
 {
  struct vmmci_softc *sc = (struct vmmci_softc *)self;
  struct virtio_softc *vsc = (struct virtio_softc *)parent;
- uint32_t features;
+ uint32_t features[] = {
+ VMMCI_F_TIMESYNC,
+ VMMCI_F_ACK,
+ VMMCI_F_SYNCRTC
+ };
 
  if (vsc->sc_child != NULL)
  panic("already attached to something else");
@@ -105,10 +109,10 @@ vmmci_attach(struct device *parent, struct device *self, void *aux)
  vsc->sc_ipl = IPL_NET;
  sc->sc_virtio = vsc;
 
- features = VMMCI_F_TIMESYNC | VMMCI_F_ACK | VMMCI_F_SYNCRTC;
- features = virtio_negotiate_features(vsc, features, NULL);
+ virtio_set_features(vsc, features, nitems(features));
+ virtio_negotiate_features(vsc, NULL);
 
- if (features & VMMCI_F_TIMESYNC) {
+ if (virtio_has_feature(vsc, VMMCI_F_TIMESYNC)) {
  strlcpy(sc->sc_sensordev.xname, sc->sc_dev.dv_xname,
     sizeof(sc->sc_sensordev.xname));
  sc->sc_sensor.type = SENSOR_TIMEDELTA;
@@ -128,7 +132,7 @@ vmmci_activate(struct device *self, int act)
  struct vmmci_softc *sc = (struct vmmci_softc *)self;
  struct virtio_softc *vsc = sc->sc_virtio;
 
- if ((vsc->sc_features & VMMCI_F_ACK) == 0)
+ if (virtio_has_feature(vsc, VMMCI_F_ACK) == 0)
  return (0);
 
  switch (act) {
@@ -183,8 +187,7 @@ vmmci_config_change(struct virtio_softc *vsc)
  break;
  }
 
- if ((cmd != VMMCI_NONE) &&
-    (vsc->sc_features & VMMCI_F_ACK))
+ if ((cmd != VMMCI_NONE) && virtio_has_feature(vsc, VMMCI_F_ACK))
  virtio_write_device_config_4(vsc, VMMCI_CONFIG_COMMAND, cmd);
 
  return (1);

Reply | Threaded
Open this post in threaded view
|

Re: Virtio 1.0 for the kernel

Jonathan Gray-11
On Thu, Jan 10, 2019 at 11:38:38PM +0100, Stefan Fritsch wrote:

> Hi,
>
> the diff below implements the virtio 1.0 standard in the kernel (0.9 keeps
> working, of course). It's not ready for commit, yet, but I would like some
> input.
>
> For the most part, the changes from 0.9 to 1.0 are not that big, but there
> are some notable differences. Since some headers are also used by vmd in
> userspace, how those things are handled in the kernel are also relevant to
> vmd.
>
> One change is that the number of feature bits is no longer limited to 32.
> This means in the long run, defining constants with values that have only
> the relevant feature bit set won't work anymore. So eventually, we will
> have to change the constants to contain the bit offsets instead. Then we
> can add some macros to make handling easier. This is what the attached
> diff does. However, in the nearer future 64 bits will be sufficient and
> another possibility is to continue as now and define features as uint64_t
> values. So, should we make the switch from "#define foo (1<<24)" to
> "#define foo 24" now or at some later time, when devices start using more
> than 64 bits?
>
> The biggest change in 1.0 is the way the resources are found and accessed
> for virtio_pci devices. This means that all the offsets for the generic
> configuration registers change between 0.9 and 1.0. One possibility would
> be to just define a second set of constants and use that for 1.0. However,
> since the register layout is documented as structs in the standard, I have
> tried something different and used some macros (CREAD/CWRITE in
> virtio_pci.c) and offsetof() magic to determine the register offset and
> width for an access. This has the advantage that the widths are
> automatically taken care of, whereas in the old style one has to manually
> pick the correct bus_dma_read/write_* function. However, I don't like the
> 0.9 code to use the one style and the 1.0 code to use the other style. I
> also don't know how suitable the offsetof hackery would be for vmd. So I
> would like input on which way you think is preferable?
>
> The diff does not implement 1.0 for virtio/mmio, yet. I hope that it does
> not break 0.9 virtio/mmio, but I have not tested that. So far I did not
> have success in running openbsd arm or aarch64 on qemu (openbsd panics).
> Any pointers about this?

Assuming a qcow2 image created by something along the lines of
'qemu-img create -f qcow2 root.qcow2 2g' miniroot64.fs from an arm64
snapshot and u-boot for qemu_arm64 from the u-boot-aarch64 package:

doas sh -c "qemu-system-aarch64 -runas $USER \
        -M virt -serial stdio -m 1024 -cpu cortex-a57 \
        -bios /usr/local/share/u-boot/qemu_arm64/u-boot.bin \
        -device virtio-rng-device -netdev tap,id=net0 \
        -device virtio-net-device,netdev=net0 \
        -drive file=miniroot64.fs,if=none,id=drive0,format=raw \
        -drive file=root.qcow2,if=none,id=drive1,format=qcow2 \
        -device ich9-ahci,id=ahci \
        -device ide-drive,drive=drive0,bus=ahci.0 \
        -device ide-drive,drive=drive1,bus=ahci.1"

arm is similiar with no need to specify cpu.

Reply | Threaded
Open this post in threaded view
|

Re: Virtio 1.0 for the kernel

Theo de Raadt-2
In reply to this post by Stefan Fritsch
arm64 also uses this subsystem, and as a result this diff breaks
all those kernels.

You ask how to run arm64.... Uhm, you didn't even try to compile it.

Reply | Threaded
Open this post in threaded view
|

Re: Virtio 1.0 for the kernel

Stefan Fritsch
On Thu, 10 Jan 2019, Theo de Raadt wrote:

> arm64 also uses this subsystem, and as a result this diff breaks
> all those kernels.

The diff also breaks vmd. Even if it compiles it will still be broken. As
I wrote, it's not ready for commit yet.

>
> You ask how to run arm64.... Uhm, you didn't even try to compile it.
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Virtio 1.0 for the kernel

Stefan Fritsch
In reply to this post by Jonathan Gray-11
On Fri, 11 Jan 2019, Jonathan Gray wrote:

> Assuming a qcow2 image created by something along the lines of
> 'qemu-img create -f qcow2 root.qcow2 2g' miniroot64.fs from an arm64
> snapshot and u-boot for qemu_arm64 from the u-boot-aarch64 package:
>
> doas sh -c "qemu-system-aarch64 -runas $USER \
> -M virt -serial stdio -m 1024 -cpu cortex-a57 \
> -bios /usr/local/share/u-boot/qemu_arm64/u-boot.bin \
> -device virtio-rng-device -netdev tap,id=net0 \
> -device virtio-net-device,netdev=net0 \
> -drive file=miniroot64.fs,if=none,id=drive0,format=raw \
> -drive file=root.qcow2,if=none,id=drive1,format=qcow2 \
> -device ich9-ahci,id=ahci \
> -device ide-drive,drive=drive0,bus=ahci.0 \
> -device ide-drive,drive=drive1,bus=ahci.1"

Thanks, that works. I had tried with some efi firmware before and not with
u-boot.

Cheers,
Stefan

Reply | Threaded
Open this post in threaded view
|

Re: Virtio 1.0 for the kernel

Stefan Fritsch
In reply to this post by Stefan Fritsch
Hi Carlos,

>
> I'm getting ramped back up from being AWOL for several months, so
> apologies in advance for the delay of going through your diff (and the
> virtio header re-order one that you fixed).  It'll be over the weekend
> at the earliest before I can review/test it out.

There is no hurry, take your time. I will split the diff up into smaller
chunks, so you may wait for that with a detailed review. But opinions
about the offsetof hackery are welcome before that.

Cheers,
Stefan

Reply | Threaded
Open this post in threaded view
|

Re: Virtio 1.0 for the kernel

Ted Unangst-6
In reply to this post by Stefan Fritsch
Stefan Fritsch wrote:

> +#define CREAD(sc, memb) \
> + ({ \
> + struct virtio_pci_common_cfg c; \
> + size_t off = offsetof(struct virtio_pci_common_cfg, memb); \
> + size_t size = sizeof(c.memb); \
> + typeof(c.memb) val; \
> + \
> + switch (size) { \
> + case 1: \
> + val = bus_space_read_1(sc->sc_iot, sc->sc_ioh, off); \
> + break; \
> + case 2: \
> + val = bus_space_read_2(sc->sc_iot, sc->sc_ioh, off); \
> + break; \
> + case 4: \
> + val = bus_space_read_4(sc->sc_iot, sc->sc_ioh, off); \
> + break; \
> + case 8: \
> + val = bus_space_read_8(sc->sc_iot, sc->sc_ioh, off); \
> + break; \
> + } \
> + DNPRINTF(2, "%s: %d: off %#zx size %#zx read %#llx\n", \
> +   __func__, __LINE__, off, size, (unsigned long long)val); \
> + val; \
> + })

I'm not a big fan of statement expressions. I thought we purged most of them
from the tree in fact. I would say just use an inline. And no typeof.

(There's only one typeof in sys/ outside of drm, and I added it, but no more
please.)

Reply | Threaded
Open this post in threaded view
|

Re: Virtio 1.0 for the kernel

Stefan Fritsch
On Fri, 11 Jan 2019, Ted Unangst wrote:

> Stefan Fritsch wrote:
> > +#define CREAD(sc, memb) \
> > + ({ \
> > + struct virtio_pci_common_cfg c; \
> > + size_t off = offsetof(struct virtio_pci_common_cfg, memb); \
> > + size_t size = sizeof(c.memb); \
> > + typeof(c.memb) val; \
> > + \
> > + switch (size) { \
> > + case 1: \
> > + val = bus_space_read_1(sc->sc_iot, sc->sc_ioh, off); \
> > + break; \
> > + case 2: \
> > + val = bus_space_read_2(sc->sc_iot, sc->sc_ioh, off); \
> > + break; \
> > + case 4: \
> > + val = bus_space_read_4(sc->sc_iot, sc->sc_ioh, off); \
> > + break; \
> > + case 8: \
> > + val = bus_space_read_8(sc->sc_iot, sc->sc_ioh, off); \
> > + break; \
> > + } \
> > + DNPRINTF(2, "%s: %d: off %#zx size %#zx read %#llx\n", \
> > +   __func__, __LINE__, off, size, (unsigned long long)val); \
> > + val; \
> > + })
>
> I'm not a big fan of statement expressions. I thought we purged most of them
> from the tree in fact. I would say just use an inline. And no typeof.
>
> (There's only one typeof in sys/ outside of drm, and I added it, but no more
> please.)
>

A simple inline function does not work because memb is the name of a
member of a struct. But combined with a macro it would work. Like this:


/* only used for sizeof, not actually allocated */
extern struct virtio_pci_common_cfg ccfg;

static inline
uint64_t _cread(struct virtio_pci_softc *sc, unsigned off, unsigned size)
{
        uint64_t val;
        switch (size) {
        case 1:
                val = bus_space_read_1(sc->sc_iot, sc->sc_ioh, off);
                break;
        case 2:
                val = bus_space_read_2(sc->sc_iot, sc->sc_ioh, off);
                break;
        case 4:
                val = bus_space_read_4(sc->sc_iot, sc->sc_ioh, off);
                break;
        case 8:
                val = bus_space_read_8(sc->sc_iot, sc->sc_ioh, off);
                break;
        }
        return val;
}

#define CREAD(sc, memb)  _cread(sc, \
    offsetof(struct virtio_pci_common_cfg, memb), sizeof(ccfg.memb))


The compiler should optimize this to the same code as the complicated
macro above. You think this variant is acceptable?

Cheers,
Stefan

Reply | Threaded
Open this post in threaded view
|

Re: Virtio 1.0 for the kernel

Ted Unangst-6
Stefan Fritsch wrote:
> On Fri, 11 Jan 2019, Ted Unangst wrote:
> A simple inline function does not work because memb is the name of a
> member of a struct. But combined with a macro it would work. Like this:

yeah, that looks pretty normal.

Reply | Threaded
Open this post in threaded view
|

Re: Virtio 1.0 for the kernel

William Ahern-2
In reply to this post by Stefan Fritsch
On Fri, Jan 11, 2019 at 10:43:25AM +0100, Stefan Fritsch wrote:
<snip>
> /* only used for sizeof, not actually allocated */
> extern struct virtio_pci_common_cfg ccfg;
<snip>
> #define CREAD(sc, memb)  _cread(sc, \
>     offsetof(struct virtio_pci_common_cfg, memb), sizeof(ccfg.memb))
>
> The compiler should optimize this to the same code as the complicated
> macro above. You think this variant is acceptable?

Maybe I'm missing something, but these are the more idiomatic constructs

  sizeof ((struct virtio_pci_common_cfg *)0)->memb
  sizeof ((struct virtio_pci_common_cfg){ 0 }).memb

Reply | Threaded
Open this post in threaded view
|

Re: Virtio 1.0 for the kernel

Jonathan Gray-11
On Fri, Jan 11, 2019 at 03:21:11PM -0800, William Ahern wrote:

> On Fri, Jan 11, 2019 at 10:43:25AM +0100, Stefan Fritsch wrote:
> <snip>
> > /* only used for sizeof, not actually allocated */
> > extern struct virtio_pci_common_cfg ccfg;
> <snip>
> > #define CREAD(sc, memb)  _cread(sc, \
> >     offsetof(struct virtio_pci_common_cfg, memb), sizeof(ccfg.memb))
> >
> > The compiler should optimize this to the same code as the complicated
> > macro above. You think this variant is acceptable?
>
> Maybe I'm missing something, but these are the more idiomatic constructs
>
>   sizeof ((struct virtio_pci_common_cfg *)0)->memb
>   sizeof ((struct virtio_pci_common_cfg){ 0 }).memb
>

No, expanding the offsetof macro and avoiding __builtin_offsetof misses
the point of it.

Reply | Threaded
Open this post in threaded view
|

Re: Virtio 1.0 for the kernel

Stefan Fritsch
On Saturday, 12 January 2019 01:49:09 CET Jonathan Gray wrote:

> On Fri, Jan 11, 2019 at 03:21:11PM -0800, William Ahern wrote:
> > On Fri, Jan 11, 2019 at 10:43:25AM +0100, Stefan Fritsch wrote:
> > <snip>
> >
> > > /* only used for sizeof, not actually allocated */
> > > extern struct virtio_pci_common_cfg ccfg;
> >
> > <snip>
> >
> > > #define CREAD(sc, memb)  _cread(sc, \
> > >
> > >     offsetof(struct virtio_pci_common_cfg, memb), sizeof(ccfg.memb))
> > >
> > > The compiler should optimize this to the same code as the complicated
> > > macro above. You think this variant is acceptable?
> >
> > Maybe I'm missing something, but these are the more idiomatic constructs
> >
> >   sizeof ((struct virtio_pci_common_cfg *)0)->memb
> >   sizeof ((struct virtio_pci_common_cfg){ 0 }).memb
>
> No, expanding the offsetof macro and avoiding __builtin_offsetof misses
> the point of it.

It allows to get rid of the "extern struct virtio_pci_common_cfg ccfg;"
though. That's an improvement.

Stefan