null deref in xen_intr_barrier()

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

null deref in xen_intr_barrier()

toc-2
>Synopsis: null deref in xen_intr_barrier()
>Category: kernel
>Environment:
        System      : OpenBSD 6.7
        Details     : OpenBSD 6.7-current (GENERIC) #291: Fri Jun 26 01:56:51 MDT 2020

        Architecture: OpenBSD.amd64
        Machine     : amd64
>Description:

        I have a system that's running as a guest under Xen; recent snapshots
panic while bringing up the xnf(4) if. This can happen in the ramdisk kernel during
a sysupgrade, or in a GENERIC kernel running netstart.

starting network
uvm_fault(0xfffffd810ed64110, 0x28, 0, 1) -> e
kernel: page fault trap, code=0
Stopped at      intr_barrier+0x6:       movq    0x28(%rdi),%rdi
ddb> bt
intr_barrier(0) at intr_barrier+0x6
xen_intr_barrier(8) at xen_intr_barrier+0x1f
xnf_stop(ffff800000193000) at xnf_stop+0x4c
xnf_ioctl(ffff800000193120,8020690c,ffff80000063b000) at xnf_ioctl+0xd2
in_ifinit(ffff800000193120,ffff80000063b000,ffff8000225ce400,1) at in_ifinit+0x
f3
in_ioctl_change_ifaddr(8040691a,ffff8000225ce3f0,ffff800000193120,1) at in_ioct
l_change_ifaddr+0x376
in_ioctl(8040691a,ffff8000225ce3f0,ffff800000193120,1) at in_ioctl+0x103
ifioctl(fffffd81028cb648,8040691a,ffff8000225ce3f0,ffff8000225d0870) at ifioctl
+0x98e
sys_ioctl(ffff8000225d0870,ffff8000225ce500,ffff8000225ce560) at sys_ioctl+0x2c
b
syscall(ffff8000225ce5d0) at syscall+0x315
Xsyscall() at Xsyscall+0x128
end of kernel
end trace frame: 0x7f7ffffe8c50, count: -11

In the definition of xen_intr_barrier() in dev/pv/xen.c, we find:

  /*
   * XXX This will need to be revised once intr_barrier starts
   * using its argument.
   */
  intr_barrier(NULL);

intr_barrier(9) started using its argument as of this commit:

  revision 1.53
  date: 2020/06/16 23:35:10;  author: dlg;  state: Exp;  lines: +4 -3;
  commitid: tVYPReymzTMuPlpA;
  make intr_barrier run sched_barrier on the cpu the interrupt pinned to.

  intr_barrier passed NULL to sched_barrier before this, which ends
  up being the primary cpu. that's been mostly right until this point,
  but is set to change.


>How-To-Repeat:
        Boot a snapshot from 6/17 or later on a Xen domU with an xnf network interface.
>Fix:
        Patching xen_intr_barrier() to do what intr_barrier(NULL) used to do
eliminates the panic:

Index: xen.c
===================================================================
RCS file: /cvs/src/sys/dev/pv/xen.c,v
retrieving revision 1.96
diff -u -p -r1.96 xen.c
--- xen.c       29 May 2020 04:42:25 -0000      1.96
+++ xen.c       26 Jun 2020 13:28:10 -0000
@@ -736,11 +736,7 @@ xen_intr_barrier(xen_intr_handle_t xih)
        struct xen_softc *sc = xen_sc;
        struct xen_intsrc *xi;

-       /*
-        * XXX This will need to be revised once intr_barrier starts
-        * using its argument.
-        */
-       intr_barrier(NULL);
+       sched_barrier(NULL);

        if ((xi = xen_intsrc_acquire(sc, (evtchn_port_t)xih)) != NULL) {
                taskq_barrier(xi->xi_taskq);



dmesg:
OpenBSD 6.7-current (GENERIC) #5: Fri Jun 26 22:10:59 KST 2020
    [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC
real mem = 4269797376 (4071MB)
avail mem = 4125454336 (3934MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xfc001000 (11 entries)
bios0: vendor Xen version "4.11.3-pre" date 11/29/2019
bios0: Xen HVM domU
acpi0 at bios0: ACPI 4.0
acpi0: sleep states S3 S4 S5
acpi0: tables DSDT FACP APIC HPET WAET SSDT SSDT
acpi0: wakeup devices
acpitimer0 at acpi0: 3579545 Hz, 32 bits
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
ioapic0 at mainbus0: apid 1 pa 0xfec00000, version 11, 48 pins, remapped
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Xeon(R) CPU E5-2630L v2 @ 2.40GHz, 6720.74 MHz, 06-3e-04
cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,SSE3,PCLMUL,SSSE3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,FSGSBASE,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,XSAVEOPT,MELTDOWN
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 318MHz
acpihpet0 at acpi0: 62500000 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpicpu0 at acpi0: C1(@1 halt!)
acpipci0 at acpi0 PCI0
extent `acpipci0 pcibus' (0x0 - 0xff), flags=0
extent `pciio' (0x0 - 0xffffffff), flags=0
     0x10000 - 0xffffffff
extent `pcimem' (0x0 - 0xffffffffffffffff), flags=0
     0x0 - 0xefffffff
     0xfc000000 - 0xffffffff
     0x40000000000 - 0xffffffffffffffff
acpicmos0 at acpi0
"ACPI0007" at acpi0 not configured
cpu0: using VERW MDS workaround (except on vmm entry)
pvbus0 at mainbus0: Xen 4.11
xen0 at pvbus0: features 0x2705, 32 grant table frames, event channel 1
xbf0 at xen0 backend 0 channel 5: disk
scsibus1 at xbf0: 1 targets
sd0 at scsibus1 targ 0 lun 0: <Xen, qdisk hda 768, 0000>
sd0: 32768MB, 512 bytes/sector, 67108864 sectors
xbf1 at xen0 backend 0 channel 6: disk
scsibus2 at xbf1: 1 targets
sd1 at scsibus2 targ 0 lun 0: <Xen, qdisk hdb 832, 0000>
sd1: 4096MB, 512 bytes/sector, 8388608 sectors
xbf2 at xen0 backend 0 channel 7: disk
scsibus3 at xbf2: 1 targets
sd2 at scsibus3 targ 0 lun 0: <Xen, qdisk hdc 5632, 0000>
sd2: 8192MB, 512 bytes/sector, 16777216 sectors
"vkbd" at xen0: device/vkbd/0 not configured
xnf0 at xen0 backend 0 channel 8: address e0:76:63:68:35:00
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel 82441FX" rev 0x02
pcib0 at pci0 dev 1 function 0 "Intel 82371SB ISA" rev 0x00
pciide0 at pci0 dev 1 function 1 "Intel 82371SB IDE" rev 0x00: DMA, channel 0 wired to compatibility, channel 1 wired to compatibility
pciide0: channel 0 disabled (no drives)
pciide0: channel 1 disabled (no drives)
piixpm0 at pci0 dev 1 function 3 "Intel 82371AB Power" rev 0x03: SMBus disabled
xspd0 at pci0 dev 2 function 0 "XenSource Platform Device" rev 0x01
vga1 at pci0 dev 3 function 0 "Cirrus Logic CL-GD5446" rev 0x00
wsdisplay at vga1 not configured
isa0 at pcib0
isadma0 at isa0
fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
com0: console
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0 mux 1
pms0 at pckbc0 (aux slot)
wsmouse0 at pms0 mux 0
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
vscsi0 at root
scsibus4 at vscsi0: 256 targets
softraid0 at root
scsibus5 at softraid0: 256 targets
root on sd0a (6de8462c6e963bb1.a) swap on sd0b dump on sd0b
fd0 at fdc0 drive 1: density unknown

usbdevs:
usbdevs: no USB controllers found

Reply | Threaded
Open this post in threaded view
|

Re: null deref in xen_intr_barrier()

Demi M. Obenour
On 2020-06-26 09:55, [hidden email] wrote:

>> Synopsis: null deref in xen_intr_barrier()
>> Category: kernel
>> Environment:
> System      : OpenBSD 6.7
> Details     : OpenBSD 6.7-current (GENERIC) #291: Fri Jun 26 01:56:51 MDT 2020
>
> Architecture: OpenBSD.amd64
> Machine     : amd64
>> Description:
>
> I have a system that's running as a guest under Xen; recent snapshots
> panic while bringing up the xnf(4) if. This can happen in the ramdisk kernel during
> a sysupgrade, or in a GENERIC kernel running netstart.
>
> starting network
> uvm_fault(0xfffffd810ed64110, 0x28, 0, 1) -> e
> kernel: page fault trap, code=0
> Stopped at      intr_barrier+0x6:       movq    0x28(%rdi),%rdi
> ddb> bt
> intr_barrier(0) at intr_barrier+0x6
> xen_intr_barrier(8) at xen_intr_barrier+0x1f
> xnf_stop(ffff800000193000) at xnf_stop+0x4c
> xnf_ioctl(ffff800000193120,8020690c,ffff80000063b000) at xnf_ioctl+0xd2
> in_ifinit(ffff800000193120,ffff80000063b000,ffff8000225ce400,1) at in_ifinit+0x
> f3
> in_ioctl_change_ifaddr(8040691a,ffff8000225ce3f0,ffff800000193120,1) at in_ioct
> l_change_ifaddr+0x376
> in_ioctl(8040691a,ffff8000225ce3f0,ffff800000193120,1) at in_ioctl+0x103
> ifioctl(fffffd81028cb648,8040691a,ffff8000225ce3f0,ffff8000225d0870) at ifioctl
> +0x98e
> sys_ioctl(ffff8000225d0870,ffff8000225ce500,ffff8000225ce560) at sys_ioctl+0x2c
> b
> syscall(ffff8000225ce5d0) at syscall+0x315
> Xsyscall() at Xsyscall+0x128
> end of kernel
> end trace frame: 0x7f7ffffe8c50, count: -11
>
> In the definition of xen_intr_barrier() in dev/pv/xen.c, we find:
>
>   /*
>    * XXX This will need to be revised once intr_barrier starts
>    * using its argument.
>    */
>   intr_barrier(NULL);
>
> intr_barrier(9) started using its argument as of this commit:
>
>   revision 1.53
>   date: 2020/06/16 23:35:10;  author: dlg;  state: Exp;  lines: +4 -3;
>   commitid: tVYPReymzTMuPlpA;
>   make intr_barrier run sched_barrier on the cpu the interrupt pinned to.
>
>   intr_barrier passed NULL to sched_barrier before this, which ends
>   up being the primary cpu. that's been mostly right until this point,
>   but is set to change.
>
>
>> How-To-Repeat:
> Boot a snapshot from 6/17 or later on a Xen domU with an xnf network interface.
>> Fix:
> Patching xen_intr_barrier() to do what intr_barrier(NULL) used to do
> eliminates the panic:
>
> Index: xen.c
> ===================================================================
> RCS file: /cvs/src/sys/dev/pv/xen.c,v
> retrieving revision 1.96
> diff -u -p -r1.96 xen.c
> --- xen.c       29 May 2020 04:42:25 -0000      1.96
> +++ xen.c       26 Jun 2020 13:28:10 -0000
> @@ -736,11 +736,7 @@ xen_intr_barrier(xen_intr_handle_t xih)
>         struct xen_softc *sc = xen_sc;
>         struct xen_intsrc *xi;
>
> -       /*
> -        * XXX This will need to be revised once intr_barrier starts
> -        * using its argument.
> -        */
> -       intr_barrier(NULL);
> +       sched_barrier(NULL);
>
>         if ((xi = xen_intsrc_acquire(sc, (evtchn_port_t)xih)) != NULL) {
>                 taskq_barrier(xi->xi_taskq);
THANK YOU!!!  That explains my panics when using OpenBSD-CURRENT as
a Qubes guest, but I wasn’t able to get a backtrace or process list
(except by screenshot), so I didn’t report the bug.

Sincerely,

Demi


signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: null deref in xen_intr_barrier()

Jonathan Gray-11
In reply to this post by toc-2
On Fri, Jun 26, 2020 at 01:55:28PM +0000, [hidden email] wrote:

> >Synopsis: null deref in xen_intr_barrier()
> >Category: kernel
> >Environment:
> System      : OpenBSD 6.7
> Details     : OpenBSD 6.7-current (GENERIC) #291: Fri Jun 26 01:56:51 MDT 2020
>
> Architecture: OpenBSD.amd64
> Machine     : amd64
> >Description:
>
> I have a system that's running as a guest under Xen; recent snapshots
> panic while bringing up the xnf(4) if. This can happen in the ramdisk kernel during
> a sysupgrade, or in a GENERIC kernel running netstart.
>
> starting network
> uvm_fault(0xfffffd810ed64110, 0x28, 0, 1) -> e
> kernel: page fault trap, code=0
> Stopped at      intr_barrier+0x6:       movq    0x28(%rdi),%rdi
> ddb> bt
> intr_barrier(0) at intr_barrier+0x6
> xen_intr_barrier(8) at xen_intr_barrier+0x1f
> xnf_stop(ffff800000193000) at xnf_stop+0x4c
> xnf_ioctl(ffff800000193120,8020690c,ffff80000063b000) at xnf_ioctl+0xd2
> in_ifinit(ffff800000193120,ffff80000063b000,ffff8000225ce400,1) at in_ifinit+0x
> f3
> in_ioctl_change_ifaddr(8040691a,ffff8000225ce3f0,ffff800000193120,1) at in_ioct
> l_change_ifaddr+0x376
> in_ioctl(8040691a,ffff8000225ce3f0,ffff800000193120,1) at in_ioctl+0x103
> ifioctl(fffffd81028cb648,8040691a,ffff8000225ce3f0,ffff8000225d0870) at ifioctl
> +0x98e
> sys_ioctl(ffff8000225d0870,ffff8000225ce500,ffff8000225ce560) at sys_ioctl+0x2c
> b
> syscall(ffff8000225ce5d0) at syscall+0x315
> Xsyscall() at Xsyscall+0x128
> end of kernel
> end trace frame: 0x7f7ffffe8c50, count: -11
>
> In the definition of xen_intr_barrier() in dev/pv/xen.c, we find:
>
>   /*
>    * XXX This will need to be revised once intr_barrier starts
>    * using its argument.
>    */
>   intr_barrier(NULL);
>
> intr_barrier(9) started using its argument as of this commit:
>
>   revision 1.53
>   date: 2020/06/16 23:35:10;  author: dlg;  state: Exp;  lines: +4 -3;
>   commitid: tVYPReymzTMuPlpA;
>   make intr_barrier run sched_barrier on the cpu the interrupt pinned to.
>
>   intr_barrier passed NULL to sched_barrier before this, which ends
>   up being the primary cpu. that's been mostly right until this point,
>   but is set to change.
>
>
> >How-To-Repeat:
> Boot a snapshot from 6/17 or later on a Xen domU with an xnf network interface.
> >Fix:
> Patching xen_intr_barrier() to do what intr_barrier(NULL) used to do
> eliminates the panic:

thanks, committed