lock order "&sched_lock"(sched_lock)

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

lock order "&sched_lock"(sched_lock)

Klemens Nanni-2
The amd64 snapshot

        OpenBSD 6.4-current (GENERIC) #492: Fri Dec 14 08:09:00 MST 2018

with a self-built, stock GENERIC.MP with debug options running inside
vmm(4) on my X230 shows a lock order reversal immediately after booting
(see dmesg at the end).

Steps to reproduce:

1. Apply the attached diff on a clean sys/
2. build the kernel
3. install a snapshot

        $ vmctl create vm/test.qcow2 -s 10G
        $ vmctl start pf -c -t rd -L -i 1 -d vm/test.qcow2
        [bsd.rd, all defaults]

4. boot new VM with debug kernel

        $ vmctl start pf -c -t rd -d vm/test.qcow2 -b bsd.debug

Sometimes it happens before the login prompt appears, sometimes I need
to login and issue some command - `pfctl -sr' often works.

I can only ever trigger this once right upon boot, not later again
during runtime.

I don't have time to look into this right now (chasing pf bugs already).

GENERIC.MP diff for debug options and dmesg of the VM including the lock
order reversal below.

Index: arch/amd64/conf/GENERIC.MP
===================================================================
RCS file: /cvs/src/sys/arch/amd64/conf/GENERIC.MP,v
retrieving revision 1.14
diff -u -p -r1.14 GENERIC.MP
--- arch/amd64/conf/GENERIC.MP 13 Jul 2018 05:25:24 -0000 1.14
+++ arch/amd64/conf/GENERIC.MP 19 Dec 2018 00:49:12 -0000
@@ -3,7 +3,9 @@
 include "arch/amd64/conf/GENERIC"
 
 option MULTIPROCESSOR
-#option MP_LOCKDEBUG
-#option WITNESS
+option MP_LOCKDEBUG
+option WITNESS
+option PFDEBUG
+option DEBUG
 
 cpu* at mainbus?


OpenBSD 6.4-current (GENERIC.MP) #10: Wed Dec 19 21:37:52 CET 2018
    [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 117440512 (112MB)
avail mem = 96862208 (92MB)
mtimeout delayed -1
path0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0
acpi at bios0 not configured
cpu0 timeout delayed -2
at mainbus0: (uniprocessor)
cpu0: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz, 2595.52 MHz, 06-3a-09
cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,CX16,SSE4.1,SSE4.2,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,LONG,LAHF,ITSC,FSGSBASE,SMEP,ERMS,MELTDOWN
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, pkg 0 (apicid 0x0, max_apicid 0x0, max_coreid 0x1, smt_bits 0x20, smt_mask 0x0, core_bits 0x0, core_mask 0x0, pkg_bits 0x20, pkg_mask 0xffffffff)
pvbus0 at mainbus0: OpenBSD
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "OpenBSD VMM Host" rev 0x00
virtio0 at pci0 dev 1 function 0 "Qumranet Virtio RNG" rev 0x00
viornd0 at virtio0
virtio0: irq 3
virtio1 at pci0 dev 2 function 0 "Qumranet Virtio Storage" rev 0x00
vioblk0 at virtio1
scsibus1 at vioblk0: 2 targets
sd0 at scsibus1 targ 0 lun 0: <VirtIO, Block Device, > SCSI3 0/direct fixed
sd0: 10240MB, 512 bytes/sector, 20971520 sectors
virtio1: irq 5
virtio2 at pci0 dev 3 function 0 "OpenBSD VMM Control" rev 0x00
vmmci0 at virtio2
virtio2: irq 6
isa0 at mainbus0
isadma0 at isa0
com0 at isa0 port 0x3f8/8 irq 4: ns16450, no fifo
com0: console
lpt_isa_probe: mask ff data 55 failed
Timecounters tick every 10.000 msec
timeout delayed -2147333648
timeout delayed -2147393647
timeout delayed -2147393647
timeout delayed -2147393647
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
dev=0x402 chrdev=0xd02 rawdev=0xd02
root on sd0a (95dcfec90818276d.a) swap on sd0b dump on sd0b
dev=0x400 chrdev=0xd00 rawdev=0xd02
init: copying out path `/sbin/init' 11
timeout delayed -1
lock order reversal:
 1st 0xffffffff81cb7a68 /usr/src/sys/kern/kern_timeout.c:92 (/usr/src/sys/kern/kern_timeout.c:92) @ /usr/src/sys/kern/kern_timeout.c:414
 2nd 0xffffffff81dea490 &sched_lock (&sched_lock) @ /usr/src/sys/kern/kern_synch.c:444
lock order "&sched_lock"(sched_lock) -> "/usr/src/sys/kern/kern_timeout.c:92"(mutex) first seen at:
#0  witness_checkorder+0x4c0
#1  _mtx_enter+0x32
#2  timeout_add+0x49
#3  tsleep+0xb1
#4  setroot+0xf4
#5  diskconf+0x193
#6  main+0x5f6
lock order "/usr/src/sys/kern/kern_timeout.c:92"(mutex) -> "&sched_lock"(sched_lock) first seen at:
#0  witness_checkorder+0x4c0
#1  ___mp_lock+0x72
#2  wakeup_n+0x39
#3  knote+0xe2
#4  selwakeup+0x2b
#5  logwakeup+0x28
#6  printf+0xb0
#7  softclock+0xcf
#8  softintr_dispatch+0x118
#9  Xsoftclock+0x1f
#10 uvm_anfree+0x54
#11 amap_wipeout+0xe5
#12 uvm_unmap_detach+0x90
#13 uvm_map_teardown+0x1d2
#14 uvmspace_free+0x4f
#15 uvm_exit+0x22
#16 reaper+0x156
timeout delayed -2
timeout delayed -1
timeout delayed -1
timeout delayed -1
timeout delayed -1
timeout delayed -1
timeout delayed -1
timeout delayed -1
timeout delayed -2
timeout delayed -1
timeout delayed -2
timeout delayed -1
timeout delayed -1
timeout delayed -2

Reply | Threaded
Open this post in threaded view
|

Re: lock order "&sched_lock"(sched_lock)

Mike Larkin
On Wed, Dec 19, 2018 at 10:28:44PM +0100, Klemens Nanni wrote:

> The amd64 snapshot
>
> OpenBSD 6.4-current (GENERIC) #492: Fri Dec 14 08:09:00 MST 2018
>
> with a self-built, stock GENERIC.MP with debug options running inside
> vmm(4) on my X230 shows a lock order reversal immediately after booting
> (see dmesg at the end).
>
> Steps to reproduce:
>
> 1. Apply the attached diff on a clean sys/
> 2. build the kernel
> 3. install a snapshot
>
> $ vmctl create vm/test.qcow2 -s 10G
> $ vmctl start pf -c -t rd -L -i 1 -d vm/test.qcow2
> [bsd.rd, all defaults]
>
> 4. boot new VM with debug kernel
>
> $ vmctl start pf -c -t rd -d vm/test.qcow2 -b bsd.debug
>
> Sometimes it happens before the login prompt appears, sometimes I need
> to login and issue some command - `pfctl -sr' often works.
>
> I can only ever trigger this once right upon boot, not later again
> during runtime.
>
> I don't have time to look into this right now (chasing pf bugs already).

Nor does anyone else, considering this was reported at least half a dozen
times. A quick perusal of the list would have saved you a lot of time.

The 8254 legacy clock isn't MP safe, and there is no desire to make it so.

-ml

>
> GENERIC.MP diff for debug options and dmesg of the VM including the lock
> order reversal below.
>
> Index: arch/amd64/conf/GENERIC.MP
> ===================================================================
> RCS file: /cvs/src/sys/arch/amd64/conf/GENERIC.MP,v
> retrieving revision 1.14
> diff -u -p -r1.14 GENERIC.MP
> --- arch/amd64/conf/GENERIC.MP 13 Jul 2018 05:25:24 -0000 1.14
> +++ arch/amd64/conf/GENERIC.MP 19 Dec 2018 00:49:12 -0000
> @@ -3,7 +3,9 @@
>  include "arch/amd64/conf/GENERIC"
>  
>  option MULTIPROCESSOR
> -#option MP_LOCKDEBUG
> -#option WITNESS
> +option MP_LOCKDEBUG
> +option WITNESS
> +option PFDEBUG
> +option DEBUG
>  
>  cpu* at mainbus?
>
>
> OpenBSD 6.4-current (GENERIC.MP) #10: Wed Dec 19 21:37:52 CET 2018
>     [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 117440512 (112MB)
> avail mem = 96862208 (92MB)
> mtimeout delayed -1
> path0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0
> acpi at bios0 not configured
> cpu0 timeout delayed -2
> at mainbus0: (uniprocessor)
> cpu0: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz, 2595.52 MHz, 06-3a-09
> cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,CX16,SSE4.1,SSE4.2,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,LONG,LAHF,ITSC,FSGSBASE,SMEP,ERMS,MELTDOWN
> cpu0: 256KB 64b/line 8-way L2 cache
> cpu0: smt 0, core 0, pkg 0 (apicid 0x0, max_apicid 0x0, max_coreid 0x1, smt_bits 0x20, smt_mask 0x0, core_bits 0x0, core_mask 0x0, pkg_bits 0x20, pkg_mask 0xffffffff)
> pvbus0 at mainbus0: OpenBSD
> pci0 at mainbus0 bus 0
> pchb0 at pci0 dev 0 function 0 "OpenBSD VMM Host" rev 0x00
> virtio0 at pci0 dev 1 function 0 "Qumranet Virtio RNG" rev 0x00
> viornd0 at virtio0
> virtio0: irq 3
> virtio1 at pci0 dev 2 function 0 "Qumranet Virtio Storage" rev 0x00
> vioblk0 at virtio1
> scsibus1 at vioblk0: 2 targets
> sd0 at scsibus1 targ 0 lun 0: <VirtIO, Block Device, > SCSI3 0/direct fixed
> sd0: 10240MB, 512 bytes/sector, 20971520 sectors
> virtio1: irq 5
> virtio2 at pci0 dev 3 function 0 "OpenBSD VMM Control" rev 0x00
> vmmci0 at virtio2
> virtio2: irq 6
> isa0 at mainbus0
> isadma0 at isa0
> com0 at isa0 port 0x3f8/8 irq 4: ns16450, no fifo
> com0: console
> lpt_isa_probe: mask ff data 55 failed
> Timecounters tick every 10.000 msec
> timeout delayed -2147333648
> timeout delayed -2147393647
> timeout delayed -2147393647
> timeout delayed -2147393647
> vscsi0 at root
> scsibus2 at vscsi0: 256 targets
> softraid0 at root
> scsibus3 at softraid0: 256 targets
> dev=0x402 chrdev=0xd02 rawdev=0xd02
> root on sd0a (95dcfec90818276d.a) swap on sd0b dump on sd0b
> dev=0x400 chrdev=0xd00 rawdev=0xd02
> init: copying out path `/sbin/init' 11
> timeout delayed -1
> lock order reversal:
>  1st 0xffffffff81cb7a68 /usr/src/sys/kern/kern_timeout.c:92 (/usr/src/sys/kern/kern_timeout.c:92) @ /usr/src/sys/kern/kern_timeout.c:414
>  2nd 0xffffffff81dea490 &sched_lock (&sched_lock) @ /usr/src/sys/kern/kern_synch.c:444
> lock order "&sched_lock"(sched_lock) -> "/usr/src/sys/kern/kern_timeout.c:92"(mutex) first seen at:
> #0  witness_checkorder+0x4c0
> #1  _mtx_enter+0x32
> #2  timeout_add+0x49
> #3  tsleep+0xb1
> #4  setroot+0xf4
> #5  diskconf+0x193
> #6  main+0x5f6
> lock order "/usr/src/sys/kern/kern_timeout.c:92"(mutex) -> "&sched_lock"(sched_lock) first seen at:
> #0  witness_checkorder+0x4c0
> #1  ___mp_lock+0x72
> #2  wakeup_n+0x39
> #3  knote+0xe2
> #4  selwakeup+0x2b
> #5  logwakeup+0x28
> #6  printf+0xb0
> #7  softclock+0xcf
> #8  softintr_dispatch+0x118
> #9  Xsoftclock+0x1f
> #10 uvm_anfree+0x54
> #11 amap_wipeout+0xe5
> #12 uvm_unmap_detach+0x90
> #13 uvm_map_teardown+0x1d2
> #14 uvmspace_free+0x4f
> #15 uvm_exit+0x22
> #16 reaper+0x156
> timeout delayed -2
> timeout delayed -1
> timeout delayed -1
> timeout delayed -1
> timeout delayed -1
> timeout delayed -1
> timeout delayed -1
> timeout delayed -1
> timeout delayed -2
> timeout delayed -1
> timeout delayed -2
> timeout delayed -1
> timeout delayed -1
> timeout delayed -2
>