panic: Stopped at kqueue_scan

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

panic: Stopped at kqueue_scan

Olivier ANTOINE-2
>Synopsis:    panic: Stopped at kqueue_scan
>Category:    kernel i386
>Environment:
    System      : OpenBSD 6.5
    Details     : OpenBSD 6.5-current (GENERIC.MP) #1368: Sun Apr 21
19:50:46 MDT 2019
             [hidden email]:/usr/src/sys/arch/i386/compile/GENERIC.MP

    Architecture: OpenBSD.i386
    Machine     : i386
>Description:
Hi, since my last update I have regular panic crashes. 4 in two days.
At least 3 of them, with certainty, occurred while I was accessing the
Internet via my smartphone connected to my OpenBSD WiFi access point
through my Allways-on VPN isakmp/ipsec/nppp relaying traffic in Tor.
This setup works for years.

The machine then displays something like:
uvm_fault(0xd34e5f3c, 0x0, 0, 2) -> e
kernel: page fault trap, code=0
Stopped at kqueue_scan+0x246: movl %eax,0(%ecx)
ddb{1}>

The USB keyboard is then not functional to provide more information.

I had not noticed any malfunction on the previous kernel I was using:
OpenBSD 6.5 (GENERIC.MP) #1353: Thu Apr 11 08:28:28:28 MDT 2019

The problem I am experiencing is very similar to the one Aaron Bieber
encountered in this message:
https://marc.info/?l=openbsd-bugs&m=149935139022501
The resolution mentions problems with NET_LOCK...

2 commit since 11/04 mention NET_LOCK (if it helps):
https://marc.info/?l=openbsd-cvs&m=155538747124306
https://marc.info/?l=openbsd-cvs&m=155522503210681


Cheers,

dmesg:
OpenBSD 6.5-current (GENERIC.MP) #1368: Sun Apr 21 19:50:46 MDT 2019
    [hidden email]:/usr/src/sys/arch/i386/compile/GENERIC.MP
real mem  = 1064452096 (1015MB)
avail mem = 1029312512 (981MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: date 01/04/10, BIOS32 rev. 0 @ 0xf0010, SMBIOS rev.
2.5 @ 0xf06b0 (32 entries)
bios0: vendor American Megatrends Inc. version "1302" date 01/04/2010
bios0: ASUSTeK Computer INC. B202
acpi0 at bios0: rev 2
acpi0: sleep states S0 S1 S3 S4 S5
acpi0: tables DSDT FACP APIC MCFG OEMB HPET SSDT
acpi0: wakeup devices P0P2(S4) P0P1(S4) MC97(S4) P0P4(S4) P0P5(S4)
P0P7(S4) P0P8(S4) P0P9(S4) USB0(S4) USB1(S4) EUSB(S4) P0P6(S4)
HDAC(S4) USB2(S4) USB3(S4)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Atom(TM) CPU N270 @ 1.60GHz ("GenuineIntel" 686-class)
1.65 GHz, 06-1c-02
cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,EST,TM2,SSSE3,xTPR,PDCM,MOVBE,NXE,LAHF,PERF,SENSOR,MELTDOWN
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 134MHz
cpu0: mwait min=64, max=64, C-substates=0.2.2.0.2, IBE
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Intel(R) Atom(TM) CPU N270 @ 1.60GHz ("GenuineIntel" 686-class)
1.61 GHz, 06-1c-02
cpu1: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,EST,TM2,SSSE3,xTPR,PDCM,MOVBE,NXE,LAHF,PERF,SENSOR,MELTDOWN
ioapic0 at mainbus0: apid 2 pa 0xfec00000, version 20, 24 pins, remapped
acpimcfg0 at acpi0
acpimcfg0: addr 0xe0000000, bus 0-255
acpihpet0 at acpi0: 14318179 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 4 (P0P1)
acpiprt2 at acpi0: bus 3 (P0P4)
acpiprt3 at acpi0: bus 2 (P0P5)
acpiprt4 at acpi0: bus 1 (P0P6)
acpicpu0 at acpi0: C1(@1 halt!), PSS
acpicpu1 at acpi0: C1(@1 halt!), PSS
"PNP0A08" at acpi0 not configured
acpicmos0 at acpi0
aibs0 at acpi0
aibs0: TSIF not found
aibs0: FSIF not found
aibs0: VSIF not found
aibs0: no sensors found
acpibtn0 at acpi0: PWRB
acpivideo0 at acpi0: VGA_
bios0: ROM list: 0xc0000/0xec00!
cpu0: Enhanced SpeedStep 1609 MHz: speeds: 1600, 1333, 1067, 800 MHz
pci0 at mainbus0 bus 0: configuration mode 1 (bios)
pchb0 at pci0 dev 0 function 0 "Intel 82945GME Host" rev 0x03
inteldrm0 at pci0 dev 2 function 0 "Intel 82945GME Video" rev 0x03
drm0 at inteldrm0
intagp0 at inteldrm0
agp0 at intagp0: aperture at 0xd0000000, size 0x10000000
inteldrm0: apic 2 int 16
"Intel 82945GM Video" rev 0x03 at pci0 dev 2 function 1 not configured
azalia0 at pci0 dev 27 function 0 "Intel 82801GB HD Audio" rev 0x02: msi
azalia0: codecs: Realtek ALC888
audio0 at azalia0
ppb0 at pci0 dev 28 function 0 "Intel 82801GB PCIE" rev 0x02: apic 2 int 16
pci1 at ppb0 bus 3
re0 at pci1 dev 0 function 0 "Realtek 8168" rev 0x02: RTL8168C/8111C
(0x3c00), msi, address 00:22:15:d5:1d:4f
rgephy0 at re0 phy 7: RTL8169S/8110S/8211 PHY, rev. 2
ppb1 at pci0 dev 28 function 1 "Intel 82801GB PCIE" rev 0x02: apic 2 int 17
pci2 at ppb1 bus 2
ral0 at pci2 dev 0 function 0 "Ralink RT2790" rev 0x00: apic 2 int 17,
address 00:22:43:24:92:32
ral0: MAC/BBP RT2872 (rev 0x0200), RF RT2720 (MIMO 1T2R)
ppb2 at pci0 dev 28 function 2 "Intel 82801GB PCIE" rev 0x02: apic 2 int 18
pci3 at ppb2 bus 1
uhci0 at pci0 dev 29 function 0 "Intel 82801GB USB" rev 0x02: apic 2 int 23
uhci1 at pci0 dev 29 function 1 "Intel 82801GB USB" rev 0x02: apic 2 int 19
uhci2 at pci0 dev 29 function 2 "Intel 82801GB USB" rev 0x02: apic 2 int 21
uhci3 at pci0 dev 29 function 3 "Intel 82801GB USB" rev 0x02: apic 2 int 22
ehci0 at pci0 dev 29 function 7 "Intel 82801GB USB" rev 0x02: apic 2 int 23
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 configuration 1 interface 0 "Intel EHCI root hub" rev
2.00/1.00 addr 1
ppb3 at pci0 dev 30 function 0 "Intel 82801BAM Hub-to-PCI" rev 0xe2
pci4 at ppb3 bus 4
ichpcib0 at pci0 dev 31 function 0 "Intel 82801GBM LPC" rev 0x02: PM disabled
pciide0 at pci0 dev 31 function 2 "Intel 82801GBM SATA" rev 0x02: DMA,
channel 0 wired to compatibility, channel 1 wired to compatibility
wd0 at pciide0 channel 0 drive 0: <HITACHI HTS543232A7A384>
wd0: 16-sector PIO, LBA48, 305245MB, 625142448 sectors
wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 5
ichiic0 at pci0 dev 31 function 3 "Intel 82801GB SMBus" rev 0x02: apic 2 int 19
iic0 at ichiic0
spdmem0 at iic0 addr 0x50: 512MB DDR2 SDRAM non-parity PC2-5300CL5 SO-DIMM
spdmem1 at iic0 addr 0x51: 512MB DDR2 SDRAM non-parity PC2-5300CL5 SO-DIMM
usb1 at uhci0: USB revision 1.0
uhub1 at usb1 configuration 1 interface 0 "Intel UHCI root hub" rev
1.00/1.00 addr 1
usb2 at uhci1: USB revision 1.0
uhub2 at usb2 configuration 1 interface 0 "Intel UHCI root hub" rev
1.00/1.00 addr 1
usb3 at uhci2: USB revision 1.0
uhub3 at usb3 configuration 1 interface 0 "Intel UHCI root hub" rev
1.00/1.00 addr 1
usb4 at uhci3: USB revision 1.0
uhub4 at usb4 configuration 1 interface 0 "Intel UHCI root hub" rev
1.00/1.00 addr 1
isa0 at ichpcib0
isadma0 at isa0
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0: console keyboard
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
wbsio0 at isa0 port 0x2e/2: W83627DHG rev 0x23
lm1 at wbsio0 port 0x290/8: W83627DHG
npx0 at isa0 port 0xf0/16: reported by CPUID; using exception 16
umass0 at uhub0 port 6 configuration 1 interface 0 "Generic
USB2.0-CRW" rev 2.00/58.87 addr 2
umass0: using SCSI over Bulk-Only
scsibus1 at umass0: 2 targets, initiator 0
sd0 at scsibus1 targ 1 lun 0: <Generic-, Multi-Card, 1.00> SCSI0
0/direct removable serial.0bda0158114173400000
uhidev0 at uhub2 port 1 configuration 1 interface 0 "vendor 0x0e6a HID
Keyboard Device" rev 1.10/0.07 addr 2
uhidev0: iclass 3/1
ukbd0 at uhidev0: 8 variable keys, 6 key codes
wskbd1 at ukbd0 mux 1
uhidev1 at uhub2 port 1 configuration 1 interface 1 "vendor 0x0e6a HID
Keyboard Device" rev 1.10/0.07 addr 2
uhidev1: iclass 3/0, 2 report ids
uhid0 at uhidev1 reportid 1: input=2, output=0, feature=0
uhid1 at uhidev1 reportid 2: input=1, output=0, feature=0
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
softraid0: sd1 was not shutdown properly
sd1 at scsibus3 targ 1 lun 0: <OPENBSD, SR CRYPTO, 006> SCSI2 0/direct fixed
sd1: 305242MB, 512 bytes/sector, 625136753 sectors
root on sd1a (f49cd53d5488216d.a) swap on sd1b dump on sd1b
WARNING: / was not properly unmounted
inteldrm0: 1024x768, 32bpp
wsdisplay0 at inteldrm0 mux 1: console (std, vt100 emulation), using wskbd0
wskbd1: connecting to wsdisplay0
wsdisplay0: screen 1-5 added (std, vt100 emulation)

usbdevs:
Controller /dev/usb0:
addr 01: 8086:0000 Intel, EHCI root hub
     high speed, self powered, config 1, rev 1.00
     driver: uhub0
addr 02: 0bda:0158 Generic, USB2.0-CRW
     high speed, power 500 mA, config 1, rev 58.87, iSerial 20071114173400000
     driver: umass0
addr 03: 0bc2:61b6 Seagate, M3
     high speed, power 100 mA, config 1, rev 0.00, iSerial NM12KPRV
     driver: umass1
Controller /dev/usb1:
addr 01: 8086:0000 Intel, UHCI root hub
     full speed, self powered, config 1, rev 1.00
     driver: uhub1
Controller /dev/usb2:
addr 01: 8086:0000 Intel, UHCI root hub
     full speed, self powered, config 1, rev 1.00
     driver: uhub2
addr 02: 0e6a:6001 vendor 0x0e6a, HID Keyboard Device
     low speed, power 100 mA, config 1, rev 0.07
     driver: uhidev0
     driver: uhidev1
Controller /dev/usb3:
addr 01: 8086:0000 Intel, UHCI root hub
     full speed, self powered, config 1, rev 1.00
     driver: uhub3
Controller /dev/usb4:
addr 01: 8086:0000 Intel, UHCI root hub
     full speed, self powered, config 1, rev 1.00
     driver: uhub4

Reply | Threaded
Open this post in threaded view
|

Re: panic: Stopped at kqueue_scan

Martin Pieuchot
On 23/04/19(Tue) 12:16, Olivier Antoine wrote:

> >Synopsis:    panic: Stopped at kqueue_scan
> >Category:    kernel i386
> >Environment:
>     System      : OpenBSD 6.5
>     Details     : OpenBSD 6.5-current (GENERIC.MP) #1368: Sun Apr 21
> 19:50:46 MDT 2019
>              [hidden email]:/usr/src/sys/arch/i386/compile/GENERIC.MP
>
>     Architecture: OpenBSD.i386
>     Machine     : i386
> >Description:
> Hi, since my last update I have regular panic crashes. 4 in two days.
> At least 3 of them, with certainty, occurred while I was accessing the
> Internet via my smartphone connected to my OpenBSD WiFi access point
> through my Allways-on VPN isakmp/ipsec/nppp relaying traffic in Tor.
> This setup works for years.
>
> The machine then displays something like:
> uvm_fault(0xd34e5f3c, 0x0, 0, 2) -> e
> kernel: page fault trap, code=0
> Stopped at kqueue_scan+0x246: movl %eax,0(%ecx)
> ddb{1}>
>
> The USB keyboard is then not functional to provide more information.

Could you set the following in /etc/sysctl.conf:

machdep.forceukbd=1

That should hopefully give you a functional keyboard in DDB.  With that
it would be nice to see ps, trace..

>
> I had not noticed any malfunction on the previous kernel I was using:
> OpenBSD 6.5 (GENERIC.MP) #1353: Thu Apr 11 08:28:28:28 MDT 2019
>
> The problem I am experiencing is very similar to the one Aaron Bieber
> encountered in this message:
> https://marc.info/?l=openbsd-bugs&m=149935139022501
> The resolution mentions problems with NET_LOCK...
>
> 2 commit since 11/04 mention NET_LOCK (if it helps):
> https://marc.info/?l=openbsd-cvs&m=155538747124306
> https://marc.info/?l=openbsd-cvs&m=155522503210681

This code has been since backed out.  However the problem it was
exposing has never been found.

Reply | Threaded
Open this post in threaded view
|

Re: panic: Stopped at kqueue_scan

Olivier ANTOINE-2
On Tue, Apr 23, 2019 at 4:01 PM Martin Pieuchot <[hidden email]> wrote:
> > The USB keyboard is then not functional to provide more information.
>
> Could you set the following in /etc/sysctl.conf:
>
> machdep.forceukbd=1
>
> That should hopefully give you a functional keyboard in DDB.  With that
> it would be nice to see ps, trace..

USB keyboard works now! Thanks. And dmesg kept history on warm reboot!
Great! But without ddb input.
I've typed 'ps', 'trace', 'mach ddbcpu 0', 'trace' and edit trace below.

uvm_fault(0xd34e4e04, 0x0, 0, 2) -> e
kernel: page fault trap, code=0
Stopped at      kqueue_scan+0x246:      movl    %eax,0(%ecx)

ddb>ps
PID     TID   PPID    UID  S       FLAGS  WAIT          COMMAND
 89513  437697  16659   1000  3    0x100083  ttyin         ksh
 16659  408882  51585   1000  3        0x90  select        sshd
 51585  131494  11094      0  3        0x92  poll          sshd
 83441   88878  54510   1000  3    0x100083  ttyin         ksh
 35397  179689  93194    509  3        0x82  poll          bitlbee
 18370   37576  93194  32767  3    0x100082  poll          nc
 54326  495139  54453   1000  3        0x83  poll          irssi
 54326  282579  54453   1000  3   0x4000083  poll          irssi
 54453  417822  54510   1000  3    0x10008b  pause         ksh
 54510  355760      1   1000  3    0x100080  kqread        tmux
 42801  333326      1      0  3    0x100083  ttyin         getty
 32464  248590      1      0  3    0x100083  ttyin         getty
 30691  252201      1      0  3    0x100083  ttyin         getty
 68172  171527      1      0  3    0x100083  ttyin         getty
  9923    2396      1      0  3    0x100083  ttyin         getty
 72757  418845      1      0  3    0x100098  poll          cron
 94232  270608      1    566  3        0x90  kqread        tor
 61066  380412      1     99  3    0x100090  poll          sndiod
 48037  106224      1    110  3    0x100090  poll          sndiod
 93194  311760      1      0  3    0x100090  kqread        inetd
 40141  303643  37303     95  3    0x100092  kqread        smtpd
 49177  273097  37303    103  3    0x100092  kqread        smtpd
 14136  436066  37303     95  3    0x100092  kqread        smtpd
 43555  114382  37303     95  3    0x100092  kqread        smtpd
 24076  293064  37303     95  3    0x100092  kqread        smtpd
 39821  382366  37303     95  3    0x100092  kqread        smtpd
 37303  419658      1      0  3    0x100080  kqread        smtpd
 34615   65679  66951     94  3    0x100092  kqread        rad
 27672   58256  66951     94  3    0x100092  kqread        rad
 66951  350036      1      0  3    0x100080  kqread        rad
 48906  239038      1     77  3    0x100090  poll          dhcpd
 11094  239419      1      0  3        0x80  select        sshd
 32082  523496  25690      0  3        0x80  netio         npppd
*25690  329908      1     82  7        0x10                npppd
 53170  152811  39119     68  3        0x90  select        isakmpd
 39119  178351      1      0  3        0x80  netio         isakmpd
 33761   61404      1      0  3    0x100080  poll          ntpd
 77281  283842  53057     83  3    0x100092  poll          ntpd
 53057  157753      1     83  3    0x100092  poll          ntpd
 29310  478555  82716     74  3    0x100092  bpf           pflogd
 82716  393202      1      0  3        0x80  netio         pflogd
 98294  140613  22386     73  2    0x100090                syslogd
 22386  352951      1      0  3    0x100082  netio         syslogd
 42363  172181      1     77  3    0x100090  poll          dhclient
 16574  286835      1      0  3        0x80  poll          dhclient
 20288  509789  38319    115  3    0x100092  kqread        slaacd
 14626  434941  38319    115  3    0x100092  kqread        slaacd
 38319  486906      1      0  3    0x100080  kqread        slaacd
 99619   63098      0      0  3     0x14280  schto         i915/signal:0
 61419  255834      0      0  3     0x14200  bored         i915-userptr-acq
 52758  356166      0      0  3     0x14200  bored         i915_modeset
 63752  394886      0      0  3     0x14200  bored         i915-dp
 28426    1278      0      0  3     0x14200  bored         i915
 40838  107636      0      0  3     0x14200  pgzero        zerothread
 85345  182218      0      0  3     0x14200  aiodoned      aiodoned
 79526  249789      0      0  3     0x14200  syncer        update
77682  102635      0      0  3     0x14200  cleaner       cleaner
  1527  169933      0      0  3     0x14200  reaper        reaper
 27493   42773      0      0  3     0x14200  pgdaemon      pagedaemon
 12828  137806      0      0  3     0x14200  bored         srdis
 99385  151993      0      0  3     0x14200  bored         crynlk
 75911  237067      0      0  3     0x14200  bored         crypto
 37078   83962      0      0  3     0x14200  usbtsk        usbtask
 97911   75679      0      0  3     0x14200  usbatsk       usbatsk
 63855  236832      0      0  3     0x14200  bored         drmtskl
 30522  145007      0      0  3     0x14200  bored         drmlwq
 83412   71690      0      0  3     0x14200  bored         drmubwq
 41239  484259      0      0  3     0x14200  bored         drmwq
 97281  500218      0      0  3     0x14200  bored         sensors
 31343  314304      0      0  3  0x40014200  acpi0         acpi0
 75414  439743      0      0  3  0x40014200                idle1
 33809  303054      0      0  7     0x14200                softnet
  3069  109594      0      0  3     0x14200  bored         systqmp
 62451  272774      0      0  3     0x14200  bored         systq
 34597  129188      0      0  3  0x40014200  bored         softclock
 88092  315407      0      0  3  0x40014200                idle0
 29326  170399      0      0  3     0x14200  bored         smr
 61127  520207      0      0  3     0x14200  kmalloc       kmthread
     1  516534      0      0  3        0x82  wait          init
     0       0     -1      0  3     0x10200  scheduler     swapper
ddb{1}> trace
kqueue_scan(d3453dd4,40,60aa5800,f5cbbab0,d3248a24,f5cbbaac) at
kqueue_scan+0x246
sys_kevent(d3248a24,f5cbbbd0,f5cbbbc8) at sys_kevent+0x24f
syscall(f5cbbc10) at syscall+0x25e
Xsyscall_untramp(2b,200246,cf7f288c,33,0) at Xsyscall_untramp+0xa9
end of kernel
ddb{1}> mach ddbcpu 0
ddb{0}> trace
db_enter() at db_enter+0x4
i386_ipi_handler() at i386_ipi_handler+0x3f
Xipi_untramp(50,200297,d365af40,0,f534e9f4) at Xipi_untramp+0xc2
___mp_lock(d0df41a0) at ___mp_lock+0x52
intr_handler(f534ea04,d365af40) at intr_handler+0x1d
Xintr_ioapic3_untramp(50,200297,0,1,f534ea74) at Xintr_ioapic3_untramp+0xdf
_kernel_lock(d0bbfa80,52) at _kernel_lock+0x8e
softintr_dispatch(0) at softintr_dispatch+0x18
Xsoftclock(8,200297,d32abad0,d32aba84,f534eae0) at Xsoftclock+0x12
_kernel_lock(d0b71e8b,19a) at _kernel_lock+0x82
sowakeup(d32aba84,d32abad0) at sowakeup+0x88
sorwakeup(d32aba84) at sorwakeup+0x53
tcp_input(f534ec5c,f534ec58,6,2) at tcp_input+0x33a5
ip_deliver(f534ec5c,f534ec58,6,2) at ip_deliver+0x224
ip_local(f534ec5c,f534ec58,d02af0da,0) at ip_local+0x142
ipintr() at ipintr+0x58
if_netisr(0) at if_netisr+0x65
taskq_thread(d3501040) at


Hope this help!

Cheers,

--
Olivier

Reply | Threaded
Open this post in threaded view
|

Re: panic: Stopped at kqueue_scan

Olivier ANTOINE-2
Hi, could it be that the problem is related to pipex?
My npppd.conf rely on the default pipex settings (on)
But net.pipex.enable is 0 since I have no explicit settings in my
/etc/sysctl.conf
Under these conditions, I have no difficulty to crash the machine.
With net.pipex.enable=1 I have not yet noticed any problem.

Cheers,

On Tue, Apr 23, 2019 at 9:03 PM Olivier Antoine
<[hidden email]> wrote:

>
> On Tue, Apr 23, 2019 at 4:01 PM Martin Pieuchot <[hidden email]> wrote:
> > > The USB keyboard is then not functional to provide more information.
> >
> > Could you set the following in /etc/sysctl.conf:
> >
> > machdep.forceukbd=1
> >
> > That should hopefully give you a functional keyboard in DDB.  With that
> > it would be nice to see ps, trace..
>
> USB keyboard works now! Thanks. And dmesg kept history on warm reboot!
> Great! But without ddb input.
> I've typed 'ps', 'trace', 'mach ddbcpu 0', 'trace' and edit trace below.
>
> uvm_fault(0xd34e4e04, 0x0, 0, 2) -> e
> kernel: page fault trap, code=0
> Stopped at      kqueue_scan+0x246:      movl    %eax,0(%ecx)
>
> ddb>ps
> PID     TID   PPID    UID  S       FLAGS  WAIT          COMMAND
>  89513  437697  16659   1000  3    0x100083  ttyin         ksh
>  16659  408882  51585   1000  3        0x90  select        sshd
>  51585  131494  11094      0  3        0x92  poll          sshd
>  83441   88878  54510   1000  3    0x100083  ttyin         ksh
>  35397  179689  93194    509  3        0x82  poll          bitlbee
>  18370   37576  93194  32767  3    0x100082  poll          nc
>  54326  495139  54453   1000  3        0x83  poll          irssi
>  54326  282579  54453   1000  3   0x4000083  poll          irssi
>  54453  417822  54510   1000  3    0x10008b  pause         ksh
>  54510  355760      1   1000  3    0x100080  kqread        tmux
>  42801  333326      1      0  3    0x100083  ttyin         getty
>  32464  248590      1      0  3    0x100083  ttyin         getty
>  30691  252201      1      0  3    0x100083  ttyin         getty
>  68172  171527      1      0  3    0x100083  ttyin         getty
>   9923    2396      1      0  3    0x100083  ttyin         getty
>  72757  418845      1      0  3    0x100098  poll          cron
>  94232  270608      1    566  3        0x90  kqread        tor
>  61066  380412      1     99  3    0x100090  poll          sndiod
>  48037  106224      1    110  3    0x100090  poll          sndiod
>  93194  311760      1      0  3    0x100090  kqread        inetd
>  40141  303643  37303     95  3    0x100092  kqread        smtpd
>  49177  273097  37303    103  3    0x100092  kqread        smtpd
>  14136  436066  37303     95  3    0x100092  kqread        smtpd
>  43555  114382  37303     95  3    0x100092  kqread        smtpd
>  24076  293064  37303     95  3    0x100092  kqread        smtpd
>  39821  382366  37303     95  3    0x100092  kqread        smtpd
>  37303  419658      1      0  3    0x100080  kqread        smtpd
>  34615   65679  66951     94  3    0x100092  kqread        rad
>  27672   58256  66951     94  3    0x100092  kqread        rad
>  66951  350036      1      0  3    0x100080  kqread        rad
>  48906  239038      1     77  3    0x100090  poll          dhcpd
>  11094  239419      1      0  3        0x80  select        sshd
>  32082  523496  25690      0  3        0x80  netio         npppd
> *25690  329908      1     82  7        0x10                npppd
>  53170  152811  39119     68  3        0x90  select        isakmpd
>  39119  178351      1      0  3        0x80  netio         isakmpd
>  33761   61404      1      0  3    0x100080  poll          ntpd
>  77281  283842  53057     83  3    0x100092  poll          ntpd
>  53057  157753      1     83  3    0x100092  poll          ntpd
>  29310  478555  82716     74  3    0x100092  bpf           pflogd
>  82716  393202      1      0  3        0x80  netio         pflogd
>  98294  140613  22386     73  2    0x100090                syslogd
>  22386  352951      1      0  3    0x100082  netio         syslogd
>  42363  172181      1     77  3    0x100090  poll          dhclient
>  16574  286835      1      0  3        0x80  poll          dhclient
>  20288  509789  38319    115  3    0x100092  kqread        slaacd
>  14626  434941  38319    115  3    0x100092  kqread        slaacd
>  38319  486906      1      0  3    0x100080  kqread        slaacd
>  99619   63098      0      0  3     0x14280  schto         i915/signal:0
>  61419  255834      0      0  3     0x14200  bored         i915-userptr-acq
>  52758  356166      0      0  3     0x14200  bored         i915_modeset
>  63752  394886      0      0  3     0x14200  bored         i915-dp
>  28426    1278      0      0  3     0x14200  bored         i915
>  40838  107636      0      0  3     0x14200  pgzero        zerothread
>  85345  182218      0      0  3     0x14200  aiodoned      aiodoned
>  79526  249789      0      0  3     0x14200  syncer        update
> 77682  102635      0      0  3     0x14200  cleaner       cleaner
>   1527  169933      0      0  3     0x14200  reaper        reaper
>  27493   42773      0      0  3     0x14200  pgdaemon      pagedaemon
>  12828  137806      0      0  3     0x14200  bored         srdis
>  99385  151993      0      0  3     0x14200  bored         crynlk
>  75911  237067      0      0  3     0x14200  bored         crypto
>  37078   83962      0      0  3     0x14200  usbtsk        usbtask
>  97911   75679      0      0  3     0x14200  usbatsk       usbatsk
>  63855  236832      0      0  3     0x14200  bored         drmtskl
>  30522  145007      0      0  3     0x14200  bored         drmlwq
>  83412   71690      0      0  3     0x14200  bored         drmubwq
>  41239  484259      0      0  3     0x14200  bored         drmwq
>  97281  500218      0      0  3     0x14200  bored         sensors
>  31343  314304      0      0  3  0x40014200  acpi0         acpi0
>  75414  439743      0      0  3  0x40014200                idle1
>  33809  303054      0      0  7     0x14200                softnet
>   3069  109594      0      0  3     0x14200  bored         systqmp
>  62451  272774      0      0  3     0x14200  bored         systq
>  34597  129188      0      0  3  0x40014200  bored         softclock
>  88092  315407      0      0  3  0x40014200                idle0
>  29326  170399      0      0  3     0x14200  bored         smr
>  61127  520207      0      0  3     0x14200  kmalloc       kmthread
>      1  516534      0      0  3        0x82  wait          init
>      0       0     -1      0  3     0x10200  scheduler     swapper
> ddb{1}> trace
> kqueue_scan(d3453dd4,40,60aa5800,f5cbbab0,d3248a24,f5cbbaac) at
> kqueue_scan+0x246
> sys_kevent(d3248a24,f5cbbbd0,f5cbbbc8) at sys_kevent+0x24f
> syscall(f5cbbc10) at syscall+0x25e
> Xsyscall_untramp(2b,200246,cf7f288c,33,0) at Xsyscall_untramp+0xa9
> end of kernel
> ddb{1}> mach ddbcpu 0
> ddb{0}> trace
> db_enter() at db_enter+0x4
> i386_ipi_handler() at i386_ipi_handler+0x3f
> Xipi_untramp(50,200297,d365af40,0,f534e9f4) at Xipi_untramp+0xc2
> ___mp_lock(d0df41a0) at ___mp_lock+0x52
> intr_handler(f534ea04,d365af40) at intr_handler+0x1d
> Xintr_ioapic3_untramp(50,200297,0,1,f534ea74) at Xintr_ioapic3_untramp+0xdf
> _kernel_lock(d0bbfa80,52) at _kernel_lock+0x8e
> softintr_dispatch(0) at softintr_dispatch+0x18
> Xsoftclock(8,200297,d32abad0,d32aba84,f534eae0) at Xsoftclock+0x12
> _kernel_lock(d0b71e8b,19a) at _kernel_lock+0x82
> sowakeup(d32aba84,d32abad0) at sowakeup+0x88
> sorwakeup(d32aba84) at sorwakeup+0x53
> tcp_input(f534ec5c,f534ec58,6,2) at tcp_input+0x33a5
> ip_deliver(f534ec5c,f534ec58,6,2) at ip_deliver+0x224
> ip_local(f534ec5c,f534ec58,d02af0da,0) at ip_local+0x142
> ipintr() at ipintr+0x58
> if_netisr(0) at if_netisr+0x65
> taskq_thread(d3501040) at
>
>
> Hope this help!
>
> Cheers,
>
> --
> Olivier

Reply | Threaded
Open this post in threaded view
|

Re: panic: Stopped at kqueue_scan

Martin Pieuchot
On 26/04/19(Fri) 11:23, Olivier Antoine wrote:
> Hi, could it be that the problem is related to pipex?
> My npppd.conf rely on the default pipex settings (on)
> But net.pipex.enable is 0 since I have no explicit settings in my
> /etc/sysctl.conf
> Under these conditions, I have no difficulty to crash the machine.
> With net.pipex.enable=1 I have not yet noticed any problem.

This might help trigger the bug.  Do you have a receipt to reproduce it
at 100%?  Does it only happens with a MP kernel?  Or can you reproduce
it in vmm(4) for example?

Reply | Threaded
Open this post in threaded view
|

Re: panic: Stopped at kqueue_scan

Olivier ANTOINE-2
With bsd.mp, net.pipex.enable=0 I just browse the web with my
LineageOS 7.1.2 throught L2TP/IPSec PSK VPN.
It doesn't seems to bug with bsd.sp

My npppd.conf :
# $OpenBSD: npppd.conf,v 1.2 2014/03/22 04:32:39 yasuoka Exp $
# sample npppd configuration file.  see npppd.conf(5)

authentication LOCAL type local {
        users-file "/etc/npppd/npppd-users"
}

tunnel L2TP protocol l2tp {
        listen on 0.0.0.0
#       pipex yes
        lcp-keepalive yes
        lcp-keepalive-interval 60
}

ipcp IPCP {
        pool-address 10.0.0.2-10.0.0.14
        dns-servers 8.8.8.8
}

# use tun(4) interface.  multiple ppp sessions concentrate one interface.
interface tun0  address 10.0.0.1 ipcp IPCP
bind tunnel from L2TP authenticated by LOCAL to tun0


On Sun, Apr 28, 2019 at 8:02 PM Martin Pieuchot <[hidden email]> wrote:

>
> On 26/04/19(Fri) 11:23, Olivier Antoine wrote:
> > Hi, could it be that the problem is related to pipex?
> > My npppd.conf rely on the default pipex settings (on)
> > But net.pipex.enable is 0 since I have no explicit settings in my
> > /etc/sysctl.conf
> > Under these conditions, I have no difficulty to crash the machine.
> > With net.pipex.enable=1 I have not yet noticed any problem.
>
> This might help trigger the bug.  Do you have a receipt to reproduce it
> at 100%?  Does it only happens with a MP kernel?  Or can you reproduce
> it in vmm(4) for example?

Reply | Threaded
Open this post in threaded view
|

Re: panic: Stopped at kqueue_scan

Martin Pieuchot
In reply to this post by Olivier ANTOINE-2
On 23/04/19(Tue) 12:16, Olivier Antoine wrote:

> >Synopsis:    panic: Stopped at kqueue_scan
> >Category:    kernel i386
> >Environment:
>     System      : OpenBSD 6.5
>     Details     : OpenBSD 6.5-current (GENERIC.MP) #1368: Sun Apr 21
> 19:50:46 MDT 2019
>              [hidden email]:/usr/src/sys/arch/i386/compile/GENERIC.MP
>
>     Architecture: OpenBSD.i386
>     Machine     : i386
> >Description:
> Hi, since my last update I have regular panic crashes. 4 in two days.
> At least 3 of them, with certainty, occurred while I was accessing the
> Internet via my smartphone connected to my OpenBSD WiFi access point
> through my Allways-on VPN isakmp/ipsec/nppp relaying traffic in Tor.
> This setup works for years.
>
> The machine then displays something like:
> uvm_fault(0xd34e5f3c, 0x0, 0, 2) -> e
> kernel: page fault trap, code=0
> Stopped at kqueue_scan+0x246: movl %eax,0(%ecx)
> ddb{1}>

So this indicates that the `kqueue' is empty.  It should not happen
because the caller, in your case npppd, always places a marker in the
list.

Since the caller is not threaded and the syscall is executed with the
KERNEL_LOCK() held, we can supposed that another part of the kernel is
removing the marker.  That would imply that the other part isn't running
with the KERNEL_LOCK() and requires a MP kernel.

Could you try *very hard* to reproduce the problem with a kernel built
with the diff below?  Hopefully you'll make it crash and we'll find the
bug.  Otherwise we'll look for another possible cause of the marker
removal.

Index: kern/kern_event.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_event.c,v
retrieving revision 1.101
diff -u -p -r1.101 kern_event.c
--- kern/kern_event.c 27 Nov 2018 15:52:50 -0000 1.101
+++ kern/kern_event.c 28 Apr 2019 21:47:25 -0000
@@ -1052,6 +1052,8 @@ knote_drop(struct knote *kn, struct proc
  struct kqueue *kq = kn->kn_kq;
  struct klist *list;
 
+ KERNEL_ASSERT_LOCKED();
+
  if (kn->kn_fop->f_isfd)
  list = &kq->kq_knlist[kn->kn_id];
  else

Reply | Threaded
Open this post in threaded view
|

Re: panic: Stopped at kqueue_scan

David Gwynne-5
On Sun, Apr 28, 2019 at 06:57:02PM -0300, Martin Pieuchot wrote:

> On 23/04/19(Tue) 12:16, Olivier Antoine wrote:
> > >Synopsis:    panic: Stopped at kqueue_scan
> > >Category:    kernel i386
> > >Environment:
> >     System      : OpenBSD 6.5
> >     Details     : OpenBSD 6.5-current (GENERIC.MP) #1368: Sun Apr 21
> > 19:50:46 MDT 2019
> >              [hidden email]:/usr/src/sys/arch/i386/compile/GENERIC.MP
> >
> >     Architecture: OpenBSD.i386
> >     Machine     : i386
> > >Description:
> > Hi, since my last update I have regular panic crashes. 4 in two days.
> > At least 3 of them, with certainty, occurred while I was accessing the
> > Internet via my smartphone connected to my OpenBSD WiFi access point
> > through my Allways-on VPN isakmp/ipsec/nppp relaying traffic in Tor.
> > This setup works for years.
> >
> > The machine then displays something like:
> > uvm_fault(0xd34e5f3c, 0x0, 0, 2) -> e
> > kernel: page fault trap, code=0
> > Stopped at kqueue_scan+0x246: movl %eax,0(%ecx)
> > ddb{1}>
>
> So this indicates that the `kqueue' is empty.  It should not happen
> because the caller, in your case npppd, always places a marker in the
> list.
>
> Since the caller is not threaded and the syscall is executed with the
> KERNEL_LOCK() held, we can supposed that another part of the kernel is
> removing the marker.  That would imply that the other part isn't running
> with the KERNEL_LOCK() and requires a MP kernel.
>
> Could you try *very hard* to reproduce the problem with a kernel built
> with the diff below?  Hopefully you'll make it crash and we'll find the
> bug.  Otherwise we'll look for another possible cause of the marker
> removal.
>
> Index: kern/kern_event.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/kern_event.c,v
> retrieving revision 1.101
> diff -u -p -r1.101 kern_event.c
> --- kern/kern_event.c 27 Nov 2018 15:52:50 -0000 1.101
> +++ kern/kern_event.c 28 Apr 2019 21:47:25 -0000
> @@ -1052,6 +1052,8 @@ knote_drop(struct knote *kn, struct proc
>   struct kqueue *kq = kn->kn_kq;
>   struct klist *list;
>  
> + KERNEL_ASSERT_LOCKED();
> +
>   if (kn->kn_fop->f_isfd)
>   list = &kq->kq_knlist[kn->kn_id];
>   else
>

i had a similar diff in my tree, and with some clues from this thread
and the same panic from jmc@, points out that tun(4) calls tun_wakeup
without KERNEL_LOCK. tun_wakeup calls selwakeup, which ends up in
the kq code messing up the kn_head list. fun fun.

below is an extremely clever diff to tun to avoid doing the wakeup
without the kernel lock held. i say extremely clever because tun(4) is
not marked as MPSAFE, which means the if_start handler gets called with
KERNEL_LOCK taken by the stack on its behalf. tun_start then calls
tun_wakeup with that implicit KERNEL_LOCK hold.

i don't know why this hasn't blown up before. the network stack
hasn't been run with the KERNEL_LOCK for ages now.

tun_output with a custom if_enqueue handler would be a lot smarter,
but that is a more invasive diff.

ok?

Index: net/if_tun.c
===================================================================
RCS file: /cvs/src/sys/net/if_tun.c,v
retrieving revision 1.184
diff -u -p -r1.184 if_tun.c
--- net/if_tun.c 3 Feb 2019 23:04:49 -0000 1.184
+++ net/if_tun.c 29 Apr 2019 07:14:46 -0000
@@ -576,7 +576,6 @@ tun_output(struct ifnet *ifp, struct mbu
  return (error);
  }
 
- tun_wakeup(tp);
  return (0);
 }
 
Index: kern/kern_event.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_event.c,v
retrieving revision 1.101
diff -u -p -r1.101 kern_event.c
--- kern/kern_event.c 27 Nov 2018 15:52:50 -0000 1.101
+++ kern/kern_event.c 29 Apr 2019 07:14:46 -0000
@@ -1072,6 +1072,7 @@ knote_enqueue(struct knote *kn)
  struct kqueue *kq = kn->kn_kq;
  int s = splhigh();
 
+ KERNEL_ASSERT_LOCKED();
  KASSERT((kn->kn_status & KN_QUEUED) == 0);
 
  TAILQ_INSERT_TAIL(&kq->kq_head, kn, kn_tqe);
@@ -1089,6 +1090,7 @@ knote_dequeue(struct knote *kn)
 
  KASSERT(kn->kn_status & KN_QUEUED);
 
+ KERNEL_ASSERT_LOCKED();
  TAILQ_REMOVE(&kq->kq_head, kn, kn_tqe);
  kn->kn_status &= ~KN_QUEUED;
  kq->kq_count--;
tun_wakeup

Reply | Threaded
Open this post in threaded view
|

Re: panic: Stopped at kqueue_scan

Martin Pieuchot
On 29/04/19(Mon) 17:24, David Gwynne wrote:

> On Sun, Apr 28, 2019 at 06:57:02PM -0300, Martin Pieuchot wrote:
> > On 23/04/19(Tue) 12:16, Olivier Antoine wrote:
> > > >Synopsis:    panic: Stopped at kqueue_scan
> > > >Category:    kernel i386
> > > >Environment:
> > >     System      : OpenBSD 6.5
> > >     Details     : OpenBSD 6.5-current (GENERIC.MP) #1368: Sun Apr 21
> > > 19:50:46 MDT 2019
> > >              [hidden email]:/usr/src/sys/arch/i386/compile/GENERIC.MP
> > >
> > >     Architecture: OpenBSD.i386
> > >     Machine     : i386
> > > >Description:
> > > Hi, since my last update I have regular panic crashes. 4 in two days.
> > > At least 3 of them, with certainty, occurred while I was accessing the
> > > Internet via my smartphone connected to my OpenBSD WiFi access point
> > > through my Allways-on VPN isakmp/ipsec/nppp relaying traffic in Tor.
> > > This setup works for years.
> > >
> > > The machine then displays something like:
> > > uvm_fault(0xd34e5f3c, 0x0, 0, 2) -> e
> > > kernel: page fault trap, code=0
> > > Stopped at kqueue_scan+0x246: movl %eax,0(%ecx)
> > > ddb{1}>
> >
> > So this indicates that the `kqueue' is empty.  It should not happen
> > because the caller, in your case npppd, always places a marker in the
> > list.
> >
> > Since the caller is not threaded and the syscall is executed with the
> > KERNEL_LOCK() held, we can supposed that another part of the kernel is
> > removing the marker.  That would imply that the other part isn't running
> > with the KERNEL_LOCK() and requires a MP kernel.
> >
> > Could you try *very hard* to reproduce the problem with a kernel built
> > with the diff below?  Hopefully you'll make it crash and we'll find the
> > bug.  Otherwise we'll look for another possible cause of the marker
> > removal.
> >
> > Index: kern/kern_event.c
> > ===================================================================
> > RCS file: /cvs/src/sys/kern/kern_event.c,v
> > retrieving revision 1.101
> > diff -u -p -r1.101 kern_event.c
> > --- kern/kern_event.c 27 Nov 2018 15:52:50 -0000 1.101
> > +++ kern/kern_event.c 28 Apr 2019 21:47:25 -0000
> > @@ -1052,6 +1052,8 @@ knote_drop(struct knote *kn, struct proc
> >   struct kqueue *kq = kn->kn_kq;
> >   struct klist *list;
> >  
> > + KERNEL_ASSERT_LOCKED();
> > +
> >   if (kn->kn_fop->f_isfd)
> >   list = &kq->kq_knlist[kn->kn_id];
> >   else
> >
>
> i had a similar diff in my tree, and with some clues from this thread
> and the same panic from jmc@, points out that tun(4) calls tun_wakeup
> without KERNEL_LOCK. tun_wakeup calls selwakeup, which ends up in
> the kq code messing up the kn_head list. fun fun.
>
> below is an extremely clever diff to tun to avoid doing the wakeup
> without the kernel lock held. i say extremely clever because tun(4) is
> not marked as MPSAFE, which means the if_start handler gets called with
> KERNEL_LOCK taken by the stack on its behalf. tun_start then calls
> tun_wakeup with that implicit KERNEL_LOCK hold.
>
> i don't know why this hasn't blown up before. the network stack
> hasn't been run with the KERNEL_LOCK for ages now.
>
> tun_output with a custom if_enqueue handler would be a lot smarter,
> but that is a more invasive diff.

I'd prefer if we could grab the KERNEK_LOCK() around csignal() and
selwakeup() like it is done in the socket code.  I believe that we
need to show where the lock is taken in order to help people realize
where it needs to be pushed down.

Can we solve both issues with a similar fix?  How hard can it be to get
selwakeup() out of the KERNEL_LOCK()?

Should we add a KASSERT in csignal() too?

> Index: net/if_tun.c
> ===================================================================
> RCS file: /cvs/src/sys/net/if_tun.c,v
> retrieving revision 1.184
> diff -u -p -r1.184 if_tun.c
> --- net/if_tun.c 3 Feb 2019 23:04:49 -0000 1.184
> +++ net/if_tun.c 29 Apr 2019 07:14:46 -0000
> @@ -576,7 +576,6 @@ tun_output(struct ifnet *ifp, struct mbu
>   return (error);
>   }
>  
> - tun_wakeup(tp);
>   return (0);
>  }
>  
> Index: kern/kern_event.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/kern_event.c,v
> retrieving revision 1.101
> diff -u -p -r1.101 kern_event.c
> --- kern/kern_event.c 27 Nov 2018 15:52:50 -0000 1.101
> +++ kern/kern_event.c 29 Apr 2019 07:14:46 -0000
> @@ -1072,6 +1072,7 @@ knote_enqueue(struct knote *kn)
>   struct kqueue *kq = kn->kn_kq;
>   int s = splhigh();
>  
> + KERNEL_ASSERT_LOCKED();
>   KASSERT((kn->kn_status & KN_QUEUED) == 0);
>  
>   TAILQ_INSERT_TAIL(&kq->kq_head, kn, kn_tqe);
> @@ -1089,6 +1090,7 @@ knote_dequeue(struct knote *kn)
>  
>   KASSERT(kn->kn_status & KN_QUEUED);
>  
> + KERNEL_ASSERT_LOCKED();
>   TAILQ_REMOVE(&kq->kq_head, kn, kn_tqe);
>   kn->kn_status &= ~KN_QUEUED;
>   kq->kq_count--;
> tun_wakeup

Reply | Threaded
Open this post in threaded view
|

Re: panic: Stopped at kqueue_scan

Olivier ANTOINE-2
In reply to this post by Martin Pieuchot
With this diff, either it crash the same way as before either it
freeze completely, no output, no keyboard.
I tried a dozen times without success.

Reply | Threaded
Open this post in threaded view
|

Re: panic: Stopped at kqueue_scan

Olivier ANTOINE-2
In reply to this post by David Gwynne-5
With this diff, I haven't had a crash yet.
Usually I plant in less than 2 minutes, with this patch, after 30
intensive minutes it still holds.

On Mon, Apr 29, 2019 at 9:24 AM David Gwynne <[hidden email]> wrote:

>
> On Sun, Apr 28, 2019 at 06:57:02PM -0300, Martin Pieuchot wrote:
> > On 23/04/19(Tue) 12:16, Olivier Antoine wrote:
> > > >Synopsis:    panic: Stopped at kqueue_scan
> > > >Category:    kernel i386
> > > >Environment:
> > >     System      : OpenBSD 6.5
> > >     Details     : OpenBSD 6.5-current (GENERIC.MP) #1368: Sun Apr 21
> > > 19:50:46 MDT 2019
> > >              [hidden email]:/usr/src/sys/arch/i386/compile/GENERIC.MP
> > >
> > >     Architecture: OpenBSD.i386
> > >     Machine     : i386
> > > >Description:
> > > Hi, since my last update I have regular panic crashes. 4 in two days.
> > > At least 3 of them, with certainty, occurred while I was accessing the
> > > Internet via my smartphone connected to my OpenBSD WiFi access point
> > > through my Allways-on VPN isakmp/ipsec/nppp relaying traffic in Tor.
> > > This setup works for years.
> > >
> > > The machine then displays something like:
> > > uvm_fault(0xd34e5f3c, 0x0, 0, 2) -> e
> > > kernel: page fault trap, code=0
> > > Stopped at kqueue_scan+0x246: movl %eax,0(%ecx)
> > > ddb{1}>
> >
> > So this indicates that the `kqueue' is empty.  It should not happen
> > because the caller, in your case npppd, always places a marker in the
> > list.
> >
> > Since the caller is not threaded and the syscall is executed with the
> > KERNEL_LOCK() held, we can supposed that another part of the kernel is
> > removing the marker.  That would imply that the other part isn't running
> > with the KERNEL_LOCK() and requires a MP kernel.
> >
> > Could you try *very hard* to reproduce the problem with a kernel built
> > with the diff below?  Hopefully you'll make it crash and we'll find the
> > bug.  Otherwise we'll look for another possible cause of the marker
> > removal.
> >
> > Index: kern/kern_event.c
> > ===================================================================
> > RCS file: /cvs/src/sys/kern/kern_event.c,v
> > retrieving revision 1.101
> > diff -u -p -r1.101 kern_event.c
> > --- kern/kern_event.c 27 Nov 2018 15:52:50 -0000      1.101
> > +++ kern/kern_event.c 28 Apr 2019 21:47:25 -0000
> > @@ -1052,6 +1052,8 @@ knote_drop(struct knote *kn, struct proc
> >       struct kqueue *kq = kn->kn_kq;
> >       struct klist *list;
> >
> > +     KERNEL_ASSERT_LOCKED();
> > +
> >       if (kn->kn_fop->f_isfd)
> >               list = &kq->kq_knlist[kn->kn_id];
> >       else
> >
>
> i had a similar diff in my tree, and with some clues from this thread
> and the same panic from jmc@, points out that tun(4) calls tun_wakeup
> without KERNEL_LOCK. tun_wakeup calls selwakeup, which ends up in
> the kq code messing up the kn_head list. fun fun.
>
> below is an extremely clever diff to tun to avoid doing the wakeup
> without the kernel lock held. i say extremely clever because tun(4) is
> not marked as MPSAFE, which means the if_start handler gets called with
> KERNEL_LOCK taken by the stack on its behalf. tun_start then calls
> tun_wakeup with that implicit KERNEL_LOCK hold.
>
> i don't know why this hasn't blown up before. the network stack
> hasn't been run with the KERNEL_LOCK for ages now.
>
> tun_output with a custom if_enqueue handler would be a lot smarter,
> but that is a more invasive diff.
>
> ok?
>
> Index: net/if_tun.c
> ===================================================================
> RCS file: /cvs/src/sys/net/if_tun.c,v
> retrieving revision 1.184
> diff -u -p -r1.184 if_tun.c
> --- net/if_tun.c        3 Feb 2019 23:04:49 -0000       1.184
> +++ net/if_tun.c        29 Apr 2019 07:14:46 -0000
> @@ -576,7 +576,6 @@ tun_output(struct ifnet *ifp, struct mbu
>                 return (error);
>         }
>
> -       tun_wakeup(tp);
>         return (0);
>  }
>
> Index: kern/kern_event.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/kern_event.c,v
> retrieving revision 1.101
> diff -u -p -r1.101 kern_event.c
> --- kern/kern_event.c   27 Nov 2018 15:52:50 -0000      1.101
> +++ kern/kern_event.c   29 Apr 2019 07:14:46 -0000
> @@ -1072,6 +1072,7 @@ knote_enqueue(struct knote *kn)
>         struct kqueue *kq = kn->kn_kq;
>         int s = splhigh();
>
> +       KERNEL_ASSERT_LOCKED();
>         KASSERT((kn->kn_status & KN_QUEUED) == 0);
>
>         TAILQ_INSERT_TAIL(&kq->kq_head, kn, kn_tqe);
> @@ -1089,6 +1090,7 @@ knote_dequeue(struct knote *kn)
>
>         KASSERT(kn->kn_status & KN_QUEUED);
>
> +       KERNEL_ASSERT_LOCKED();
>         TAILQ_REMOVE(&kq->kq_head, kn, kn_tqe);
>         kn->kn_status &= ~KN_QUEUED;
>         kq->kq_count--;
> tun_wakeup

Reply | Threaded
Open this post in threaded view
|

Re: panic: Stopped at kqueue_scan

David Gwynne-5
In reply to this post by Martin Pieuchot


> On 30 Apr 2019, at 03:24, Martin Pieuchot <[hidden email]> wrote:
>
> On 29/04/19(Mon) 17:24, David Gwynne wrote:
>> On Sun, Apr 28, 2019 at 06:57:02PM -0300, Martin Pieuchot wrote:
>>> On 23/04/19(Tue) 12:16, Olivier Antoine wrote:
>>>>> Synopsis:    panic: Stopped at kqueue_scan
>>>>> Category:    kernel i386
>>>>> Environment:
>>>>    System      : OpenBSD 6.5
>>>>    Details     : OpenBSD 6.5-current (GENERIC.MP) #1368: Sun Apr 21
>>>> 19:50:46 MDT 2019
>>>>             [hidden email]:/usr/src/sys/arch/i386/compile/GENERIC.MP
>>>>
>>>>    Architecture: OpenBSD.i386
>>>>    Machine     : i386
>>>>> Description:
>>>> Hi, since my last update I have regular panic crashes. 4 in two days.
>>>> At least 3 of them, with certainty, occurred while I was accessing the
>>>> Internet via my smartphone connected to my OpenBSD WiFi access point
>>>> through my Allways-on VPN isakmp/ipsec/nppp relaying traffic in Tor.
>>>> This setup works for years.
>>>>
>>>> The machine then displays something like:
>>>> uvm_fault(0xd34e5f3c, 0x0, 0, 2) -> e
>>>> kernel: page fault trap, code=0
>>>> Stopped at kqueue_scan+0x246: movl %eax,0(%ecx)
>>>> ddb{1}>
>>>
>>> So this indicates that the `kqueue' is empty.  It should not happen
>>> because the caller, in your case npppd, always places a marker in the
>>> list.
>>>
>>> Since the caller is not threaded and the syscall is executed with the
>>> KERNEL_LOCK() held, we can supposed that another part of the kernel is
>>> removing the marker.  That would imply that the other part isn't running
>>> with the KERNEL_LOCK() and requires a MP kernel.
>>>
>>> Could you try *very hard* to reproduce the problem with a kernel built
>>> with the diff below?  Hopefully you'll make it crash and we'll find the
>>> bug.  Otherwise we'll look for another possible cause of the marker
>>> removal.
>>>
>>> Index: kern/kern_event.c
>>> ===================================================================
>>> RCS file: /cvs/src/sys/kern/kern_event.c,v
>>> retrieving revision 1.101
>>> diff -u -p -r1.101 kern_event.c
>>> --- kern/kern_event.c 27 Nov 2018 15:52:50 -0000 1.101
>>> +++ kern/kern_event.c 28 Apr 2019 21:47:25 -0000
>>> @@ -1052,6 +1052,8 @@ knote_drop(struct knote *kn, struct proc
>>> struct kqueue *kq = kn->kn_kq;
>>> struct klist *list;
>>>
>>> + KERNEL_ASSERT_LOCKED();
>>> +
>>> if (kn->kn_fop->f_isfd)
>>> list = &kq->kq_knlist[kn->kn_id];
>>> else
>>>
>>
>> i had a similar diff in my tree, and with some clues from this thread
>> and the same panic from jmc@, points out that tun(4) calls tun_wakeup
>> without KERNEL_LOCK. tun_wakeup calls selwakeup, which ends up in
>> the kq code messing up the kn_head list. fun fun.
>>
>> below is an extremely clever diff to tun to avoid doing the wakeup
>> without the kernel lock held. i say extremely clever because tun(4) is
>> not marked as MPSAFE, which means the if_start handler gets called with
>> KERNEL_LOCK taken by the stack on its behalf. tun_start then calls
>> tun_wakeup with that implicit KERNEL_LOCK hold.
>>
>> i don't know why this hasn't blown up before. the network stack
>> hasn't been run with the KERNEL_LOCK for ages now.
>>
>> tun_output with a custom if_enqueue handler would be a lot smarter,
>> but that is a more invasive diff.
>
> I'd prefer if we could grab the KERNEK_LOCK() around csignal() and
> selwakeup() like it is done in the socket code.  I believe that we
> need to show where the lock is taken in order to help people realize
> where it needs to be pushed down.

I should have been more clear, but this diff is to fix the tree, it's by no means a final fix. tun(4) itself could do with some further changes, including the ones you describe above.

If you prefer I could just add KERNEL_LOCK to tun_wakeup?

> Can we solve both issues with a similar fix?  How hard can it be to get
> selwakeup() out of the KERNEL_LOCK()?

Figuring that out was going to go onto my todo.

> Should we add a KASSERT in csignal() too?

If it currently needs the kernel lock then it should have the assert for it. Otherwise we have issues like these waiting days or weeks for fixes.

dlg

>
>> Index: net/if_tun.c
>> ===================================================================
>> RCS file: /cvs/src/sys/net/if_tun.c,v
>> retrieving revision 1.184
>> diff -u -p -r1.184 if_tun.c
>> --- net/if_tun.c 3 Feb 2019 23:04:49 -0000 1.184
>> +++ net/if_tun.c 29 Apr 2019 07:14:46 -0000
>> @@ -576,7 +576,6 @@ tun_output(struct ifnet *ifp, struct mbu
>> return (error);
>> }
>>
>> - tun_wakeup(tp);
>> return (0);
>> }
>>
>> Index: kern/kern_event.c
>> ===================================================================
>> RCS file: /cvs/src/sys/kern/kern_event.c,v
>> retrieving revision 1.101
>> diff -u -p -r1.101 kern_event.c
>> --- kern/kern_event.c 27 Nov 2018 15:52:50 -0000 1.101
>> +++ kern/kern_event.c 29 Apr 2019 07:14:46 -0000
>> @@ -1072,6 +1072,7 @@ knote_enqueue(struct knote *kn)
>> struct kqueue *kq = kn->kn_kq;
>> int s = splhigh();
>>
>> + KERNEL_ASSERT_LOCKED();
>> KASSERT((kn->kn_status & KN_QUEUED) == 0);
>>
>> TAILQ_INSERT_TAIL(&kq->kq_head, kn, kn_tqe);
>> @@ -1089,6 +1090,7 @@ knote_dequeue(struct knote *kn)
>>
>> KASSERT(kn->kn_status & KN_QUEUED);
>>
>> + KERNEL_ASSERT_LOCKED();
>> TAILQ_REMOVE(&kq->kq_head, kn, kn_tqe);
>> kn->kn_status &= ~KN_QUEUED;
>> kq->kq_count--;
>> tun_wakeup

Reply | Threaded
Open this post in threaded view
|

Re: panic: Stopped at kqueue_scan

Martin Pieuchot
On 30/04/19(Tue) 08:06, David Gwynne wrote:

>
>
> > On 30 Apr 2019, at 03:24, Martin Pieuchot <[hidden email]> wrote:
> >
> > On 29/04/19(Mon) 17:24, David Gwynne wrote:
> >> On Sun, Apr 28, 2019 at 06:57:02PM -0300, Martin Pieuchot wrote:
> >>> On 23/04/19(Tue) 12:16, Olivier Antoine wrote:
> >>>>> Synopsis:    panic: Stopped at kqueue_scan
> >>>>> Category:    kernel i386
> >>>>> Environment:
> >>>>    System      : OpenBSD 6.5
> >>>>    Details     : OpenBSD 6.5-current (GENERIC.MP) #1368: Sun Apr 21
> >>>> 19:50:46 MDT 2019
> >>>>             [hidden email]:/usr/src/sys/arch/i386/compile/GENERIC.MP
> >>>>
> >>>>    Architecture: OpenBSD.i386
> >>>>    Machine     : i386
> >>>>> Description:
> >>>> Hi, since my last update I have regular panic crashes. 4 in two days.
> >>>> At least 3 of them, with certainty, occurred while I was accessing the
> >>>> Internet via my smartphone connected to my OpenBSD WiFi access point
> >>>> through my Allways-on VPN isakmp/ipsec/nppp relaying traffic in Tor.
> >>>> This setup works for years.
> >>>>
> >>>> The machine then displays something like:
> >>>> uvm_fault(0xd34e5f3c, 0x0, 0, 2) -> e
> >>>> kernel: page fault trap, code=0
> >>>> Stopped at kqueue_scan+0x246: movl %eax,0(%ecx)
> >>>> ddb{1}>
> >>>
> >>> So this indicates that the `kqueue' is empty.  It should not happen
> >>> because the caller, in your case npppd, always places a marker in the
> >>> list.
> >>>
> >>> Since the caller is not threaded and the syscall is executed with the
> >>> KERNEL_LOCK() held, we can supposed that another part of the kernel is
> >>> removing the marker.  That would imply that the other part isn't running
> >>> with the KERNEL_LOCK() and requires a MP kernel.
> >>>
> >>> Could you try *very hard* to reproduce the problem with a kernel built
> >>> with the diff below?  Hopefully you'll make it crash and we'll find the
> >>> bug.  Otherwise we'll look for another possible cause of the marker
> >>> removal.
> >>>
> >>> Index: kern/kern_event.c
> >>> ===================================================================
> >>> RCS file: /cvs/src/sys/kern/kern_event.c,v
> >>> retrieving revision 1.101
> >>> diff -u -p -r1.101 kern_event.c
> >>> --- kern/kern_event.c 27 Nov 2018 15:52:50 -0000 1.101
> >>> +++ kern/kern_event.c 28 Apr 2019 21:47:25 -0000
> >>> @@ -1052,6 +1052,8 @@ knote_drop(struct knote *kn, struct proc
> >>> struct kqueue *kq = kn->kn_kq;
> >>> struct klist *list;
> >>>
> >>> + KERNEL_ASSERT_LOCKED();
> >>> +
> >>> if (kn->kn_fop->f_isfd)
> >>> list = &kq->kq_knlist[kn->kn_id];
> >>> else
> >>>
> >>
> >> i had a similar diff in my tree, and with some clues from this thread
> >> and the same panic from jmc@, points out that tun(4) calls tun_wakeup
> >> without KERNEL_LOCK. tun_wakeup calls selwakeup, which ends up in
> >> the kq code messing up the kn_head list. fun fun.
> >>
> >> below is an extremely clever diff to tun to avoid doing the wakeup
> >> without the kernel lock held. i say extremely clever because tun(4) is
> >> not marked as MPSAFE, which means the if_start handler gets called with
> >> KERNEL_LOCK taken by the stack on its behalf. tun_start then calls
> >> tun_wakeup with that implicit KERNEL_LOCK hold.
> >>
> >> i don't know why this hasn't blown up before. the network stack
> >> hasn't been run with the KERNEL_LOCK for ages now.
> >>
> >> tun_output with a custom if_enqueue handler would be a lot smarter,
> >> but that is a more invasive diff.
> >
> > I'd prefer if we could grab the KERNEK_LOCK() around csignal() and
> > selwakeup() like it is done in the socket code.  I believe that we
> > need to show where the lock is taken in order to help people realize
> > where it needs to be pushed down.
>
> I should have been more clear, but this diff is to fix the tree, it's by no means a final fix. tun(4) itself could do with some further changes, including the ones you describe above.
>
> If you prefer I could just add KERNEL_LOCK to tun_wakeup?

I'd prefer :)

> > Can we solve both issues with a similar fix?  How hard can it be to get
> > selwakeup() out of the KERNEL_LOCK()?
>
> Figuring that out was going to go onto my todo.

\o/
 
> > Should we add a KASSERT in csignal() too?
>
> If it currently needs the kernel lock then it should have the assert for it. Otherwise we have issues like these waiting days or weeks for fixes.

Go for it.