octeon: (reproducible) panic: assertwaitok: non-zero mutex count: 1; on 6.5-beta; erlite

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

octeon: (reproducible) panic: assertwaitok: non-zero mutex count: 1; on 6.5-beta; erlite

Abel Abraham Camarillo Ojeda-2
Hi to all,

I experienced this panic from time to time in 6.4 but was'n able to generate
a way to reproduce it on demand, sometimes my erlite took days to panic.

Right now, I noticed on 6.5-beta that sysmerge triggered on first run the
panic,
but after that I takes several runs to trigger it. Maybe it's not related?

Also,

ddb> boot reboot

most of the time doesn't do anything and I require physical access to the
erlite and
unplug / replug to mains. Is this normal? (also present in 6.4, 6.3)

Thanks to all



Looking for valid bootloader image....
Jumping to start of image at address 0xbfc80000


U-Boot 1.1.1 (UBNT Build ID: 4670715-gbd7e2d7) (Build time: May 27 2014 -
11:16:22)

BIST check passed.
UBNT_E100 r1:2, r2:18, f:4/71, serial #: 802AA88D35EF
MPR 13-00318-18
Core clock: 500 MHz, DDR clock: 266 MHz (532 Mhz data rate)
DRAM:  512 MB
Clearing DRAM....... done
Flash:  4 MB
Net:   octeth0, octeth1, octeth2

USB:   (port 0) scanning bus for devices...
      USB device not responding, giving up (status=0)
3 USB Devices found
       scanning bus for storage devices...
  Device 0: Vendor: Generic  Prod.: STORAGE DEVICE   Rev: 0272
            Type: Removable Hard Disk
            Capacity: 30159.0 MB = 29.4 GB (61765632 x 512)
 0
(Re)start USB...
USB:   (port 0) scanning bus for devices...
      USB device not responding, giving up (status=0)
3 USB Devices found
       scanning bus for storage devices...
  Device 0: Vendor: Generic  Prod.: STORAGE DEVICE   Rev: 0272
            Type: Removable Hard Disk
            Capacity: 30159.0 MB = 29.4 GB (61765632 x 512)
reading bsd
.............
.................

6002770 bytes read
argv[2]: numcores=2
ELF file is 64 bit
Allocating memory for ELF segment: addr: 0xffffffff81000000 (adjusted to:
0x1000000), size 0x5aace0
Allocated memory for ELF segment: addr: 0xffffffff81000000, size 0x5aace0
Processing PHDR 0
  Loading 51aa98 bytes at ffffffff81000000
  Clearing 90248 bytes at ffffffff8151aa98
## Loading Linux kernel with entry point: 0xffffffff81000000 ...
Bootloader: Done loading app on coremask: 0x3
bootmem desc 0x24108 version 3.0
avail phys mem 0x00000000001004d0 - 0x0000000000fffcf0
avail phys mem 0x00000000015aace0 - 0x0000000008100000
avail phys mem 0x0000000008100010 - 0x000000000fffdc00
avail phys mem 0x0000000410000000 - 0x000000041ff00000
Total DRAM Size 0x0000000020000000
mem_layout[0] page 0x0000000000000041 -> 0x00000000000003FF
mem_layout[1] page 0x000000000000056B -> 0x0000000000002040
mem_layout[2] page 0x0000000000002041 -> 0x0000000000003FFFInitial setup
done, switching console.
boot_desc->desc_ver:7
boot_desc->desc_size:400
boot_desc->stack_top:0
boot_desc->heap_start:0
boot_desc->heap_end:0
boot_desc->argc:3
boot_desc->flags:0x5
boot_desc->core_mask:0x3
boot_desc->dram_size:512
boot_desc->phy_mem_desc_addr:0
boot_desc->debugger_flag_addr:0xa44
boot_desc->eclock:500000000
boot_desc->boot_info_addr:0x100200
boot_info->ver_major:1
boot_info->ver_minor:2
boot_info->stack_top:0
boot_info->heap_start:0
boot_info->heap_end:0
boot_info->boot_desc_addr:0
boot_info->exception_base_addr:0x1000
boot_info->stack_size:0
boot_info->flags:0x5
boot_info->core_mask:0x3
boot_info->dram_size:512
boot_info->phys_mem_desc_addr:0x24108
boot_info->debugger_flags_addr:0
boot_info->eclock:500000000
boot_info->dclock:266000000
boot_info->board_type:20002
boot_info->board_rev_major:2
boot_info->board_rev_minor:18
boot_info->mac_addr_count:3
boot_info->cf_common_addr:0
boot_info->cf_attr_addr:0
boot_info->led_display_addr:0
boot_info->dfaclock:0
boot_info->config_flags:0x8
Copyright (c) 1982, 1986, 1989, 1991, 1993
        The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2019 OpenBSD. All rights reserved.
https://www.OpenBSD.org

OpenBSD 6.5-beta (GENERIC.MP) #0: Thu Feb 28 12:52:54 UTC 2019
    visa@octeon:/usr/src/sys/arch/octeon/compile/GENERIC.MP
real mem = 536870912 (512MB)
avail mem = 523747328 (499MB)
mainbus0 at root: board 20002 rev 2.18
cpu0 at mainbus0: CN50xx CPU rev 0.1 500 MHz, Software FP emulation
cpu0: cache L1-I 32KB 4 way D 16KB 64 way, L2 128KB 8 way
cpu1 at mainbus0: CN50xx CPU rev 0.1 500 MHz, Software FP emulation
cpu1: cache L1-I 32KB 4 way D 16KB 64 way, L2 128KB 8 way
clock0 at mainbus0: int 5
octcrypto0 at mainbus0
iobus0 at mainbus0
simplebus0 at iobus0: "soc"
octciu0 at simplebus0
cn30xxsmi0 at simplebus0
com0 at simplebus0: ns16550a, 64 byte fifo
com0: console
dwctwo0 at iobus0 base 0x1180068000000 irq 56
usb0 at dwctwo0: USB revision 2.0
uhub0 at usb0 configuration 1 interface 0 "Octeon DWC2 root hub" rev
2.00/1.00 addr 1
octrng0 at iobus0 base 0x1400000000000 irq 0
cn30xxgmx0 at iobus0 base 0x1180008000000
cnmac0 at cn30xxgmx0: RGMII, address 80:2a:a8:8d:35:ef
atphy0 at cnmac0 phy 7: AR8035 10/100/1000 PHY, rev. 2
cnmac1 at cn30xxgmx0: RGMII, address 80:2a:a8:8d:35:f0
atphy1 at cnmac1 phy 6: AR8035 10/100/1000 PHY, rev. 2
cnmac2 at cn30xxgmx0: RGMII, address 80:2a:a8:8d:35:f1
atphy2 at cnmac2 phy 5: AR8035 10/100/1000 PHY, rev. 2
/dev/ksyms: Symbol table not valid.
uhub1 at uhub0 port 1 configuration 1 interface 0 "GenesysLogic USB2.0 Hub"
rev 2.00/92.24 addr 2
umass0 at uhub1 port 1 configuration 1 interface 0 "Generic USB Storage"
rev 2.00/2.72 addr 3
umass0: using SCSI over Bulk-Only
scsibus0 at umass0: 2 targets, initiator 0
sd0 at scsibus0 targ 1 lun 0: <Generic, STORAGE DEVICE, 0272> SCSI0
0/direct removable serial.05e30736000000000272
sd0: 30159MB, 512 bytes/sector, 61765632 sectors
uhidev0 at uhub1 port 4 configuration 1 interface 0 "APC Back-UPS ES 550G
FW:843.K4 .D USB FW:K4" rev 1.10/1.06 addr 4
uhidev0: iclass 3/0, 123 report ids
upd0 at uhidev0
vscsi0 at root
scsibus1 at vscsi0: 256 targets
softraid0 at root
scsibus2 at softraid0: 256 targets
boot device: sd0
root on sd0a (7737cd38aada9108.a) swap on sd0b dump on sd0b
WARNING: / was not properly unmounted
WARNING: No TOD clock, believing file system.
WARNING: CHECK AND RESET THE DATE!
Automatic boot in progress: starting file system checks.
/dev/sd0a (7737cd38aada9108.a): INCORRECT BLOCK COUNT I=1169356 (4 should
be 0) (CORRECTED)
/dev/sd0a (7737cd38aada9108.a): INCORRECT BLOCK COUNT I=1169521 (8 should
be 0) (CORRECTED)
/dev/sd0a (7737cd38aada9108.a): INCORRECT BLOCK COUNT I=2442716 (8 should
be 0) (CORRECTED)
/dev/sd0a (7737cd38aada9108.a): INCORRECT BLOCK COUNT I=2443327 (16 should
be 0) (CORRECTED)
/dev/sd0a (7737cd38aada9108.a): UNREF FILE I=1169356  OWNER=root MODE=100644
/dev/sd0a: SIZE=0 MTIME=Mar  1 05:02 2019  (CLEARED)
/dev/sd0a (7737cd38aada9108.a): UNREF FILE I=1169521  OWNER=root MODE=100600
/dev/sd0a: SIZE=0 MTIME=Mar  1 05:02 2019  (CLEARED)
/dev/sd0a (7737cd38aada9108.a): LINK COUNT FILE I=1169579  OWNER=root
MODE=100644
/dev/sd0a: SIZE=3765 MTIME=Feb 28 10:05 2019  COUNT 2 SHOULD BE 1 (ADJUSTED)
/dev/sd0a (7737cd38aada9108.a): UNREF FILE I=2442716  OWNER=root MODE=100644
/dev/sd0a: SIZE=0 MTIME=Mar  1 05:02 2019  (CLEARED)
/dev/sd0a (7737cd38aada9108.a): UNREF FILE I=2443327  OWNER=root MODE=100644
/dev/sd0a: SIZE=0 MTIME=Mar  1 05:02 2019  (CLEARED)
/dev/sd0a (7737cd38aada9108.a): FREE BLK COUNT(S) WRONG IN SUPERBLK
(SALVAGED)
/dev/sd0a (7737cd38aada9108.a): SUMMARY INFORMATION BAD (SALVAGED)
/dev/sd0a (7737cd38aada9108.a): BLK(S) MISSING IN BIT MAPS (SALVAGED)
/dev/sd0a (7737cd38aada9108.a): 52926 files, 3306370 used, 10625884 free
(13388 frags, 1326562 blocks, 0.1% fragmentation)
/dev/sd0a (7737cd38aada9108.a): MARKING FILE SYSTEM CLEAN
pf enabled
starting network
cnmac2: no link....
cnmac2: bound to 100.100.0.41 from 100.100.0.1 (5a:02:71:92:cd:88)
starting early daemons: syslogd pflogd ntpd.
starting RPC daemons:.
savecore: /bsd: kvm_read: version misread
checking quotas: done.
kvm_mkdb: can't open /dev/ksyms
clearing /tmp
kern.securelevel: 0 -> 1
creating runtime link editor directory cache.
preserving editor files.
starting network daemons: sshd smtpd sndiod.
starting local daemons: cron.
Fri Mar  1 05:04:02 CST 2019

OpenBSD/octeon (fox.00z.us) (console)

login: root
Password:
Last login: Fri Mar  1 04:58:49 on console
OpenBSD 6.5-beta (GENERIC.MP) #0: Thu Feb 28 12:52:54 UTC 2019

Welcome to OpenBSD: The proactively secure Unix-like operating system.

Please use the sendbug(1) utility to report bugs in the system.
Before reporting a bug, please try to reproduce it with the latest
version of the code.  With bug reports, please try to ensure that
enough information to reproduce the problem is enclosed, and if a
known fix for it exists, include that as well.


fox#
fox#
fox# while true; do nohup sysmerge </dev/null; done
sending output to nohup.out
sending output to nohup.out
sending output to nohup.out
sending output to nohup.out
sending output to nohup.out
sending output to nohup.out
sending output to nohup.out
sending output to nohup.out

[... lots of runs, sometimes like 10, sometimes like 100-200]
sending output to nohup.out
sending output to nohup.out
sending output to nohup.out
sending output to nohup.out
sending output to nohup.out
sending output to nohup.out
sending output to nohup.out
sending output to nohup.out
sending output to nohup.out
sending output to nohup.out
panic: assertwaitok: non-zero mutex count: 1
Stopped at      0xffffffff813d9b8c:     jr      ra
0xffffffff813d9b90:      nop
    TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
*328985  76286      0         0x3          0    0K mtree
0xffffffff813d9b88 (f040d88df76b5718,9001070000000208,208,0)  ra
0xffffffff8130
cc38 sp 0x980000000e08f2f0, sz 0
0xffffffff8130c9f8 (f040d88df76b5718,9001070000000208,208,0)  ra 0x0 sp
0x98000
0000e08f2f0, sz 0
User-level: pid 76286
https://www.openbsd.org/ddb.html describes the minimum info required in bug
reports.  Insufficient info makes it difficult to find and fix bugs.
ddb{0}> trace
0xffffffff813d9b88 (f040d88df76b5718,9001070000000208,208,0)  ra
0xffffffff8130
cc38 sp 0x980000000e08f2f0, sz 0
0xffffffff8130c9f8 (f040d88df76b5718,9001070000000208,208,0)  ra 0x0 sp
0x98000
0000e08f2f0, sz 0
User-level: pid 76286
ddb{0}> mach ddbcpu 1
Stopped at      0xffffffff813d9b8c:     jr      ra
0xffffffff813d9b90:      nop
0xffffffff813d9b88 (4,1,688,1)  ra 0xffffffff812b85d0 sp
0x980000000ffa7b48, sz
 0
0xffffffff812b85c8 (4,1,688,1)  ra 0x0 sp 0x980000000ffa7b48, sz 0
User-level: pid 36059
ddb{1}> ps
   PID     TID   PPID    UID  S       FLAGS  WAIT          COMMAND
 76286  328985  36686      0  7         0x3                mtree
 36686   37627  41488      0  3    0x10008b  pause         ksh
 41488  409393      1      0  3    0x10008b  pause         ksh
 11141  244902      1      0  3    0x100098  poll          cron
 54247  456241      1     99  3    0x100090  poll          sndiod
 62407  205834      1    110  3    0x100090  poll          sndiod
 52322   87926  34285     95  3    0x100092  kqread        smtpd
 11675  401244  34285    103  3    0x100092  kqread        smtpd
  9637  453038  34285     95  3    0x100092  kqread        smtpd
 80752  427165  34285     95  3    0x100092  kqread        smtpd
 74796  296951  34285     95  3    0x100092  kqread        smtpd
  7814  221942  34285     95  3    0x100092  kqread        smtpd
 34285  146305      1      0  3    0x100080  kqread        smtpd
 66349  254825      1      0  3        0x80  select        sshd
  7699  416645      1      0  3    0x100080  poll          ntpd
  5662  461400  85635     83  3    0x100092  poll          ntpd
 85635  412447      1     83  3    0x100092  poll          ntpd
 84089    5569  76003     74  3    0x100092  bpf           pflogd
 76003  309543      1      0  3        0x80  netio         pflogd
  6753  243115  65785     73  3    0x100090  kqread        syslogd
 65785   69103      1      0  3    0x100082  netio         syslogd
 38721  216563      1     77  3    0x100090  poll          dhclient
 25316   59598      1      0  3        0x80  poll          dhclient
 96685  175532   7840    115  3    0x100092  kqread        slaacd
 95265   42021   7840    115  3    0x100092  kqread        slaacd
  7840   71695      1      0  3    0x100080  kqread        slaacd
*36059  263503      0      0  7  0x40014200                idle1
 99016  153365      0      0  3     0x14200  pgzero        zerothread
 64649   47582      0      0  3     0x14200  aiodoned      aiodoned
 69660  502690      0      0  3     0x14200  syncer        update
 78018  302139      0      0  3     0x14200  cleaner       cleaner
 36281   13798      0      0  3     0x14200  reaper        reaper
 66877  255603      0      0  3     0x14200  pgdaemon      pagedaemon
 46235  162844      0      0  3     0x14200  bored         sensors
 12602   44662      0      0  3     0x14200  bored         crynlk
 87289  424959      0      0  3     0x14200  bored         crypto
  8862  142896      0      0  3     0x14200  usbtsk        usbtask
 12532   21825      0      0  3     0x14200  usbatsk       usbatsk
 74540  257569      0      0  3     0x14200  bored         dwc2
 32064  495552      0      0  3     0x14200  bored         softnet
 18598  116806      0      0  3     0x14200  bored         systqmp
 54587  184237      0      0  3     0x14200  bored         systq
 58911  109549      0      0  3  0x40014200  bored         softclock
 45522  111620      0      0  3  0x40014200                idle0
 97264  489073      0      0  3     0x14200  bored         smr
     1   84670      0      0  3        0x82  wait          init
     0       0     -1      0  3     0x10200  scheduler     swapper
ddb{1}>
Reply | Threaded
Open this post in threaded view
|

Re: octeon: (reproducible) panic: assertwaitok: non-zero mutex count: 1; on 6.5-beta; erlite

Visa Hankala-2
On Fri, Mar 01, 2019 at 03:12:41PM -0600, Abel Abraham Camarillo Ojeda wrote:
> panic: assertwaitok: non-zero mutex count: 1
> Stopped at      0xffffffff813d9b8c:     jr      ra
> 0xffffffff813d9b90:      nop
>     TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
> *328985  76286      0         0x3          0    0K mtree
> 0xffffffff813d9b88 (f040d88df76b5718,9001070000000208,208,0)  ra
> 0xffffffff8130

With help from Abel, I got the following stack trace from one of these
panics. It appears that dwctwo(4) incorrectly uses a blocking memory
allocation. The error gets uncovered by the upd(4) driver.

panic
__assertwaitok
malloc
dwc2_hcd_urb_alloc
dwc2_allocx
usbd_alloc_xfer
uhidev_get_report_async
upd_request_children
upd_update_report_cb
uhidev_get_report_async_cb
usb_transfer_complete
dwc2_softintr
softintr_dispatch
dosoftint
interrupt
k_intr
(interrupt)
pool_do_put
pool_put
namei
dofstatat
itsa
trap
u_general
(end of kernel stack)

The following patch makes the allocation non-blocking, so that
dwc2_allocx() behaves similarly to the other *_allocx() routines.
Abel has confirmed that this fixes the panic.

OK?

Index: dev/usb/dwc2/dwc2.c
===================================================================
RCS file: src/sys/dev/usb/dwc2/dwc2.c,v
retrieving revision 1.47
diff -u -p -r1.47 dwc2.c
--- dev/usb/dwc2/dwc2.c 11 Mar 2019 17:50:09 -0000 1.47
+++ dev/usb/dwc2/dwc2.c 13 Mar 2019 17:20:04 -0000
@@ -302,7 +302,11 @@ dwc2_allocx(struct usbd_bus *bus)
  memset(dxfer, 0, sizeof(*dxfer));
 
  dxfer->urb = dwc2_hcd_urb_alloc(sc->sc_hsotg,
-    DWC2_MAXISOCPACKETS, GFP_KERNEL);
+    DWC2_MAXISOCPACKETS, GFP_ATOMIC);
+ if (dxfer->urb == NULL) {
+ pool_put(&sc->sc_xferpool, dxfer);
+ return NULL;
+ }
 
 #ifdef DWC2_DEBUG
  dxfer->xfer.busy_free = XFER_ONQU;