arm64: bnxt(4) crash running pkg_add -u on mcbin single-shot

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

arm64: bnxt(4) crash running pkg_add -u on mcbin single-shot

Carlos Cardenas
Howdy.

After an upgrade, my mcbin-ss panic'ed while performing pkg_add -u.

Attached is the dmesg and ddb output(s).

+--+
Carlos

aethulwulf# pkg_add -u                                                  
quirks-3.24 signed on 2018-10-29T21:21:13Z
quirks-3.16->3.24: ok
curl-7.61.1:nghttp2-1.33.0->1.34.0: ok
curl-7.61.1->7.61.1: ok
install-info: warning: no entries found for
`/usr/local/info/autosprintf.info'; nothing deleted
gettext-0.19.8.1p1->0.19.8.1p3: ok
git-2.19.1 (extracting)|***********************                           |
46%panic: mtx 0xffffff8000b6cae8: locking against myself
Stopped at      panic+0x154:        TID    PID    UID     PRFLAGS     PFLAGS
C
PU  COMMAND
 445399  49133      0    0x100003       0x80    1  signify
 327068  13460      0        0x13          0    2K perl
*231470  74839      0     0x14000      0x200    0  zerothread
db_enter() at panic+0x150
panic() at uvm_pmr_getpages+0x19c
uvm_pmr_getpages() at km_alloc+0x2d8
km_alloc() at pool_multi_alloc+0xac
pool_multi_alloc() at m_pool_alloc+0xa4
m_pool_alloc() at pool_p_alloc+0x64
pool_p_alloc() at pool_do_get+0xe0
https://www.openbsd.org/ddb.html describes the minimum info required in bug
reports.  Insufficient info makes it difficult to find and fix bugs.

ddb{0}> show panic
mtx 0xffffff8000b6cae8: locking against myself
ddb{0}> trace
db_enter() at panic+0x150
panic() at uvm_pmr_getpages+0x19c
uvm_pmr_getpages() at km_alloc+0x2d8
km_alloc() at pool_multi_alloc+0xac
pool_multi_alloc() at m_pool_alloc+0xa4
m_pool_alloc() at pool_p_alloc+0x64
pool_p_alloc() at pool_do_get+0xe0
pool_do_get() at pool_get+0x90
pool_get() at m_clget+0x1a4
m_clget() at bnxt_rx_fill_slots+0x70
bnxt_rx_fill_slots() at bnxt_rx_fill+0x58
bnxt_rx_fill() at bnxt_intr+0x32c
bnxt_intr() at ampintc_irq_handler+0x120
ampintc_irq_handler() at handle_el1h_irq+0x6c
handle_el1h_irq() at _rb_nfind+0x48
_rb_nfind() at uvm_pmr_pnaddr+0x48
uvm_pmr_pnaddr() at uvm_pmr_insert_addr+0x6c
uvm_pmr_insert_addr() at uvm_pmr_remove_1strange+0x340
uvm_pmr_remove_1strange() at uvm_pagezero_thread+0x138
uvm_pagezero_thread() at proc_trampoline+0x10
ddb{0}> mach ddbcpu 1
Stopped at      ampintc_ipi_ddb+0x1c:   db_enter() at ampintc_ipi_ddb+0x18
ampintc_ipi_ddb() at handle_el1h_irq+0x6c
handle_el1h_irq() at mi_switch+0x27c
mi_switch() at sleep_finish+0x80
sleep_finish() at sleep_finish_all+0x24
sleep_finish_all() at tsleep+0xc8
tsleep() at pipe_write+0x2e4
ddb{1}> mach ddbcpu 2
Stopped at      ampintc_ipi_ddb+0x1c:   db_enter() at ampintc_ipi_ddb+0x18
ampintc_ipi_ddb() at handle_el1h_irq+0x6c
handle_el1h_irq() at pipe_read+0x18c
pipe_read() at dofilereadv+0x118
dofilereadv() at sys_read+0x64
sys_read() at svc_handler+0x264
svc_handler() at do_el0_sync+0x1b0
ddb{2}> mach ddbcpu 3
Stopped at      ampintc_ipi_ddb+0x1c:   db_enter() at ampintc_ipi_ddb+0x18
ampintc_ipi_ddb() at handle_el1h_irq+0x6c
handle_el1h_irq() at sched_idle+0x264
sched_idle() at proc_trampoline+0x10
ddb{0}> ps
   PID     TID   PPID    UID  S       FLAGS  WAIT          COMMAND
 49133  445399  13460      0  7    0x100083                signify
 73562  244961  13460     57  3    0x100083  pipewr        ftp
 13460  327068  73083      0  7        0x13                perl
 73083  145869      1      0  3    0x10008b  pause         ksh
 14480  429557      1      0  3    0x100098  poll          cron
 94239  269814  43573    720  3        0x90  kqread        lldpd
 43573    9590      1      0  3        0x80  netio         lldpd
 34010  511267      1     99  3    0x100090  poll          sndiod
 56442  292457      1    110  3    0x100090  poll          sndiod
 87524  516137  76220     95  3    0x100092  kqread        smtpd
 67451  517797  76220    103  3    0x100092  kqread        smtpd
 46139   58374  76220     95  3    0x100092  kqread        smtpd
 64211  416407  76220     95  3    0x100092  kqread        smtpd
 59703   56359  76220     95  3    0x100092  kqread        smtpd
 32430  162051  76220     95  3    0x100092  kqread        smtpd
 76220  338058      1      0  3    0x100080  kqread        smtpd
 43032  105890      1      0  3        0x80  select        sshd
 15205   45074  18882     83  3    0x100092  poll          ntpd
 18882  275774   3939     83  3    0x100092  poll          ntpd
  3939  197221      1      0  3    0x100080  poll          ntpd
 66576  427757  60591     74  3    0x100092  bpf           pflogd
 60591    6199      1      0  3        0x80  netio         pflogd
 13115  213653  49662     73  3    0x100090  kqread        syslogd
 49662  255195      1      0  3    0x100082  netio         syslogd
 52280  180117      1     77  3    0x100090  poll          dhclient
 82868  253718      1      0  3        0x80  poll          dhclient
   752  473909  89936    115  3    0x100092  kqread        slaacd
 28475  288910  89936    115  3    0x100092  kqread        slaacd
 89936  422111      1      0  3    0x100080  kqread        slaacd
*74839  231470      0      0  7     0x14200                zerothread
 97587  509880      0      0  3     0x14200  aiodoned      aiodoned
 12018  504243      0      0  3     0x14200  syncer        update
 29346  114311      0      0  3     0x14200  cleaner       cleaner
  3201  164514      0      0  3     0x14200  reaper        reaper
 74165  156963      0      0  3     0x14200  pgdaemon      pagedaemon
 97256  255669      0      0  3     0x14200  bored         crynlk
 25774  193955      0      0  3     0x14200  bored         crypto
 82385  500927      0      0  7  0x40014200                idle3
 51930  221918      0      0  3  0x40014200                idle2
 99556  361074      0      0  3  0x40014200                idle1
 36698   79862      0      0  3     0x14200  usbtsk        usbtask
 87822   63054      0      0  3     0x14200  usbatsk       usbatsk
 59815  433532      0      0  3     0x14200  bored         sensors
 12031  445539      0      0  3     0x14200  bored         softnet
 45842  278094      0      0  3     0x14200  bored         systqmp
 55708  324286      0      0  3     0x14200  bored         systq
 52554  311327      0      0  3  0x40014200  bored         softclock
 62638  159321      0      0  3  0x40014200                idle0
 66053  474648      0      0  3     0x14200  kmalloc       kmthread
     1  203595      0      0  3        0x82  wait          init
     0       0     -1      0  3     0x10200  scheduler     swapper
ddb{0}> show uvm
Current UVM status:
  pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
  4036954 VM pages: 15691 active, 31617 inactive, 0 wired, 3898311 free
(487279
 zero)
  min  10% (25) anon, 10% (25) vnode, 5% (12) vtext
  freemin=134565, free-target=179420, inactive-target=0, wired-max=1345651
  faults=350301, traps=0, intrs=0, ctxswitch=71731 fpuswitch=0
  softint=4204, syscalls=258464, kmapent=14
  fault counts:
    noram=0, noanon=0, noamap=0, pgwait=0, pgrele=0
    ok relocks(total)=52530(52531), anget(retries)=113181(0), amapcopy=105599
    neighbor anon/obj pg=11213/93307, gets(lock/unlock)=127729/52531
    cases: anon=94876, anoncow=18305, obj=121106, prcopy=6622, przero=109384
  daemon and swap counts:
    woke=0, revs=0, scans=0, obscans=0, anscans=0
    busy=0, freed=0, reactivate=0, deactivate=0
    pageouts=0, pending=0, nswget=0
    nswapdev=1
    swpages=1050249, swpginuse=0, swpgonly=0 paging=0
  kernel pointers:
    objs(kern)=0xffffff8000aa9128
ddb{0}> show bcstats
Current Buffer Cache status:
numbufs 17692 busymapped 0, delwri 129
kvaslots 6553 avail kva slots 6553
bufpages 70695, dmapages 70695, dirtypages 516
pendingreads 1, pendingwrites 0
highflips 0, highflops 0, dmaflips 0
ddb{0}> dmesg
OpenBSD 6.4-current (GENERIC.MP) #271: Sat Nov 17 19:35:25 MST 2018
    [hidden email]:/usr/src/sys/arch/arm64/compile/GENERIC.MP
real mem  = 17112715264 (16319MB)
avail mem = 16511594496 (15746MB)
mainbus0 at root: Marvell 8040 MACCHIATOBin
cpu0 at mainbus0 mpidr 0: ARM Cortex-A72 r0p1
cpu0: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache
cpu0: 512KB 64b/line 16-way L2 cache
efi0 at mainbus0: UEFI 2.7
efi0: EDK II rev 0x10000
psci0 at mainbus0: PSCI 1.1, SMCCC 1.1
simplebus0 at mainbus0: "ap806"
simplebus1 at simplebus0: "config-space"
ampintc0 at simplebus1 nirq 352, ncpu 4 ipi: 0, 1: "interrupt-controller"
ampintcmsi0 at ampintc0: nspi 32
ampintcmsi1 at ampintc0: nspi 32
ampintcmsi2 at ampintc0: nspi 32
ampintcmsi3 at ampintc0: nspi 32
syscon0 at simplebus1: "system-controller"
mvclock0 at syscon0
mvpinctrl0 at syscon0
mvgpio0 at syscon0
cpu0: 512KB 64b/line 16-way L2 cache
efi0 at mainbus0: UEFI 2.7
efi0: EDK II rev 0x10000
psci0 at mainbus0: PSCI 1.1, SMCCC 1.1
simplebus0 at mainbus0: "ap806"
simplebus1 at simplebus0: "config-space"
ampintc0 at simplebus1 nirq 352, ncpu 4 ipi: 0, 1: "interrupt-controller"
ampintcmsi0 at ampintc0: nspi 32
ampintcmsi1 at ampintc0: nspi 32
ampintcmsi2 at ampintc0: nspi 32
ampintcmsi3 at ampintc0: nspi 32
syscon0 at simplebus1: "system-controller"
mvclock0 at syscon0
mvpinctrl0 at syscon0
mvgpio0 at syscon0
agtimer0 at simplebus1: tick rate 25000 KHz
com0 at simplebus1: ns16550, no working fifo
com0: console
mvtemp0 at simplebus1
simplebus2 at mainbus0: "cp0"
simplebus3 at simplebus2: "config-space"
mvicu0 at simplebus3
syscon1 at simplebus3: "system-controller"
mvclock1 at syscon1
mvgpio1 at syscon1
mvtemp1 at simplebus3
xhci0 at simplebus3, xHCI 1.0
usb0 at xhci0: USB revision 3.0
uhub0 at usb0 configuration 1 interface 0 "Generic xHCI root hub" rev
3.00/1.00
 addr 1
xhci1 at simplebus3, xHCI 1.0
usb1 at xhci1: USB revision 3.0
uhub1 at usb1 configuration 1 interface 0 "Generic xHCI root hub" rev
3.00/1.00
 addr 1
ahci0 at simplebus3: AHCI 1.0
scsibus0 at ahci0: 32 targets
sxitwi0 at simplebus3
iic0 at sxitwi0
sxitwi1 at simplebus3
iic1 at sxitwi1
"nxp,pca9548" at iic1 addr 0x70 not configured
com1 at simplebus3: ns16550, no working fifo
mvrng0 at simplebus3
dwpcie0 at simplebus2
pci0 at dwpcie0
ppb0 at pci0 dev 0 function 0 "Marvell ARMADA 7K/8K Root Complex" rev 0x00
pci1 at ppb0 bus 1
bnxt0 at pci1 dev 0 function 0 "Broadcom BCM57404" rev 0x01: fw ver 20.2.26,
in
tx, address 00:0a:f7:ae:9c:b6
bnxt1 at pci1 dev 0 function 1 "Broadcom BCM57404" rev 0x01: fw ver 20.2.26,
in
tx, address 00:0a:f7:ae:9c:b7
simplebus4 at mainbus0: "cp1"
simplebus5 at simplebus4: "config-space"
mvicu1 at simplebus5
syscon2 at simplebus5: "system-controller"
mvclock2 at syscon2
mvgpio2 at syscon2
mvtemp2 at simplebus5
xhci2 at simplebus5, xHCI 1.0
usb2 at xhci2: USB revision 3.0
uhub2 at usb2 configuration 1 interface 0 "Generic xHCI root hub" rev
3.00/1.00
pci1 at ppb0 bus 1
bnxt0 at pci1 dev 0 function 0 "Broadcom BCM57404" rev 0x01: fw ver 20.2.26,
in
tx, address 00:0a:f7:ae:9c:b6
bnxt1 at pci1 dev 0 function 1 "Broadcom BCM57404" rev 0x01: fw ver 20.2.26,
in
tx, address 00:0a:f7:ae:9c:b7
simplebus4 at mainbus0: "cp1"
simplebus5 at simplebus4: "config-space"
mvicu1 at simplebus5
syscon2 at simplebus5: "system-controller"
mvclock2 at syscon2
mvgpio2 at syscon2
mvtemp2 at simplebus5
xhci2 at simplebus5, xHCI 1.0
usb2 at xhci2: USB revision 3.0
uhub2 at usb2 configuration 1 interface 0 "Generic xHCI root hub" rev
3.00/1.00
 addr 1    
ahci1 at simplebus5: AHCI 1.0
ahci1: port busy after first PMP probe FIS
ahci1: port busy after first PMP probe FIS
ahci1: port 0: 6.0Gb/s
ahci1: port busy after first PMP probe FIS
ahci1: port busy after first PMP probe FIS
ahci1: port 1: 6.0Gb/s
scsibus1 at ahci1: 32 targets
sd0 at scsibus1 targ 0 lun 0: <ATA, Samsung SSD 860, RVT0> SCSI3 0/direct
fixed
 naa.5002538e4048c423
sd0: 476940MB, 512 bytes/sector, 976773168 sectors, thin
sd1 at scsibus1 targ 1 lun 0: <ATA, WDC WD20SPZX-22C, 01.0> SCSI3 0/direct
fixe
d naa.50014ee608945f38
sd1: 1907729MB, 512 bytes/sector, 3907029168 sectors, thin
com2 at simplebus5: ns16550, no working fifo
mvrng1 at simplebus5
cpu1 at mainbus0 mpidr 1: ARM Cortex-A72 r0p1
cpu1: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache
cpu1: 512KB 64b/line 16-way L2 cache
cpu2 at mainbus0 mpidr 100: ARM Cortex-A72 r0p1
cpu2: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache
cpu2: 512KB 64b/line 16-way L2 cache
cpu3 at mainbus0 mpidr 101: ARM Cortex-A72 r0p1
cpu3: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache
cpu3: 512KB 64b/line 16-way L2 cache
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
bootfile: sd0a:/bsd
boot device: sd0
root on sd0a (674d8d0abfc9f314.a) swap on sd0b dump on sd0b

Reply | Threaded
Open this post in threaded view
|

Re: arm64: bnxt(4) crash running pkg_add -u on mcbin single-shot

Martin Pieuchot
On 18/11/18(Sun) 14:29, Carlos Cardenas wrote:
> Howdy.
>
> After an upgrade, my mcbin-ss panic'ed while performing pkg_add -u.
>
> Attached is the dmesg and ddb output(s).

This seems to be a SPL issue.  In the trace below uvm_pmr_remove_1strange()
is called with `fpageq' lock, which is a IPL_VM mutex.  So the bnx(4)
interrupt shouldn't be triggered.

> ddb{0}> show panic
> mtx 0xffffff8000b6cae8: locking against myself
> ddb{0}> trace
> db_enter() at panic+0x150
> panic() at uvm_pmr_getpages+0x19c
> uvm_pmr_getpages() at km_alloc+0x2d8
> km_alloc() at pool_multi_alloc+0xac
> pool_multi_alloc() at m_pool_alloc+0xa4
> m_pool_alloc() at pool_p_alloc+0x64
> pool_p_alloc() at pool_do_get+0xe0
> pool_do_get() at pool_get+0x90
> pool_get() at m_clget+0x1a4
> m_clget() at bnxt_rx_fill_slots+0x70
> bnxt_rx_fill_slots() at bnxt_rx_fill+0x58
> bnxt_rx_fill() at bnxt_intr+0x32c
> bnxt_intr() at ampintc_irq_handler+0x120
> ampintc_irq_handler() at handle_el1h_irq+0x6c
> handle_el1h_irq() at _rb_nfind+0x48
> _rb_nfind() at uvm_pmr_pnaddr+0x48
> uvm_pmr_pnaddr() at uvm_pmr_insert_addr+0x6c
> uvm_pmr_insert_addr() at uvm_pmr_remove_1strange+0x340
> uvm_pmr_remove_1strange() at uvm_pagezero_thread+0x138
> uvm_pagezero_thread() at proc_trampoline+0x10
> ddb{0}> mach ddbcpu 1
> Stopped at      ampintc_ipi_ddb+0x1c:   db_enter() at ampintc_ipi_ddb+0x18
> ampintc_ipi_ddb() at handle_el1h_irq+0x6c
> handle_el1h_irq() at mi_switch+0x27c
> mi_switch() at sleep_finish+0x80
> sleep_finish() at sleep_finish_all+0x24
> sleep_finish_all() at tsleep+0xc8
> tsleep() at pipe_write+0x2e4
> ddb{1}> mach ddbcpu 2
> Stopped at      ampintc_ipi_ddb+0x1c:   db_enter() at ampintc_ipi_ddb+0x18
> ampintc_ipi_ddb() at handle_el1h_irq+0x6c
> handle_el1h_irq() at pipe_read+0x18c
> pipe_read() at dofilereadv+0x118
> dofilereadv() at sys_read+0x64
> sys_read() at svc_handler+0x264
> svc_handler() at do_el0_sync+0x1b0
> ddb{2}> mach ddbcpu 3
> Stopped at      ampintc_ipi_ddb+0x1c:   db_enter() at ampintc_ipi_ddb+0x18
> ampintc_ipi_ddb() at handle_el1h_irq+0x6c
> handle_el1h_irq() at sched_idle+0x264
> sched_idle() at proc_trampoline+0x10
> ddb{0}> ps
>    PID     TID   PPID    UID  S       FLAGS  WAIT          COMMAND
>  49133  445399  13460      0  7    0x100083                signify
>  73562  244961  13460     57  3    0x100083  pipewr        ftp
>  13460  327068  73083      0  7        0x13                perl
>  73083  145869      1      0  3    0x10008b  pause         ksh
>  14480  429557      1      0  3    0x100098  poll          cron
>  94239  269814  43573    720  3        0x90  kqread        lldpd
>  43573    9590      1      0  3        0x80  netio         lldpd
>  34010  511267      1     99  3    0x100090  poll          sndiod
>  56442  292457      1    110  3    0x100090  poll          sndiod
>  87524  516137  76220     95  3    0x100092  kqread        smtpd
>  67451  517797  76220    103  3    0x100092  kqread        smtpd
>  46139   58374  76220     95  3    0x100092  kqread        smtpd
>  64211  416407  76220     95  3    0x100092  kqread        smtpd
>  59703   56359  76220     95  3    0x100092  kqread        smtpd
>  32430  162051  76220     95  3    0x100092  kqread        smtpd
>  76220  338058      1      0  3    0x100080  kqread        smtpd
>  43032  105890      1      0  3        0x80  select        sshd
>  15205   45074  18882     83  3    0x100092  poll          ntpd
>  18882  275774   3939     83  3    0x100092  poll          ntpd
>   3939  197221      1      0  3    0x100080  poll          ntpd
>  66576  427757  60591     74  3    0x100092  bpf           pflogd
>  60591    6199      1      0  3        0x80  netio         pflogd
>  13115  213653  49662     73  3    0x100090  kqread        syslogd
>  49662  255195      1      0  3    0x100082  netio         syslogd
>  52280  180117      1     77  3    0x100090  poll          dhclient
>  82868  253718      1      0  3        0x80  poll          dhclient
>    752  473909  89936    115  3    0x100092  kqread        slaacd
>  28475  288910  89936    115  3    0x100092  kqread        slaacd
>  89936  422111      1      0  3    0x100080  kqread        slaacd
> *74839  231470      0      0  7     0x14200                zerothread
>  97587  509880      0      0  3     0x14200  aiodoned      aiodoned
>  12018  504243      0      0  3     0x14200  syncer        update
>  29346  114311      0      0  3     0x14200  cleaner       cleaner
>   3201  164514      0      0  3     0x14200  reaper        reaper
>  74165  156963      0      0  3     0x14200  pgdaemon      pagedaemon
>  97256  255669      0      0  3     0x14200  bored         crynlk
>  25774  193955      0      0  3     0x14200  bored         crypto
>  82385  500927      0      0  7  0x40014200                idle3
>  51930  221918      0      0  3  0x40014200                idle2
>  99556  361074      0      0  3  0x40014200                idle1
>  36698   79862      0      0  3     0x14200  usbtsk        usbtask
>  87822   63054      0      0  3     0x14200  usbatsk       usbatsk
>  59815  433532      0      0  3     0x14200  bored         sensors
>  12031  445539      0      0  3     0x14200  bored         softnet
>  45842  278094      0      0  3     0x14200  bored         systqmp
>  55708  324286      0      0  3     0x14200  bored         systq
>  52554  311327      0      0  3  0x40014200  bored         softclock
>  62638  159321      0      0  3  0x40014200                idle0
>  66053  474648      0      0  3     0x14200  kmalloc       kmthread
>      1  203595      0      0  3        0x82  wait          init
>      0       0     -1      0  3     0x10200  scheduler     swapper
> ddb{0}> show uvm
> Current UVM status:
>   pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
>   4036954 VM pages: 15691 active, 31617 inactive, 0 wired, 3898311 free
> (487279
>  zero)
>   min  10% (25) anon, 10% (25) vnode, 5% (12) vtext
>   freemin=134565, free-target=179420, inactive-target=0, wired-max=1345651
>   faults=350301, traps=0, intrs=0, ctxswitch=71731 fpuswitch=0
>   softint=4204, syscalls=258464, kmapent=14
>   fault counts:
>     noram=0, noanon=0, noamap=0, pgwait=0, pgrele=0
>     ok relocks(total)=52530(52531), anget(retries)=113181(0), amapcopy=105599
>     neighbor anon/obj pg=11213/93307, gets(lock/unlock)=127729/52531
>     cases: anon=94876, anoncow=18305, obj=121106, prcopy=6622, przero=109384
>   daemon and swap counts:
>     woke=0, revs=0, scans=0, obscans=0, anscans=0
>     busy=0, freed=0, reactivate=0, deactivate=0
>     pageouts=0, pending=0, nswget=0
>     nswapdev=1
>     swpages=1050249, swpginuse=0, swpgonly=0 paging=0
>   kernel pointers:
>     objs(kern)=0xffffff8000aa9128
> ddb{0}> show bcstats
> Current Buffer Cache status:
> numbufs 17692 busymapped 0, delwri 129
> kvaslots 6553 avail kva slots 6553
> bufpages 70695, dmapages 70695, dirtypages 516
> pendingreads 1, pendingwrites 0
> highflips 0, highflops 0, dmaflips 0
> ddb{0}> dmesg
> OpenBSD 6.4-current (GENERIC.MP) #271: Sat Nov 17 19:35:25 MST 2018
>     [hidden email]:/usr/src/sys/arch/arm64/compile/GENERIC.MP
> real mem  = 17112715264 (16319MB)
> avail mem = 16511594496 (15746MB)
> mainbus0 at root: Marvell 8040 MACCHIATOBin
> cpu0 at mainbus0 mpidr 0: ARM Cortex-A72 r0p1
> cpu0: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache
> cpu0: 512KB 64b/line 16-way L2 cache
> efi0 at mainbus0: UEFI 2.7
> efi0: EDK II rev 0x10000
> psci0 at mainbus0: PSCI 1.1, SMCCC 1.1
> simplebus0 at mainbus0: "ap806"
> simplebus1 at simplebus0: "config-space"
> ampintc0 at simplebus1 nirq 352, ncpu 4 ipi: 0, 1: "interrupt-controller"
> ampintcmsi0 at ampintc0: nspi 32
> ampintcmsi1 at ampintc0: nspi 32
> ampintcmsi2 at ampintc0: nspi 32
> ampintcmsi3 at ampintc0: nspi 32
> syscon0 at simplebus1: "system-controller"
> mvclock0 at syscon0
> mvpinctrl0 at syscon0
> mvgpio0 at syscon0
> cpu0: 512KB 64b/line 16-way L2 cache
> efi0 at mainbus0: UEFI 2.7
> efi0: EDK II rev 0x10000
> psci0 at mainbus0: PSCI 1.1, SMCCC 1.1
> simplebus0 at mainbus0: "ap806"
> simplebus1 at simplebus0: "config-space"
> ampintc0 at simplebus1 nirq 352, ncpu 4 ipi: 0, 1: "interrupt-controller"
> ampintcmsi0 at ampintc0: nspi 32
> ampintcmsi1 at ampintc0: nspi 32
> ampintcmsi2 at ampintc0: nspi 32
> ampintcmsi3 at ampintc0: nspi 32
> syscon0 at simplebus1: "system-controller"
> mvclock0 at syscon0
> mvpinctrl0 at syscon0
> mvgpio0 at syscon0
> agtimer0 at simplebus1: tick rate 25000 KHz
> com0 at simplebus1: ns16550, no working fifo
> com0: console
> mvtemp0 at simplebus1
> simplebus2 at mainbus0: "cp0"
> simplebus3 at simplebus2: "config-space"
> mvicu0 at simplebus3
> syscon1 at simplebus3: "system-controller"
> mvclock1 at syscon1
> mvgpio1 at syscon1
> mvtemp1 at simplebus3
> xhci0 at simplebus3, xHCI 1.0
> usb0 at xhci0: USB revision 3.0
> uhub0 at usb0 configuration 1 interface 0 "Generic xHCI root hub" rev
> 3.00/1.00
>  addr 1
> xhci1 at simplebus3, xHCI 1.0
> usb1 at xhci1: USB revision 3.0
> uhub1 at usb1 configuration 1 interface 0 "Generic xHCI root hub" rev
> 3.00/1.00
>  addr 1
> ahci0 at simplebus3: AHCI 1.0
> scsibus0 at ahci0: 32 targets
> sxitwi0 at simplebus3
> iic0 at sxitwi0
> sxitwi1 at simplebus3
> iic1 at sxitwi1
> "nxp,pca9548" at iic1 addr 0x70 not configured
> com1 at simplebus3: ns16550, no working fifo
> mvrng0 at simplebus3
> dwpcie0 at simplebus2
> pci0 at dwpcie0
> ppb0 at pci0 dev 0 function 0 "Marvell ARMADA 7K/8K Root Complex" rev 0x00
> pci1 at ppb0 bus 1
> bnxt0 at pci1 dev 0 function 0 "Broadcom BCM57404" rev 0x01: fw ver 20.2.26,
> in
> tx, address 00:0a:f7:ae:9c:b6
> bnxt1 at pci1 dev 0 function 1 "Broadcom BCM57404" rev 0x01: fw ver 20.2.26,
> in
> tx, address 00:0a:f7:ae:9c:b7
> simplebus4 at mainbus0: "cp1"
> simplebus5 at simplebus4: "config-space"
> mvicu1 at simplebus5
> syscon2 at simplebus5: "system-controller"
> mvclock2 at syscon2
> mvgpio2 at syscon2
> mvtemp2 at simplebus5
> xhci2 at simplebus5, xHCI 1.0
> usb2 at xhci2: USB revision 3.0
> uhub2 at usb2 configuration 1 interface 0 "Generic xHCI root hub" rev
> 3.00/1.00
> pci1 at ppb0 bus 1
> bnxt0 at pci1 dev 0 function 0 "Broadcom BCM57404" rev 0x01: fw ver 20.2.26,
> in
> tx, address 00:0a:f7:ae:9c:b6
> bnxt1 at pci1 dev 0 function 1 "Broadcom BCM57404" rev 0x01: fw ver 20.2.26,
> in
> tx, address 00:0a:f7:ae:9c:b7
> simplebus4 at mainbus0: "cp1"
> simplebus5 at simplebus4: "config-space"
> mvicu1 at simplebus5
> syscon2 at simplebus5: "system-controller"
> mvclock2 at syscon2
> mvgpio2 at syscon2
> mvtemp2 at simplebus5
> xhci2 at simplebus5, xHCI 1.0
> usb2 at xhci2: USB revision 3.0
> uhub2 at usb2 configuration 1 interface 0 "Generic xHCI root hub" rev
> 3.00/1.00
>  addr 1    
> ahci1 at simplebus5: AHCI 1.0
> ahci1: port busy after first PMP probe FIS
> ahci1: port busy after first PMP probe FIS
> ahci1: port 0: 6.0Gb/s
> ahci1: port busy after first PMP probe FIS
> ahci1: port busy after first PMP probe FIS
> ahci1: port 1: 6.0Gb/s
> scsibus1 at ahci1: 32 targets
> sd0 at scsibus1 targ 0 lun 0: <ATA, Samsung SSD 860, RVT0> SCSI3 0/direct
> fixed
>  naa.5002538e4048c423
> sd0: 476940MB, 512 bytes/sector, 976773168 sectors, thin
> sd1 at scsibus1 targ 1 lun 0: <ATA, WDC WD20SPZX-22C, 01.0> SCSI3 0/direct
> fixe
> d naa.50014ee608945f38
> sd1: 1907729MB, 512 bytes/sector, 3907029168 sectors, thin
> com2 at simplebus5: ns16550, no working fifo
> mvrng1 at simplebus5
> cpu1 at mainbus0 mpidr 1: ARM Cortex-A72 r0p1
> cpu1: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache
> cpu1: 512KB 64b/line 16-way L2 cache
> cpu2 at mainbus0 mpidr 100: ARM Cortex-A72 r0p1
> cpu2: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache
> cpu2: 512KB 64b/line 16-way L2 cache
> cpu3 at mainbus0 mpidr 101: ARM Cortex-A72 r0p1
> cpu3: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache
> cpu3: 512KB 64b/line 16-way L2 cache
> vscsi0 at root
> scsibus2 at vscsi0: 256 targets
> softraid0 at root
> scsibus3 at softraid0: 256 targets
> bootfile: sd0a:/bsd
> boot device: sd0
> root on sd0a (674d8d0abfc9f314.a) swap on sd0b dump on sd0b
>

Reply | Threaded
Open this post in threaded view
|

Re: arm64: bnxt(4) crash running pkg_add -u on mcbin single-shot

Patrick Wildt-3
On Mon, Nov 19, 2018 at 02:13:52PM -0200, Martin Pieuchot wrote:

> On 18/11/18(Sun) 14:29, Carlos Cardenas wrote:
> > Howdy.
> >
> > After an upgrade, my mcbin-ss panic'ed while performing pkg_add -u.
> >
> > Attached is the dmesg and ddb output(s).
>
> This seems to be a SPL issue.  In the trace below uvm_pmr_remove_1strange()
> is called with `fpageq' lock, which is a IPL_VM mutex.  So the bnx(4)
> interrupt shouldn't be triggered.

Yeah, I think I can tell you why.  First of all, bnxt(4) does not use
MSI, even though that ARM64 platform supports it, because it's disabled
in bnxt(4).  Thus we have to use legacy interrupts.

The device tree for the PCIe nodes specify an interrupt map:


        pcie@f2600000 {
            [...]
            interrupt-map-mask = <0x00000000 0x00000000 0x00000000 0x00000000>;
            interrupt-map = <0x00000000 0x00000000 0x00000000 0x00000000 0x00000008 0x00000000 0x00000016 0x00000004>;
            interrupts = <0x00000000 0x00000016 0x00000004>;
            [...]
        };

The "interrupts" property is used to establish an interrupt handler on
the controller itself.  In this case apparently we have a handler that
ACKs INTx interrupts and returns, because someone else will take care of
the actual interrupt handling (i.e. PCIe device driver).

        sc->sc_ih = fdt_intr_establish(sc->sc_node, IPL_AUDIO | IPL_MPSAFE,
            dwpcie_armada8k_intr, sc, sc->sc_dev.dv_xname);

The PCIe device driver like bnxt(4) maps the interrupt through the
interrupt-map which apparently has the same interrupt ids.

                cookie = fdt_intr_establish_imap(sc->sc_node, reg,
                    sizeof(reg), level, func, arg, name);

That means bnxt(4) and dwpcie(4) share the same interrupt line, but one
has IPL_AUDIO and the other one IPL_NET.  Since the highest IPL on a
pin counts, this line is now IPL_AUDIO (which is >IPL_VM).  A bnxt(4)
interrupt is now allowed to interrupt IPL_VM.

I don't think there's a regression, I think it behaves as implemented.
The question that remains is: How should it behave if not like that?

Patrick

Reply | Threaded
Open this post in threaded view
|

Re: arm64: bnxt(4) crash running pkg_add -u on mcbin single-shot

Theo de Raadt-2
> That means bnxt(4) and dwpcie(4) share the same interrupt line, but one
> has IPL_AUDIO and the other one IPL_NET.  Since the highest IPL on a
> pin counts, this line is now IPL_AUDIO (which is >IPL_VM).  A bnxt(4)
> interrupt is now allowed to interrupt IPL_VM.
>
> I don't think there's a regression, I think it behaves as implemented.
> The question that remains is: How should it behave if not like that?

When shared, it should operate at the lowest level not the highest.  That
may cause other difficulties.

Reply | Threaded
Open this post in threaded view
|

Re: arm64: bnxt(4) crash running pkg_add -u on mcbin single-shot

Patrick Wildt-3
On Mon, Nov 19, 2018 at 11:27:54AM -0700, Theo de Raadt wrote:

> > That means bnxt(4) and dwpcie(4) share the same interrupt line, but one
> > has IPL_AUDIO and the other one IPL_NET.  Since the highest IPL on a
> > pin counts, this line is now IPL_AUDIO (which is >IPL_VM).  A bnxt(4)
> > interrupt is now allowed to interrupt IPL_VM.
> >
> > I don't think there's a regression, I think it behaves as implemented.
> > The question that remains is: How should it behave if not like that?
>
> When shared, it should operate at the lowest level not the highest.  That
> may cause other difficulties.

Actually it seems like we do set the lowest level, not the highest,
sorry.  But I think there might be a bug.  I will have a look.

Reply | Threaded
Open this post in threaded view
|

Re: arm64: bnxt(4) crash running pkg_add -u on mcbin single-shot

Patrick Wildt-3
On Mon, Nov 19, 2018 at 08:16:31PM +0100, Patrick Wildt wrote:

> On Mon, Nov 19, 2018 at 11:27:54AM -0700, Theo de Raadt wrote:
> > > That means bnxt(4) and dwpcie(4) share the same interrupt line, but one
> > > has IPL_AUDIO and the other one IPL_NET.  Since the highest IPL on a
> > > pin counts, this line is now IPL_AUDIO (which is >IPL_VM).  A bnxt(4)
> > > interrupt is now allowed to interrupt IPL_VM.
> > >
> > > I don't think there's a regression, I think it behaves as implemented.
> > > The question that remains is: How should it behave if not like that?
> >
> > When shared, it should operate at the lowest level not the highest.  That
> > may cause other difficulties.
>
> Actually it seems like we do set the lowest level, not the highest,
> sorry.  But I think there might be a bug.  I will have a look.
>

I hope I'm not wrong, but I think there's a bug in the calculation
code.  The ampinctc_calc_mask() function sets the priority of an IRQ
every time a handler is established or disestablished on the given IRQ.
In this case, the IPL_AUDIO is probably established before the IPL_NET,
thus when the IPL_NET establish happens, iq_irq is already set to "max"
as in IPL_AUDIO.  That means the code that sets the priority to "min" is
skipped, as the continue statement will take effect.

I guess each intrq object should have the lowest and highest IPL, and
the loop should only continue of both are the same.  The lowest IPL is
then set for the IPL priority, and the highest IPL is used to splraise()
once the IRQ actually hits and the handler is suppoed to run.

Untested, I hope that someone has a look as well and to spot if I have
a mistake in my understanding.

We have this issue in at least 4 files: agintc/ampintc for arm64/armv7.

Patrick

diff --git a/sys/arch/arm64/dev/ampintc.c b/sys/arch/arm64/dev/ampintc.c
index 2c4ce63a150..a1b28a20ea7 100644
--- a/sys/arch/arm64/dev/ampintc.c
+++ b/sys/arch/arm64/dev/ampintc.c
@@ -160,8 +160,8 @@ struct intrhand {
 
 struct intrq {
  TAILQ_HEAD(, intrhand) iq_list; /* handler list */
- int iq_irq; /* IRQ to mask while handling */
- int iq_levels; /* IPL_*'s this IRQ has */
+ int iq_irq_max; /* IRQ to mask while handling */
+ int iq_irq_min; /* lowest IRQ when shared */
  int iq_ist; /* share type */
 };
 
@@ -469,14 +469,16 @@ ampintc_calc_mask(void)
  min = ih->ih_ipl;
  }
 
- if (sc->sc_handler[irq].iq_irq == max) {
- continue;
- }
- sc->sc_handler[irq].iq_irq = max;
-
  if (max == IPL_NONE)
  min = IPL_NONE;
 
+ if (sc->sc_handler[irq].iq_irq_max == max ||
+    sc->sc_handler[irq].iq_irq_min == min)
+ continue;
+
+ sc->sc_handler[irq].iq_irq_max = max;
+ sc->sc_handler[irq].iq_irq_min = min;
+
  /* Enable interrupts at lower levels, clear -> enable */
  /* Set interrupt priority/enable */
  if (min != IPL_NONE) {
@@ -604,7 +606,7 @@ ampintc_route_irq(void *v, int enable, struct cpu_info *ci)
  bus_space_write_4(sc->sc_iot, sc->sc_d_ioh, ICD_ICRn(ih->ih_irq), 0);
  if (enable) {
  ampintc_set_priority(ih->ih_irq,
-    sc->sc_handler[ih->ih_irq].iq_irq);
+    sc->sc_handler[ih->ih_irq].iq_irq_min);
  ampintc_intr_enable(ih->ih_irq);
  }
 
@@ -646,7 +648,7 @@ ampintc_irq_handler(void *frame)
  if (irq >= sc->sc_nintr)
  return;
 
- pri = sc->sc_handler[irq].iq_irq;
+ pri = sc->sc_handler[irq].iq_irq_max;
  s = ampintc_splraise(pri);
  TAILQ_FOREACH(ih, &sc->sc_handler[irq].iq_list, ih_list) {
 #ifdef MULTIPROCESSOR

Reply | Threaded
Open this post in threaded view
|

Re: arm64: bnxt(4) crash running pkg_add -u on mcbin single-shot

Carlos Cardenas
On Mon, Nov 19, 2018 at 08:35:44PM +0100, Patrick Wildt wrote:

> On Mon, Nov 19, 2018 at 08:16:31PM +0100, Patrick Wildt wrote:
> > On Mon, Nov 19, 2018 at 11:27:54AM -0700, Theo de Raadt wrote:
> > > > That means bnxt(4) and dwpcie(4) share the same interrupt line, but one
> > > > has IPL_AUDIO and the other one IPL_NET.  Since the highest IPL on a
> > > > pin counts, this line is now IPL_AUDIO (which is >IPL_VM).  A bnxt(4)
> > > > interrupt is now allowed to interrupt IPL_VM.
> > > >
> > > > I don't think there's a regression, I think it behaves as implemented.
> > > > The question that remains is: How should it behave if not like that?
> > >
> > > When shared, it should operate at the lowest level not the highest.  That
> > > may cause other difficulties.
> >
> > Actually it seems like we do set the lowest level, not the highest,
> > sorry.  But I think there might be a bug.  I will have a look.
> >
>
> I hope I'm not wrong, but I think there's a bug in the calculation
> code.  The ampinctc_calc_mask() function sets the priority of an IRQ
> every time a handler is established or disestablished on the given IRQ.
> In this case, the IPL_AUDIO is probably established before the IPL_NET,
> thus when the IPL_NET establish happens, iq_irq is already set to "max"
> as in IPL_AUDIO.  That means the code that sets the priority to "min" is
> skipped, as the continue statement will take effect.
>
> I guess each intrq object should have the lowest and highest IPL, and
> the loop should only continue of both are the same.  The lowest IPL is
> then set for the IPL priority, and the highest IPL is used to splraise()
> once the IRQ actually hits and the handler is suppoed to run.
>
> Untested, I hope that someone has a look as well and to spot if I have
> a mistake in my understanding.
>
> We have this issue in at least 4 files: agintc/ampintc for arm64/armv7.
>
> Patrick
Well...with the patch applied, we panic in another location (attached).

+--+
Carlos

Stopped at      panic+0x154:        TID    PID    UID     PRFLAGS     PFLAGS
C
PU  COMMAND
 370699  33809      0        0x13          0    3  perl
*102914  60685      0     0x14000      0x200    0  softnet
db_enter() at panic+0x150
panic() at pool_get+0x5c
pool_get() at m_gethdr+0x34
m_gethdr() at tcp_output+0x9a0
tcp_output() at tcp_input+0x2e94
tcp_input() at ip_deliver+0x240
ip_deliver() at ipintr+0x60
https://www.openbsd.org/ddb.html describes the minimum info required in bug
reports.  Insufficient info makes it difficult to find and fix bugs.

ddb{0}> show panic
pool_cache_item_magic_check: mbufpl cpu free list modified: item addr
0xffffff8
024cc2b00+16 0xffffff8024c7b000!=0x7bf2bf8d6b78bbb
ddb{0}> trace
db_enter() at panic+0x150
panic() at pool_get+0x5c
pool_get() at m_gethdr+0x34
m_gethdr() at tcp_output+0x9a0
tcp_output() at tcp_input+0x2e94
tcp_input() at ip_deliver+0x240
ip_deliver() at ipintr+0x60
ipintr() at if_netisr+0x8c
if_netisr() at taskq_thread+0x80
taskq_thread() at proc_trampoline+0x10
ddb{0}> mach ddbcpu 1
Stopped at      ampintc_ipi_ddb+0x1c:   db_enter() at ampintc_ipi_ddb+0x18
ampintc_ipi_ddb() at handle_el1h_irq+0x6c
handle_el1h_irq() at sched_idle+0x264
sched_idle() at proc_trampoline+0x10
ddb{1}> mach ddbcpu 2
Stopped at      ampintc_ipi_ddb+0x1c:   db_enter() at ampintc_ipi_ddb+0x18
ampintc_ipi_ddb() at handle_el1h_irq+0x6c
handle_el1h_irq() at sched_idle+0x264
sched_idle() at proc_trampoline+0x10
ddb{2}> mach ddbcpu 3
Stopped at      ampintc_ipi_ddb+0x1c:   db_enter() at ampintc_ipi_ddb+0x18
ampintc_ipi_ddb() at handle_el1h_irq+0x6c
handle_el1h_irq() at tsleep+0x120
tsleep() at bwrite+0x224
bwrite() at ffs1_balloc+0x734
ffs1_balloc() at ffs_write+0x208
ffs_write() at VOP_WRITE+0x5c
ddb{0}> ps
   PID     TID   PPID    UID  S       FLAGS  WAIT          COMMAND
 63229  264459  33809      0  3    0x100083  piperd        signify
 61279  149431  33809     57  3    0x100003  netlock       ftp
 33809  370699  68767      0  7        0x13                perl
 68767  448790      1      0  3    0x10008b  pause         ksh
 60114  382468      1      0  3    0x100098  poll          cron
 20210  409805  48663    720  3        0x90  kqread        lldpd
 48663  355916      1      0  3        0x80  netio         lldpd
 63008  469476      1     99  3    0x100090  poll          sndiod
 53674  479632      1    110  3    0x100090  poll          sndiod
 15732  327475  85598     95  3    0x100092  kqread        smtpd
 94636  299326  85598    103  3    0x100092  kqread        smtpd
 57774   76693  85598     95  3    0x100092  kqread        smtpd
 23788  122165  85598     95  3    0x100092  kqread        smtpd
 65743  344634  85598     95  3    0x100092  kqread        smtpd
 42485  499416  85598     95  3    0x100092  kqread        smtpd
 85598  204375      1      0  3    0x100080  kqread        smtpd
 13294   86371      1      0  3        0x80  select        sshd
 33900  334576  54338     83  3    0x100092  poll          ntpd
 54338  229575  56575     83  3    0x100092  poll          ntpd
 56575  222315      1      0  3    0x100080  poll          ntpd
  6277  355457   2175     74  3    0x100092  bpf           pflogd
  2175  367815      1      0  3        0x80  netio         pflogd
   191  157873  64956     73  3    0x100090  kqread        syslogd
 64956  449336      1      0  3    0x100082  netio         syslogd
 46751  290865      1     77  3    0x100090  poll          dhclient
 13612  149750      1      0  3        0x80  poll          dhclient
  3833  241695  81696    115  3    0x100092  kqread        slaacd
 12716   38042  81696    115  3    0x100092  kqread        slaacd
 81696  292319      1      0  3    0x100080  kqread        slaacd
 28310   13221      0      0  3     0x14200  pgzero        zerothread
 30261  447035      0      0  3     0x14200  aiodoned      aiodoned
 19436  499304      0      0  3     0x14200  syncer        update
 97858  506840      0      0  3     0x14200  cleaner       cleaner
 44656  399351      0      0  3     0x14200  reaper        reaper
 73296  131157      0      0  3     0x14200  pgdaemon      pagedaemon
 26447  256479      0      0  3     0x14200  bored         crynlk
 19005  454771      0      0  3     0x14200  bored         crypto
 78823   76292      0      0  3  0x40014200                idle3
 75791  314598      0      0  7  0x40014200                idle2
 98837  125412      0      0  7  0x40014200                idle1
 81508   15782      0      0  3     0x14200  usbtsk        usbtask
 88377   24693      0      0  3     0x14200  usbatsk       usbatsk
 66739  316258      0      0  3     0x14200  bored         sensors
*60685  102914      0      0  7     0x14200                softnet
 27305  149149      0      0  3     0x14200  bored         systqmp
 69126  162699      0      0  2     0x14200                systq
 65063  245748      0      0  3  0x40014200  bored         softclock
  4477  427622      0      0  3  0x40014200                idle0
  6704  239660      0      0  3     0x14200  kmalloc       kmthread
     1  118616      0      0  3        0x82  wait          init
     0       0     -1      0  3     0x10200  scheduler     swapper
ddb{0}> show uvm
Current UVM status:
  pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
  4036956 VM pages: 15367 active, 34591 inactive, 0 wired, 3875370 free
(484444
 zero)
  min  10% (25) anon, 10% (25) vnode, 5% (12) vtext
  freemin=134565, free-target=179420, inactive-target=0, wired-max=1345652
  faults=545808, traps=0, intrs=0, ctxswitch=210848 fpuswitch=0
  softint=314540, syscalls=448239, kmapent=13
  fault counts:
    noram=0, noanon=0, noamap=0, pgwait=0, pgrele=0
    ok relocks(total)=55651(55652), anget(retries)=206803(0), amapcopy=180703
    neighbor anon/obj pg=23503/146861, gets(lock/unlock)=172655/55652
    cases: anon=170001, anoncow=36802, obj=162961, prcopy=9693, przero=166343
  daemon and swap counts:
    woke=0, revs=0, scans=0, obscans=0, anscans=0
    busy=0, freed=0, reactivate=0, deactivate=0
    pageouts=0, pending=0, nswget=0
    nswapdev=1
    swpages=1050249, swpginuse=0, swpgonly=0 paging=0
  kernel pointers:
    objs(kern)=0xffffff8000aa7460
ddb{0}> show bcstats
Current Buffer Cache status:
numbufs 22627 busymapped 13, delwri 127
kvaslots 6553 avail kva slots 6540
bufpages 90435, dmapages 90435, dirtypages 508
pendingreads 41, pendingwrites 13
highflips 0, highflops 0, dmaflips 0
Reply | Threaded
Open this post in threaded view
|

Re: arm64: bnxt(4) crash running pkg_add -u on mcbin single-shot

Mark Kettenis
In reply to this post by Patrick Wildt-3
> Date: Mon, 19 Nov 2018 20:35:44 +0100
> From: Patrick Wildt <[hidden email]>
>
> On Mon, Nov 19, 2018 at 08:16:31PM +0100, Patrick Wildt wrote:
> > On Mon, Nov 19, 2018 at 11:27:54AM -0700, Theo de Raadt wrote:
> > > > That means bnxt(4) and dwpcie(4) share the same interrupt line, but one
> > > > has IPL_AUDIO and the other one IPL_NET.  Since the highest IPL on a
> > > > pin counts, this line is now IPL_AUDIO (which is >IPL_VM).  A bnxt(4)
> > > > interrupt is now allowed to interrupt IPL_VM.
> > > >
> > > > I don't think there's a regression, I think it behaves as implemented.
> > > > The question that remains is: How should it behave if not like that?

Admittedly the current INTx implementation in dwpcie(4) is a bit of a
hack.  The hardware has a register that signals which of the four
legacy PCI interrupts pin was triggered so the driver really should
expose itself as a secondary interrupt controller.  But proper support
for secondary interrupt controllers requires a bit of thought.  For
one thing we'd need a way to unblock interrupts on decondary interrupt
controllers in response to an splx(9) call.

> > > When shared, it should operate at the lowest level not the highest.  That
> > > may cause other difficulties.
> >
> > Actually it seems like we do set the lowest level, not the highest,
> > sorry.  But I think there might be a bug.  I will have a look.
> >
>
> I hope I'm not wrong, but I think there's a bug in the calculation
> code.  The ampinctc_calc_mask() function sets the priority of an IRQ
> every time a handler is established or disestablished on the given IRQ.
> In this case, the IPL_AUDIO is probably established before the IPL_NET,
> thus when the IPL_NET establish happens, iq_irq is already set to "max"
> as in IPL_AUDIO.  That means the code that sets the priority to "min" is
> skipped, as the continue statement will take effect.
>
> I guess each intrq object should have the lowest and highest IPL, and
> the loop should only continue of both are the same.  The lowest IPL is
> then set for the IPL priority, and the highest IPL is used to splraise()
> once the IRQ actually hits and the handler is suppoed to run.
>
> Untested, I hope that someone has a look as well and to spot if I have
> a mistake in my understanding.
>
> We have this issue in at least 4 files: agintc/ampintc for arm64/armv7.
>
> Patrick
>
> diff --git a/sys/arch/arm64/dev/ampintc.c b/sys/arch/arm64/dev/ampintc.c
> index 2c4ce63a150..a1b28a20ea7 100644
> --- a/sys/arch/arm64/dev/ampintc.c
> +++ b/sys/arch/arm64/dev/ampintc.c
> @@ -160,8 +160,8 @@ struct intrhand {
>  
>  struct intrq {
>   TAILQ_HEAD(, intrhand) iq_list; /* handler list */
> - int iq_irq; /* IRQ to mask while handling */
> - int iq_levels; /* IPL_*'s this IRQ has */
> + int iq_irq_max; /* IRQ to mask while handling */
> + int iq_irq_min; /* lowest IRQ when shared */
>   int iq_ist; /* share type */
>  };
>  
> @@ -469,14 +469,16 @@ ampintc_calc_mask(void)
>   min = ih->ih_ipl;
>   }
>  
> - if (sc->sc_handler[irq].iq_irq == max) {
> - continue;
> - }
> - sc->sc_handler[irq].iq_irq = max;
> -
>   if (max == IPL_NONE)
>   min = IPL_NONE;
>  
> + if (sc->sc_handler[irq].iq_irq_max == max ||
> +    sc->sc_handler[irq].iq_irq_min == min)
> + continue;

Shouldn't that be && instead of ||?

> +
> + sc->sc_handler[irq].iq_irq_max = max;
> + sc->sc_handler[irq].iq_irq_min = min;
> +
>   /* Enable interrupts at lower levels, clear -> enable */
>   /* Set interrupt priority/enable */
>   if (min != IPL_NONE) {
> @@ -604,7 +606,7 @@ ampintc_route_irq(void *v, int enable, struct cpu_info *ci)
>   bus_space_write_4(sc->sc_iot, sc->sc_d_ioh, ICD_ICRn(ih->ih_irq), 0);
>   if (enable) {
>   ampintc_set_priority(ih->ih_irq,
> -    sc->sc_handler[ih->ih_irq].iq_irq);
> +    sc->sc_handler[ih->ih_irq].iq_irq_min);
>   ampintc_intr_enable(ih->ih_irq);
>   }
>  
> @@ -646,7 +648,7 @@ ampintc_irq_handler(void *frame)
>   if (irq >= sc->sc_nintr)
>   return;
>  
> - pri = sc->sc_handler[irq].iq_irq;
> + pri = sc->sc_handler[irq].iq_irq_max;
>   s = ampintc_splraise(pri);
>   TAILQ_FOREACH(ih, &sc->sc_handler[irq].iq_list, ih_list) {
>  #ifdef MULTIPROCESSOR
>
>

Reply | Threaded
Open this post in threaded view
|

Re: arm64: bnxt(4) crash running pkg_add -u on mcbin single-shot

Patrick Wildt-3
On Tue, Nov 20, 2018 at 04:44:56PM +0100, Mark Kettenis wrote:

> > Date: Mon, 19 Nov 2018 20:35:44 +0100
> > From: Patrick Wildt <[hidden email]>
> >
> > On Mon, Nov 19, 2018 at 08:16:31PM +0100, Patrick Wildt wrote:
> > > On Mon, Nov 19, 2018 at 11:27:54AM -0700, Theo de Raadt wrote:
> > > > > That means bnxt(4) and dwpcie(4) share the same interrupt line, but one
> > > > > has IPL_AUDIO and the other one IPL_NET.  Since the highest IPL on a
> > > > > pin counts, this line is now IPL_AUDIO (which is >IPL_VM).  A bnxt(4)
> > > > > interrupt is now allowed to interrupt IPL_VM.
> > > > >
> > > > > I don't think there's a regression, I think it behaves as implemented.
> > > > > The question that remains is: How should it behave if not like that?
>
> Admittedly the current INTx implementation in dwpcie(4) is a bit of a
> hack.  The hardware has a register that signals which of the four
> legacy PCI interrupts pin was triggered so the driver really should
> expose itself as a secondary interrupt controller.  But proper support
> for secondary interrupt controllers requires a bit of thought.  For
> one thing we'd need a way to unblock interrupts on decondary interrupt
> controllers in response to an splx(9) call.
>
> > > > When shared, it should operate at the lowest level not the highest.  That
> > > > may cause other difficulties.
> > >
> > > Actually it seems like we do set the lowest level, not the highest,
> > > sorry.  But I think there might be a bug.  I will have a look.
> > >
> >
> > I hope I'm not wrong, but I think there's a bug in the calculation
> > code.  The ampinctc_calc_mask() function sets the priority of an IRQ
> > every time a handler is established or disestablished on the given IRQ.
> > In this case, the IPL_AUDIO is probably established before the IPL_NET,
> > thus when the IPL_NET establish happens, iq_irq is already set to "max"
> > as in IPL_AUDIO.  That means the code that sets the priority to "min" is
> > skipped, as the continue statement will take effect.
> >
> > I guess each intrq object should have the lowest and highest IPL, and
> > the loop should only continue of both are the same.  The lowest IPL is
> > then set for the IPL priority, and the highest IPL is used to splraise()
> > once the IRQ actually hits and the handler is suppoed to run.
> >
> > Untested, I hope that someone has a look as well and to spot if I have
> > a mistake in my understanding.
> >
> > We have this issue in at least 4 files: agintc/ampintc for arm64/armv7.
> >
> > Patrick
> >
> > diff --git a/sys/arch/arm64/dev/ampintc.c b/sys/arch/arm64/dev/ampintc.c
> > index 2c4ce63a150..a1b28a20ea7 100644
> > --- a/sys/arch/arm64/dev/ampintc.c
> > +++ b/sys/arch/arm64/dev/ampintc.c
> > @@ -160,8 +160,8 @@ struct intrhand {
> >  
> >  struct intrq {
> >   TAILQ_HEAD(, intrhand) iq_list; /* handler list */
> > - int iq_irq; /* IRQ to mask while handling */
> > - int iq_levels; /* IPL_*'s this IRQ has */
> > + int iq_irq_max; /* IRQ to mask while handling */
> > + int iq_irq_min; /* lowest IRQ when shared */
> >   int iq_ist; /* share type */
> >  };
> >  
> > @@ -469,14 +469,16 @@ ampintc_calc_mask(void)
> >   min = ih->ih_ipl;
> >   }
> >  
> > - if (sc->sc_handler[irq].iq_irq == max) {
> > - continue;
> > - }
> > - sc->sc_handler[irq].iq_irq = max;
> > -
> >   if (max == IPL_NONE)
> >   min = IPL_NONE;
> >  
> > + if (sc->sc_handler[irq].iq_irq_max == max ||
> > +    sc->sc_handler[irq].iq_irq_min == min)
> > + continue;
>
> Shouldn't that be && instead of ||?

Yes, you are right, it should be &&.

> > +
> > + sc->sc_handler[irq].iq_irq_max = max;
> > + sc->sc_handler[irq].iq_irq_min = min;
> > +
> >   /* Enable interrupts at lower levels, clear -> enable */
> >   /* Set interrupt priority/enable */
> >   if (min != IPL_NONE) {
> > @@ -604,7 +606,7 @@ ampintc_route_irq(void *v, int enable, struct cpu_info *ci)
> >   bus_space_write_4(sc->sc_iot, sc->sc_d_ioh, ICD_ICRn(ih->ih_irq), 0);
> >   if (enable) {
> >   ampintc_set_priority(ih->ih_irq,
> > -    sc->sc_handler[ih->ih_irq].iq_irq);
> > +    sc->sc_handler[ih->ih_irq].iq_irq_min);
> >   ampintc_intr_enable(ih->ih_irq);
> >   }
> >  
> > @@ -646,7 +648,7 @@ ampintc_irq_handler(void *frame)
> >   if (irq >= sc->sc_nintr)
> >   return;
> >  
> > - pri = sc->sc_handler[irq].iq_irq;
> > + pri = sc->sc_handler[irq].iq_irq_max;
> >   s = ampintc_splraise(pri);
> >   TAILQ_FOREACH(ih, &sc->sc_handler[irq].iq_list, ih_list) {
> >  #ifdef MULTIPROCESSOR
> >
> >
>

Reply | Threaded
Open this post in threaded view
|

Re: arm64: bnxt(4) crash running pkg_add -u on mcbin single-shot

Carlos Cardenas
On Tue, Nov 20, 2018 at 07:57:00PM +0100, Patrick Wildt wrote:

> On Tue, Nov 20, 2018 at 04:44:56PM +0100, Mark Kettenis wrote:
> > > Date: Mon, 19 Nov 2018 20:35:44 +0100
> > > From: Patrick Wildt <[hidden email]>
> > >
> > > On Mon, Nov 19, 2018 at 08:16:31PM +0100, Patrick Wildt wrote:
> > > > On Mon, Nov 19, 2018 at 11:27:54AM -0700, Theo de Raadt wrote:
> > > > > > That means bnxt(4) and dwpcie(4) share the same interrupt line, but one
> > > > > > has IPL_AUDIO and the other one IPL_NET.  Since the highest IPL on a
> > > > > > pin counts, this line is now IPL_AUDIO (which is >IPL_VM).  A bnxt(4)
> > > > > > interrupt is now allowed to interrupt IPL_VM.
> > > > > >
> > > > > > I don't think there's a regression, I think it behaves as implemented.
> > > > > > The question that remains is: How should it behave if not like that?
> >
> > Admittedly the current INTx implementation in dwpcie(4) is a bit of a
> > hack.  The hardware has a register that signals which of the four
> > legacy PCI interrupts pin was triggered so the driver really should
> > expose itself as a secondary interrupt controller.  But proper support
> > for secondary interrupt controllers requires a bit of thought.  For
> > one thing we'd need a way to unblock interrupts on decondary interrupt
> > controllers in response to an splx(9) call.
> >
> > > > > When shared, it should operate at the lowest level not the highest.  That
> > > > > may cause other difficulties.
> > > >
> > > > Actually it seems like we do set the lowest level, not the highest,
> > > > sorry.  But I think there might be a bug.  I will have a look.
> > > >
> > >
> > > I hope I'm not wrong, but I think there's a bug in the calculation
> > > code.  The ampinctc_calc_mask() function sets the priority of an IRQ
> > > every time a handler is established or disestablished on the given IRQ.
> > > In this case, the IPL_AUDIO is probably established before the IPL_NET,
> > > thus when the IPL_NET establish happens, iq_irq is already set to "max"
> > > as in IPL_AUDIO.  That means the code that sets the priority to "min" is
> > > skipped, as the continue statement will take effect.
> > >
> > > I guess each intrq object should have the lowest and highest IPL, and
> > > the loop should only continue of both are the same.  The lowest IPL is
> > > then set for the IPL priority, and the highest IPL is used to splraise()
> > > once the IRQ actually hits and the handler is suppoed to run.
> > >
> > > Untested, I hope that someone has a look as well and to spot if I have
> > > a mistake in my understanding.
> > >
> > > We have this issue in at least 4 files: agintc/ampintc for arm64/armv7.
> > >
> > > Patrick
> > >
[snip]
> Yes, you are right, it should be &&.
[snip]

I've been running the updated patch and so far so good...

If y'all don't mind, I want to keep pounding on it for the next day or
two before giving a thumbs up.

Thanks.

+--+
Carlos

Reply | Threaded
Open this post in threaded view
|

Re: arm64: bnxt(4) crash running pkg_add -u on mcbin single-shot

Patrick Wildt-3
On Tue, Nov 20, 2018 at 07:56:31PM -0800, Carlos Cardenas wrote:

>
> I've been running the updated patch and so far so good...
>
> If y'all don't mind, I want to keep pounding on it for the next day or
> two before giving a thumbs up.
>
> Thanks.
>
> +--+
> Carlos
>

I had a look into the IRQ controller drivers used on arm64 and armv7
and it seems like the following diff would fix the panic and also
modify mvmpic(4) similarly.  I haven't touched omgpio(4) since we
do no use IRQs there.

diff --git a/sys/arch/arm/cortex/ampintc.c b/sys/arch/arm/cortex/ampintc.c
index bdfc14b621d..19d6eebdd2a 100644
--- a/sys/arch/arm/cortex/ampintc.c
+++ b/sys/arch/arm/cortex/ampintc.c
@@ -161,8 +161,8 @@ struct intrhand {
 
 struct intrq {
  TAILQ_HEAD(, intrhand) iq_list; /* handler list */
- int iq_irq; /* IRQ to mask while handling */
- int iq_levels; /* IPL_*'s this IRQ has */
+ int iq_irq_max; /* IRQ to mask while handling */
+ int iq_irq_min; /* lowest IRQ when shared */
  int iq_ist; /* share type */
 };
 
@@ -402,10 +402,12 @@ ampintc_calc_mask(void)
  min = ih->ih_ipl;
  }
 
- if (sc->sc_ampintc_handler[irq].iq_irq == max) {
+ if (sc->sc_ampintc_handler[irq].iq_irq_max == max &&
+    sc->sc_ampintc_handler[irq].iq_irq_min == min)
  continue;
- }
- sc->sc_ampintc_handler[irq].iq_irq = max;
+
+ sc->sc_ampintc_handler[irq].iq_irq_max = max;
+ sc->sc_ampintc_handler[irq].iq_irq_min = min;
 
  if (max == IPL_NONE)
  min = IPL_NONE;
@@ -538,7 +540,7 @@ ampintc_irq_handler(void *frame)
  if (irq >= sc->sc_nintr)
  return;
 
- pri = sc->sc_ampintc_handler[irq].iq_irq;
+ pri = sc->sc_ampintc_handler[irq].iq_irq_max;
  s = ampintc_splraise(pri);
  TAILQ_FOREACH(ih, &sc->sc_ampintc_handler[irq].iq_list, ih_list) {
 #ifdef MULTIPROCESSOR
diff --git a/sys/arch/arm64/dev/agintc.c b/sys/arch/arm64/dev/agintc.c
index 2d9afa9016d..d39530e85a0 100644
--- a/sys/arch/arm64/dev/agintc.c
+++ b/sys/arch/arm64/dev/agintc.c
@@ -171,8 +171,8 @@ struct intrhand {
 
 struct intrq {
  TAILQ_HEAD(, intrhand) iq_list; /* handler list */
- int iq_irq; /* IRQ to mask while handling */
- int iq_levels; /* IPL_*'s this IRQ has */
+ int iq_irq_max; /* IRQ to mask while handling */
+ int iq_irq_min; /* lowest IRQ when shared */
  int iq_ist; /* share type */
  int iq_route;
 };
@@ -746,13 +746,16 @@ agintc_calc_irq(struct agintc_softc *sc, int irq)
  min = ih->ih_ipl;
  }
 
- if (sc->sc_handler[irq].iq_irq == max)
- return;
- sc->sc_handler[irq].iq_irq = max;
-
  if (max == IPL_NONE)
  min = IPL_NONE;
 
+ if (sc->sc_handler[irq].iq_irq_max == max &&
+    sc->sc_handler[irq].iq_irq_min == min)
+ return;
+
+ sc->sc_handler[irq].iq_irq_max = max;
+ sc->sc_handler[irq].iq_irq_min = min;
+
 #ifdef DEBUG_AGINTC
  if (min != IPL_NONE)
  printf("irq %d to block at %d %d \n", irq, max, min );
@@ -828,7 +831,7 @@ agintc_route_irq(void *v, int enable, struct cpu_info *ci)
 
  if (enable) {
  agintc_set_priority(sc, ih->ih_irq,
-    sc->sc_handler[ih->ih_irq].iq_irq);
+    sc->sc_handler[ih->ih_irq].iq_irq_min);
  agintc_route(sc, ih->ih_irq, IRQ_ENABLE, ci);
  agintc_intr_enable(sc, ih->ih_irq);
  }
@@ -937,7 +940,7 @@ agintc_irq_handler(void *frame)
  return;
  }
 
- pri = sc->sc_handler[irq].iq_irq;
+ pri = sc->sc_handler[irq].iq_irq_max;
  s = agintc_splraise(pri);
  TAILQ_FOREACH(ih, &sc->sc_handler[irq].iq_list, ih_list) {
  agintc_run_handler(ih, frame, s);
diff --git a/sys/arch/arm64/dev/ampintc.c b/sys/arch/arm64/dev/ampintc.c
index 2c4ce63a150..de6ed057f95 100644
--- a/sys/arch/arm64/dev/ampintc.c
+++ b/sys/arch/arm64/dev/ampintc.c
@@ -160,8 +160,8 @@ struct intrhand {
 
 struct intrq {
  TAILQ_HEAD(, intrhand) iq_list; /* handler list */
- int iq_irq; /* IRQ to mask while handling */
- int iq_levels; /* IPL_*'s this IRQ has */
+ int iq_irq_max; /* IRQ to mask while handling */
+ int iq_irq_min; /* lowest IRQ when shared */
  int iq_ist; /* share type */
 };
 
@@ -469,14 +469,16 @@ ampintc_calc_mask(void)
  min = ih->ih_ipl;
  }
 
- if (sc->sc_handler[irq].iq_irq == max) {
- continue;
- }
- sc->sc_handler[irq].iq_irq = max;
-
  if (max == IPL_NONE)
  min = IPL_NONE;
 
+ if (sc->sc_handler[irq].iq_irq_max == max &&
+    sc->sc_handler[irq].iq_irq_min == min)
+ continue;
+
+ sc->sc_handler[irq].iq_irq_max = max;
+ sc->sc_handler[irq].iq_irq_min = min;
+
  /* Enable interrupts at lower levels, clear -> enable */
  /* Set interrupt priority/enable */
  if (min != IPL_NONE) {
@@ -604,7 +606,7 @@ ampintc_route_irq(void *v, int enable, struct cpu_info *ci)
  bus_space_write_4(sc->sc_iot, sc->sc_d_ioh, ICD_ICRn(ih->ih_irq), 0);
  if (enable) {
  ampintc_set_priority(ih->ih_irq,
-    sc->sc_handler[ih->ih_irq].iq_irq);
+    sc->sc_handler[ih->ih_irq].iq_irq_min);
  ampintc_intr_enable(ih->ih_irq);
  }
 
@@ -646,7 +648,7 @@ ampintc_irq_handler(void *frame)
  if (irq >= sc->sc_nintr)
  return;
 
- pri = sc->sc_handler[irq].iq_irq;
+ pri = sc->sc_handler[irq].iq_irq_max;
  s = ampintc_splraise(pri);
  TAILQ_FOREACH(ih, &sc->sc_handler[irq].iq_list, ih_list) {
 #ifdef MULTIPROCESSOR
diff --git a/sys/arch/armv7/marvell/mvmpic.c b/sys/arch/armv7/marvell/mvmpic.c
index 5d3b162c6bb..8c7c93d3f1f 100644
--- a/sys/arch/armv7/marvell/mvmpic.c
+++ b/sys/arch/armv7/marvell/mvmpic.c
@@ -195,8 +195,8 @@ mpic_calc_mask(struct mpic_softc *sc)
  if (max == IPL_NONE)
  min = IPL_NONE;
 
- if (sc->sc_ipl != max) {
- sc->sc_ipl = max;
+ if (sc->sc_ipl != min) {
+ sc->sc_ipl = min;
 
  if (sc->sc_ih != NULL)
  arm_intr_disestablish_fdt(sc->sc_ih);

Reply | Threaded
Open this post in threaded view
|

Re: arm64: bnxt(4) crash running pkg_add -u on mcbin single-shot

Carlos Cardenas
On Thu, Nov 22, 2018 at 03:05:59PM +0100, Patrick Wildt wrote:

> On Tue, Nov 20, 2018 at 07:56:31PM -0800, Carlos Cardenas wrote:
> >
> > I've been running the updated patch and so far so good...
> >
> > If y'all don't mind, I want to keep pounding on it for the next day or
> > two before giving a thumbs up.
> >
> > Thanks.
> >
> > +--+
> > Carlos
> >

Good news...the second patch to arm64/dev/ampintc.c worked very well and
handled the load testing (small pkts, jumbo pkts, rsync'ing the tree,
web traffic patterns, and basic iperf stressing).

>
> I had a look into the IRQ controller drivers used on arm64 and armv7
> and it seems like the following diff would fix the panic and also
> modify mvmpic(4) similarly.  I haven't touched omgpio(4) since we
> do no use IRQs there.

Sweet...I'll apply these and report back in a couple days if all goes
well so I can run the same set of tests.

+--+
Carlos

Reply | Threaded
Open this post in threaded view
|

Re: arm64: bnxt(4) crash running pkg_add -u on mcbin single-shot

Carlos Cardenas
On Thu, Nov 22, 2018 at 10:21:54AM -0800, Carlos Cardenas wrote:

> On Thu, Nov 22, 2018 at 03:05:59PM +0100, Patrick Wildt wrote:
> > On Tue, Nov 20, 2018 at 07:56:31PM -0800, Carlos Cardenas wrote:
> > >
> > > I've been running the updated patch and so far so good...
> > >
> > > If y'all don't mind, I want to keep pounding on it for the next day or
> > > two before giving a thumbs up.
> > >
> > > Thanks.
> > >
> > > +--+
> > > Carlos
> > >
>
> Good news...the second patch to arm64/dev/ampintc.c worked very well and
> handled the load testing (small pkts, jumbo pkts, rsync'ing the tree,
> web traffic patterns, and basic iperf stressing).
>
> >
> > I had a look into the IRQ controller drivers used on arm64 and armv7
> > and it seems like the following diff would fix the panic and also
> > modify mvmpic(4) similarly.  I haven't touched omgpio(4) since we
> > do no use IRQs there.
>
> Sweet...I'll apply these and report back in a couple days if all goes
> well so I can run the same set of tests.
>
> +--+
> Carlos

Alright...everything looks good with the latest patch.

ok ccardenas@

+--+
Carlos