NET_LOCK() pflow panic

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

NET_LOCK() pflow panic

Alexander Bluhm
Hi,

This crash happened during regress/sys/net/pflow on my regression test
machine:

panic: rw_enter: netlock locking against myself
Stopped at      Debugger+0x7:   leave
   TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
*291613  71462      0         0x2          0    1  ifconfig
 96946  67690      0     0x14000      0x200    0  zerothread
Debugger(d09facbd,f57a49e8,d09d242c,f57a49e8,0) at Debugger+0x7
panic(d09d242c,d09dc32f,f57a4a2c,d0504491,d0b7b1ec) at panic+0x71
rw_enter(d0b56f38,1,f57a4a7c,d03be305,d0b55fe4) at rw_enter+0x1b4
rw_enter_write(d0b56f38,2,2,10,0) at rw_enter_write+0x3c
socreate(2,f57a4ce8,2,0,f57a4c04) at socreate+0x8f
pflowioctl(d3e18800,802069fd,f57a4e74,0,d3e18800) at pflowioctl+0x5f3
in_ioctl(802069fd,f57a4e74,d3e18800,1,d3e18800) at in_ioctl+0xf9
ifioctl(d8bfce80,802069fd,f57a4e74,d8883b60,d87a9604) at ifioctl+0x168
soo_ioctl(d884dbec,802069fd,f57a4e74,d8883b60,0) at soo_ioctl+0x21c
sys_ioctl(d8883b60,f57a4f5c,f57a4f7c,0,f57a4fa8) at sys_ioctl+0x19f
syscall() at syscall+0x250
--- syscall (number -2110258080) ---
0x6:

NET_LOCK() is taken in soo_ioctl()
        NET_LOCK(s);
        error = ((*so->so_proto->pr_usrreq)(so, PRU_CONTROL,
            (struct mbuf *)cmd, (struct mbuf *)data, (struct mbuf *)NULL, p));
        NET_UNLOCK(s);
and in socreate().

bluhm

Reply | Threaded
Open this post in threaded view
|

Re: NET_LOCK() pflow panic

Martin Pieuchot
On 20/12/16(Tue) 14:50, Alexander Bluhm wrote:

> Hi,
>
> This crash happened during regress/sys/net/pflow on my regression test
> machine:
>
> panic: rw_enter: netlock locking against myself
> Stopped at      Debugger+0x7:   leave
>    TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
> *291613  71462      0         0x2          0    1  ifconfig
>  96946  67690      0     0x14000      0x200    0  zerothread
> Debugger(d09facbd,f57a49e8,d09d242c,f57a49e8,0) at Debugger+0x7
> panic(d09d242c,d09dc32f,f57a4a2c,d0504491,d0b7b1ec) at panic+0x71
> rw_enter(d0b56f38,1,f57a4a7c,d03be305,d0b55fe4) at rw_enter+0x1b4
> rw_enter_write(d0b56f38,2,2,10,0) at rw_enter_write+0x3c
> socreate(2,f57a4ce8,2,0,f57a4c04) at socreate+0x8f
> pflowioctl(d3e18800,802069fd,f57a4e74,0,d3e18800) at pflowioctl+0x5f3
> in_ioctl(802069fd,f57a4e74,d3e18800,1,d3e18800) at in_ioctl+0xf9
> ifioctl(d8bfce80,802069fd,f57a4e74,d8883b60,d87a9604) at ifioctl+0x168
> soo_ioctl(d884dbec,802069fd,f57a4e74,d8883b60,0) at soo_ioctl+0x21c
> sys_ioctl(d8883b60,f57a4f5c,f57a4f7c,0,f57a4fa8) at sys_ioctl+0x19f
> syscall() at syscall+0x250
> --- syscall (number -2110258080) ---
> 0x6:
>
> NET_LOCK() is taken in soo_ioctl()
>         NET_LOCK(s);
>         error = ((*so->so_proto->pr_usrreq)(so, PRU_CONTROL,
>             (struct mbuf *)cmd, (struct mbuf *)data, (struct mbuf *)NULL, p));
>         NET_UNLOCK(s);
> and in socreate().

I don't have a solution for the moment and I want to be sure we know all
recursions before trying to write a fix.  So here's a diff that mark the
recursions with a XXXSMP like in the NFS case.

It is safe since we're still running everything under KERNEL_LOCK().

ok?

Index: net/if_pflow.c
===================================================================
RCS file: /cvs/src/sys/net/if_pflow.c,v
retrieving revision 1.62
diff -u -p -r1.62 if_pflow.c
--- net/if_pflow.c 4 Oct 2016 13:54:32 -0000 1.62
+++ net/if_pflow.c 20 Dec 2016 14:23:32 -0000
@@ -267,7 +267,10 @@ pflow_clone_destroy(struct ifnet *ifp)
  pflow_flush(sc);
  m_freem(sc->send_nam);
  if (sc->so != NULL) {
+ /* XXXSMP breaks atomicity */
+ rw_exit_write(&netlock);
  error = soclose(sc->so);
+ rw_enter_write(&netlock);
  sc->so = NULL;
  }
  if (sc->sc_flowdst != NULL)
@@ -375,6 +378,8 @@ pflowioctl(struct ifnet *ifp, u_long cmd
  }
  }
 
+ /* XXXSMP breaks atomicity */
+ rw_exit_write(&netlock);
  s = splnet();
  pflow_flush(sc);
 
@@ -530,6 +535,7 @@ pflowioctl(struct ifnet *ifp, u_long cmd
  } else
  ifp->if_flags &= ~IFF_RUNNING;
 
+ rw_enter_write(&netlock);
  break;
 
  default:

Reply | Threaded
Open this post in threaded view
|

Re: NET_LOCK() pflow panic

Alexander Bluhm
On Tue, Dec 20, 2016 at 03:28:38PM +0100, Martin Pieuchot wrote:
> I don't have a solution for the moment and I want to be sure we know all

Me neither.  That's why I did not send a diff, but only the stack trace.

> recursions before trying to write a fix.  So here's a diff that mark the
> recursions with a XXXSMP like in the NFS case.

I think this approach is fine.

> ok?

You missed the returns in pflowioctl() between rw_exit_write() and
rw_enter_write().  I would suggest that you follow the splnet() and
put a rw_enter_write() after each splx().

bluhm

>
> Index: net/if_pflow.c
> ===================================================================
> RCS file: /cvs/src/sys/net/if_pflow.c,v
> retrieving revision 1.62
> diff -u -p -r1.62 if_pflow.c
> --- net/if_pflow.c 4 Oct 2016 13:54:32 -0000 1.62
> +++ net/if_pflow.c 20 Dec 2016 14:23:32 -0000
> @@ -267,7 +267,10 @@ pflow_clone_destroy(struct ifnet *ifp)
>   pflow_flush(sc);
>   m_freem(sc->send_nam);
>   if (sc->so != NULL) {
> + /* XXXSMP breaks atomicity */
> + rw_exit_write(&netlock);
>   error = soclose(sc->so);
> + rw_enter_write(&netlock);
>   sc->so = NULL;
>   }
>   if (sc->sc_flowdst != NULL)
> @@ -375,6 +378,8 @@ pflowioctl(struct ifnet *ifp, u_long cmd
>   }
>   }
>  
> + /* XXXSMP breaks atomicity */
> + rw_exit_write(&netlock);
>   s = splnet();
>   pflow_flush(sc);
>  
> @@ -530,6 +535,7 @@ pflowioctl(struct ifnet *ifp, u_long cmd
>   } else
>   ifp->if_flags &= ~IFF_RUNNING;
>  
> + rw_enter_write(&netlock);
>   break;
>  
>   default:

Reply | Threaded
Open this post in threaded view
|

Re: NET_LOCK() pflow panic

Darren Tucker
In reply to this post by Martin Pieuchot
Hi.

On Tue, Dec 20, 2016 at 03:28:38PM +0100, Martin Pieuchot wrote:
[...]
> I don't have a solution for the moment and I want to be sure we know all
> recursions before trying to write a fix.  So here's a diff that mark the
> recursions with a XXXSMP like in the NFS case.

I think I have another case of this.  (Note that it's a single processor
VM, not SMP).  So far it's reproduceable 2/2 so if you need any
additional info or testing please let me know.

panic: rw_enter: netlock locking against myself
Stopped at      Debugger+0x9:   leave
   TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
*389277  87361    500         0x3          0    0  ssh
Debugger() at Debugger+0x9
panic() at panic+0xfe
rw_enter() at rw_enter+0x1c1
sosend() at sosend+0x114

Full dmesg and trace below.

Copyright (c) 1982, 1986, 1989, 1991, 1993
        The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2016 OpenBSD. All rights reserved.  https://www.OpenBSD.org

OpenBSD 6.0-current (GENERIC) #61: Tue Dec 20 15:30:01 MST 2016
    [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC
real mem = 2130698240 (2031MB)
avail mem = 2061586432 (1966MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xf64e0 (10 entries)
bios0: vendor Bochs version "Bochs" date 01/01/2011
bios0: QEMU Standard PC (i440FX + PIIX, 1996)
acpi0 at bios0: rev 0
acpi0: sleep states S3 S4 S5
acpi0: tables DSDT FACP SSDT APIC HPET
acpi0: wakeup devices
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: QEMU Virtual CPU version 2.5+, 2400.57 MHz
cpu0: FPU,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,VMX,CX16,x2APIC,HV,NXE,LONG,LAHF
cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache
cpu0: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 999MHz
ioapic0 at mainbus0: apid 0 pa 0xfec00000, version 11, 24 pins
acpihpet0 at acpi0: 100000000 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpicpu0 at acpi0: C1(@1 halt!)
"ACPI0006" at acpi0 not configured
"PNP0303" at acpi0 not configured
"PNP0F13" at acpi0 not configured
"PNP0700" at acpi0 not configured
"PNP0501" at acpi0 not configured
"PNP0A06" at acpi0 not configured
"PNP0A06" at acpi0 not configured
"PNP0A06" at acpi0 not configured
pvbus0 at mainbus0: KVM
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel 82441FX" rev 0x02
pcib0 at pci0 dev 1 function 0 "Intel 82371SB ISA" rev 0x00
pciide0 at pci0 dev 1 function 1 "Intel 82371SB IDE" rev 0x00: DMA, channel 0 wired to compatibility, channel 1 wired to compatibility
pciide0: channel 0 disabled (no drives)
atapiscsi0 at pciide0 channel 1 drive 0
scsibus1 at atapiscsi0: 2 targets
cd0 at scsibus1 targ 0 lun 0: <QEMU, QEMU DVD-ROM, 2.5+> ATAPI 5/cdrom removable
cd0(pciide0:1:0): using PIO mode 4, DMA mode 2
piixpm0 at pci0 dev 1 function 3 "Intel 82371AB Power" rev 0x03: apic 0 int 9
iic0 at piixpm0
vga1 at pci0 dev 2 function 0 "Cirrus Logic CL-GD5446" rev 0x00
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
virtio0 at pci0 dev 3 function 0 "Qumranet Virtio RNG" rev 0x00
viornd0 at virtio0
virtio0: apic 0 int 11
virtio1 at pci0 dev 4 function 0 "Qumranet Virtio Network" rev 0x00
vio0 at virtio1: address 52:54:00:f6:02:ea
virtio1: msix shared
virtio2 at pci0 dev 5 function 0 "Qumranet Virtio Storage" rev 0x00
vioblk0 at virtio2
scsibus2 at vioblk0: 2 targets
sd0 at scsibus2 targ 0 lun 0: <VirtIO, Block Device, > SCSI3 0/direct fixed
sd0: 16384MB, 512 bytes/sector, 33554432 sectors
virtio2: msix shared
virtio3 at pci0 dev 6 function 0 "Qumranet Virtio Memory" rev 0x00
viomb0 at virtio3
virtio3: apic 0 int 10
virtio4 at pci0 dev 7 function 0 "Qumranet Virtio Storage" rev 0x00
vioblk1 at virtio4
scsibus3 at vioblk1: 2 targets
sd1 at scsibus3 targ 0 lun 0: <VirtIO, Block Device, > SCSI3 0/direct fixed
sd1: 16384MB, 512 bytes/sector, 33554432 sectors
virtio4: msix shared
isa0 at pcib0
isadma0 at isa0
fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
fd0 at fdc0 drive 1: density unknown
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
com0: console
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pms0 at pckbc0 (aux slot)
wsmouse0 at pms0 mux 0
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
vmm0 at mainbus0: VMX/EPT
vscsi0 at root
scsibus4 at vscsi0: 256 targets
softraid0 at root
scsibus5 at softraid0: 256 targets
root on sd0a (1afc9f32ece695a9.a) swap on sd0b dump on sd0b
Automatic boot in progress: starting file system checks.
/dev/rsd0a: file system is clean; not checking
/dev/rsd1a: file system is clean; not checking
setting tty flags
pf enabled
ddb.console: 0 -> 1
kern.splassert: 1 -> 2
kern.bufcachepercent: 20 -> 50
starting network
add net default: gateway 2001:44b8:3110:fb01::254
reordering libraries: done.
starting early daemons: syslogd pflogd ntpd.
starting RPC daemons: portmap mountd(failed) nfsd(failed).
savecore: no core dump (no dumpdev)
checking quotas: done.
clearing /tmp
starting pre-securelevel daemons:.
kern.securelevel: 0 -> 1
creating runtime link editor directory cache.
preserving editor files.
starting network daemons: sshd snmpd smtpd.
starting local daemons:.
starting local daemons: cron.
Thu Dec 22 11:20:14 AEDT 2016

OpenBSD/amd64 (quoll.dtucker.net) (tty00)

login: panic: rw_enter: netlock locking against myself
Stopped at      Debugger+0x9:   leave
   TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
*389277  87361    500         0x3          0    0  ssh
Debugger() at Debugger+0x9
panic() at panic+0xfe
rw_enter() at rw_enter+0x1c1
sosend() at sosend+0x114
nfs_send() at nfs_send+0x60
nfs_request() at nfs_request+0x408
nfs_lookup() at nfs_lookup+0x28c
VOP_LOOKUP() at VOP_LOOKUP+0x39
vfs_lookup() at vfs_lookup+0x2fe
namei() at namei+0x346
unp_connect() at unp_connect+0xd4
uipc_usrreq() at uipc_usrreq+0x26f
soconnect() at soconnect+0xb1
sys_connect() at sys_connect+0x1c2
end trace frame: 0xffff8000213a4f10, count: 0
https://www.openbsd.org/ddb.html describes the minimum info required in bug
reports.  Insufficient info makes it difficult to find and fix bugs.
ddb> trace
Debugger() at Debugger+0x9
panic() at panic+0xfe
rw_enter() at rw_enter+0x1c1
sosend() at sosend+0x114
nfs_send() at nfs_send+0x60
nfs_request() at nfs_request+0x408
nfs_lookup() at nfs_lookup+0x28c
VOP_LOOKUP() at VOP_LOOKUP+0x39
vfs_lookup() at vfs_lookup+0x2fe
namei() at namei+0x346
unp_connect() at unp_connect+0xd4
uipc_usrreq() at uipc_usrreq+0x26f
soconnect() at soconnect+0xb1
sys_connect() at sys_connect+0x1c2
syscall() at syscall+0x197
--- syscall (number 98) ---
end of kernel
end trace frame: 0x829b7dafa00, count: -15
0x82a2c90b18a:
ddb> ps
   PID   PPID   PGRP    UID  S       FLAGS  WAIT          COMMAND
*87361  80697  80697    500  7         0x3                ssh
 80697  55791  80697    500  3    0x100083  piperd        cvs
 55791  52204  55791    500  3        0x83  wait          bash
 52204  89068  89068    500  3        0x90  select        sshd
 89068  27605  89068      0  3        0x92  poll          sshd
 22216      1  22216      0  3    0x100083  ttyin         getty
 16520      1  16520      0  3    0x100083  ttyin         getty
 93531      1  93531      0  3    0x100083  ttyin         getty
 72152      1  72152      0  3    0x100083  ttyin         getty
 27091      1  27091      0  3    0x100083  ttyin         getty
 18433      1  18433      0  3    0x100083  ttyin         getty
 26898      1  26898      0  3    0x100098  poll          cron
 28803  15702  15702     95  3    0x100092  kqread        smtpd
 64391  15702  15702    103  3    0x100092  kqread        smtpd
 80636  15702  15702     95  3    0x100092  kqread        smtpd
 88749  15702  15702     95  3    0x100092  kqread        smtpd
 70061  15702  15702     95  3    0x100092  kqread        smtpd
 61726  15702  15702     95  3    0x100092  kqread        smtpd
 15702      1  15702      0  3    0x100080  kqread        smtpd
 47987      1  47987      0  3        0x80  kqread        snmpd
 74393      1  74393     91  3        0x92  kqread        snmpd
 36435      1  36435     91  3        0x92  kqread        snmpd
 27605      1  27605      0  3        0x80  select        sshd
 52936      0      0      0  3     0x14280  nfsidl        nfsio
 19844      0      0      0  3     0x14280  nfsidl        nfsio
  4638      0      0      0  3     0x14280  nfsidl        nfsio
 90711      0      0      0  3     0x14280  nfsidl        nfsio
  1626      1   1626     28  3    0x100090  poll          portmap
 75964      1  75964      0  3    0x100080  poll          ntpd
 44586  83081  44586     83  3    0x100092  poll          ntpd
 83081      1  83081     83  3    0x100092  poll          ntpd
  1140  77396  77396     74  3    0x100090  bpf           pflogd
 77396      1  77396      0  3        0x80  netio         pflogd
 62826  36676  36676     73  3    0x100090  kqread        syslogd
 36676      1  36676      0  3    0x100082  netio         syslogd
 39385      1  39385      0  3        0x80  mfsidl        mount_mfs
 65342      0      0      0  3     0x14200  pgzero        zerothread
 54659      0      0      0  3     0x14200  aiodoned      aiodoned
 66065      0      0      0  3     0x14200  syncer        update
 34231      0      0      0  3     0x14200  cleaner       cleaner
 12102      0      0      0  3     0x14200  reaper        reaper
 81268      0      0      0  3     0x14200  pgdaemon      pagedaemon
 85494      0      0      0  3     0x14200  bored         crynlk
 71588      0      0      0  3     0x14200  bored         crypto
 83246      0      0      0  3     0x14200  pftm          pfpurge
 60845      0      0      0  3     0x14200  bored         viomb
 96989      0      0      0  3  0x40014200  acpi0         acpi0
 48057      0      0      0  3     0x14200  bored         softnet
 10475      0      0      0  3     0x14200  bored         systqmp
 85279      0      0      0  3     0x14200  bored         systq
 33036      0      0      0  3  0x40014200  bored         softclock
 66216      0      0      0  3  0x40014200                idle0
     1      0      1      0  3        0x82  wait          init
     0     -1      0      0  3     0x10200  scheduler     swapper
ddb>
--
Darren Tucker (dtucker at zip.com.au)
GPG key 11EAA6FA / A86E 3E07 5B19 5880 E860  37F4 9357 ECEF 11EA A6FA (new)
    Good judgement comes with experience. Unfortunately, the experience
usually comes from bad judgement.

Reply | Threaded
Open this post in threaded view
|

Re: NET_LOCK() pflow panic

Alexander Bluhm
On Thu, Dec 22, 2016 at 11:32:26AM +1100, Darren Tucker wrote:
> > I don't have a solution for the moment and I want to be sure we know all
> > recursions before trying to write a fix.  So here's a diff that mark the
> > recursions with a XXXSMP like in the NFS case.

Looks like visa@'s crash.  I have already OKed his fix.

bluhm

> login: panic: rw_enter: netlock locking against myself
> Stopped at      Debugger+0x9:   leave
>    TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
> *389277  87361    500         0x3          0    0  ssh
> Debugger() at Debugger+0x9
> panic() at panic+0xfe
> rw_enter() at rw_enter+0x1c1
> sosend() at sosend+0x114
> nfs_send() at nfs_send+0x60
> nfs_request() at nfs_request+0x408
> nfs_lookup() at nfs_lookup+0x28c
> VOP_LOOKUP() at VOP_LOOKUP+0x39
> vfs_lookup() at vfs_lookup+0x2fe
> namei() at namei+0x346
> unp_connect() at unp_connect+0xd4
> uipc_usrreq() at uipc_usrreq+0x26f
> soconnect() at soconnect+0xb1
> sys_connect() at sys_connect+0x1c2

Reply | Threaded
Open this post in threaded view
|

Re: NET_LOCK() pflow panic

Darren Tucker
On Thu, Dec 22, 2016 at 11:41 AM, Alexander Bluhm
<[hidden email]> wrote:
> On Thu, Dec 22, 2016 at 11:32:26AM +1100, Darren Tucker wrote:
>> > I don't have a solution for the moment and I want to be sure we know all
>> > recursions before trying to write a fix.  So here's a diff that mark the
>> > recursions with a XXXSMP like in the NFS case.
>
> Looks like visa@'s crash.  I have already OKed his fix.

Thanks, I'll give that diff a go.  (The "XXXSMP" comment mislead me
since my problem was observed on a single processor machine.)

--
Darren Tucker (dtucker at zip.com.au)
GPG key 11EAA6FA / A86E 3E07 5B19 5880 E860  37F4 9357 ECEF 11EA A6FA (new)
    Good judgement comes with experience. Unfortunately, the experience
usually comes from bad judgement.