filt_bpfrdetach uvm_fault after vmd vm was shutdown

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

filt_bpfrdetach uvm_fault after vmd vm was shutdown

Stuart Henderson
July 29 amd64 snap. I had just tested something in a vm (not very
common for me) and did "halt -p" in the guest. Immediately afterwards
I hit this:

uvm_fault(0xfffffd84031d4ef0, 0x8, 0, 1) -> e
kernel: page fault trap, code=0
Stopped at      filt_bpfrdetach+0x43:   movq    0x8(%rax),%rax

From ps /o it seems lldpd was involved - this would be using bpfs.
was running at startup. The bridge was created and vmd was started later after
boot (while lldpd was already running, if that matters).

If a step-by-step reproducer or more details are needed I'll try to figure
something out when I'm on a machine that is less annoying to fsck.

vm.conf like this,

: switch "uplink" {
:         interface bridge0
: }
:
: vm "open" {
:         disable
:         owner sthen
:         memory 1G
:         interface { switch "uplink" }
:         disk "/data/vmm/open.img"
: }

and lldpd just running with defaults (pkg_add lldpd, rcctl enable lldpd)



[...]
Guest EPTP = 0x50c8201e
vmm_alloc_vpid: allocated VPID/ASID 1
vmm_handle_cpuid: function 0x0a (arch. perf mon) not supported
vmx_handle_cr: mov to cr0 @ 100149e, data=0x80010031
vmx_handle_wrmsr: wrmsr exit, msr=0x277, discarding data written from guest=0x70106:0x70106
vmm_handle_cpuid: unsupported rax=0x40000100
vmx_handle_wrmsr: wrmsr exit, msr=0x8b, discarding data written from guest=0x0:0x0
vmx_handle_rdmsr: rdmsr exit, msr=0x8b, data returned to guest=0x0:0x0
vmx_handle_rdmsr: rdmsr exit, msr=0x17, data returned to guest=0x0:0x0
vmm_handle_cpuid: function 0x06 (thermal/power mgt) not supported
vmm_handle_cpuid: invalid cpuid input leaf 0x15, guest rip=0xffffffff816d3021 - resetting to 0xd
vmm_handle_cpuid: function 0x0a (arch. perf mon) not supported
vmm_free_vpid: freed VPID/ASID 1
uvm_fault(0xfffffd84031d4ef0, 0x8, 0, 1) -> e
kernel: page fault trap, code=0
Stopped at      filt_bpfrdetach+0x43:   movq    0x8(%rax),%rax
ddb{2}> tr
filt_bpfrdetach(fffffd8053d82028) at filt_bpfrdetach+0x43
knote_fdclose(ffff8000332231c0,a) at knote_fdclose+0x71
fdrelease(ffff8000332231c0,a) at fdrelease+0x88
syscall(ffff800033308d90) at syscall+0x389
Xsyscall(0,6,11651b1f0400,6,11653c5c3140,1164c835f640) at Xsyscall+0x128
end of kernel
end trace frame: 0x7f7ffffe8f20, count: -5
ddb{2}> ps /o
    TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
 241085  82884     55         0x3          0    3  c++
*299783  85942    720        0x10          0    2K lldpd
 207781  34043      0     0x14000      0x200    1  softnet
ddb{2}> sh reg
rdi               0xfffffd8053d82028
rsi                              0xa
rbp               0xffff800033308bf0
rbx                              0xa
rdx                       0x5ebff6ec
rcx                              0x2
rax                                0
r8                0xffffffff812e3a50    uvm_map_inentry_pc
r9                              0x16
r10                                0
r11               0x5849881d09945a01
r12               0xfffffd8053d82028
r13               0xfffffd83a0bfee38
r14               0xfffffd8053d82028
r15                                0
rip               0xffffffff818b9d13    filt_bpfrdetach+0x43
cs                               0x8
rflags                       0x10217    __ALIGN_SIZE+0xf217
rsp               0xffff800033308bc0
ss                              0x10
filt_bpfrdetach+0x43:   movq    0x8(%rax),%rax
ddb{2}> sh witness
No such command
ddb{2}> mach ddbcpu 1
Stopped at      x86_ipi_db+0x12:        leave
ddb{1}> tr
x86_ipi_db(ffff800022009ff0) at x86_ipi_db+0x12
x86_ipi_handler() at x86_ipi_handler+0x80
Xresume_lapic_ipi(9,ffff800022009ff0,ffff800022290278,0,0,ffff800022290350) at X
resume_lapic_ipi+0x23
_kernel_lock() at _kernel_lock+0xa9
timeout_del_barrier(ffff800022290350) at timeout_del_barrier+0xa2
msleep(ffff800000028040,ffff800000028060,20,ffffffff81ad080d,0) at msleep+0xf5
taskq_next_work(ffff800000028040,ffff8000222b6fe0) at taskq_next_work+0x38
taskq_thread(ffff800000028040) at taskq_thread+0x6f
end trace frame: 0x0, count: -8
ddb{1}> sh reg
rdi               0xffff800022009ff0
rsi                                0
rbp               0xffff8000222b6d60
rbx               0xffffffff81d18768    ipifunc+0x38
rdx                                0
rcx                              0x7
rax                       0xffffff7f
r8                                 0
r9                                 0
r10                                0
r11               0x8e93433a56188026
r12                              0x7
r13                                0
r14               0xffff800022009ff0
r15                                0
rip               0xffffffff81669b42    x86_ipi_db+0x12
cs                               0x8
rflags                         0x282
rsp               0xffff8000222b6d50
ss                              0x10
x86_ipi_db+0x12:        leave
ddb{1}> bo r
rebooting...





OpenBSD 6.5-current (GENERIC.MP) #156: Mon Jul 29 12:00:48 MDT 2019
    [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 17067200512 (16276MB)
avail mem = 16539762688 (15773MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xec400 (92 entries)
bios0: vendor Dell Inc. version "A12" date 05/11/2017
bios0: Dell Inc. PowerEdge T20
acpi0 at bios0: ACPI 5.0
acpi0: sleep states S0 S4 S5
acpi0: tables DSDT FACP APIC FPDT SLIC LPIT SSDT SSDT SSDT HPET SSDT MCFG SSDT ASF! DMAR
acpi0: wakeup devices UAR1(S4) RP01(S4) PXSX(S4) RP02(S4) PXSX(S4) PXSX(S4) RP05(S4) PXSX(S4) PXSX(S4) PXSX(S4) PXSX(S4) GLAN(S4) EHC1(S3) EHC2(S3) XHC_(S4) HDEF(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Xeon(R) CPU E3-1225 v3 @ 3.20GHz, 3392.70 MHz, 06-3c-03
cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Xeon(R) CPU E3-1225 v3 @ 3.20GHz, 3392.17 MHz, 06-3c-03
cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 4 (application processor)
cpu2: Intel(R) Xeon(R) CPU E3-1225 v3 @ 3.20GHz, 3392.17 MHz, 06-3c-03
cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu2: 256KB 64b/line 8-way L2 cache
cpu2: smt 0, core 2, package 0
cpu3 at mainbus0: apid 6 (application processor)
cpu3: Intel(R) Xeon(R) CPU E3-1225 v3 @ 3.20GHz, 3392.17 MHz, 06-3c-03
cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu3: 256KB 64b/line 8-way L2 cache
cpu3: smt 0, core 3, package 0
ioapic0 at mainbus0: apid 8 pa 0xfec00000, version 20, 24 pins
acpihpet0 at acpi0: 14318179 Hz
acpimcfg0 at acpi0
acpimcfg0: addr 0xf8000000, bus 0-63
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 2 (RP01)
acpiprt2 at acpi0: bus 3 (RP02)
acpiprt3 at acpi0: bus 5 (RP05)
acpiprt4 at acpi0: bus 1 (PEG0)
acpiprt5 at acpi0: bus -1 (PEG1)
acpiprt6 at acpi0: bus -1 (PEG2)
acpiec0 at acpi0: not present
acpicpu0 at acpi0: C2(200@148 mwait.1@0x33), C1(1000@1 mwait.1), PSS
acpicpu1 at acpi0: C2(200@148 mwait.1@0x33), C1(1000@1 mwait.1), PSS
acpicpu2 at acpi0: C2(200@148 mwait.1@0x33), C1(1000@1 mwait.1), PSS
acpicpu3 at acpi0: C2(200@148 mwait.1@0x33), C1(1000@1 mwait.1), PSS
acpitz0 at acpi0: critical temperature is 105 degC
acpitz1 at acpi0: critical temperature is 105 degC
acpipci0 at acpi0 PCI0: 0x00000000 0x00000011 0x00000001
acpicmos0 at acpi0
acpibtn0 at acpi0: PWRB
"PNP0C14" at acpi0 not configured
acpivideo0 at acpi0: GFX0
acpivout0 at acpivideo0: DD1F
cpu0: using VERW MDS workaround (except on vmm entry)
cpu0: Enhanced SpeedStep 3392 MHz: speeds: 3201, 3200, 3000, 2900, 2700, 2500, 2300, 2200, 2000, 1800, 1700, 1500, 1300, 1100, 1000, 800 MHz
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel Xeon E3-1200 v3 Host" rev 0x06
ppb0 at pci0 dev 1 function 0 "Intel Core 4G PCIE" rev 0x06: msi
pci1 at ppb0 bus 1
em0 at pci1 dev 0 function 0 "Intel 82572EI" rev 0x06: apic 8 int 16, address 00:15:17:8e:79:85
inteldrm0 at pci0 dev 2 function 0 "Intel HD Graphics P4600" rev 0x06
drm0 at inteldrm0
inteldrm0: msi
xhci0 at pci0 dev 20 function 0 "Intel 8 Series xHCI" rev 0x04: msi, xHCI 1.0
usb0 at xhci0: USB revision 3.0
uhub0 at usb0 configuration 1 interface 0 "Intel xHCI root hub" rev 3.00/1.00 addr 1
"Intel 8 Series MEI" rev 0x04 at pci0 dev 22 function 0 not configured
puc0 at pci0 dev 22 function 3 "Intel 8 Series KT" rev 0x04: ports: 16 com
com4 at puc0 port 0 apic 8 int 19: ns16550a, 16 byte fifo
com4: probed fifo depth: 0 bytes
em1 at pci0 dev 25 function 0 "Intel I217-LM" rev 0x04: msi, address f8:b1:56:ac:32:76
ehci0 at pci0 dev 26 function 0 "Intel 8 Series USB" rev 0x04: apic 8 int 16
usb1 at ehci0: USB revision 2.0
uhub1 at usb1 configuration 1 interface 0 "Intel EHCI root hub" rev 2.00/1.00 addr 1
azalia0 at pci0 dev 27 function 0 "Intel 8 Series HD Audio" rev 0x04: msi
azalia0: codecs: Realtek/0x0280
audio0 at azalia0
ppb1 at pci0 dev 28 function 0 "Intel 8 Series PCIE" rev 0xd4
pci2 at ppb1 bus 2
ppb2 at pci0 dev 28 function 1 "Intel 8 Series PCIE" rev 0xd4: msi
pci3 at ppb2 bus 3
ppb3 at pci3 dev 0 function 0 "TI XIO2001 PCIE-PCI" rev 0x00
pci4 at ppb3 bus 4
ppb4 at pci0 dev 28 function 4 "Intel 8 Series PCIE" rev 0xd4: msi
pci5 at ppb4 bus 5
nvme0 at pci5 dev 0 function 0 "Samsung SM961/PM961 NVMe" rev 0x00: msix, NVMe 1.2
nvme0: SAMSUNG MZVLW256HEHP-000L7, firmware 4L7QCXB7, serial S35ENX0J765205
scsibus1 at nvme0: 2 targets, initiator 0
sd0 at scsibus1 targ 1 lun 0: <NVMe, SAMSUNG MZVLW256, 4L7Q> SCSI4 0/direct fixed
sd0: 244198MB, 512 bytes/sector, 500118192 sectors
ehci1 at pci0 dev 29 function 0 "Intel 8 Series USB" rev 0x04: apic 8 int 23
usb2 at ehci1: USB revision 2.0
uhub2 at usb2 configuration 1 interface 0 "Intel EHCI root hub" rev 2.00/1.00 addr 1
pcib0 at pci0 dev 31 function 0 "Intel C226 LPC" rev 0x04
ahci0 at pci0 dev 31 function 2 "Intel 8 Series AHCI" rev 0x04: msi, AHCI 1.3
ahci0: port 0: 6.0Gb/s
scsibus2 at ahci0: 32 targets
sd1 at scsibus2 targ 0 lun 0: <ATA, Samsung SSD 850, EMT0> SCSI3 0/direct fixed naa.5002538d4086e2f8
sd1: 476940MB, 512 bytes/sector, 976773168 sectors, thin
ichiic0 at pci0 dev 31 function 3 "Intel 8 Series SMBus" rev 0x04: apic 8 int 18
iic0 at ichiic0
sdtemp0 at iic0 addr 0x18: se97
sdtemp1 at iic0 addr 0x19: mcp98243
sdtemp2 at iic0 addr 0x1a: se97
sdtemp3 at iic0 addr 0x1b: mcp98243
spdmem0 at iic0 addr 0x50: 4GB DDR3 SDRAM ECC PC3-12800 with thermal sensor
spdmem1 at iic0 addr 0x51: 4GB DDR3 SDRAM ECC PC3-12800 with thermal sensor
spdmem2 at iic0 addr 0x52: 4GB DDR3 SDRAM ECC PC3-12800 with thermal sensor
spdmem3 at iic0 addr 0x53: 4GB DDR3 SDRAM ECC PC3-12800 with thermal sensor
isa0 at pcib0
isadma0 at isa0
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
com0: console
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0: console keyboard
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
vmm0 at mainbus0: VMX/EPT
uhub3 at uhub0 port 3 configuration 1 interface 0 "GenesysLogic USB2.0 Hub" rev 2.00/92.24 addr 2
uhub4 at uhub0 port 4 configuration 1 interface 0 "Texas Instruments product 0x8142" rev 2.10/1.00 addr 3
uhub5 at uhub4 port 4 configuration 1 interface 0 "Texas Instruments product 0x8142" rev 2.10/1.00 addr 4
uaudio0 at uhub0 port 8 configuration 1 interface 1 "EDIROL UA-1EX" rev 1.10/1.00 addr 5
uaudio0: class v1, full-speed, sync, channels: 2 play, 2 rec, 0 ctls
audio1 at uaudio0
uhidev0 at uhub0 port 9 configuration 1 interface 0 "Lite-On Technology Corp. ThinkPad USB Keyboard with TrackPoint" rev 1.10/1.27 addr 6
uhidev0: iclass 3/1
ukbd0 at uhidev0: 8 variable keys, 6 key codes
wskbd1 at ukbd0 mux 1
uhidev1 at uhub0 port 9 configuration 1 interface 1 "Lite-On Technology Corp. ThinkPad USB Keyboard with TrackPoint" rev 1.10/1.27 addr 6
uhidev1: iclass 3/0, 4 report ids
ums0 at uhidev1 reportid 1: 3 buttons
wsmouse0 at ums0 mux 0
uhid0 at uhidev1 reportid 2: input=1, output=0, feature=0
uhid1 at uhidev1 reportid 3: input=3, output=1, feature=0
uhid2 at uhidev1 reportid 4: input=0, output=0, feature=4
uhub6 at uhub0 port 10 configuration 1 interface 0 "Genesys Logic USB2.0 Hub" rev 2.00/77.64 addr 7
uhub7 at uhub6 port 4 configuration 1 interface 0 "Genesys Logic USB2.0 Hub" rev 2.00/77.64 addr 8
umodem0 at uhub0 port 12 configuration 1 interface 0 "SoftIron, Inc. OverDrive 1000" rev 2.00/1.02 addr 9
umodem0: data interface 1, has no CM over data, has break
umodem0: status change notification available
ucom0 at umodem0
ugen0 at uhub0 port 12 configuration 1 "SoftIron, Inc. OverDrive 1000" rev 2.00/1.02 addr 9
uhub8 at uhub0 port 21 configuration 1 interface 0 "Texas Instruments product 0x8140" rev 3.00/1.00 addr 10
uhub9 at uhub8 port 4 configuration 1 interface 0 "Texas Instruments product 0x8140" rev 3.00/1.00 addr 11
uhub10 at uhub1 port 1 configuration 1 interface 0 "Intel Rate Matching Hub" rev 2.00/0.04 addr 2
uhub11 at uhub2 port 1 configuration 1 interface 0 "Intel Rate Matching Hub" rev 2.00/0.04 addr 2
vscsi0 at root
scsibus3 at vscsi0: 256 targets
softraid0 at root
scsibus4 at softraid0: 256 targets
root on sd1a (55d00535300500ff.a) swap on sd1b dump on sd1b
WARNING: / was not properly unmounted
inteldrm0: 1920x1200, 32bpp
wsdisplay0 at inteldrm0 mux 1: console (std, vt100 emulation), using wskbd0
wskbd1: connecting to wsdisplay0
wsdisplay0: screen 1-5 added (std, vt100 emulation)
wskbd1: disconnecting from wsdisplay0
wskbd1 detached
ukbd0 detached
uhidev0 detached
wsmouse0 detached
ums0 detached
uhid0 detached
uhid1 detached
uhid2 detached
uhidev1 detached
arp_rtrequest: bad gateway value: vlan5
uhidev0 at uhub0 port 9 configuration 1 interface 0 "Lite-On Technology Corp. ThinkPad USB Keyboard with TrackPoint" rev 1.10/1.27 addr 6
uhidev0: iclass 3/1
ukbd0 at uhidev0: 8 variable keys, 6 key codes
wskbd1 at ukbd0: console keyboard, using wsdisplay0
uhidev1 at uhub0 port 9 configuration 1 interface 1 "Lite-On Technology Corp. ThinkPad USB Keyboard with TrackPoint" rev 1.10/1.27 addr 6
uhidev1: iclass 3/0, 4 report ids
ums0 at uhidev1 reportid 1: 3 buttons
wsmouse0 at ums0 mux 0
uhid0 at uhidev1 reportid 2: input=1, output=0, feature=0
uhid1 at uhidev1 reportid 3: input=3, output=1, feature=0
uhid2 at uhidev1 reportid 4: input=0, output=0, feature=4

Reply | Threaded
Open this post in threaded view
|

Re: filt_bpfrdetach uvm_fault after vmd vm was shutdown

Alexandr Nedvedicky
Looks like it can be related to my commit from May:

    https://github.com/openbsd/src/commit/b50d0c1cf040666aed872208cd6f6ba609197b11#diff-7922ad1d2f6422aa72d4bacd1bf41909

I'll try to take a look. The steps to reproduce would be handy.

as the first aid I would give a try to apply a reverse patch
to commit above. The reverse patch is below.

hope it helps
regards
sashan

----8<-------8<-------8<-------8<-------8<------------8<----
diff --git b/sys/net/bpf.c a/sys/net/bpf.c
index 5f2a20d593e..6d9554ec502 100644
--- b/sys/net/bpf.c
+++ a/sys/net/bpf.c
@@ -1,4 +1,4 @@
-/* $OpenBSD: bpf.c,v 1.175 2019/05/18 12:59:32 sashan Exp $ */
+/* $OpenBSD: bpf.c,v 1.174 2019/04/25 18:24:39 anton Exp $ */
 /* $NetBSD: bpf.c,v 1.33 1997/02/21 23:59:35 thorpej Exp $ */
 
 /*
@@ -126,6 +126,13 @@ void bpf_resetd(struct bpf_d *);
 void bpf_prog_smr(void *);
 void bpf_d_smr(void *);
 
+/*
+ * Reference count access to descriptor buffers
+ */
+void bpf_get(struct bpf_d *);
+void bpf_put(struct bpf_d *);
+
+
 struct rwlock bpf_sysctl_lk = RWLOCK_INITIALIZER("bpfsz");
 
 int
@@ -320,11 +327,13 @@ bpf_detachd(struct bpf_d *d)
 
  d->bd_promisc = 0;
 
+ bpf_get(d);
  mtx_leave(&d->bd_mtx);
  NET_LOCK();
  error = ifpromisc(bp->bif_ifp, 0);
  NET_UNLOCK();
  mtx_enter(&d->bd_mtx);
+ bpf_put(d);
 
  if (error && !(error == EINVAL || error == ENODEV ||
     error == ENXIO))
@@ -373,6 +382,7 @@ bpfopen(dev_t dev, int flag, int mode, struct proc *p)
  if (flag & FNONBLOCK)
  bd->bd_rtout = -1;
 
+ bpf_get(bd);
  LIST_INSERT_HEAD(&bpf_d_list, bd, bd_list);
 
  return (0);
@@ -393,13 +403,7 @@ bpfclose(dev_t dev, int flag, int mode, struct proc *p)
  bpf_wakeup(d);
  LIST_REMOVE(d, bd_list);
  mtx_leave(&d->bd_mtx);
-
- /*
- * Wait for the task to finish here, before proceeding to garbage
- * collection.
- */
- taskq_barrier(systq);
- smr_call(&d->bd_smr, bpf_d_smr, d);
+ bpf_put(d);
 
  return (0);
 }
@@ -433,6 +437,7 @@ bpfread(dev_t dev, struct uio *uio, int ioflag)
  if (d->bd_bif == NULL)
  return (ENXIO);
 
+ bpf_get(d);
  mtx_enter(&d->bd_mtx);
 
  /*
@@ -538,6 +543,7 @@ bpfread(dev_t dev, struct uio *uio, int ioflag)
  d->bd_in_uiomove = 0;
 out:
  mtx_leave(&d->bd_mtx);
+ bpf_put(d);
 
  return (error);
 }
@@ -556,7 +562,9 @@ bpf_wakeup(struct bpf_d *d)
  * by the KERNEL_LOCK() we have to delay the wakeup to
  * another context to keep the hot path KERNEL_LOCK()-free.
  */
- task_add(systq, &d->bd_wake_task);
+ bpf_get(d);
+ if (!task_add(systq, &d->bd_wake_task))
+ bpf_put(d);
 }
 
 void
@@ -571,6 +579,7 @@ bpf_wakeup_cb(void *xd)
  csignal(d->bd_pgid, d->bd_sig, d->bd_siguid, d->bd_sigeuid);
 
  selwakeup(&d->bd_sel);
+ bpf_put(d);
 }
 
 int
@@ -588,6 +597,7 @@ bpfwrite(dev_t dev, struct uio *uio, int ioflag)
  if (d->bd_bif == NULL)
  return (ENXIO);
 
+ bpf_get(d);
  ifp = d->bd_bif->bif_ifp;
 
  if (ifp == NULL || (ifp->if_flags & IFF_UP) == 0) {
@@ -621,6 +631,7 @@ bpfwrite(dev_t dev, struct uio *uio, int ioflag)
  NET_UNLOCK();
 
 out:
+ bpf_put(d);
  return (error);
 }
 
@@ -696,6 +707,8 @@ bpfioctl(dev_t dev, u_long cmd, caddr_t addr, int flag, struct proc *p)
  }
  }
 
+ bpf_get(d);
+
  switch (cmd) {
  default:
  error = EINVAL;
@@ -993,6 +1006,7 @@ bpfioctl(dev_t dev, u_long cmd, caddr_t addr, int flag, struct proc *p)
  break;
  }
 
+ bpf_put(d);
  return (error);
 }
 
@@ -1180,6 +1194,7 @@ bpfkqfilter(dev_t dev, struct knote *kn)
  return (EINVAL);
  }
 
+ bpf_get(d);
  kn->kn_hook = d;
  SLIST_INSERT_HEAD(klist, kn, kn_selnext);
 
@@ -1199,6 +1214,7 @@ filt_bpfrdetach(struct knote *kn)
  KERNEL_ASSERT_LOCKED();
 
  SLIST_REMOVE(&d->bd_sel.si_note, kn, knote, kn_selnext);
+ bpf_put(d);
 }
 
 int
@@ -1591,6 +1607,25 @@ bpf_d_smr(void *smr)
  free(bd, M_DEVBUF, sizeof(*bd));
 }
 
+void
+bpf_get(struct bpf_d *bd)
+{
+ atomic_inc_int(&bd->bd_ref);
+}
+
+/*
+ * Free buffers currently in use by a descriptor
+ * when the reference count drops to zero.
+ */
+void
+bpf_put(struct bpf_d *bd)
+{
+ if (atomic_dec_int_nv(&bd->bd_ref) > 0)
+ return;
+
+ smr_call(&bd->bd_smr, bpf_d_smr, bd);
+}
+
 void *
 bpfsattach(caddr_t *bpfp, const char *name, u_int dlt, u_int hdrlen)
 {
diff --git b/sys/net/bpfdesc.h a/sys/net/bpfdesc.h
index 130b91c1d9f..de8f6f3e440 100644
--- b/sys/net/bpfdesc.h
+++ a/sys/net/bpfdesc.h
@@ -1,4 +1,4 @@
-/* $OpenBSD: bpfdesc.h,v 1.38 2019/05/18 12:59:32 sashan Exp $ */
+/* $OpenBSD: bpfdesc.h,v 1.37 2019/04/15 21:55:08 sashan Exp $ */
 /* $NetBSD: bpfdesc.h,v 1.11 1995/09/27 18:30:42 thorpej Exp $ */
 
 /*
@@ -93,6 +93,7 @@ struct bpf_d {
  pid_t bd_pgid; /* process or group id for signal */
  uid_t bd_siguid; /* uid for process that set pgid */
  uid_t bd_sigeuid; /* euid for process that set pgid */
+ u_int bd_ref; /* reference count */
  struct selinfo bd_sel; /* bsd select info */
  int bd_unit; /* logical unit number */
  LIST_ENTRY(bpf_d) bd_list; /* descriptor list */

Reply | Threaded
Open this post in threaded view
|

Re: filt_bpfrdetach uvm_fault after vmd vm was shutdown

Visa Hankala-2
On Wed, Jul 31, 2019 at 08:22:55PM +0200, Alexandr Nedvedicky wrote:
> Looks like it can be related to my commit from May:
>
>     https://github.com/openbsd/src/commit/b50d0c1cf040666aed872208cd6f6ba609197b11#diff-7922ad1d2f6422aa72d4bacd1bf41909
>
> I'll try to take a look. The steps to reproduce would be handy.
>
> as the first aid I would give a try to apply a reverse patch
> to commit above. The reverse patch is below.

Wait, the cause is not understood yet. It looks that the crash happened
because of a NULL pointer dereference in the loop in SLIST_REMOVE(),
the element kn was not on the list, or the memory was corrupted.

The system should detach event filters before the object is destroyed.
That should not need reference counting on the level of the object.

Reply | Threaded
Open this post in threaded view
|

Re: filt_bpfrdetach uvm_fault after vmd vm was shutdown

Alexandr Nedvedicky
On Thu, Aug 01, 2019 at 04:20:16AM +0000, Visa Hankala wrote:

> On Wed, Jul 31, 2019 at 08:22:55PM +0200, Alexandr Nedvedicky wrote:
> > Looks like it can be related to my commit from May:
> >
> >     https://github.com/openbsd/src/commit/b50d0c1cf040666aed872208cd6f6ba609197b11#diff-7922ad1d2f6422aa72d4bacd1bf41909
> >
> > I'll try to take a look. The steps to reproduce would be handy.
> >
> > as the first aid I would give a try to apply a reverse patch
> > to commit above. The reverse patch is below.
>
> Wait, the cause is not understood yet. It looks that the crash happened
> because of a NULL pointer dereference in the loop in SLIST_REMOVE(),
> the element kn was not on the list, or the memory was corrupted.
>
> The system should detach event filters before the object is destroyed.
> That should not need reference counting on the level of the object.

    lldpd uses privilege separation and is linked with libevent,
    which uses kqueue(2). The crash also does not make much sense
    to me so far, because knote_fdclose() gets called before
    closef(), which in turn calls bpfclose().

    I'd like to better understand what happens at kernel side, when
    /dev/bpf file descriptor is passed from privileged process to
    unprivileged child. That's something where I'd like to look at
    once will be back home tonight.

regards
sashan