amd64: stuck in netlock

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

amd64: stuck in netlock

Artturi Alm
Hi,

>Synopsis: stuck in netlock
>Category: amd64
>Environment:
        System      : OpenBSD 6.2
        Details     : OpenBSD 6.2-current (GENERIC.MP) #333: Sun Jan  7 09:13:00 MST 2018
                         [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.MP

        Architecture: OpenBSD.amd64
        Machine     : amd64
>Description:
        processes getting stuck w/STATE=netlock, kill has no effect.
>How-To-Repeat:
        using the desktop normally, until trying to restart chrome ends
        up failing.
        I've had this happen to me atleast twice in the last few of weeks.
        At first time i noticed how trying to launch chrome did lock up
        all the other processes in netlock, and "pkill chrome" did allow
        the system to recover, i was unable to figure out what was wrong
        and rebooting did make everything work again, while ie.
        removing ~/.cache & ~/.config did not.

        long before running the "ps cl" below, i had already killed all
        the xterm-windows those processes were in. cwm(1) was unable to
        kill some of those, but xkill did not.

        after exiting X w/ctrl+alt+backspace(iirc?) i didn't get back to
        $-prompt, and ^T did show xauth stuck in netlock..
        i guess it's obvious where it was heading; so i got pics of
        "# reboot -nq" failing because stuck in the fckng netlock -_-

        i do have ddb.{panic,console,log}=1, but
        "# sysctl ddb.trigger=1" ==
        "sysctl: ddb.trigger: Operation not supported by device"
        ?? so i had no option but "virsh reset <domain>"...

        -Artturi

"$ ps cl | grep netlock":
 1000 11941     1   0  10   0  1440  3976 netlock Dsp   p0    0:00.16 ssh
 1000 58467     1   0  10   0  1452  3992 netlock Dsp   p1    0:00.34 ssh
 1000 23909     1   0  10   0  1436  3960 netlock RE/3  p2-   0:00.02 ssh
 1000 11699     1   0  10   0  1552  4132 netlock Dp    p2-   0:14.07 ssh
 1000 44333     1   0  10   0  1452  4040 netlock Dp    p4-   0:00.30 ssh
 1000 50770     1   0  10   0 12284 23468 netlock D     p5-   4:04.48 weechat
 1000 64651     1   0  10   0  1412  3916 netlock D     p6-   0:00.02 ssh
 1000 71503     1   0  10   0  1432  3944 netlock Ds+p  p7    0:00.20 ssh
 1000 45823     1   0  10   0  1500  4080 netlock Dp    p9-   0:00.21 ssh
 1000 65348     1   0  10   0  1480  4204 netlock Dp    pe-   0:30.40 ssh
 1000 74456     1   1  10   0  4772 10096 netlock Dp    pg-   0:01.90 mutt


"$ ps cl"
  UID   PID  PPID CPU PRI  NI   VSZ   RSS WCHAN   STAT  TT       TIME COMMAND
 1000 11941     1   0  10   0  1440  3976 netlock Dsp   p0    0:00.16 ssh
 1000 75836 28487   0  18   0   748   872 pause   Ssp   p0    0:00.06 ksh
 1000 11216 75836   0  28   0   952  1232 -       R+p/2 p0    0:00.00 ps
 1000 58467     1   0  10   0  1452  3992 netlock Dsp   p1    0:00.34 ssh
 1000  1522 76480   0   3   0   748   908 ttyin   Is+p  p1    0:00.06 ksh
 1000 23909     1   0  10   0  1436  3960 netlock RE/3  p2-   0:00.02 ssh
 1000 11699     1   0  10   0  1552  4132 netlock Dp    p2-   0:14.07 ssh
 1000 64202 90615   0  18   0   752   912 pause   Isp   p2    0:00.12 ksh
 1000  6401 64202   0  10   0  1400  2896 wait    Ip    p2    0:00.02 man
 1000 82273  6401   0   3   0   960  2444 ttyin   I+p   p2    0:00.05 more
 1000 44333     1   0  10   0  1452  4040 netlock Dp    p4-   0:00.30 ssh
 1000 50770     1   0  10   0 12284 23468 netlock D     p5-   4:04.48 weechat
 1000 64651     1   0  10   0  1412  3916 netlock D     p6-   0:00.02 ssh
 1000 71503     1   0  10   0  1432  3944 netlock Ds+p  p7    0:00.20 ssh
 1000 45823     1   0  10   0  1500  4080 netlock Dp    p9-   0:00.21 ssh
 1000 65348     1   0  10   0  1480  4204 netlock Dp    pe-   0:30.40 ssh
 1000 74456     1   1  10   0  4772 10096 netlock Dp    pg-   0:01.90 mutt
 1000 13823     1   0  18   0   684   804 pause   Isp   C0    0:00.16 ksh
 1000 41484 13823   0  18   0   676   792 pause   I+p   C0    0:00.09 sh
 1000 30568 41484   0  10   0   428  1836 wait    I+    C0    0:00.02 xinit
 1000 45439 30568   0  18   0   656   760 pause   Ip    C0    0:00.01 sh
 1000 67103     1   0   2   0   596  1824 select  I     C0    4:03.86 dbus-launch
 1000 77867     1   0   2   0  1536  6824 poll    Ip    C0    0:01.48 xclock

>Fix:
        wish i knew.

dmesg:
OpenBSD 6.2-current (GENERIC.MP) #333: Sun Jan  7 09:13:00 MST 2018
    [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 8572960768 (8175MB)
avail mem = 8306216960 (7921MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xf69e0 (11 entries)
bios0: vendor SeaBIOS version "1.10.2-2.fc27" date 04/01/2014
bios0: QEMU Standard PC (i440FX + PIIX, 1996)
acpi0 at bios0: rev 0
acpi0: sleep states S3 S4 S5
acpi0: tables DSDT FACP APIC
acpi0: wakeup devices
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD FX(tm)-9590 Eight-Core Processor, 4690.64 MHz
cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,HV,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,XOP,FMA4,TBM,BMI1
cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache
cpu0: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 999MHz
cpu1 at mainbus0: apid 1 (application processor)
cpu1: AMD FX(tm)-9590 Eight-Core Processor, 4690.05 MHz
cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,HV,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,XOP,FMA4,TBM,BMI1
cpu1: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache
cpu1: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu1: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu2 at mainbus0: apid 2 (application processor)
cpu2: AMD FX(tm)-9590 Eight-Core Processor, 4690.06 MHz
cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,HV,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,XOP,FMA4,TBM,BMI1
cpu2: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache
cpu2: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu2: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu3 at mainbus0: apid 4 (application processor)
cpu3: AMD FX(tm)-9590 Eight-Core Processor, 4690.03 MHz
cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,HV,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,XOP,FMA4,TBM,BMI1
cpu3: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache
cpu3: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu3: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu4 at mainbus0: apid 5 (application processor)
cpu4: AMD FX(tm)-9590 Eight-Core Processor, 4690.00 MHz
cpu4: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,HV,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,XOP,FMA4,TBM,BMI1
cpu4: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache
cpu4: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu4: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu5 at mainbus0: apid 6 (application processor)
cpu5: AMD FX(tm)-9590 Eight-Core Processor, 4690.04 MHz
cpu5: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,HV,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,XOP,FMA4,TBM,BMI1
cpu5: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache
cpu5: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu5: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
ioapic0 at mainbus0: apid 0 pa 0xfec00000, version 11, 24 pins
acpiprt0 at acpi0: bus 0 (PCI0)
acpicpu0 at acpi0: C1(@1 halt!)
acpicpu1 at acpi0: C1(@1 halt!)
acpicpu2 at acpi0: C1(@1 halt!)
acpicpu3 at acpi0: C1(@1 halt!)
acpicpu4 at acpi0: C1(@1 halt!)
acpicpu5 at acpi0: C1(@1 halt!)
"ACPI0006" at acpi0 not configured
"PNP0A06" at acpi0 not configured
"PNP0A06" at acpi0 not configured
"PNP0A06" at acpi0 not configured
"QEMU0002" at acpi0 not configured
"ACPI0010" at acpi0 not configured
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel 82441FX" rev 0x02
pcib0 at pci0 dev 1 function 0 "Intel 82371SB ISA" rev 0x00
pciide0 at pci0 dev 1 function 1 "Intel 82371SB IDE" rev 0x00: DMA, channel 0 wired to compatibility, channel 1 wired to compatibility
pciide0: channel 0 disabled (no drives)
pciide0: channel 1 disabled (no drives)
piixpm0 at pci0 dev 1 function 3 "Intel 82371AB Power" rev 0x03: apic 0 int 9
iic0 at piixpm0
virtio0 at pci0 dev 2 function 0 "Qumranet Virtio Storage" rev 0x00
vioblk0 at virtio0
scsibus1 at vioblk0: 2 targets
sd0 at scsibus1 targ 0 lun 0: <VirtIO, Block Device, > SCSI3 0/direct fixed
sd0: 204800MB, 512 bytes/sector, 419430400 sectors
virtio0: msix shared
virtio1 at pci0 dev 3 function 0 "Qumranet Virtio Network" rev 0x00
vio0 at virtio1: address 52:54:00:d8:72:b3
virtio1: msix shared
virtio2 at pci0 dev 5 function 0 "Qumranet Virtio Console" rev 0x00
virtio2: no matching child driver; not configured
em0 at pci0 dev 6 function 0 "Intel 82574L" rev 0x00: apic 0 int 10, address 68:05:ca:23:90:88
xhci0 at pci0 dev 7 function 0 "NEC xHCI" rev 0x03: apic 0 int 11
usb0 at xhci0: USB revision 3.0
uhub0 at usb0 configuration 1 interface 0 "NEC xHCI root hub" rev 3.00/1.00 addr 1
virtio3 at pci0 dev 9 function 0 "Qumranet Virtio SCSI" rev 0x00
vioscsi0 at virtio3: qsize 128
scsibus2 at vioscsi0: 255 targets
virtio3: msix shared
radeondrm0 at pci0 dev 10 function 0 "ATI Radeon HD 5670" rev 0x00
drm0 at radeondrm0
radeondrm0: apic 0 int 10
azalia0 at pci0 dev 10 function 1 "ATI Radeon HD 5600 Audio" rev 0x00: apic 0 int 11
azalia0: no supported codecs
virtio4 at pci0 dev 11 function 0 "Qumranet Virtio Storage" rev 0x00
vioblk1 at virtio4
scsibus3 at vioblk1: 2 targets
sd1 at scsibus3 targ 0 lun 0: <VirtIO, Block Device, > SCSI3 0/direct fixed
sd1: 102400MB, 512 bytes/sector, 209715200 sectors
virtio4: msix shared
xhci1 at pci0 dev 13 function 0 "Etron EJ188 xHCI" rev 0x00: apic 0 int 10
usb1 at xhci1: USB revision 3.0
uhub1 at usb1 configuration 1 interface 0 "Etron xHCI root hub" rev 3.00/1.00 addr 1
isa0 at pcib0
isadma0 at isa0
fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0: console keyboard
pms0 at pckbc0 (aux slot)
wsmouse0 at pms0 mux 0
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
vmm0 at mainbus0: SVM/RVI
uhidev0 at uhub0 port 5 configuration 1 interface 0 "Cypress Cypress USB Keyboard / PS2 Mouse" rev 1.10/0.01 addr 2
uhidev0: iclass 3/1
ukbd0 at uhidev0: 8 variable keys, 6 key codes
wskbd1 at ukbd0 mux 1
uhidev1 at uhub0 port 5 configuration 1 interface 1 "Cypress Cypress USB Keyboard / PS2 Mouse" rev 1.10/0.01 addr 2
uhidev1: iclass 3/1, 3 report ids
ums0 at uhidev1 reportid 1: 3 buttons, Z dir
wsmouse1 at ums0 mux 0
uhid0 at uhidev1 reportid 2: input=1, output=0, feature=0
uhid1 at uhidev1 reportid 3: input=1, output=0, feature=0
uhidev2 at uhub0 port 6 configuration 1 interface 0 "Razer Razer Naga Hex" rev 2.00/2.00 addr 3
uhidev2: iclass 3/1
ums1 at uhidev2: 5 buttons, Z dir
wsmouse2 at ums1 mux 0
uhidev3 at uhub0 port 6 configuration 1 interface 1 "Razer Razer Naga Hex" rev 2.00/2.00 addr 3
uhidev3: iclass 3/1, 3 report ids
ukbd1 at uhidev3 reportid 1: 8 variable keys, 6 key codes
wskbd2 at ukbd1 mux 1
uhid2 at uhidev3 reportid 2: input=3, output=0, feature=0
uhid3 at uhidev3 reportid 3: input=3, output=0, feature=0
uhub2 at uhub1 port 1 configuration 1 interface 0 "ALCOR USB Hub 2.0" rev 2.00/7.02 addr 2
uhub3 at uhub2 port 1 configuration 1 interface 0 "ALCOR USB Hub 2.0" rev 2.00/7.02 addr 3
uftdi0 at uhub3 port 1 configuration 1 interface 0 "FTDI FT232R USB UART" rev 2.00/6.00 addr 4
ucom0 at uftdi0 portno 1
fd0 at fdc0 drive 1: density unknown
uhidev4 at uhub3 port 3 configuration 1 interface 0 "Logitech Logitech USB Keyboard" rev 2.00/60.00 addr 5
uhidev4: iclass 3/1
ukbd2 at uhidev4: 8 variable keys, 6 key codes
wskbd3 at ukbd2 mux 1
uhidev5 at uhub3 port 3 configuration 1 interface 1 "Logitech Logitech USB Keyboard" rev 2.00/60.00 addr 5
uhidev5: iclass 3/0, 4 report ids
uhid4 at uhidev5 reportid 3: input=4, output=0, feature=0
uhid5 at uhidev5 reportid 4: input=1, output=0, feature=0
uftdi1 at uhub3 port 4 configuration 1 interface 0 "FTDI FT232R USB UART" rev 2.00/6.00 addr 6
ucom1 at uftdi1 portno 1
uftdi2 at uhub2 port 2 configuration 1 interface 0 "FTDI FT232R USB UART" rev 2.00/6.00 addr 7
ucom2 at uftdi2 portno 1
uftdi3 at uhub2 port 4 configuration 1 interface 0 "FTDI FT232R USB UART" rev 2.00/6.00 addr 8
ucom3 at uftdi3 portno 1
uhub4 at uhub1 port 2 configuration 1 interface 0 "VIA Labs USB 2.0 HUB
" rev 2.00/85.70 addr 9
uvideo0 at uhub4 port 4 configuration 1 interface 0 "317GAWCM001LON32E0K7 USB Video device" rev 2.00/21.11 addr 10
video0 at uvideo0
uaudio0 at uhub4 port 4 configuration 1 interface 2 "317GAWCM001LON32E0K7 USB Video device" rev 2.00/21.11 addr 10
uaudio0: audio rev 1.00, 3 mixer controls
audio0 at uaudio0
uhub5 at uhub1 port 6 configuration 1 interface 0 "VLI Labs, Inc. USB 3.0 HUB
" rev 3.00/85.74 addr 11
vscsi0 at root
scsibus4 at vscsi0: 256 targets
softraid0 at root
scsibus5 at softraid0: 256 targets
root on sd1a (a6b8f833567d5088.a) swap on sd1b dump on sd1b
WARNING: / was not properly unmounted
radeondrm0: 1920x1200, 32bpp
wsdisplay0 at radeondrm0 mux 1: console (std, vt100 emulation), using wskbd0
wskbd1: connecting to wsdisplay0
wskbd2: connecting to wsdisplay0
wskbd3: connecting to wsdisplay0
wsdisplay0: screen 1-5 added (std, vt100 emulation)
arp_rtrequest: bad gateway value: em0

usbdevs:
Controller /dev/usb0:
addr 1: super speed, self powered, config 1, xHCI root hub(0x0000), NEC(0x1033), rev 1.00
 port 1 disabled
 port 2 disabled
 port 3 disabled
 port 4 disabled
 port 5 addr 2: low speed, power 100 mA, config 1, Cypress USB Keyboard / PS2 Mouse(0x0101), Cypress(0x04b4), rev 0.01
 port 6 addr 3: full speed, power 100 mA, config 1, Razer Naga Hex(0x0036), Razer(0x1532), rev 2.00
 port 7 disabled
 port 8 disabled
Controller /dev/usb1:
addr 1: super speed, self powered, config 1, xHCI root hub(0x0000), Etron(0x1b6f), rev 1.00
 port 1 addr 2: full speed, self powered, config 1, USB Hub 2.0(0x0606), ALCOR(0x05e3), rev 7.02
  port 1 addr 3: full speed, self powered, config 1, USB Hub 2.0(0x0606), ALCOR(0x05e3), rev 7.02
   port 1 addr 4: full speed, power 90 mA, config 1, FT232R USB UART(0x6001), FTDI(0x0403), rev 6.00, iSerialNumber A600ae86
   port 2 powered
   port 3 addr 5: low speed, power 98 mA, config 1, Logitech USB Keyboard(0xc31b), Logitech(0x046d), rev 60.00
   port 4 addr 6: full speed, power 90 mA, config 1, FT232R USB UART(0x6001), FTDI(0x0403), rev 6.00, iSerialNumber A40081We
  port 2 addr 7: full speed, power 90 mA, config 1, FT232R USB UART(0x6001), FTDI(0x0403), rev 6.00, iSerialNumber A600afb9
  port 3 powered
  port 4 addr 8: full speed, power 90 mA, config 1, FT232R USB UART(0x6001), FTDI(0x0403), rev 6.00, iSerialNumber A40081WI
 port 2 addr 9: high speed, self powered, config 1, USB 2.0 HUB
(0x2811), VIA Labs(0x2109), rev 85.70
  port 1 powered
  port 2 powered
  port 3 powered
  port 4 addr 10: high speed, power 500 mA, config 1, USB Video device(0x58fe), 317GAWCM001LON32E0K7(0x0bda), rev 21.11, iSerialNumber 200901010001
 port 3 disabled
 port 4 disabled
 port 5 disabled
 port 6 addr 11: super speed, self powered, config 1, USB 3.0 HUB
(0x8110), VLI Labs, Inc.(0x2109), rev 85.74
  port 1 disabled
  port 2 disabled
  port 3 disabled
  port 4 disabled
 port 7 disabled
 port 8 disabled

Reply | Threaded
Open this post in threaded view
|

Re: amd64: stuck in netlock

Martin Pieuchot
Hello Artturi,

On 28/01/18(Sun) 09:08, Artturi Alm wrote:

> >Synopsis: stuck in netlock
> >Category: amd64
> >Environment:
> System      : OpenBSD 6.2
> Details     : OpenBSD 6.2-current (GENERIC.MP) #333: Sun Jan  7 09:13:00 MST 2018
> [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>
> Architecture: OpenBSD.amd64
> Machine     : amd64
> >Description:
> processes getting stuck w/STATE=netlock, kill has no effect.
> >How-To-Repeat:
> using the desktop normally, until trying to restart chrome ends
> up failing.

What do you mean with "using the desktop normally"?  Which applications
are you using?  Which browser plugins?  Can you find out the minimum
setup to reproduce this deadlock?

> I've had this happen to me atleast twice in the last few of weeks.

Do you know how to reproduce it easily?

> At first time i noticed how trying to launch chrome did lock up
> all the other processes in netlock, and "pkill chrome" did allow
> the system to recover, i was unable to figure out what was wrong
> and rebooting did make everything work again, while ie.
> removing ~/.cache & ~/.config did not.

So the deadlock is related to your chrome usage?

> long before running the "ps cl" below, i had already killed all
> the xterm-windows those processes were in. cwm(1) was unable to
> kill some of those, but xkill did not.

Well killing process waiting for the 'netlock' won't help.  What has to
be find is which process is holding it.  For that we need the full ps
output, including kernel and userland threads.
>
> after exiting X w/ctrl+alt+backspace(iirc?) i didn't get back to
> $-prompt, and ^T did show xauth stuck in netlock..
> i guess it's obvious where it was heading; so i got pics of
> "# reboot -nq" failing because stuck in the fckng netlock -_-
>
> i do have ddb.{panic,console,log}=1, but
> "# sysctl ddb.trigger=1" ==
> "sysctl: ddb.trigger: Operation not supported by device"

Not having DDB access will limit the debugging experience.  Are you sure
you tried to enter it on your console?

> ?? so i had no option but "virsh reset <domain>"...

Did you try top(1)?  What were the kernel processes doing?

Reply | Threaded
Open this post in threaded view
|

Re: amd64: stuck in netlock

Artturi Alm
On Mon, Jan 29, 2018 at 10:42:20AM +0100, Martin Pieuchot wrote:

> Hello Artturi,
>
> On 28/01/18(Sun) 09:08, Artturi Alm wrote:
> > >Synopsis: stuck in netlock
> > >Category: amd64
> > >Environment:
> > System      : OpenBSD 6.2
> > Details     : OpenBSD 6.2-current (GENERIC.MP) #333: Sun Jan  7 09:13:00 MST 2018
> > [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> >
> > Architecture: OpenBSD.amd64
> > Machine     : amd64
> > >Description:
> > processes getting stuck w/STATE=netlock, kill has no effect.
> > >How-To-Repeat:
> > using the desktop normally, until trying to restart chrome ends
> > up failing.
>
> What do you mean with "using the desktop normally"?  Which applications
> are you using?  Which browser plugins?  Can you find out the minimum
> setup to reproduce this deadlock?
>

I had mupdf, gvim, weechat and chromium running out of packages, not much
else even installed, and no browser plugins.
if i had only one machine to use, this would be it, so kind of hard to
minimize the setup as i had +24hrs of use(or atleast uptime) before this
got triggered.

> > I've had this happen to me atleast twice in the last few of weeks.
>
> Do you know how to reproduce it easily?
>

No i don't, but i will try to stay alert for this to notice it before i go
killing stuff randomly in despair.

> > At first time i noticed how trying to launch chrome did lock up
> > all the other processes in netlock, and "pkill chrome" did allow
> > the system to recover, i was unable to figure out what was wrong
> > and rebooting did make everything work again, while ie.
> > removing ~/.cache & ~/.config did not.
>
> So the deadlock is related to your chrome usage?
>

Possibly, i've an issue with crome, where it will eventually stop playing
videos.
as an example let's say i've got +50 tabs open in a single "main"window,
and then open a second one with "chromium --incognito" and make it play
some playlist from youtube, once the playback ceases(at the beginning
of a new vid). i have to restart all chrome processes to have it continue.

i was guessing it's not related, as i've had this playback issues like
before 6.2 iirc., and even when playback is not working, it does keep
downloading/buffering the video and everything else does work w/o issues
in the other chrome window.

> > long before running the "ps cl" below, i had already killed all
> > the xterm-windows those processes were in. cwm(1) was unable to
> > kill some of those, but xkill did not.
>
> Well killing process waiting for the 'netlock' won't help.  What has to
> be find is which process is holding it.  For that we need the full ps
> output, including kernel and userland threads.

Ok, i'll get those if/when i run into this again.

> >
> > after exiting X w/ctrl+alt+backspace(iirc?) i didn't get back to
> > $-prompt, and ^T did show xauth stuck in netlock..
> > i guess it's obvious where it was heading; so i got pics of
> > "# reboot -nq" failing because stuck in the fckng netlock -_-
> >
> > i do have ddb.{panic,console,log}=1, but
> > "# sysctl ddb.trigger=1" ==
> > "sysctl: ddb.trigger: Operation not supported by device"
>
> Not having DDB access will limit the debugging experience.  Are you sure
> you tried to enter it on your console?
>

Yes, i had already exited X, or do you mean above would only work from
what i get into with ctrl+alt+f1? and not ie. ctrl+alt+f2?
using the first VT(or whatever those are) was impossible as there
was the xauth locked up giving me no prompt..

> > ?? so i had no option but "virsh reset <domain>"...
>
> Did you try top(1)?  What were the kernel processes doing?

Yes, but i didn't pay attention to anything but how weechat
went waiting on netlock if i launched chrome.
on first time i ran into it, i think launching chrome froze systat too.

Reply | Threaded
Open this post in threaded view
|

Re: amd64: stuck in netlock

Artturi Alm
In reply to this post by Martin Pieuchot
On Mon, Jan 29, 2018 at 10:42:20AM +0100, Martin Pieuchot wrote:

> Hello Artturi,
>
> On 28/01/18(Sun) 09:08, Artturi Alm wrote:
> > >Synopsis: stuck in netlock
> > >Category: amd64
> > >Environment:
> > System      : OpenBSD 6.2
> > Details     : OpenBSD 6.2-current (GENERIC.MP) #333: Sun Jan  7 09:13:00 MST 2018
> > [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> >
> > Architecture: OpenBSD.amd64
> > Machine     : amd64
> > >Description:
> > processes getting stuck w/STATE=netlock, kill has no effect.
> > >How-To-Repeat:
> > using the desktop normally, until trying to restart chrome ends
> > up failing.
>
> What do you mean with "using the desktop normally"?  Which applications
> are you using?  Which browser plugins?  Can you find out the minimum
> setup to reproduce this deadlock?
>
> > I've had this happen to me atleast twice in the last few of weeks.
>
> Do you know how to reproduce it easily?
>

this time i had less than 10tabs open, so i guess it can be narrowed
down even further.

> > At first time i noticed how trying to launch chrome did lock up
> > all the other processes in netlock, and "pkill chrome" did allow
> > the system to recover, i was unable to figure out what was wrong
> > and rebooting did make everything work again, while ie.
> > removing ~/.cache & ~/.config did not.
>
> So the deadlock is related to your chrome usage?
>

now it does feel like so. i'll upgrade tonight.

> > long before running the "ps cl" below, i had already killed all
> > the xterm-windows those processes were in. cwm(1) was unable to
> > kill some of those, but xkill did not.
>
> Well killing process waiting for the 'netlock' won't help.  What has to
> be find is which process is holding it.  For that we need the full ps
> output, including kernel and userland threads.
> >
> > after exiting X w/ctrl+alt+backspace(iirc?) i didn't get back to
> > $-prompt, and ^T did show xauth stuck in netlock..
> > i guess it's obvious where it was heading; so i got pics of
> > "# reboot -nq" failing because stuck in the fckng netlock -_-
> >
> > i do have ddb.{panic,console,log}=1, but
> > "# sysctl ddb.trigger=1" ==
> > "sysctl: ddb.trigger: Operation not supported by device"
>
> Not having DDB access will limit the debugging experience.  Are you sure
> you tried to enter it on your console?
>

so this requires ttyC0, right?
this time it was ifconfig in [netlock], that prevented using ttyC0.
i got there from X by running "virsh shutdown <domain" from the kvm host,
i guess it emulates what pressing actual power button would(acpi?).

> > ?? so i had no option but "virsh reset <domain>"...
>
> Did you try top(1)?  What were the kernel processes doing?

see below, if "top -bCHS -d 1 999" should do.
anything else i could do? anyway, thanks in advance:)
-Artturi

load averages:  0.00,  0.02,  0.06    tfort.my.domain 20:04:13
145 threads: 1 running, 139 idle, 5 on processor  up 1 day, 11:38
CPU0 states:  0.2% user,  0.0% nice,  0.4% system,  0.3% interrupt, 99.2% idle
CPU1 states:  1.1% user,  0.1% nice,  2.3% system,  0.0% interrupt, 96.5% idle
CPU2 states:  1.3% user,  0.1% nice,  2.5% system,  0.0% interrupt, 96.1% idle
CPU3 states:  0.9% user,  0.2% nice,  2.9% system,  0.0% interrupt, 96.0% idle
CPU4 states:  0.3% user,  0.1% nice,  0.8% system,  0.0% interrupt, 98.8% idle
CPU5 states:  0.4% user,  0.1% nice,  1.2% system,  0.0% interrupt, 98.3% idle
Memory: Real: 285M/1053M act/tot Free: 6876M Cache: 521M Swap: 0K/4336M

  PID      TID PRI NICE  SIZE   RES STATE     WAIT      TIME    CPU COMMAND
14495   155467   2    0   35M   40M sleep/1   poll     39:05  1.61% /usr/X11R6/bin/X :0 -auth /home/aalm/.serverauth.KsYBQlXE5t
70058   507112   2    0 9652K   13M sleep/1   select    0:02  0.05% xterm
13394   440936 -22    0    0K   21M idle      -        35.3H  0.00% idle0
 6862   125212 -22    0    0K   21M onproc/5  -        35.2H  0.00% idle5
43153   547872 -22    0    0K   21M onproc/4  -        35.0H  0.00% idle4
  661   212291 -22    0    0K   21M onproc/3  -        34.7H  0.00% idle3
25137   319342 -22    0    0K   21M onproc/1  -        34.4H  0.00% idle1
65690   467656 -22    0    0K   21M idle      -        34.4H  0.00% idle2
 3067   485689  10    0   12M   23M idle      netlock   3:12  0.00% weechat -r /connect freenode
87817   410790  68   20    0K   21M run/2     -         2:29  0.00% zerothread
14495   421539   2    0   35M   40M sleep/4   poll      1:51  0.00% /usr/X11R6/bin/X :0 -auth /home/aalm/.serverauth.KsYBQlXE5t
13992   615559  10  -20  888K 2452K idle      netlock   0:47  0.00% ntpd: ntp engine
30357   245010  10    0    0K   21M idle      netlock   0:42  0.00% softclock
61217   230818  10    0    0K   21M idle      netlock   0:30  0.00% softnet
51008   255493  18    0    0K   21M sleep/1   syncer    0:30  0.00% update
70625   286762  10    0    0K   21M sleep/1   bored     0:28  0.00% systq
94504   451160   2    0 4124K   13M idle      select    0:18  0.00% xfe
33315   484574  34    0  141M  102M idle      thrslee   0:17  0.00% chrome:
36172   453673  10    0    0K   21M idle      usbtsk    0:16  0.00% usbtask
39893   337592   4    0  724K  544K sleep/5   bpf       0:16  0.00% pflogd: [running] -s 160 -i pflog0 -f /var/log/pflog
53882   315963   2    0 7680K 8864K idle      select    0:15  0.00% xterm
 5353   101076 -18    0    0K   21M sleep/5   reaper    0:15  0.00% reaper
35731   216145   2    0 2184K 5984K sleep/1   poll      0:14  0.00% /usr/local/libexec/at-spi2-registryd
98256   602209  10    0    0K   21M sleep/3   bored     0:11  0.00% systqmp
17398   194216  10    0 1072K 3780K idle      netlock   0:09  0.00% ntpd: constraint from 2a00:1450:400f:806::2004
35819   545415   2    0  784K 2432K idle      poll      0:06  0.00% /usr/local/bin/dbus-daemon --config-file=/usr/local/share/defaults/at-spi2/accessibility.conf --nofork --print-address 3
95229   161133   2    0 7708K 8960K idle      select    0:06  0.00% xterm
 7658   251559   2    0 1488K 6320K sleep/2   poll      0:04  0.00% cwm
 4561   518683   2    0 5964K 8920K idle      select    0:04  0.00% xterm
54407   393915   2    0 3148K 5676K idle      poll      0:04  0.00% /usr/local/libexec/at-spi-bus-launcher
43330   322248  18    0  676K  824K idle      pause     0:03  0.00% -ksh
    0   100000 -18    0    0K   21M sleep/2   schedul   0:03  0.00% swapper
30418   202727   2    0 8532K   12M idle      select    0:01  0.00% xterm
36972   147487  10    0    0K   21M idle      usbatsk   0:01  0.00% usbatsk
25835   316204   2    0 1528K 6832K sleep/5   poll      0:01  0.00% xclock
72329   601323   2    0 5952K   10M idle      select    0:01  0.00% xterm
 7664   528677   2    0  920K 1692K idle      kqread    0:01  0.00% /usr/sbin/syslogd
83389   257121   2    0 7252K   11M sleep/1   select    0:01  0.00% xterm
26949   457506   2    0 2008K 4592K sleep/1   poll      0:01  0.00% top
49903   184590   2    0 1172K 1328K sleep/1   poll      0:01  0.00% /usr/sbin/cron
89149   330400   2    0 5924K 9300K idle      select    0:01  0.00% xterm -e ssh 10.0.1.2 doas tail -f /var/log/daemon
 8397   307475   2    0 1704K 4624K idle      poll      0:00  0.00% xconsole
82280   557981   2    0 5928K 8760K idle      select    0:00  0.00% xterm
83314   573665   2    0 5932K 8368K idle      select    0:00  0.00% xterm
 3183   109413   2    0 5920K 8520K idle      select    0:00  0.00% xterm -e ssh 10.0.1.2 doas tail -f /var/log/messages
 4293   225199   2    0 5924K 8356K idle      select    0:00  0.00% xterm
 4617   408637   2    0 1448K 4088K idle      select    0:00  0.00% ssh 10.0.1.2 doas tail -f /var/log/daemon
21989   479865   2    0 5940K 8376K idle      select    0:00  0.00% xterm -e ssh 10.0.1.2 doas tail -f /var/log/authlog
46277   558542   2    0 7296K   11M idle      select    0:00  0.00% xterm
31523   438959   2    0  604K  528K idle      poll      0:00  0.00% dhclient: vio0 [priv]
87024   617358   2    0 1452K 4084K idle      select    0:00  0.00% ssh 10.0.1.2 doas tail -f /var/log/messages
66268   224449   2    0 1444K 4072K idle      select    0:00  0.00% ssh 10.0.1.2 doas tail -f /var/log/authlog
33315   262946  28    0  141M  102M idle      fsleep    0:00  0.00% chrome:
85141   228595   2  -20  756K 1748K idle      poll      0:00  0.00% /usr/sbin/ntpd
31097   620520   2    0  588K 2384K idle      poll      0:00  0.00% ssh-agent -s
33315   123420  28    0  141M  102M idle      fsleep    0:00  0.00% chrome:
88786   574966   2    0  604K 1904K idle      poll      0:00  0.00% /usr/local/bin/dbus-daemon --syslog --fork --print-pid 5 --print-address 7 --session
54407   257551   2    0 3148K 5676K idle      poll      0:00  0.00% /usr/local/libexec/at-spi-bus-launcher
33315   443734  -6    0  141M  102M idle      viowait   0:00  0.00% chrome:
79398   434603   2    0 1588K 3860K idle      kqread    0:00  0.00% smtpd: queue
77921   214889   3    0  680K  812K idle      ttyin     0:00  0.00% -ksh
74345   112202   3    0  808K  856K idle      ttyin     0:00  0.00% -ksh
33315   313524   2    0  141M  102M idle      poll      0:00  0.00% chrome:
64223   226902  18    0  744K  892K idle      pause     0:00  0.00% -ksh
20626   492796   2    0 1540K 3856K idle      kqread    0:00  0.00% smtpd: control
 3834   486708  18    0  748K  856K sleep/2   pause     0:00  0.00% -ksh
54407   490935   2    0 3148K 5676K idle      poll      0:00  0.00% /usr/local/libexec/at-spi-bus-launcher
87615   611410   2    0 1540K 3992K idle      kqread    0:00  0.00% smtpd: pony express
27809   212832   2    0 2948K 5772K idle      poll      0:00  0.00% /usr/local/libexec/gvfsd
41871   383430   2    0  920K 1348K idle      select    0:00  0.00% /usr/sbin/sshd
33315   585404  28    0  141M  102M idle      fsleep    0:00  0.00% chrome:
14495   122264  31    0   35M   40M idle      fsleep    0:00  0.00% /usr/X11R6/bin/X :0 -auth /home/aalm/.serverauth.KsYBQlXE5t
97852   388194   2    0 1388K 3760K idle      kqread    0:00  0.00% smtpd: lookup
59867   310950  10    0 1460K 4152K idle      netlock   0:00  0.00% ssh 10.0.1.4
95935   209526   2    0 1268K 3592K idle      kqread    0:00  0.00% smtpd: scheduler
33315   204638  28    0  141M  102M sleep/2   fsleep    0:00  0.00% chrome:
40985   282468   2    0 1264K 3524K idle      kqread    0:00  0.00% smtpd: klondike
26683   403143   2    0 2224K 1280K idle      netio     0:00  0.00% X: [priv]
16901   218184  18    0  680K  800K idle      pause     0:00  0.00% -ksh
62588   321190  18    0  684K  820K idle      pause     0:00  0.00% -ksh
77631   557114   2    0  220K  336K idle      netcon    0:00  0.00% nfsd: master
33315   116563  28    0  141M  102M sleep/4   fsleep    0:00  0.00% chrome:
12097   547114   2    0  668K 2316K idle      poll      0:00  0.00% ntpd: dns engine
45861   500739   2    0  140K  156K idle      nfsd      0:00  0.00% nfsd: server
63000   119248   2    0  140K  152K idle      nfsd      0:00  0.00% nfsd: server
 1959   474643   3    0  316K 1288K idle      ttyin     0:00  0.00% /usr/libexec/getty std.9600 ttyC3
75796   203066   2    0  672K  752K idle      poll      0:00  0.00% mountd: parent
91228   394672   2    0  140K  152K idle      nfsd      0:00  0.00% nfsd: server
73066   389907   2    0 1540K 2264K idle      kqread    0:00  0.00% /usr/sbin/smtpd
33315   539831  28    0  141M  102M sleep/3   fsleep    0:00  0.00% chrome:
33315   236547  28    0  141M  102M idle      fsleep    0:00  0.00% chrome:
61229   136255   3    0  816K  908K idle      ttyin     0:00  0.00% -ksh
33315   514421  29    0  141M  102M idle      fsleep    0:00  0.00% chrome:
27809   260205   2    0 2948K 5772K idle      poll      0:00  0.00% /usr/local/libexec/gvfsd
 2846   113818   2    0  680K  612K idle      netio     0:00  0.00% pflogd: [priv]
31957   551446   2  -20  436K 1048K idle      poll      0:00  0.00% /usr/bin/sndiod
39187   119771   2    0  736K  684K idle      poll      0:00  0.00% dhclient: vio0
94324   372170  10    0  432K 1828K idle      wait      0:00  0.00% xinit /home/aalm/.xinitrc -- /usr/X11R6/bin/X :0 -auth /home/aalm/.serverauth.KsYBQlXE5t
25954   531521   3    0  312K 1276K idle      ttyin     0:00  0.00% /usr/libexec/getty std.9600 ttyC1
77645   555246   3    0  312K 1272K idle      ttyin     0:00  0.00% /usr/libexec/getty std.9600 ttyC2
24372   144495   3    0  680K  840K idle      ttyin     0:00  0.00% -ksh
54451   115117   3    0  312K 1272K idle      ttyin     0:00  0.00% /usr/libexec/getty std.9600 ttyC5
45064   261299  18    0  804K  860K idle      pause     0:00  0.00% -ksh
33315   317401   2    0  141M  102M idle      kqread    0:00  0.00% chrome:
33315   367737  28    0  141M  102M sleep/4   fsleep    0:00  0.00% chrome:
49805   158039   2    0  508K  468K idle      poll      0:00  0.00% mountd: [priv]
27809   450428   2    0 2948K 5772K idle      poll      0:00  0.00% /usr/local/libexec/gvfsd
 6307   102098   3    0  684K  816K idle      ttyin     0:00  0.00% -ksh
33315   595662  30    0  141M  102M idle      fsleep    0:00  0.00% chrome:
    1   398685  10    0  388K  452K idle      wait      0:00  0.00% /sbin/init
54083   465642  18    0  676K  792K idle      pause     0:00  0.00% /bin/sh /usr/X11R6/bin/startx
35731   409155   2    0 2184K 5984K idle      poll      0:00  0.00% /usr/local/libexec/at-spi2-registryd
51400   304022 -18    0    0K   21M idle      pgdaemo   0:00  0.00% pagedaemon
 5798   348610  18    0  664K  768K idle      pause     0:00  0.00% sh /home/aalm/.xinitrc
34823   269871  10    0    0K   21M idle      bored     0:00  0.00% ttm_swap
 4063   460059   3    0  684K  808K idle      ttyin     0:00  0.00% -ksh
60668   341706 -13    0    0K   21M idle      cleaner   0:00  0.00% cleaner
27861   371189 -18    0    0K   21M idle      aiodone   0:00  0.00% aiodoned
24713   509614   2    0  496K 1960K idle      netio     0:00  0.00% syslogd: [priv]
69707   613393   2    0 1036K 1080K idle      netio     0:00  0.00% xconsole
59595   265281   2    0  596K 1844K idle      select    0:00  0.00% dbus-launch --sh-syntax --exit-with-session
53526   603738  10    0  432K 1532K idle      netlock   0:00  0.00% systat
35731   431557   2    0 2184K 5984K idle      poll      0:00  0.00% /usr/local/libexec/at-spi2-registryd
45074   233314   2    0  140K  156K idle      nfsd      0:00  0.00% nfsd: server
81708   418838   2    0  384K 1072K idle      poll      0:00  0.00% /usr/sbin/portmap
58374   516454   2    0  140K  156K idle      nfsd      0:00  0.00% nfsd: server
 9217   478612   2    0  140K  152K idle      nfsd      0:00  0.00% nfsd: server
 7100   505816   2    0  140K  156K idle      nfsd      0:00  0.00% nfsd: server
33315   250331   2    0  141M  102M idle      poll      0:00  0.00% chrome:
75588   539022   2    0  416K  900K idle      poll      0:00  0.00% sndiod: helper
24254   306793   2    0  140K  156K idle      nfsd      0:00  0.00% nfsd: server
20053   327803   2    0  140K  152K idle      nfsd      0:00  0.00% nfsd: server
83191   436025   2    0  140K  152K idle      nfsd      0:00  0.00% nfsd: server
82243   423406   2    0  140K  152K idle      nfsd      0:00  0.00% nfsd: server
54407   602688   2    0 3148K 5676K idle      poll      0:00  0.00% /usr/local/libexec/at-spi-bus-launcher
33315   403281  30    0  141M  102M idle      fsleep    0:00  0.00% chrome:
33315   222186  28    0  141M  102M idle      fsleep    0:00  0.00% chrome:
33315   117266  30    0  141M  102M idle      fsleep    0:00  0.00% chrome:
33315   505611   2    0  141M  102M idle      poll      0:00  0.00% chrome:
33315   439073  30    0  141M  102M idle      fsleep    0:00  0.00% chrome:
70689   286455   2    0  140K  152K idle      nfsd      0:00  0.00% nfsd: server
42722   600020  10    0    0K   21M idle      acpi0     0:00  0.00% acpi0
37946   233490  10    0    0K   21M idle      bored     0:00  0.00% crynlk
26234   539619  10    0    0K   21M idle      bored     0:00  0.00% crypto
93494   221357  28    0  516K 1504K onproc/0  -         0:00  0.00% top -bCHS -d 1 999

Reply | Threaded
Open this post in threaded view
|

Re: amd64: stuck in netlock

Martin Pieuchot
On 29/01/18(Mon) 20:38, Artturi Alm wrote:

> On Mon, Jan 29, 2018 at 10:42:20AM +0100, Martin Pieuchot wrote:
> > Hello Artturi,
> >
> > On 28/01/18(Sun) 09:08, Artturi Alm wrote:
> > > >Synopsis: stuck in netlock
> > > >Category: amd64
> > > >Environment:
> > > System      : OpenBSD 6.2
> > > Details     : OpenBSD 6.2-current (GENERIC.MP) #333: Sun Jan  7 09:13:00 MST 2018
> > > [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > >
> > > Architecture: OpenBSD.amd64
> > > Machine     : amd64
> > > >Description:
> > > processes getting stuck w/STATE=netlock, kill has no effect.
> > > >How-To-Repeat:
> > > using the desktop normally, until trying to restart chrome ends
> > > up failing.
> >
> > What do you mean with "using the desktop normally"?  Which applications
> > are you using?  Which browser plugins?  Can you find out the minimum
> > setup to reproduce this deadlock?
> >
> > > I've had this happen to me atleast twice in the last few of weeks.
> >
> > Do you know how to reproduce it easily?
> >
>
> this time i had less than 10tabs open, so i guess it can be narrowed
> down even further.
>
> > > At first time i noticed how trying to launch chrome did lock up
> > > all the other processes in netlock, and "pkill chrome" did allow
> > > the system to recover, i was unable to figure out what was wrong
> > > and rebooting did make everything work again, while ie.
> > > removing ~/.cache & ~/.config did not.
> >
> > So the deadlock is related to your chrome usage?
> >
>
> now it does feel like so. i'll upgrade tonight.
>
> > > long before running the "ps cl" below, i had already killed all
> > > the xterm-windows those processes were in. cwm(1) was unable to
> > > kill some of those, but xkill did not.
> >
> > Well killing process waiting for the 'netlock' won't help.  What has to
> > be find is which process is holding it.  For that we need the full ps
> > output, including kernel and userland threads.
> > >
> > > after exiting X w/ctrl+alt+backspace(iirc?) i didn't get back to
> > > $-prompt, and ^T did show xauth stuck in netlock..
> > > i guess it's obvious where it was heading; so i got pics of
> > > "# reboot -nq" failing because stuck in the fckng netlock -_-
> > >
> > > i do have ddb.{panic,console,log}=1, but
> > > "# sysctl ddb.trigger=1" ==
> > > "sysctl: ddb.trigger: Operation not supported by device"
> >
> > Not having DDB access will limit the debugging experience.  Are you sure
> > you tried to enter it on your console?
> >
>
> so this requires ttyC0, right?
> this time it was ifconfig in [netlock], that prevented using ttyC0.
> i got there from X by running "virsh shutdown <domain" from the kvm host,
> i guess it emulates what pressing actual power button would(acpi?).
>
> > > ?? so i had no option but "virsh reset <domain>"...
> >
> > Did you try top(1)?  What were the kernel processes doing?
>
> see below, if "top -bCHS -d 1 999" should do.
> anything else i could do? anyway, thanks in advance:)

This is where the problems comes from:

> 33315   443734  -6    0  141M  102M idle      viowait   0:00  0.00% chrome:

I don't understand how chrome can end up sleeping in vio_ioctl() and why
it is sleeping forever.  But this thread is holding the NET_LOCK() and
prevents the rest of the kernel from making progress.

Could you try a virtual interface different from vio(4) and see if you
can reproduce the problem?

Reply | Threaded
Open this post in threaded view
|

Re: amd64: stuck in netlock

Artturi Alm
On Mon, Jan 29, 2018 at 08:03:38PM +0100, Martin Pieuchot wrote:

> On 29/01/18(Mon) 20:38, Artturi Alm wrote:
> > On Mon, Jan 29, 2018 at 10:42:20AM +0100, Martin Pieuchot wrote:
> > > Hello Artturi,
> > >
> > > On 28/01/18(Sun) 09:08, Artturi Alm wrote:
> > > > >Synopsis: stuck in netlock
> > > > >Category: amd64
> > > > >Environment:
> > > > System      : OpenBSD 6.2
> > > > Details     : OpenBSD 6.2-current (GENERIC.MP) #333: Sun Jan  7 09:13:00 MST 2018
> > > > [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > > >
> > > > Architecture: OpenBSD.amd64
> > > > Machine     : amd64
> > > > >Description:
> > > > processes getting stuck w/STATE=netlock, kill has no effect.
> > > > >How-To-Repeat:
> > > > using the desktop normally, until trying to restart chrome ends
> > > > up failing.
> > >
> > > What do you mean with "using the desktop normally"?  Which applications
> > > are you using?  Which browser plugins?  Can you find out the minimum
> > > setup to reproduce this deadlock?
> > >
> > > > I've had this happen to me atleast twice in the last few of weeks.
> > >
> > > Do you know how to reproduce it easily?
> > >
> >
> > this time i had less than 10tabs open, so i guess it can be narrowed
> > down even further.
> >
> > > > At first time i noticed how trying to launch chrome did lock up
> > > > all the other processes in netlock, and "pkill chrome" did allow
> > > > the system to recover, i was unable to figure out what was wrong
> > > > and rebooting did make everything work again, while ie.
> > > > removing ~/.cache & ~/.config did not.
> > >
> > > So the deadlock is related to your chrome usage?
> > >
> >
> > now it does feel like so. i'll upgrade tonight.
> >
> > > > long before running the "ps cl" below, i had already killed all
> > > > the xterm-windows those processes were in. cwm(1) was unable to
> > > > kill some of those, but xkill did not.
> > >
> > > Well killing process waiting for the 'netlock' won't help.  What has to
> > > be find is which process is holding it.  For that we need the full ps
> > > output, including kernel and userland threads.
> > > >
> > > > after exiting X w/ctrl+alt+backspace(iirc?) i didn't get back to
> > > > $-prompt, and ^T did show xauth stuck in netlock..
> > > > i guess it's obvious where it was heading; so i got pics of
> > > > "# reboot -nq" failing because stuck in the fckng netlock -_-
> > > >
> > > > i do have ddb.{panic,console,log}=1, but
> > > > "# sysctl ddb.trigger=1" ==
> > > > "sysctl: ddb.trigger: Operation not supported by device"
> > >
> > > Not having DDB access will limit the debugging experience.  Are you sure
> > > you tried to enter it on your console?
> > >
> >
> > so this requires ttyC0, right?
> > this time it was ifconfig in [netlock], that prevented using ttyC0.
> > i got there from X by running "virsh shutdown <domain" from the kvm host,
> > i guess it emulates what pressing actual power button would(acpi?).
> >
> > > > ?? so i had no option but "virsh reset <domain>"...
> > >
> > > Did you try top(1)?  What were the kernel processes doing?
> >
> > see below, if "top -bCHS -d 1 999" should do.
> > anything else i could do? anyway, thanks in advance:)
>
> This is where the problems comes from:
>
> > 33315   443734  -6    0  141M  102M idle      viowait   0:00  0.00% chrome:
>
> I don't understand how chrome can end up sleeping in vio_ioctl() and why
> it is sleeping forever.  But this thread is holding the NET_LOCK() and
> prevents the rest of the kernel from making progress.
>
> Could you try a virtual interface different from vio(4) and see if you
> can reproduce the problem?

Will try with 'e1000', but then this does seem to me like it would have
something to do with routing too(?), as the vio0 is only for reaching to
the host.
and separate physical interface, to which the default route belongs to.


Routing tables

Internet:
Destination        Gateway            Flags   Refs      Use   Mtu  Prio Iface
default            10.0.1.2           UGS       11       65     -     8 em0
224/4              127.0.0.1          URS        0       60 32768     8 lo0
10.0.1/24          10.0.1.1           UCn        3        0     -     4 em0
10.0.1/24          10.0.1.1           US         0        0     -     8 em0
10.0.1.1           68:05:ca:23:90:88  UHLl       0       20     -     1 em0
10.0.1.2           bc:5f:f4:e6:e2:63  UHLch      4       80     -     3 em0
10.0.1.4           c8:3a:35:d8:ec:0b  UHLc       0        5     -     3 em0
10.0.1.10          link#2             UHLch      2       10     -     3 em0
10.0.1.255         10.0.1.1           UHb        0        0     -     1 em0
10.0.10/24         10.0.1.10          UGS        0        0     -     8 em0
10.0.11/24         10.0.11.1          UCn        0        0     -     4 vio0
10.0.11.1          52:54:00:d8:72:b3  UHLl       0        1     -     1 vio0
10.0.11.255        10.0.11.1          UHb        0        0     -     1 vio0
10.0.100/24        10.0.1.10          UGS        0        0     -     8 em0
127/8              127.0.0.1          UGRS       0        0 32768     8 lo0
127.0.0.1          127.0.0.1          UHhl       2       33 32768     1 lo0

$ ifconfig
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 32768
        index 4 priority 0 llprio 3
        groups: lo
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4
        inet 127.0.0.1 netmask 0xff000000
vio0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        lladdr 52:54:00:d8:72:b3
        index 1 priority 0 llprio 3
        media: Ethernet autoselect
        status: active
        inet 10.0.11.1 netmask 0xffffff00 broadcast 10.0.11.255
em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        lladdr 68:05:ca:23:90:88
        index 2 priority 0 llprio 3
        groups: egress
        media: Ethernet autoselect (1000baseT full-duplex,rxpause,txpause)
        status: active
        inet 10.0.1.1 netmask 0xffffff00 broadcast 10.0.1.255
enc0: flags=0<>
        index 3 priority 0 llprio 3
        groups: enc
        status: active
pflog0: flags=141<UP,RUNNING,PROMISC> mtu 33136
        index 5 priority 0 llprio 3
        groups: pflog

Reply | Threaded
Open this post in threaded view
|

Re: amd64: stuck in netlock

Martin Pieuchot
On 29/01/18(Mon) 21:25, Artturi Alm wrote:

> On Mon, Jan 29, 2018 at 08:03:38PM +0100, Martin Pieuchot wrote:
> > On 29/01/18(Mon) 20:38, Artturi Alm wrote:
> > > On Mon, Jan 29, 2018 at 10:42:20AM +0100, Martin Pieuchot wrote:
> > > > Hello Artturi,
> > > >
> > > > On 28/01/18(Sun) 09:08, Artturi Alm wrote:
> > > > > >Synopsis: stuck in netlock
> > > > > >Category: amd64
> > > > > >Environment:
> > > > > System      : OpenBSD 6.2
> > > > > Details     : OpenBSD 6.2-current (GENERIC.MP) #333: Sun Jan  7 09:13:00 MST 2018
> > > > > [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > > > >
> > > > > Architecture: OpenBSD.amd64
> > > > > Machine     : amd64
> > > > > >Description:
> > > > > processes getting stuck w/STATE=netlock, kill has no effect.
> > > > > >How-To-Repeat:
> > > > > using the desktop normally, until trying to restart chrome ends
> > > > > up failing.
> > > >
> > > > What do you mean with "using the desktop normally"?  Which applications
> > > > are you using?  Which browser plugins?  Can you find out the minimum
> > > > setup to reproduce this deadlock?
> > > >
> > > > > I've had this happen to me atleast twice in the last few of weeks.
> > > >
> > > > Do you know how to reproduce it easily?
> > > >
> > >
> > > this time i had less than 10tabs open, so i guess it can be narrowed
> > > down even further.
> > >
> > > > > At first time i noticed how trying to launch chrome did lock up
> > > > > all the other processes in netlock, and "pkill chrome" did allow
> > > > > the system to recover, i was unable to figure out what was wrong
> > > > > and rebooting did make everything work again, while ie.
> > > > > removing ~/.cache & ~/.config did not.
> > > >
> > > > So the deadlock is related to your chrome usage?
> > > >
> > >
> > > now it does feel like so. i'll upgrade tonight.
> > >
> > > > > long before running the "ps cl" below, i had already killed all
> > > > > the xterm-windows those processes were in. cwm(1) was unable to
> > > > > kill some of those, but xkill did not.
> > > >
> > > > Well killing process waiting for the 'netlock' won't help.  What has to
> > > > be find is which process is holding it.  For that we need the full ps
> > > > output, including kernel and userland threads.
> > > > >
> > > > > after exiting X w/ctrl+alt+backspace(iirc?) i didn't get back to
> > > > > $-prompt, and ^T did show xauth stuck in netlock..
> > > > > i guess it's obvious where it was heading; so i got pics of
> > > > > "# reboot -nq" failing because stuck in the fckng netlock -_-
> > > > >
> > > > > i do have ddb.{panic,console,log}=1, but
> > > > > "# sysctl ddb.trigger=1" ==
> > > > > "sysctl: ddb.trigger: Operation not supported by device"
> > > >
> > > > Not having DDB access will limit the debugging experience.  Are you sure
> > > > you tried to enter it on your console?
> > > >
> > >
> > > so this requires ttyC0, right?
> > > this time it was ifconfig in [netlock], that prevented using ttyC0.
> > > i got there from X by running "virsh shutdown <domain" from the kvm host,
> > > i guess it emulates what pressing actual power button would(acpi?).
> > >
> > > > > ?? so i had no option but "virsh reset <domain>"...
> > > >
> > > > Did you try top(1)?  What were the kernel processes doing?
> > >
> > > see below, if "top -bCHS -d 1 999" should do.
> > > anything else i could do? anyway, thanks in advance:)
> >
> > This is where the problems comes from:
> >
> > > 33315   443734  -6    0  141M  102M idle      viowait   0:00  0.00% chrome:
> >
> > I don't understand how chrome can end up sleeping in vio_ioctl() and why
> > it is sleeping forever.  But this thread is holding the NET_LOCK() and
> > prevents the rest of the kernel from making progress.
> >
> > Could you try a virtual interface different from vio(4) and see if you
> > can reproduce the problem?
>
> Will try with 'e1000', but then this does seem to me like it would have
> something to do with routing too(?), as the vio0 is only for reaching to
> the host.
> and separate physical interface, to which the default route belongs to.

Here's a diff to fix vio(4), could you give it a go?

Index: dev/pv/if_vio.c
===================================================================
RCS file: /cvs/src/sys/dev/pv/if_vio.c,v
retrieving revision 1.4
diff -u -p -r1.4 if_vio.c
--- dev/pv/if_vio.c 10 Aug 2017 18:03:51 -0000 1.4
+++ dev/pv/if_vio.c 23 Feb 2018 09:14:29 -0000
@@ -1276,7 +1276,8 @@ vio_wait_ctrl(struct vio_softc *sc)
  int r = 0;
 
  while (sc->sc_ctrl_inuse != FREE) {
- r = tsleep(&sc->sc_ctrl_inuse, PRIBIO|PCATCH, "viowait", 0);
+ r = rwsleep(&sc->sc_ctrl_inuse, &netlock, PRIBIO|PCATCH,
+    "viowait", 0);
  if (r == EINTR)
  return r;
  }
@@ -1295,7 +1296,8 @@ vio_wait_ctrl_done(struct vio_softc *sc)
  r = 1;
  break;
  }
- r = tsleep(&sc->sc_ctrl_inuse, PRIBIO|PCATCH, "viodone", 0);
+ r = rwsleep(&sc->sc_ctrl_inuse, &netlock, PRIBIO|PCATCH,
+    "viodone", 0);
  if (r == EINTR)
  break;
  }