Kernel crash in IWM driver after resume from sleep on -current

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Kernel crash in IWM driver after resume from sleep on -current

Xavier Guerin-2
>Synopsis: Kernel crash in IWM driver after resume from sleep
>Category: kernel
>Environment:
        System      : OpenBSD 6.4
        Details     : OpenBSD 6.4-beta (GENERIC.MP) #292: Mon Sep 10
18:26:22 MDT 2018
                         [hidden email]:/usr/src/sys/arch/am
d64/compile/GENERIC.MP

        Architecture: OpenBSD.amd64
        Machine     : amd64
>Description:
        IWM is connected to a 802.11G network then the machine is put
to sleep.
        After several hours, when the machine resumes it ends up in DDB
after maybe
        5 minutes crashing in the IWM driver.
>How-To-Repeat:
        1. Connect the IWM device to a network.
        2. Put the machine to sleep.
        3. Wait for some hours.
        4. Resume.
>Fix:
        Disconnect from WiFi before sleeping.

SENDBUG: Run sendbug as root if this is an ACPI report!
SENDBUG: dmesg and usbdevs are attached.
SENDBUG: Feel free to delete or use the -D flag if they contain
sensitive information.

dmesg:
OpenBSD 6.4-beta (GENERIC.MP) #292: Mon Sep 10 18:26:22 MDT 2018
    [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.M
P
real mem = 8469381120 (8077MB)
avail mem = 8203436032 (7823MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xacbfd000 (66 entries)
bios0: vendor LENOVO version "N14ET35W (1.13 )" date 04/07/2016
bios0: LENOVO 20BS003EUS
acpi0 at bios0: rev 2
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP SLIC ASF! HPET ECDT APIC MCFG SSDT SSDT SSDT
SSDT SSDT SSDT SSDT SSDT SSDT PCCT SSDT UEFI MSDM BATB FPDT UEFI BGRT
DMAR
acpi0: wakeup devices LID_(S4) SLPB(S3) IGBE(S4) EXP2(S4) XHCI(S3)
EHC1(S3)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpihpet0 at acpi0: 14318179 Hz
acpiec0 at acpi0
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz, 2594.41 MHz, 06-3d-04
cpu0:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36
,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAI
T,DS-
CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x
2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSC
P,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,
INVPCID,RTM,RDSEED,ADX,SMAP,PT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XS
AVEOPT,MELTDOWN
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz, 2594.00 MHz, 06-3d-04
cpu1:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36
,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAI
T,DS-
CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x
2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSC
P,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,
INVPCID,RTM,RDSEED,ADX,SMAP,PT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XS
AVEOPT,MELTDOWN
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 1, core 0, package 0
cpu2 at mainbus0: apid 2 (application processor)
cpu2: Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz, 2594.00 MHz, 06-3d-04
cpu2:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36
,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAI
T,DS-
CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x
2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSC
P,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,
INVPCID,RTM,RDSEED,ADX,SMAP,PT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XS
AVEOPT,MELTDOWN
cpu2: 256KB 64b/line 8-way L2 cache
cpu2: smt 0, core 1, package 0
cpu3 at mainbus0: apid 3 (application processor)
cpu3: Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz, 2594.00 MHz, 06-3d-04
cpu3:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36
,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAI
T,DS-
CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x
2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSC
P,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,
INVPCID,RTM,RDSEED,ADX,SMAP,PT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XS
AVEOPT,MELTDOWN
cpu3: 256KB 64b/line 8-way L2 cache
cpu3: smt 1, core 1, package 0
ioapic0 at mainbus0: apid 2 pa 0xfec00000, version 20, 40 pins
acpimcfg0 at acpi0
acpimcfg0: addr 0xf8000000, bus 0-63
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus -1 (PEG_)
acpiprt2 at acpi0: bus 3 (EXP1)
acpiprt3 at acpi0: bus 4 (EXP2)
acpiprt4 at acpi0: bus -1 (EXP3)
acpiprt5 at acpi0: bus -1 (EXP6)
acpicpu0 at acpi0: C3(200@233 mwait.1@0x40), C2(200@148 mwait.1@0x33),
C1(1000@1 mwait.1)
acpicpu1 at acpi0: C3(200@233 mwait.1@0x40), C2(200@148 mwait.1@0x33),
C1(1000@1 mwait.1)
acpicpu2 at acpi0: C3(200@233 mwait.1@0x40), C2(200@148 mwait.1@0x33),
C1(1000@1 mwait.1)
acpicpu3 at acpi0: C3(200@233 mwait.1@0x40), C2(200@148 mwait.1@0x33),
C1(1000@1 mwait.1)
acpipwrres0 at acpi0: PUBS, resource for XHCI, EHC1
acpipwrres1 at acpi0: NVP3, resource for PEG_
acpipwrres2 at acpi0: NVP2, resource for PEG_
acpitz0 at acpi0: critical temperature is 128 degC
acpibtn0 at acpi0: LID_
acpibtn1 at acpi0: SLPB
acpicmos0 at acpi0
acpibat0 at acpi0: BAT0 model "00HW003" serial   332 type LiP oem "SMP"
acpiac0 at acpi0: AC unit online
acpithinkpad0 at acpi0
"PNP0C14" at acpi0 not configured
"PNP0C14" at acpi0 not configured
"PNP0C14" at acpi0 not configured
"INT340F" at acpi0 not configured
acpivideo0 at acpi0: VID_
acpivout at acpivideo0 not configured
acpivideo1 at acpi0: VID_
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel Core 5G Host" rev 0x09
inteldrm0 at pci0 dev 2 function 0 "Intel HD Graphics 5500" rev 0x09
drm0 at inteldrm0
inteldrm0: msi
inteldrm0: 2560x1440, 32bpp
wsdisplay0 at inteldrm0 mux 1: console (std, vt100 emulation)
wsdisplay0: screen 1-5 added (std, vt100 emulation)
azalia0 at pci0 dev 3 function 0 "Intel Core 5G HD Audio" rev 0x09: msi
xhci0 at pci0 dev 20 function 0 "Intel 9 Series xHCI" rev 0x03: msi,
xHCI 1.0
usb0 at xhci0: USB revision 3.0
uhub0 at usb0 configuration 1 interface 0 "Intel xHCI root hub" rev
3.00/1.00 addr 1
"Intel 9 Series MEI" rev 0x03 at pci0 dev 22 function 0 not configured
em0 at pci0 dev 25 function 0 "Intel I218-LM" rev 0x03: msi, address
54:ee:75:5d:70:e3
azalia1 at pci0 dev 27 function 0 "Intel 9 Series HD Audio" rev 0x03:
msi
azalia1: codecs: Realtek ALC292
audio0 at azalia1
ppb0 at pci0 dev 28 function 0 "Intel 9 Series PCIE" rev 0xe3: msi
pci1 at ppb0 bus 3
ppb1 at pci0 dev 28 function 1 "Intel 9 Series PCIE" rev 0xe3: msi
pci2 at ppb1 bus 4
iwm0 at pci2 dev 0 function 0 "Intel Dual Band Wireless AC 7265" rev
0x59, msi
pcib0 at pci0 dev 31 function 0 "Intel 9 Series LPC" rev 0x03
ahci0 at pci0 dev 31 function 2 "Intel 9 Series AHCI" rev 0x03: msi,
AHCI 1.3
ahci0: port 3: 6.0Gb/s
scsibus1 at ahci0: 32 targets
sd0 at scsibus1 targ 3 lun 0: <ATA, TOSHIBA THNSFJ25, JULA> SCSI3
0/direct fixed naa.500080d9103d7f7b
sd0: 244198MB, 512 bytes/sector, 500118192 sectors, thin
ichiic0 at pci0 dev 31 function 3 "Intel 9 Series SMBus" rev 0x03: apic
2 int 18
iic0 at ichiic0
pchtemp0 at pci0 dev 31 function 6 "Intel 9 Series Thermal" rev 0x03
isa0 at pcib0
isadma0 at isa0
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pms0 at pckbc0 (aux slot)
wsmouse0 at pms0 mux 0
wsmouse1 at pms0 mux 0
pms0: Synaptics clickpad, firmware 8.1, 0x1e2b1 0x943300
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
vmm0 at mainbus0: VMX/EPT
efifb at mainbus0 not configured
uhidev0 at uhub0 port 1 configuration 1 interface 0 "Yubico Yubikey 4
OTP+U2F+CCID" rev 2.00/4.28 addr 2
uhidev0: iclass 3/1
ukbd0 at uhidev0: 8 variable keys, 6 key codes
wskbd1 at ukbd0 mux 1
wskbd1: connecting to wsdisplay0
uhidev1 at uhub0 port 1 configuration 1 interface 1 "Yubico Yubikey 4
OTP+U2F+CCID" rev 2.00/4.28 addr 2
uhidev1: iclass 3/0
uhid0 at uhidev1: input=64, output=64, feature=0
ugen0 at uhub0 port 1 configuration 1 "Yubico Yubikey 4 OTP+U2F+CCID"
rev 2.00/4.28 addr 2
uhidev2 at uhub0 port 5 configuration 1 interface 0 "ELAN Touchscreen"
rev 2.00/0.13 addr 3
uhidev2: iclass 3/0, 68 report ids
ums0 at uhidev2 reportid 1: 1 button, tip
wsmouse2 at ums0 mux 0
uhid1 at uhidev2 reportid 2: input=64, output=0, feature=0
uhid2 at uhidev2 reportid 3: input=0, output=31, feature=0
uhid3 at uhidev2 reportid 4: input=19, output=0, feature=0
uhid4 at uhidev2 reportid 10: input=0, output=0, feature=1
ums1 at uhidev2 reportid 68
ums1: mouse has no X report
ugen1 at uhub0 port 6 "Validity Sensors VFS5011 Fingerprint Reader" rev
1.10/0.78 addr 4
ugen2 at uhub0 port 7 "Intel Bluetooth" rev 2.01/0.01 addr 5
uvideo0 at uhub0 port 8 configuration 1 interface 0 "Chicony
Electronics Co.,Ltd. Integrated Camera" rev 2.00/0.29 addr 6
video0 at uvideo0
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
root on sd0a (e738e89d40ccfe0d.a) swap on sd0b dump on sd0b
WARNING: / was not properly unmounted
iwm0: hw rev 0x210, fw ver 16.242414.0, address 5c:e0:c5:50:cb:cf

usbdevs:
Controller /dev/usb0:
addr 01: 8086:0000 Intel, xHCI root hub
         super speed, self powered, config 1, rev 1.00
         driver: uhub0
addr 02: 1050:0407 Yubico, Yubikey 4 OTP+U2F+CCID
         full speed, power 30 mA, config 1, rev 4.28
         driver: uhidev0
         driver: uhidev1
         driver: ugen0
addr 03: 04f3:012d ELAN, Touchscreen
         full speed, self powered, config 1, rev 0.13
         driver: uhidev2
addr 04: 138a:0017 Validity Sensors, VFS5011 Fingerprint Reader
         full speed, power 100 mA, config 1, rev 0.78, iSerialNumber
98ac8b524595
         driver: ugen1
addr 05: 8087:0a2a Intel, Bluetooth
         full speed, self powered, config 1, rev 0.01
         driver: ugen2
addr 06: 04f2:b45d Chicony Electronics Co.,Ltd., Integrated Camera
         high speed, power 500 mA, config 1, rev 0.29, iSerialNumber
0x0001
         driver: uvideo0

IMG_0491.jpeg (685K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Kernel crash in IWM driver after resume from sleep on -current

Stefan Sperling-5
On Thu, Sep 13, 2018 at 08:44:05AM -0400, Xavier Guerin wrote:

> >Synopsis: Kernel crash in IWM driver after resume from sleep
> >Category: kernel
> >Environment:
> System      : OpenBSD 6.4
> Details     : OpenBSD 6.4-beta (GENERIC.MP) #292: Mon Sep 10
> 18:26:22 MDT 2018
> [hidden email]:/usr/src/sys/arch/am
> d64/compile/GENERIC.MP
>
> Architecture: OpenBSD.amd64
> Machine     : amd64
> >Description:
> IWM is connected to a 802.11G network then the machine is put
> to sleep.
> After several hours, when the machine resumes it ends up in DDB
> after maybe
> 5 minutes crashing in the IWM driver.

> >How-To-Repeat:
> 1. Connect the IWM device to a network.
> 2. Put the machine to sleep.
> 3. Wait for some hours.
> 4. Resume.
> >Fix:
> Disconnect from WiFi before sleeping.

If 5 minutes have passed since resume it seems unlikely that a
suspend/resume cycle would have anything to do with this crash.
Note that suspend/resume completely powers down the device and
will cause a new association to be created with an access point.
The post-resume state is as good as new, just like after a reboot.

The DDB trace looks more like the driver was reading or writing
beyond buffers when the firmware received some particular frame.

Can you try this diff? It makes the input length checks in
this driver a bit more precise and careful.

Index: if_iwm.c
===================================================================
RCS file: /cvs/src/sys/dev/pci/if_iwm.c,v
retrieving revision 1.231
diff -u -p -r1.231 if_iwm.c
--- if_iwm.c 13 Aug 2018 15:05:31 -0000 1.231
+++ if_iwm.c 13 Sep 2018 16:45:25 -0000
@@ -7111,7 +7111,7 @@ iwm_rx_mpdu(struct iwm_softc *sc, struct
  struct ifnet *ifp = IC2IFP(ic);
  struct iwm_rx_packet *pkt;
  struct iwm_rx_mpdu_res_start *rx_res;
- uint32_t len;
+ uint16_t len;
  uint32_t rx_pkt_status;
  int rxfail;
 
@@ -7124,14 +7124,16 @@ iwm_rx_mpdu(struct iwm_softc *sc, struct
  m_freem(m);
  return;
  }
- if (len > maxlen) {
+ if (len + sizeof(*rx_res) + sizeof(rx_pkt_status) > maxlen ||
+    len > IEEE80211_MAX_LEN) {
  IC2IFP(ic)->if_ierrors++;
  m_freem(m);
  return;
  }
 
- rx_pkt_status = le32toh(*(uint32_t *)(pkt->data +
-    sizeof(*rx_res) + len));
+ memcpy(&rx_pkt_status, pkt->data + sizeof(*rx_res) + len,
+    sizeof(rx_pkt_status));
+ rx_pkt_status = le32toh(rx_pkt_status);
  rxfail = ((rx_pkt_status & IWM_RX_MPDU_RES_STATUS_CRC_OK) == 0 ||
     (rx_pkt_status & IWM_RX_MPDU_RES_STATUS_OVERRUN_OK) == 0);
  if (rxfail) {
@@ -7156,7 +7158,7 @@ iwm_rx_pkt(struct iwm_softc *sc, struct
  struct iwm_rx_packet *pkt;
  uint32_t offset = 0, nmpdu = 0, len;
  struct mbuf *m0;
- const uint32_t minsz = sizeof(uint32_t) + sizeof(struct iwm_cmd_header);
+ const size_t minsz = sizeof(pkt->len_n_flags) + sizeof(pkt->hdr);
  int qid, idx, code;
 
  bus_dmamap_sync(sc->sc_dmat, data->map, 0, IWM_RBUF_SIZE,
@@ -7175,11 +7177,10 @@ iwm_rx_pkt(struct iwm_softc *sc, struct
  break;
  }
 
- len = le32toh(pkt->len_n_flags) & IWM_FH_RSCSR_FRAME_SIZE_MSK;
- if (len < sizeof(struct iwm_cmd_header) ||
-    len > (IWM_RBUF_SIZE - offset))
+ len = iwm_rx_packet_len(pkt);
+ if (len < sizeof(pkt->hdr) ||
+    len > (IWM_RBUF_SIZE - offset - minsz))
  break;
- len += sizeof(uint32_t); /* account for status word */
 
  if (code == IWM_REPLY_RX_MPDU_CMD && ++nmpdu == 1) {
  /* Take mbuf m0 off the RX ring. */
@@ -7207,7 +7208,7 @@ iwm_rx_pkt(struct iwm_softc *sc, struct
  }
  m_adj(m, offset);
 
- iwm_rx_mpdu(sc, m, IWM_RBUF_SIZE - offset);
+ iwm_rx_mpdu(sc, m, IWM_RBUF_SIZE - offset - minsz);
  break;
  }
 
@@ -7452,6 +7453,7 @@ iwm_rx_pkt(struct iwm_softc *sc, struct
  iwm_cmd_done(sc, pkt);
  }
 
+ len += sizeof(pkt->len_n_flags);
  offset += roundup(len, IWM_FH_RSCSR_FRAME_ALIGN);
  }
 


Reply | Threaded
Open this post in threaded view
|

Re: Kernel crash in IWM driver after resume from sleep on -current

Xavier Guerin-2
On Thu, 2018-09-13 at 19:52 +0200, Stefan Sperling wrote:

> On Thu, Sep 13, 2018 at 08:44:05AM -0400, Xavier Guerin wrote:
> > > Synopsis: Kernel crash in IWM driver after resume from sleep
> > > Category: kernel
> > > Environment:
> >
> > System      : OpenBSD 6.4
> > Details     : OpenBSD 6.4-beta (GENERIC.MP) #292: Mon Sep 10
> > 18:26:22 MDT 2018
> > [hidden email]:/usr/src/sys/arch/am
> > d64/compile/GENERIC.MP
> >
> > Architecture: OpenBSD.amd64
> > Machine     : amd64
> > > Description:
> >
> > IWM is connected to a 802.11G network then the machine is put
> > to sleep.
> > After several hours, when the machine resumes it ends up in DDB
> > after maybe
> > 5 minutes crashing in the IWM driver.
> > > How-To-Repeat:
> >
> > 1. Connect the IWM device to a network.
> > 2. Put the machine to sleep.
> > 3. Wait for some hours.
> > 4. Resume.
> > > Fix:
> >
> > Disconnect from WiFi before sleeping.
>
> If 5 minutes have passed since resume it seems unlikely that a
> suspend/resume cycle would have anything to do with this crash.
> Note that suspend/resume completely powers down the device and
> will cause a new association to be created with an access point.
> The post-resume state is as good as new, just like after a reboot.
>
> The DDB trace looks more like the driver was reading or writing
> beyond buffers when the firmware received some particular frame.
>
> Can you try this diff? It makes the input length checks in
> this driver a bit more precise and careful.

Sure thing, thanks. I'm building a patched kernel as we speak and will
reboot into it later today. I'll let you know if the problem occurs
again.

Reply | Threaded
Open this post in threaded view
|

Re: Kernel crash in IWM driver after resume from sleep on -current

Xavier Guerin-2
In reply to this post by Stefan Sperling-5
On Thu, 2018-09-13 at 19:52 +0200, Stefan Sperling wrote:

> On Thu, Sep 13, 2018 at 08:44:05AM -0400, Xavier Guerin wrote:
> > > Synopsis: Kernel crash in IWM driver after resume from sleep
> > > Category: kernel
> > > Environment:
> >
> > System      : OpenBSD 6.4
> > Details     : OpenBSD 6.4-beta (GENERIC.MP) #292: Mon Sep 10
> > 18:26:22 MDT 2018
> > [hidden email]:/usr/src/sys/arch/am
> > d64/compile/GENERIC.MP
> >
> > Architecture: OpenBSD.amd64
> > Machine     : amd64
> > > Description:
> >
> > IWM is connected to a 802.11G network then the machine is put
> > to sleep.
> > After several hours, when the machine resumes it ends up in DDB
> > after maybe
> > 5 minutes crashing in the IWM driver.
> > > How-To-Repeat:
> >
> > 1. Connect the IWM device to a network.
> > 2. Put the machine to sleep.
> > 3. Wait for some hours.
> > 4. Resume.
> > > Fix:
> >
> > Disconnect from WiFi before sleeping.
>
> If 5 minutes have passed since resume it seems unlikely that a
> suspend/resume cycle would have anything to do with this crash.
> Note that suspend/resume completely powers down the device and
> will cause a new association to be created with an access point.
> The post-resume state is as good as new, just like after a reboot.
>
> The DDB trace looks more like the driver was reading or writing
> beyond buffers when the firmware received some particular frame.
>
> Can you try this diff? It makes the input length checks in
> this driver a bit more precise and careful.

I'm running the patched kernel since yesterday and the issue has not
shown up yet even after a whole bunch of sleep/resume cycles of various
duration.

I'll keep you updated again after the week-end.

Reply | Threaded
Open this post in threaded view
|

Re: Kernel crash in IWM driver after resume from sleep on -current

Xavier Guerin-2
> On Sep 14, 2018, at 11:16 AM, Xavier Guerin <[hidden email]> wrote:
>
> On Thu, 2018-09-13 at 19:52 +0200, Stefan Sperling wrote:
>> On Thu, Sep 13, 2018 at 08:44:05AM -0400, Xavier Guerin wrote:
>>>> Synopsis: Kernel crash in IWM driver after resume from sleep
>>>> Category: kernel
>>>> Environment:
>>>
>>> System      : OpenBSD 6.4
>>> Details     : OpenBSD 6.4-beta (GENERIC.MP) #292: Mon Sep 10
>>> 18:26:22 MDT 2018
>>> [hidden email]:/usr/src/sys/arch/am
>>> d64/compile/GENERIC.MP
>>>
>>> Architecture: OpenBSD.amd64
>>> Machine     : amd64
>>>> Description:
>>>
>>> IWM is connected to a 802.11G network then the machine is put
>>> to sleep.
>>> After several hours, when the machine resumes it ends up in DDB
>>> after maybe
>>> 5 minutes crashing in the IWM driver.
>>>> How-To-Repeat:
>>>
>>> 1. Connect the IWM device to a network.
>>> 2. Put the machine to sleep.
>>> 3. Wait for some hours.
>>> 4. Resume.
>>>> Fix:
>>>
>>> Disconnect from WiFi before sleeping.
>>
>> If 5 minutes have passed since resume it seems unlikely that a
>> suspend/resume cycle would have anything to do with this crash.
>> Note that suspend/resume completely powers down the device and
>> will cause a new association to be created with an access point.
>> The post-resume state is as good as new, just like after a reboot.
>>
>> The DDB trace looks more like the driver was reading or writing
>> beyond buffers when the firmware received some particular frame.
>>
>> Can you try this diff? It makes the input length checks in
>> this driver a bit more precise and careful.
>
> I'm running the patched kernel since yesterday and the issue has not
> shown up yet even after a whole bunch of sleep/resume cycles of various
> duration.
>
> I'll keep you updated again after the week-end.

Everything good so far. The problem has not re-occured.
Reply | Threaded
Open this post in threaded view
|

Re: Kernel crash in IWM driver after resume from sleep on -current

Stefan Sperling-5
On Mon, Sep 17, 2018 at 10:23:14AM -0400, "Xavier R. Guérin" wrote:
> > On Sep 14, 2018, at 11:16 AM, Xavier Guerin <[hidden email]> wrote:
> > I'm running the patched kernel since yesterday and the issue has not
> > shown up yet even after a whole bunch of sleep/resume cycles of various
> > duration.
> >
> > I'll keep you updated again after the week-end.
>
> Everything good so far. The problem has not re-occured.

Thanks for reporting back. I committed the fix yesterday.

Reply | Threaded
Open this post in threaded view
|

Re: Kernel crash in IWM driver after resume from sleep on -current

Gregor Best-2
In reply to this post by Stefan Sperling-5

Hi,

  iwm0 at pci2 dev 0 function 0 "Intel Dual Band Wireless AC 8260" rev 0x3a, msi
  iwm0: hw rev 0x200, fw ver 16.242414.0, address e4:a4:71:e1:45:79

this patch seems to have caused increased error rates (IERRS in systat)
  OpenBSD 6.4-beta (GENERIC.MP) #312: Fri Sep 21 14:12:02 MDT 2018

for my

 with a snapshot from today:

I've reverted the patch and error rates are down again. Before,
`pkg_add -u` would barely progress, now it's smooth and fast as always.

> [...]
> Can you try this diff? It makes the input length checks in
> this driver a bit more precise and careful.
>
> Index: if_iwm.c
> ===================================================================
> RCS file: /cvs/src/sys/dev/pci/if_iwm.c,v
> retrieving revision 1.231
> diff -u -p -r1.231 if_iwm.c
> --- if_iwm.c 13 Aug 2018 15:05:31 -0000 1.231
> +++ if_iwm.c 13 Sep 2018 16:45:25 -0000
> @@ -7111,7 +7111,7 @@ iwm_rx_mpdu(struct iwm_softc *sc, struct
>   struct ifnet *ifp = IC2IFP(ic);
>   struct iwm_rx_packet *pkt;
>   struct iwm_rx_mpdu_res_start *rx_res;
> - uint32_t len;
> + uint16_t len;
>   uint32_t rx_pkt_status;
>   int rxfail;
>
> @@ -7124,14 +7124,16 @@ iwm_rx_mpdu(struct iwm_softc *sc, struct
>   m_freem(m);
>   return;
>   }
> - if (len > maxlen) {
> + if (len + sizeof(*rx_res) + sizeof(rx_pkt_status) > maxlen ||
> +    len > IEEE80211_MAX_LEN) {
>   IC2IFP(ic)->if_ierrors++;
>   m_freem(m);
>   return;
>   }
>
> - rx_pkt_status = le32toh(*(uint32_t *)(pkt->data +
> -    sizeof(*rx_res) + len));
> + memcpy(&rx_pkt_status, pkt->data + sizeof(*rx_res) + len,
> +    sizeof(rx_pkt_status));
> + rx_pkt_status = le32toh(rx_pkt_status);
>   rxfail = ((rx_pkt_status & IWM_RX_MPDU_RES_STATUS_CRC_OK) == 0 ||
>      (rx_pkt_status & IWM_RX_MPDU_RES_STATUS_OVERRUN_OK) == 0);
>   if (rxfail) {
> @@ -7156,7 +7158,7 @@ iwm_rx_pkt(struct iwm_softc *sc, struct
>   struct iwm_rx_packet *pkt;
>   uint32_t offset = 0, nmpdu = 0, len;
>   struct mbuf *m0;
> - const uint32_t minsz = sizeof(uint32_t) + sizeof(struct iwm_cmd_header);
> + const size_t minsz = sizeof(pkt->len_n_flags) + sizeof(pkt->hdr);
>   int qid, idx, code;
>
>   bus_dmamap_sync(sc->sc_dmat, data->map, 0, IWM_RBUF_SIZE,
> @@ -7175,11 +7177,10 @@ iwm_rx_pkt(struct iwm_softc *sc, struct
>   break;
>   }
>
> - len = le32toh(pkt->len_n_flags) & IWM_FH_RSCSR_FRAME_SIZE_MSK;
> - if (len < sizeof(struct iwm_cmd_header) ||
> -    len > (IWM_RBUF_SIZE - offset))
> + len = iwm_rx_packet_len(pkt);
> + if (len < sizeof(pkt->hdr) ||
> +    len > (IWM_RBUF_SIZE - offset - minsz))
>   break;
> - len += sizeof(uint32_t); /* account for status word */
>
>   if (code == IWM_REPLY_RX_MPDU_CMD && ++nmpdu == 1) {
>   /* Take mbuf m0 off the RX ring. */
> @@ -7207,7 +7208,7 @@ iwm_rx_pkt(struct iwm_softc *sc, struct
>   }
>   m_adj(m, offset);
>
> - iwm_rx_mpdu(sc, m, IWM_RBUF_SIZE - offset);
> + iwm_rx_mpdu(sc, m, IWM_RBUF_SIZE - offset - minsz);
>   break;
>   }
>
> @@ -7452,6 +7453,7 @@ iwm_rx_pkt(struct iwm_softc *sc, struct
>   iwm_cmd_done(sc, pkt);
>   }
>
> + len += sizeof(pkt->len_n_flags);
>   offset += roundup(len, IWM_FH_RSCSR_FRAME_ALIGN);
>   }
>
>
>

--
        Gregor

Reply | Threaded
Open this post in threaded view
|

Re: Kernel crash in IWM driver after resume from sleep on -current

Stefan Sperling-5
On Sat, Sep 22, 2018 at 04:25:03PM +0200, Gregor Best wrote:

>
> Hi,
>
>   iwm0 at pci2 dev 0 function 0 "Intel Dual Band Wireless AC 8260" rev 0x3a, msi
>   iwm0: hw rev 0x200, fw ver 16.242414.0, address e4:a4:71:e1:45:79
>
> this patch seems to have caused increased error rates (IERRS in systat)
>   OpenBSD 6.4-beta (GENERIC.MP) #312: Fri Sep 21 14:12:02 MDT 2018
>
> for my
>
>  with a snapshot from today:
>
> I've reverted the patch and error rates are down again. Before,
> `pkg_add -u` would barely progress, now it's smooth and fast as always.

Thanks for letting us know. The patch has already been reverted in -current.