Thinkpad X1 reproducible suspend panic

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Thinkpad X1 reproducible suspend panic

Edd Barrett-3
Hi,

Today I spent some time trying to debug the suspend issues on the
Thinkpad X1 5th generation. I started with the intention of tracking
down an old bug [1], but I think I've found a new one.

To trigger the bug, install a recent snapshot. I've managed to get this
panic using two kernels:
---8<---
OpenBSD 6.4-current (GENERIC.MP) #374: Sun Oct 21 00:04:11 MDT 2018
    [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
--->8---

and:
---8<---
OpenBSD 6.4-current (GENERIC.MP) #436: Sun Nov 11 23:59:55 MST 2018
    [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
--->8---

Install to a USB thumb drive with a single 'a' label (relevant?):
---8<---
sd1 at scsibus2 targ 1 lun 0: <, USB DISK 3.0, PMAP> SCSI4 0/direct removable serial.13fe55006606C86B7615
sd1: 15120MB, 512 bytes/sector, 30965760 sectors
--->8---

I used a USB drive so that I wouldn't trash my everyday filesystems by
constant dirty shutdowns. However, it seems essential for reproduction.
If I boot a recent snap off my SSD, this bug does not manifest (but
perhaps [1] is related).

If I run `zzz`, the LED indicator will blink, the screen will go blank,
and that's it. Usually when this system suspends properly, the LED fades
up and down -- this does not happen this time.

When I spoke to krw@ earlier in the year, he shared a debugging tactic
that involves using the LED status indicator to see where the kernel
hangs. Via this tactic I narrowed the bug to this call in acpi.c:

---8<---
#endif /* HIBERNATE */

        sensor_quiesce();
        if (config_suspend_all(DVACT_QUIESCE)) // <-- THIS LINE HANGS
                goto fail_quiesce;

        vfs_stall(curproc, 1);
#if NSOFTRAID > 0
--->8---

After this I tried disabling devices via `boot -c` to see if I could
find which one causes the hang. I tried (off the top of my head) nvme,
uhid, iwm, em, uvideo, pms and inteldrm.

Interestingly when disable inteldrm, I see a panic on suspend. Sorry, no
serial line, but here are photos of the panic, ddb traces and ps:
http://theunixzoo.co.uk/random/zzz-panic.tar.gz

I think it's likely that this panic has always been happening, but
disabling inteldrm is allowing us to see ddb.

I need to spend some more time on this, but I wanted to share initial
findings. If anyone has any ideas, please shout!

Cheers

[1]: https://marc.info/?l=openbsd-bugs&m=151575724607508&w=2

--
Best Regards
Edd Barrett

http://www.theunixzoo.co.uk

Reply | Threaded
Open this post in threaded view
|

Re: Thinkpad X1 reproducible suspend panic

Ted Unangst-6
Edd Barrett wrote:
> I used a USB drive so that I wouldn't trash my everyday filesystems by
> constant dirty shutdowns. However, it seems essential for reproduction.
> If I boot a recent snap off my SSD, this bug does not manifest (but

this will never work. USB devices are ejected during suspend. there may be
something else going on, but if it can only be reproduced in a known bad
config, it's not worth pursuing.