Re: ACPI interrupt storm on ThinkPad T480s [was: intermittent sluggish behavior]

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Re: ACPI interrupt storm on ThinkPad T480s [was: intermittent sluggish behavior]

Martin Brandenburg
> From mkb Sun Feb 25 14:19:33 2018
> From: [hidden email]
> To: [hidden email]
> Subject: intermittent sluggish behavior; seems to be acpi related
>
> >Synopsis: intermittent sluggish behavior; seems to be acpi related
> >Category: amd64
> >Environment:
> System      : OpenBSD 6.2
> Details     : OpenBSD 6.2-current (GENERIC.MP) #10: Wed Feb 21 21:26:27 MST 2018
> [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>
> Architecture: OpenBSD.amd64
> Machine     : amd64
> >Description:
> This is a Lenovo ThinkPad T480s.
>
> Sometimes, when I boot my system, it runs great.  Other times, it runs
> very sluggishly.  I'd say it's good about half the time.  I've
> narrowed this down to ACPI on the following evidence.
>
> A good boot:
> $ uptime && ps auxwwk | grep acpi0
> 5:53PM  up 18 mins, 1 user, load averages: 0.00, 0.00, 0.00
> root     45527  0.0  0.0     0     0 ??  DK     5:35PM    0:00.22 (acpi0)
>
> A bad boot:
> $ uptime && ps auxwwk | grep acpi0
> 4:45PM  up 18 mins, 1 user, load averages: 1.03, 1.00, 0.75
> root     97711 87.0  0.0     0     0 ??  DK     4:27PM   15:43.95 (acpi0)
>
> The system runs very sluggishly on a bad boot.  Starting an xterm
> should be and is, on a good boot, instant.  On a bad boot, it takes
> 10 seconds or so.  Clearly something is wrong, but I haven't been able
> to pinpoint what exactly is wrong.
>
> Here's the acpidump output:
>
> http://www.martinbrandenburg.com/2018/acpi.tar.gz
>
> In an effort to find the problem, I enabled ACPI_DEBUG.  I couldn't
> make any sense, and I'm afraid too much has scrolled off the top, but
> in case any of it is useful, here it is:
>
> http://www.martinbrandenburg.com/2018/bad.dmesg
> http://www.martinbrandenburg.com/2018/good.dmesg
>
> This seems to be related to another problem.  Sometimes when I boot
> the BIOS outputs "Configuration changed -- restart the system" and
> does so.  I admit to not recording every instance, but it seems that
> when that occurs, the system is fine.  When the system boots without
> restarting, the system is sluggish and I have the problems described.
>
> I've booted a Linux live USB quite a few times, and never had this
> kind of trouble there.  As long as I don't boot OpenBSD, I never see
> the "Configuration changed -- restart the system" message.  But I'd
> much prefer to actually use OpenBSD.
>
> I can supply more information or run tests to gather more data if
> needed.
> >How-To-Repeat:
> Boot OpenBSD on a ThinkPad T480s and possibly other newer ThinkPads
> until the problem occurs.
> >Fix:
> Unknown.
>
>

I have some more information.

I had noticed that the problem always shows up after suspending my
system.

Prior to suspend, vmstat -i shows

irq144/acpi0                      318        0

After suspend, the system gets sluggish and acpi0's CPU time explodes as
described.  Then running vmstat -i periodically over the course of about
10 minutes reveals that ACPI interrupts just go up and up.

irq144/acpi0                   282494      152
irq144/acpi0                   385191      197
irq144/acpi0                   436550      218
irq144/acpi0                   517509      250
irq144/acpi0                   600715      280
irq144/acpi0                   737721      325

Putting a printf in acpi_gpe revealed that excepting one at boot, no GPE
events occur until after suspend the system, where a deluge of _L6F show
up.

Decompiling the AML revealed this had something to do with Thunderbolt.

I don't think OpenBSD supports Thunderbolt, and I don't care to use it
anyway.  I went to the BIOS to disable it, but found an option "Enable
Thunderbolt BIOS Assist Mode" which purported to be necessary for older
versions of Windows and Linux.  I enabled it.

This seems to stop the problem after suspend.

However, I still occassionally see them when I first boot, before
attempting to suspend.  The printf starts before /etc/rc even starts
running.

I have a USB-C to VGA adapter

uhidev2 at uhub1 port 1 configuration 1 interface 1 "Lenovo Lenovo USB-C to VGA Adapter" rev 2.01/0.00 addr 2
uhidev2: iclass 3/0, 237 report ids
uhid0 at uhidev2 reportid 237: input=0, output=0, feature=80
ugen2 at uhub1 port 1 configuration 1 "Lenovo Lenovo USB-C to VGA Adapter" rev 2.01/0.00 addr 2

which also sometimes triggers it no matter whether the BIOS option is on
or off.  However, I have a similar USB-C to DisplayPort adapter which
does not.

I am now running the following patch, which at least makes the machine
usable and lets me see when the first bad interrupt has happened.
Obviously it isn't a real fix.

I'll update with more information if I find it.

Index: acpi.c
===================================================================
RCS file: /cvs/src/sys/dev/acpi/acpi.c,v
retrieving revision 1.340
diff -u -p -r1.340 acpi.c
--- acpi.c 19 Feb 2018 08:59:52 -0000 1.340
+++ acpi.c 5 Mar 2018 03:57:59 -0000
@@ -2179,6 +2179,15 @@ acpi_gpe(struct acpi_softc *sc, int gpe,
  struct aml_node *node = arg;
  uint8_t mask, en;
 
+ if (!sc->gpe_table[gpe].edge && gpe == 111) {
+ static unsigned short i;
+ if (i == 0) {
+ i++;
+ printf("acpi_gpe %d %s IGNORING\n", gpe, node->name);
+ }
+ } else {
+ printf("acpi_gpe %d %s\n", gpe, node->name);
+
  dnprintf(10, "handling GPE %.2x\n", gpe);
  aml_evalnode(sc, node, 0, NULL, NULL);
 
@@ -2187,6 +2196,7 @@ acpi_gpe(struct acpi_softc *sc, int gpe,
  acpi_write_pmreg(sc, ACPIREG_GPE_STS, gpe>>3, mask);
  en = acpi_read_pmreg(sc, ACPIREG_GPE_EN,  gpe>>3);
  acpi_write_pmreg(sc, ACPIREG_GPE_EN,  gpe>>3, en | mask);
+ }
  return (0);
 }