AMDGPU

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

AMDGPU

Charlie Burnett
Hi,

Wasn’t sure who to tell this to, but with Vega 20 hardware under -current,
there is an issue with the firmware, where it cannot load. Manually
installing the latest amdgpu firmware from kernel.org fixes this seemingly.
There's also an issue that I've been unable to figure out for a while here
as well, in that undergoing a CPU intensive task will freeze up the entire
system. Disabling all power management options and setting the
amdgpu_vm_update_mode to 3 lessens the occurrence of this, and using an
HDMI connection instead of a DisplayPort with said modifications seemingly
eliminates it. Just switching amdgpu_vm_update_mode to 3 without anything
else leads to issues, in which when launching X in which only a small
square of seemingly random pixels are displayed. Using a vanilla kernel,
only "Waiting for fences timed out!" appears. However, turning on
amdgpu_debug_vm in amdgpu_drv.c will output quite a few DRM errors for
"gmc_v9_0_process_interrupt", sometimes in the tens of thousands. Any hang
ups require a hard reboot. With amdgpu_vm_update_mode set to 3, the crash
occurs differently in that whichever windows are using a bunch of GPU/CPU
time turn a lime green color. They're completely functional at first,
however if I keep putting heavy loads on both the screen becomes pixelated
on any changed pixels for those windows. I have a huge amount of logs for
these, however from a couple weeks of trying to fix it myself they didn't
offer much beyond what was stated in this email.

Best regards,
Charlie
Reply | Threaded
Open this post in threaded view
|

Re: AMDGPU

Jonathan Gray-11
On Mon, Jun 29, 2020 at 11:13:49PM -0500, Charlie Burnett wrote:
> Hi,
>
> Wasn’t sure who to tell this to, but with Vega 20 hardware under -current,
> there is an issue with the firmware, where it cannot load. Manually
> installing the latest amdgpu firmware from kernel.org fixes this seemingly.

can you show the output when the 20200421 firmware failed to load?
you are referring to the following in linux-firmware 20200619 and later?

commit f73f82cd4b7506a22a9aa1aa19e009fac3092eef
Author: Alex Deucher <[hidden email]>
Date:   Mon Jun 15 17:33:26 2020 -0400

    amdgpu: add vega20 TA firmware from 20.20 release
   
    Based on internal commit:
    c6aa2bdaa30af815fc257f2b0e50f6c66d74045c
   
    Signed-off-by: Alex Deucher <[hidden email]>
    Signed-off-by: Josh Boyer <[hidden email]>

 amdgpu/vega20_ta.bin | Bin 0 -> 54016 bytes
 1 file changed, 0 insertions(+), 0 deletions(-)

commit 9ecaba882d78501d2ab2f6bd9407409128b351ed
Author: Alex Deucher <[hidden email]>
Date:   Mon Jun 15 17:30:20 2020 -0400

    amdgpu: update vega20 firmware from 20.20 release
   
    Based on internal commit:
    c6aa2bdaa30af815fc257f2b0e50f6c66d74045c
   
    Signed-off-by: Alex Deucher <[hidden email]>
    Signed-off-by: Josh Boyer <[hidden email]>

 amdgpu/vega20_asd.bin   | Bin 147968 -> 160256 bytes
 amdgpu/vega20_ce.bin    | Bin 9344 -> 9344 bytes
 amdgpu/vega20_me.bin    | Bin 17536 -> 17536 bytes
 amdgpu/vega20_mec.bin   | Bin 268048 -> 268048 bytes
 amdgpu/vega20_mec2.bin  | Bin 268048 -> 268048 bytes
 amdgpu/vega20_pfp.bin   | Bin 21632 -> 21632 bytes
 amdgpu/vega20_sdma.bin  | Bin 17408 -> 17408 bytes
 amdgpu/vega20_sdma1.bin | Bin 17408 -> 17408 bytes
 amdgpu/vega20_smc.bin   | Bin 262912 -> 262912 bytes
 amdgpu/vega20_sos.bin   | Bin 170896 -> 174992 bytes
 10 files changed, 0 insertions(+), 0 deletions(-)

> There's also an issue that I've been unable to figure out for a while here
> as well, in that undergoing a CPU intensive task will freeze up the entire
> system. Disabling all power management options and setting the
> amdgpu_vm_update_mode to 3 lessens the occurrence of this, and using an
> HDMI connection instead of a DisplayPort with said modifications seemingly
> eliminates it. Just switching amdgpu_vm_update_mode to 3 without anything
> else leads to issues, in which when launching X in which only a small
> square of seemingly random pixels are displayed. Using a vanilla kernel,
> only "Waiting for fences timed out!" appears. However, turning on
> amdgpu_debug_vm in amdgpu_drv.c will output quite a few DRM errors for
> "gmc_v9_0_process_interrupt", sometimes in the tens of thousands. Any hang
> ups require a hard reboot. With amdgpu_vm_update_mode set to 3, the crash
> occurs differently in that whichever windows are using a bunch of GPU/CPU
> time turn a lime green color. They're completely functional at first,
> however if I keep putting heavy loads on both the screen becomes pixelated
> on any changed pixels for those windows. I have a huge amount of logs for
> these, however from a couple weeks of trying to fix it myself they didn't
> offer much beyond what was stated in this email.

this is similar to what is seen on vega10 and other parts

Reply | Threaded
Open this post in threaded view
|

Re: AMDGPU

Charlie Burnett
For sure, whatever helps!
Jun 27 18:58:21 tabr /bsd: [drm] *ERROR* sdma_v4_0: Failed to load firmware
"amdgpu/vega20_sdma.bin"
Jun 27 18:58:21 tabr /bsd: [drm] *ERROR* Failed to load sdma firmware!
Jun 27 18:58:21 tabr /bsd: drm:pid0:psp_v11_0_init_microcode *ERROR* psp
v11.0: Failed to load firmware "amdgpu/vega20_sos.bin"
Jun 27 18:58:21 tabr /bsd: [drm] *ERROR* Failed to load psp firmware!
Jun 27 18:58:21 tabr /bsd: [drm] *ERROR* sw_init of IP block <psp> failed -2
Jun 27 18:58:21 tabr /bsd: drm:pid0:amdgpu_device_init *ERROR*
amdgpu_device_ip_init failed
Jun 27 18:58:21 tabr /bsd: drm:pid0:amdgpu_attachhook *ERROR* Fatal error
during GPU init
That's with the old firmware, and yeah that's with the newest firmware. I
had to use newer firmware on your newdrm branch as well. Let me know how I
can help! :)


On Mon, Jun 29, 2020 at 11:50 PM Jonathan Gray <[hidden email]> wrote:

> On Mon, Jun 29, 2020 at 11:13:49PM -0500, Charlie Burnett wrote:
> > Hi,
> >
> > Wasn’t sure who to tell this to, but with Vega 20 hardware under
> -current,
> > there is an issue with the firmware, where it cannot load. Manually
> > installing the latest amdgpu firmware from kernel.org fixes this
> seemingly.
>
> can you show the output when the 20200421 firmware failed to load?
> you are referring to the following in linux-firmware 20200619 and later?
>
> commit f73f82cd4b7506a22a9aa1aa19e009fac3092eef
> Author: Alex Deucher <[hidden email]>
> Date:   Mon Jun 15 17:33:26 2020 -0400
>
>     amdgpu: add vega20 TA firmware from 20.20 release
>
>     Based on internal commit:
>     c6aa2bdaa30af815fc257f2b0e50f6c66d74045c
>
>     Signed-off-by: Alex Deucher <[hidden email]>
>     Signed-off-by: Josh Boyer <[hidden email]>
>
>  amdgpu/vega20_ta.bin | Bin 0 -> 54016 bytes
>  1 file changed, 0 insertions(+), 0 deletions(-)
>
> commit 9ecaba882d78501d2ab2f6bd9407409128b351ed
> Author: Alex Deucher <[hidden email]>
> Date:   Mon Jun 15 17:30:20 2020 -0400
>
>     amdgpu: update vega20 firmware from 20.20 release
>
>     Based on internal commit:
>     c6aa2bdaa30af815fc257f2b0e50f6c66d74045c
>
>     Signed-off-by: Alex Deucher <[hidden email]>
>     Signed-off-by: Josh Boyer <[hidden email]>
>
>  amdgpu/vega20_asd.bin   | Bin 147968 -> 160256 bytes
>  amdgpu/vega20_ce.bin    | Bin 9344 -> 9344 bytes
>  amdgpu/vega20_me.bin    | Bin 17536 -> 17536 bytes
>  amdgpu/vega20_mec.bin   | Bin 268048 -> 268048 bytes
>  amdgpu/vega20_mec2.bin  | Bin 268048 -> 268048 bytes
>  amdgpu/vega20_pfp.bin   | Bin 21632 -> 21632 bytes
>  amdgpu/vega20_sdma.bin  | Bin 17408 -> 17408 bytes
>  amdgpu/vega20_sdma1.bin | Bin 17408 -> 17408 bytes
>  amdgpu/vega20_smc.bin   | Bin 262912 -> 262912 bytes
>  amdgpu/vega20_sos.bin   | Bin 170896 -> 174992 bytes
>  10 files changed, 0 insertions(+), 0 deletions(-)
>
> > There's also an issue that I've been unable to figure out for a while
> here
> > as well, in that undergoing a CPU intensive task will freeze up the
> entire
> > system. Disabling all power management options and setting the
> > amdgpu_vm_update_mode to 3 lessens the occurrence of this, and using an
> > HDMI connection instead of a DisplayPort with said modifications
> seemingly
> > eliminates it. Just switching amdgpu_vm_update_mode to 3 without anything
> > else leads to issues, in which when launching X in which only a small
> > square of seemingly random pixels are displayed. Using a vanilla kernel,
> > only "Waiting for fences timed out!" appears. However, turning on
> > amdgpu_debug_vm in amdgpu_drv.c will output quite a few DRM errors for
> > "gmc_v9_0_process_interrupt", sometimes in the tens of thousands. Any
> hang
> > ups require a hard reboot. With amdgpu_vm_update_mode set to 3, the crash
> > occurs differently in that whichever windows are using a bunch of GPU/CPU
> > time turn a lime green color. They're completely functional at first,
> > however if I keep putting heavy loads on both the screen becomes
> pixelated
> > on any changed pixels for those windows. I have a huge amount of logs for
> > these, however from a couple weeks of trying to fix it myself they didn't
> > offer much beyond what was stated in this email.
>
> this is similar to what is seen on vega10 and other parts
>
Reply | Threaded
Open this post in threaded view
|

Re: AMDGPU

Jonathan Gray-11
On Mon, Jun 29, 2020 at 11:59:28PM -0500, Charlie Burnett wrote:

> For sure, whatever helps!
> Jun 27 18:58:21 tabr /bsd: [drm] *ERROR* sdma_v4_0: Failed to load firmware
> "amdgpu/vega20_sdma.bin"
> Jun 27 18:58:21 tabr /bsd: [drm] *ERROR* Failed to load sdma firmware!
> Jun 27 18:58:21 tabr /bsd: drm:pid0:psp_v11_0_init_microcode *ERROR* psp
> v11.0: Failed to load firmware "amdgpu/vega20_sos.bin"
> Jun 27 18:58:21 tabr /bsd: [drm] *ERROR* Failed to load psp firmware!
> Jun 27 18:58:21 tabr /bsd: [drm] *ERROR* sw_init of IP block <psp> failed -2
> Jun 27 18:58:21 tabr /bsd: drm:pid0:amdgpu_device_init *ERROR*
> amdgpu_device_ip_init failed
> Jun 27 18:58:21 tabr /bsd: drm:pid0:amdgpu_attachhook *ERROR* Fatal error
> during GPU init
> That's with the old firmware, and yeah that's with the newest firmware. I
> had to use newer firmware on your newdrm branch as well. Let me know how I
> can help! :)

Thanks, I've updated the port to 20200619 after checking vega10 and
picasso.  It will later be available via fw_update(1).