Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix rocminfo when run within docker environments. #49

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

iotamudelta
Copy link
Contributor

@iotamudelta iotamudelta commented Nov 8, 2021

Currently, rocminfo will fail when executed inside a docker container
due to being unable to lsmod inside docker. This has impacts on rocprofiler use.

Fix this behavior by querying initstate of the amdgpu module from
/sys/module/amdgpu instead. If initstate is marked "live" everything is
fine - error out with either "not loaded" (initstate file does not
exist) or "not live" (initstate file does not contain "live" string).

Currently, rocminfo will fail when executed inside a docker container
due to being unable to lsmod inside docker. This has impacts on rocprofiler use.

Fix this behavior by querying initstate of the amdgpu module from
/sys/module/amdgpu instead. If initstate is marked "live" everything if
fine - error out with either "not loaded" (initstate file does not
exist) or "not live" (initstate file does not contain "live" string).
@Maxzor
Copy link

Maxzor commented Jan 1, 2022

Can someone review this please? LGTM, and the current check is pretty dumb.
This is cumbersome, had to do all kinds of wizardry to bypass that.
You could add checks:

  • that /dev/dri and /dev/kfd are mounted
  • that the current user belongs to render&video groups too to prevent "false positives".
    Meaning, rocminfo succeeds while further programs above in the stack won't work.

Copy link
Contributor

@dayatsin-amd dayatsin-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I will submit the patch from our internal repo

@susman
Copy link

susman commented Oct 30, 2023

Querying /sys/module/amdgpu/initstate isn't a good solution either. It doesn't work on systems with the module built-in.
One suggestion would be to look for 0x1002 in /sys/class/drm/card*/device/vendor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants