-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi GPU support for scene classification #1997
Comments
For reference: @oscar-davids shared this comment from the TF docs that indicates virtual device configuration is per process and not per session. |
Ah I think I misunderstood why the original TF issue around this was closed earlier. IIUC this means we'd need to spawn separate UNIX processes (i.e. orchestrator nodes) to use different GPUs via the An alternative is mentioned in this comment, where the user can specify the GPU used while creating the model: tensorflow/tensorflow#18861 (comment) I believe @oscar-davids did something similar in the original detection PR in our FFmpeg fork, which got subsequently removed in the re-design as I wanted to minimize changes in the TF backend. Maybe we can explore that solution again, if spawning different O nodes per GPU is not feasible. |
I think we should first see if there is a solution that allows a single O to utilize multiple GPUs as running separate Os for each GPU would be a pain from a UX POV. |
Given that using different settings in different sessions within same process is not safe, I used this to set deviceID and tested it on VNO machine and coreweave GPU. Here is the linked branch. The test setup are summarized below:
I beleive the failure on coreweave could be either due to:
|
I have implemented per-graph GPU selection in DNN filter using suggestions above, and it seem to work fine. Do we want to add a separate set of arguments for controlling detection GPU selection to
|
Hm having different modes could be interesting. But, for now, I suggest implementing the shared mode first as the default without the ability to switch between modes. Since we're also researching the possibility of using different model backends that have less VRAM overhead as a part of livepeer/FFmpeg#15 I think we can wait to consider a dedicated mode until after completing that research - the rationale being that if we eventually end up using a model backend with much lower VRAM overhead then the value of having a dedicated mode could be diminished. |
Closed by #2038 livepeer/lpms#264 livepeer/FFmpeg#16 |
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
At the moment, the node only supports scene classification on a single GPU even on a multi GPU machine. This is because we only setup detection on GPU 0:
The problems with only setting up detection on GPU 0 are:
Describe the solution you'd like
A clear and concise description of what you want to happen.
Add multi GPU support so that the node can run scene classification on any of the GPUs on a machine. Additionally, it would be preferable to have scene classification run on the same GPU that transcodes a segment because eventually when we remove CPU <> GPU data copying we'll want to do this anyway.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
N/A
Additional context
Add any other context or screenshots about the feature request here.
I think we ran into #1980 when we tried to use multiple GPUs recently. Unclear why. We should see if that issue is a blocker for multi GPU support.
The text was updated successfully, but these errors were encountered: