Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi GPU support for scene classification #1997

Closed
yondonfu opened this issue Aug 18, 2021 · 7 comments
Closed

Multi GPU support for scene classification #1997

yondonfu opened this issue Aug 18, 2021 · 7 comments
Assignees

Comments

@yondonfu
Copy link
Member

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

At the moment, the node only supports scene classification on a single GPU even on a multi GPU machine. This is because we only setup detection on GPU 0:

The problems with only setting up detection on GPU 0 are:

  • With CPU <> GPU data copying, all decompressed frames go from the CPU to GPU 0 on the same connection which might lead to more congestion on that connection given that it has a limited amount of bandwidth
  • Without CPU <> GPU data copying (we eventually want to get here), there will be no way for data to get from another GPU to GPU 0 so we'll need to be able to setup detection on those other GPUs
  • We're running inference for all streams on GPU 0 and not using the CUDA cores on the other GPUs for inference

Describe the solution you'd like
A clear and concise description of what you want to happen.

Add multi GPU support so that the node can run scene classification on any of the GPUs on a machine. Additionally, it would be preferable to have scene classification run on the same GPU that transcodes a segment because eventually when we remove CPU <> GPU data copying we'll want to do this anyway.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

N/A

Additional context
Add any other context or screenshots about the feature request here.

I think we ran into #1980 when we tried to use multiple GPUs recently. Unclear why. We should see if that issue is a blocker for multi GPU support.

@yondonfu
Copy link
Member Author

For reference:

@oscar-davids shared this comment from the TF docs that indicates virtual device configuration is per process and not per session.

@jailuthra
Copy link
Contributor

jailuthra commented Aug 20, 2021

Ah I think I misunderstood why the original TF issue around this was closed earlier. IIUC this means we'd need to spawn separate UNIX processes (i.e. orchestrator nodes) to use different GPUs via the visible_devices list.

An alternative is mentioned in this comment, where the user can specify the GPU used while creating the model: tensorflow/tensorflow#18861 (comment)

I believe @oscar-davids did something similar in the original detection PR in our FFmpeg fork, which got subsequently removed in the re-design as I wanted to minimize changes in the TF backend.

Maybe we can explore that solution again, if spawning different O nodes per GPU is not feasible.

@yondonfu
Copy link
Member Author

if spawning different O nodes per GPU is not feasible.

I think we should first see if there is a solution that allows a single O to utilize multiple GPUs as running separate Os for each GPU would be a pain from a UX POV.

@oscar-davids
Copy link
Contributor

oscar-davids commented Aug 31, 2021

Given that using different settings in different sessions within same process is not safe,
I think setting deviceID as session config using protobuf will result in unexpected behavior.
Instead, we can do something similar as suggested here and that's what I did before 4.4 upgrade.
The C API that is equivalent to this with tf.device('/gpu:1') in python is TF_ImportGraphDefOptionsSetDefaultDevice("/gpu:1")

I used this to set deviceID and tested it on VNO machine and coreweave GPU. Here is the linked branch.
I tested 12 concurrent streams(6 streams on each GPU) and it worked reliably. However, on Coreweave, it crashed after the number of concurrent streams exceed 3 on each GPU.

The test setup are summarized below:

  • VNO Server
    GPU: gtx1080 x 2
    CUDA version 11.1 + 11.2
    driver version 11.2
    Tensorflow Version: 2.5
  • Coreweave
    GPU: gtx 1070 x 6
    CUDA version: 10.1
    driver version: 11.3
    Tensorflow version: 2.3

I beleive the failure on coreweave could be either due to:

@cyberj0g cyberj0g self-assigned this Sep 20, 2021
@cyberj0g
Copy link
Contributor

I have implemented per-graph GPU selection in DNN filter using suggestions above, and it seem to work fine. Do we want to add a separate set of arguments for controlling detection GPU selection to livepeer.go?
I see at least 2 possible modes:

  • shared - run both transcoding and detection on the same GPU (specific GPU is picked for each Transcoder instance). Downside - it will take 2 Gb VRAM per GPU for TF runtime.
  • dedicated - run detection on dedicated GPU (GPUs?). It will limit VRAM overhead, but such dedicated GPU may potentially become a bottleneck.

@yondonfu @jailuthra

@yondonfu
Copy link
Member Author

Hm having different modes could be interesting. But, for now, I suggest implementing the shared mode first as the default without the ability to switch between modes. Since we're also researching the possibility of using different model backends that have less VRAM overhead as a part of livepeer/FFmpeg#15 I think we can wait to consider a dedicated mode until after completing that research - the rationale being that if we eventually end up using a model backend with much lower VRAM overhead then the value of having a dedicated mode could be diminished.

@yondonfu
Copy link
Member Author

Closed by #2038 livepeer/lpms#264 livepeer/FFmpeg#16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants