Multi GPU support for scene classification #1997

yondonfu · 2021-08-18T14:42:54Z

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

At the moment, the node only supports scene classification on a single GPU even on a multi GPU machine. This is because we only setup detection on GPU 0:

We setup detection on GPU 0 on node startup in order to load TF deps into GPU memory to avoid doing so later on
When create a new transcoder with detection enabled, detection is setup on GPU 0

The problems with only setting up detection on GPU 0 are:

With CPU <> GPU data copying, all decompressed frames go from the CPU to GPU 0 on the same connection which might lead to more congestion on that connection given that it has a limited amount of bandwidth
Without CPU <> GPU data copying (we eventually want to get here), there will be no way for data to get from another GPU to GPU 0 so we'll need to be able to setup detection on those other GPUs
We're running inference for all streams on GPU 0 and not using the CUDA cores on the other GPUs for inference

Describe the solution you'd like
A clear and concise description of what you want to happen.

Add multi GPU support so that the node can run scene classification on any of the GPUs on a machine. Additionally, it would be preferable to have scene classification run on the same GPU that transcodes a segment because eventually when we remove CPU <> GPU data copying we'll want to do this anyway.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

N/A

Additional context
Add any other context or screenshots about the feature request here.

I think we ran into #1980 when we tried to use multiple GPUs recently. Unclear why. We should see if that issue is a blocker for multi GPU support.

yondonfu · 2021-08-19T16:17:41Z

For reference:

@oscar-davids shared this comment from the TF docs that indicates virtual device configuration is per process and not per session.

jailuthra · 2021-08-20T08:21:12Z

Ah I think I misunderstood why the original TF issue around this was closed earlier. IIUC this means we'd need to spawn separate UNIX processes (i.e. orchestrator nodes) to use different GPUs via the visible_devices list.

An alternative is mentioned in this comment, where the user can specify the GPU used while creating the model: tensorflow/tensorflow#18861 (comment)

I believe @oscar-davids did something similar in the original detection PR in our FFmpeg fork, which got subsequently removed in the re-design as I wanted to minimize changes in the TF backend.

Maybe we can explore that solution again, if spawning different O nodes per GPU is not feasible.

yondonfu · 2021-08-23T18:56:25Z

if spawning different O nodes per GPU is not feasible.

I think we should first see if there is a solution that allows a single O to utilize multiple GPUs as running separate Os for each GPU would be a pain from a UX POV.

oscar-davids · 2021-08-31T10:53:31Z

Given that using different settings in different sessions within same process is not safe,
I think setting deviceID as session config using protobuf will result in unexpected behavior.
Instead, we can do something similar as suggested here and that's what I did before 4.4 upgrade.
The C API that is equivalent to this with tf.device('/gpu:1') in python is TF_ImportGraphDefOptionsSetDefaultDevice("/gpu:1")

I used this to set deviceID and tested it on VNO machine and coreweave GPU. Here is the linked branch.
I tested 12 concurrent streams(6 streams on each GPU) and it worked reliably. However, on Coreweave, it crashed after the number of concurrent streams exceed 3 on each GPU.

The test setup are summarized below:

VNO Server
GPU: gtx1080 x 2
CUDA version 11.1 + 11.2
driver version 11.2
Tensorflow Version: 2.5
Coreweave
GPU: gtx 1070 x 6
CUDA version: 10.1
driver version: 11.3
Tensorflow version: 2.3

I beleive the failure on coreweave could be either due to:

version mismatch (https://www.tensorflow.org/install/gpu)
docker internal bug

cyberj0g · 2021-09-21T12:42:22Z

I have implemented per-graph GPU selection in DNN filter using suggestions above, and it seem to work fine. Do we want to add a separate set of arguments for controlling detection GPU selection to livepeer.go?
I see at least 2 possible modes:

shared - run both transcoding and detection on the same GPU (specific GPU is picked for each Transcoder instance). Downside - it will take 2 Gb VRAM per GPU for TF runtime.
dedicated - run detection on dedicated GPU (GPUs?). It will limit VRAM overhead, but such dedicated GPU may potentially become a bottleneck.

@yondonfu @jailuthra

yondonfu · 2021-09-22T19:22:40Z

Hm having different modes could be interesting. But, for now, I suggest implementing the shared mode first as the default without the ability to switch between modes. Since we're also researching the possibility of using different model backends that have less VRAM overhead as a part of livepeer/FFmpeg#15 I think we can wait to consider a dedicated mode until after completing that research - the rationale being that if we eventually end up using a model backend with much lower VRAM overhead then the value of having a dedicated mode could be diminished.

yondonfu · 2021-09-30T14:11:03Z

Closed by #2038 livepeer/lpms#264 livepeer/FFmpeg#16

yondonfu assigned oscar-davids Aug 18, 2021

yondonfu unassigned oscar-davids Sep 7, 2021

cyberj0g self-assigned this Sep 20, 2021

This was referenced Sep 23, 2021

Device id option for TF backend of DNN filter livepeer/FFmpeg#16

Merged

Pre-load model and TF runtime for each GPU #2038

Merged

Pass device id parameter livepeer/lpms#264

Merged

yondonfu closed this as completed Sep 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi GPU support for scene classification #1997

Multi GPU support for scene classification #1997

yondonfu commented Aug 18, 2021

yondonfu commented Aug 19, 2021

jailuthra commented Aug 20, 2021 •

edited

Loading

yondonfu commented Aug 23, 2021

oscar-davids commented Aug 31, 2021 •

edited

Loading

cyberj0g commented Sep 21, 2021

yondonfu commented Sep 22, 2021

yondonfu commented Sep 30, 2021

Multi GPU support for scene classification #1997

Multi GPU support for scene classification #1997

Comments

yondonfu commented Aug 18, 2021

yondonfu commented Aug 19, 2021

jailuthra commented Aug 20, 2021 • edited Loading

yondonfu commented Aug 23, 2021

oscar-davids commented Aug 31, 2021 • edited Loading

cyberj0g commented Sep 21, 2021

yondonfu commented Sep 22, 2021

yondonfu commented Sep 30, 2021

jailuthra commented Aug 20, 2021 •

edited

Loading

oscar-davids commented Aug 31, 2021 •

edited

Loading