You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It's possible this is due to a misconfiguration, I built xccl with cuda and ucx support, but without sharp or vmc support. My question is - is it expected for xccl to properly utilize NVLink when available (in this case on a DGX-1 doing all-reduce across all 8 GPUs)?
I also noticed when running the benchmarks that CPU utilization as very high for all workers which seemed to be due to high-frequency polling.
Also as you can see in the output, ucc fails trying to reduce a 2gb tensor whereas nccl fails trying to reduce an 8gb tensor. This could be indicative of a leak somewhere.
Do you mean torch.cuda.set_device()? If so then yes. I also changed torch_ucc to use cudaGetDevice in ProcessGroupUCC::progress_loop instead of hard-coding device 0.
I was running some benchmarks with torch-ucc using xccl for collectives, and I noticed very bad performance compared to NCCL. See numbers here: https://gist.github.com/froody/a86a5b2c5d9f46aedba7e930f4b4e08d
It's possible this is due to a misconfiguration, I built xccl with cuda and ucx support, but without sharp or vmc support. My question is - is it expected for xccl to properly utilize NVLink when available (in this case on a DGX-1 doing all-reduce across all 8 GPUs)?
I also noticed when running the benchmarks that CPU utilization as very high for all workers which seemed to be due to high-frequency polling.
Also as you can see in the output, ucc fails trying to reduce a 2gb tensor whereas nccl fails trying to reduce an 8gb tensor. This could be indicative of a leak somewhere.
Repro steps:
Run benchmark here: https://gist.github.com/froody/01ed6ce8d6ab72bd868431d793591379
Use BACKEND=ucc or BACKEND=nccl to select backend
hardware: DGX-1, Driver Version: 418.116.00
cuda: 10.1
pytorch: 1.6.0
ucx: 1.9.0
torch-ucc: a277d7da24ae6e8a40bda658d0f0d4e06fcadb8b
xccl: 2e97986
The text was updated successfully, but these errors were encountered: