-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Issues: microsoft/DeepSpeed
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[BUG] The problem of using Deepspeed to start training
bug
Something isn't working
training
#6715
opened Nov 5, 2024 by
sanxiaojijiaben
[BUG]Issue with Zero Optimization for Llama-2-7b Fine-Tuning on Intel GPUs
bug
Something isn't working
training
#6713
opened Nov 5, 2024 by
molang66
"__nv_bfloat162" has already been defined
install
Installation and package dependencies
windows
#6709
opened Nov 4, 2024 by
wolfljj
[REQUEST] Some questions about deepspeed sequence parallel
enhancement
New feature or request
#6708
opened Nov 4, 2024 by
yingtongxiong
[BUG] NCCL Timeout When Pre-traing "ds_train_bert_nvidia_data_bsz32k_seq512".
bug
Something isn't working
training
#6705
opened Nov 3, 2024 by
always-H
Installation Error on Windows 11 through Command Prompt.
windows
#6702
opened Nov 2, 2024 by
neonvarun
[REQUEST] Non-element-wise Optimizer Compatibility
enhancement
New feature or request
#6701
opened Nov 2, 2024 by
Triang-jyed-driung
How could I convert ZeRO-0 deepspeed weights into fp32 model checkpoint?
enhancement
New feature or request
#6699
opened Nov 1, 2024 by
liming-ai
Installing DeepSpeed in WSL.
install
Installation and package dependencies
windows
#6692
opened Oct 30, 2024 by
anonymous-user803
[BUG] Universal Checkpoint Conversion: Resumed Training Behaves as If Model Initialized from Scratch
bug
Something isn't working
training
#6691
opened Oct 30, 2024 by
purefall
DeepSpeed windows install errors
install
Installation and package dependencies
windows
#6673
opened Oct 27, 2024 by
xiezhipeng-git
Error when parsing GPUs on a node when only specifying node name
--include=node3
vs --include=node3:1,2,4
#6671
opened Oct 26, 2024 by
stephen-nju
[BUG] While submodule forward process in different gpu is not same, loss.backward get stuck
bug
Something isn't working
training
#6667
opened Oct 25, 2024 by
fuzuoyi
[BUG] Deepspeed launcher not picking up virtualenv --system-site-packages
bug
Something isn't working
build
Improvements to the build and testing systems.
#6664
opened Oct 24, 2024 by
VRehnberg
[BUG] ZeRO++ sharding small parameter raise IndexError
bug
Something isn't working
training
#6659
opened Oct 23, 2024 by
wuxibin89
[BUG] Training batch size is not consistent with train_batch_size
bug
Something isn't working
training
#6657
opened Oct 23, 2024 by
tnnandi
[BUG] RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
bug
Something isn't working
training
#6643
opened Oct 20, 2024 by
RickoNoNo3
[HELP] Zero-3 on partial model to fix the input/output constant constraint
#6642
opened Oct 19, 2024 by
BoyeGuillaume
Previous Next
ProTip!
Find all open issues with in progress development work with linked:pr.