多卡4090 DeepSpeed ZeRO-3 LoRA 微调 Qwen2.5-14B-Instruct 显存占用超出预期 #6011

Lanture1064 · 2024-11-13T02:28:55Z

Reminder

I have read the README and searched the existing issues.

System Info

llamafactory version: 0.9.1.dev0
Platform: Linux-5.15.0-97-generic-x86_64-with-glibc2.35
Python version: 3.12.3
PyTorch version: 2.3.0+cu121 (GPU)
Transformers version: 4.46.1
Datasets version: 3.1.0
Accelerate version: 1.0.1
PEFT version: 0.12.0
TRL version: 0.9.6
GPU type: NVIDIA GeForce RTX 4090
DeepSpeed version: 0.14.4
Bitsandbytes version: 0.44.1

Reproduction

llamafactory-cli train \
    --stage sft \
    --do_train True \
    --model_name_or_path /root/autodl-tmp/Qwen/Qwen2.5-14B-Instruct \
    --preprocessing_num_workers 16 \
    --finetuning_type lora \
    --template qwen \
    --flash_attn auto \
    --enable_liger_kernel True \
    --dataset_dir /root/LLaMA-Factory/data/ \
    --dataset shibing624_roleplay_gpt3_5_tuned_full,shibing624_roleplay_gpt4_tuned_full,shibing624_roleplay_male_gpt3.5_tuned_full,botnow_char_roleplay \
    --cutoff_len 1024 \
    --learning_rate 5e-05 \
    --num_train_epochs 3.0 \
    --max_samples 100000 \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 2 \
    --lr_scheduler_type cosine \
    --max_grad_norm 1.0 \
    --logging_steps 5 \
    --save_steps 100 \
    --warmup_steps 0 \
    --packing False \
    --report_to none \
    --output_dir saves/Qwen2.5-14B-Instruct/lora/train_2024-11-13-10-21-39 \
    --bf16 True \
    --plot_loss True \
    --ddp_timeout 180000000 \
    --optim adamw_torch \
    --lora_rank 8 \
    --lora_alpha 16 \
    --lora_dropout 0.1 \
    --loraplus_lr_ratio 8 \
    --lora_target all \
    --deepspeed cache/ds_z3_config.json

在 Webui 中构建以上命令启动，加载模型正常，但训练开始后显存占用量暴涨，双卡及三卡 4090 24GB 均 OOM：

开启 offload 后，显存占用 + 内存占用共接近 95 GB:

Expected behavior

按照 README 表格所述，LoRA 微调 14B 级别模型需要 32 GB 显存，在何种微调设置下可以复现该情况？

Others

No response

The text was updated successfully, but these errors were encountered:

github-actions bot added the pending This problem is yet to be addressed label Nov 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

多卡4090 DeepSpeed ZeRO-3 LoRA 微调 Qwen2.5-14B-Instruct 显存占用超出预期 #6011

多卡4090 DeepSpeed ZeRO-3 LoRA 微调 Qwen2.5-14B-Instruct 显存占用超出预期 #6011

Lanture1064 commented Nov 13, 2024 •

edited

Loading

多卡4090 DeepSpeed ZeRO-3 LoRA 微调 Qwen2.5-14B-Instruct 显存占用超出预期 #6011

多卡4090 DeepSpeed ZeRO-3 LoRA 微调 Qwen2.5-14B-Instruct 显存占用超出预期 #6011

Comments

Lanture1064 commented Nov 13, 2024 • edited Loading

Reminder

System Info

Reproduction

Expected behavior

Others

Lanture1064 commented Nov 13, 2024 •

edited

Loading