You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
However, in your implementation, the code seems to assign a single learning rate over all modules. This could be the reason why people complained in many issues (e.g., #5890, #5824, #5672) that the results were not reproducible.
Others
The misalignment may be due to the difference in the method 'create_optimizer'.
In LLaVA implementation (see https://github.com/haotian-liu/LLaVA/blob/main/llava/train/llava_trainer.py), the hyperparameters (weight_decay, lr) are assigned separately for different parameter groups. Llama-factory may not yet feature it.
Would you kindly fix this bug? In its current form, it would be incorrect to claim "we supported fine-tuning the LLaVA-1.5 multimodal LLMs"
The text was updated successfully, but these errors were encountered:
Reminder
System Info
llamafactory
version: 0.9.1.dev0Reproduction
model
model_name_or_path: llava-hf/llava-1.5-7b-hf
method
stage: sft
do_train: true
finetuning_type: lora
lora_target: all
dataset
dataset: mllm_demo
template: llava
cutoff_len: 1024
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
output
output_dir: saves/llava1_5-7b/lora/sft
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 1.0e-4
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
eval
val_size: 0.1
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 500
Expected behavior
In the original LLaVA implementation (see, e.g., https://github.com/haotian-liu/LLaVA/blob/main/scripts/v1_5/finetune_lora.sh), they use "--learning_rate 2e-4" for the training (visual instruction tuning) but assign "--mm_projector_lr 2e-5" for the mm_projector specifically.
However, in your implementation, the code seems to assign a single learning rate over all modules. This could be the reason why people complained in many issues (e.g., #5890, #5824, #5672) that the results were not reproducible.
Others
The misalignment may be due to the difference in the method 'create_optimizer'.
In LLaVA implementation (see https://github.com/haotian-liu/LLaVA/blob/main/llava/train/llava_trainer.py), the hyperparameters (weight_decay, lr) are assigned separately for different parameter groups. Llama-factory may not yet feature it.
Would you kindly fix this bug? In its current form, it would be incorrect to claim "we supported fine-tuning the LLaVA-1.5 multimodal LLMs"
The text was updated successfully, but these errors were encountered: