Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training only on 2B tokens (openwebtext) #5

Open
Nandan91 opened this issue Mar 22, 2024 · 3 comments
Open

Training only on 2B tokens (openwebtext) #5

Nandan91 opened this issue Mar 22, 2024 · 3 comments

Comments

@Nandan91
Copy link

Nandan91 commented Mar 22, 2024

Hi !
Interesting work on the role of explicit bias!

I was wondering what training settings got you an eval PPL ~3.04. The paper mentions that 50K iterations are required to train the GPT-2 model on 2B tokens. What was the bacth_size_per_device and block_size for the same? Did you do training from scratch or fine-tune the pre-trained model (trained on 300B tokens)?

Thanks!

@Eric-mingjie
Copy link
Collaborator

Hi, Thanks for your interest in our work.

The training config is shown here, which i think will be automatically divided by the number of GPUs available (here).

We do not perform any fine-tuning but instead train all the GPT-2 models from scratch.

@Nandan91 Nandan91 closed this as completed May 8, 2024
@Nandan91 Nandan91 reopened this May 8, 2024
@Nandan91
Copy link
Author

Nandan91 commented May 8, 2024

Thanks for your reply.

The training configurations you referred to seem configured for the 600K training steps.
As mentioned in the paper, you ran for 50K iterations to train only on 2 B tokens (the eval PPL you got is 3). Did you change anything else, such as learning rate, weight decay, etc.?

I trained for 50K iterations; however, my val loss remained ~3 (PPL >30).

@Eric-mingjie
Copy link
Collaborator

Eric-mingjie commented May 9, 2024

No, i did not change anything such as learning rate or weight decay, I recall that my number is around those reported in the original nanogpt repo (https://github.com/karpathy/nanoGPT?tab=readme-ov-file#baselines).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants