Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Larger Tokenizers #701

Open
dustinwloring1988 opened this issue Jul 19, 2024 · 0 comments
Open

Larger Tokenizers #701

dustinwloring1988 opened this issue Jul 19, 2024 · 0 comments

Comments

@dustinwloring1988
Copy link

I would love to train GPT2 with a large BPE tokenizer maybe even with llama3's tokenizer as it has a vocab size of 128K. However this code will not work with a tokenizer that has a large vocab. Is there an easy way to add this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant