Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OLMo November 2024 #34551

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

2015aroras
Copy link
Contributor

@2015aroras 2015aroras commented Oct 31, 2024

What does this PR do?

An updated OLMo model will be released in November. The new model has a few small architecture changes compared to the existing model in transformers:

  • RMSNorm is used instead of standard layer norm.
  • Norm is applied to attention queries and keys.
  • Norm is applied after attention/feedforward rather than before.

The original PR #34497 updated the OLMo implementation in transformers to support the November release. This PR instead adds a new model using the modular approach.

@ArthurZucker

Fixes #34496

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@2015aroras
Copy link
Contributor Author

I tested this before we locked in the Olmo1124 naming conventions. Will update once tested.

@2015aroras 2015aroras marked this pull request as draft October 31, 2024 23:48
@2015aroras 2015aroras marked this pull request as ready for review November 4, 2024 23:42
@2015aroras
Copy link
Contributor Author

2015aroras commented Nov 4, 2024

Tests are passing, including slow ones (except for Olmo1124ModelTest::test_generate_compile_1_end_to_end, but this appears to be broken for base OLMo too so I'm considering it an existing problem).. I've used a test HF hub repo (shanearora/OLMo-7B-1124-hf) since the official final model is not ready yet.

@2015aroras
Copy link
Contributor Author

PR checks were passing before I merged main again, and PR check failures relate to other models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for OLMo November release
1 participant