Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added test for the way BytePairTokenizer handles the \n\n sequence, which is important in Lama chat templates #1912

Merged
merged 5 commits into from
Oct 16, 2024

Conversation

martin-gorner
Copy link
Contributor

this also removes the @pytest.mark.large from the BytePairTokenizer test which only takes a second to run.

Copy link
Member

@SamanehSaadat SamanehSaadat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Martin!
Left some nit comments!

keras_hub/src/tokenizers/byte_pair_tokenizer_test.py Outdated Show resolved Hide resolved
keras_hub/src/tokenizers/byte_pair_tokenizer_test.py Outdated Show resolved Hide resolved
keras_hub/src/tokenizers/byte_pair_tokenizer_test.py Outdated Show resolved Hide resolved
Copy link
Member

@SamanehSaadat SamanehSaadat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Martin!

@SamanehSaadat SamanehSaadat merged commit 1777eac into keras-team:master Oct 16, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants