Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditional stemming for 'persian' analyzer #113482

Merged
merged 19 commits into from
Oct 2, 2024

Conversation

cbuescher
Copy link
Member

@cbuescher cbuescher commented Sep 24, 2024

The 'persian' analyzer for Lucene 10 comes with PersianStemFilter as the last token filter by default. In order to maintain compatibility for old indices, we only use the new analyzer for newly created indices but configure a legacy analyzer with the old behaviour for older index versions.

Closes #113050

The 'persian' analyzer for Lucene 10 comes with PersianStemFilter as the last
token filter by default. In order to maintain compatibility for old indices, we
only use the new analyzer for newly created indices but configure a legacy
analyzer with the old behaviour for older index versions.
Copy link
Contributor

Documentation preview:

@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Sep 24, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the testing and most of the changes. I do think the node features we need to add is a "Lucene 10 Release" feature instead of individual ones that are required for the lucene 10 release.

@cbuescher
Copy link
Member Author

@elasticmachine run elasticsearch-ci/part-1

@cbuescher
Copy link
Member Author

CI Failures on 8.16-bwc are issues coming from "main" so I'm inclined to merge this and wait for a fix/awaitsFix with the next upstream merge

@cbuescher
Copy link
Member Author

I think that even though we provide backward compatibility with existing indices with this change, it should be marked as "breaking" and have a changelog entry. Users that don't want the additional stemming need to move away from the default analyzer and build their own.

@cbuescher
Copy link
Member Author

@benwtrent I added a changelog entry, let me know if that reads alright to you.

@cbuescher cbuescher merged commit 7089ff3 into elastic:lucene_snapshot Oct 2, 2024
15 checks passed
@cbuescher cbuescher deleted the persian-analyzer-l10 branch October 2, 2024 12:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>breaking >enhancement :Search Relevance/Analysis How text is split into tokens Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch >upgrade v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants