-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conditional stemming for 'persian' analyzer #113482
Conditional stemming for 'persian' analyzer #113482
Conversation
The 'persian' analyzer for Lucene 10 comes with PersianStemFilter as the last token filter by default. In order to maintain compatibility for old indices, we only use the new analyzer for newly created indices but configure a legacy analyzer with the old behaviour for older index versions.
Documentation preview: |
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
server/src/main/java/org/elasticsearch/index/analysis/Analysis.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the testing and most of the changes. I do think the node features we need to add is a "Lucene 10 Release" feature instead of individual ones that are required for the lucene 10 release.
server/src/main/java/org/elasticsearch/index/analysis/Analysis.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/search/SearchFeatures.java
Outdated
Show resolved
Hide resolved
@elasticmachine run elasticsearch-ci/part-1 |
CI Failures on 8.16-bwc are issues coming from "main" so I'm inclined to merge this and wait for a fix/awaitsFix with the next upstream merge |
I think that even though we provide backward compatibility with existing indices with this change, it should be marked as "breaking" and have a changelog entry. Users that don't want the additional stemming need to move away from the default analyzer and build their own. |
@benwtrent I added a changelog entry, let me know if that reads alright to you. |
server/src/main/java/org/elasticsearch/index/analysis/AnalyzerProvider.java
Outdated
Show resolved
Hide resolved
…st {p0=range/20_synthetic_source/Date range} elastic#113874
The 'persian' analyzer for Lucene 10 comes with PersianStemFilter as the last token filter by default. In order to maintain compatibility for old indices, we only use the new analyzer for newly created indices but configure a legacy analyzer with the old behaviour for older index versions.
Closes #113050