Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to disable BMW optimization for benchmarks #265

Open
shubhamvishu opened this issue Apr 29, 2024 · 6 comments
Open

Add an option to disable BMW optimization for benchmarks #265

shubhamvishu opened this issue Apr 29, 2024 · 6 comments

Comments

@shubhamvishu
Copy link

Description

Looking for more ideas on this!

@jpountz
Copy link
Collaborator

jpountz commented Apr 29, 2024

Could you use tasks where dynamic pruning doesn't apply instead of disabling it? E.g. use counting tasks?

@mikemccand
Copy link
Owner

Could you use tasks where dynamic pruning doesn't apply instead of disabling it? E.g. use counting tasks?

+1, that's a nice approach. Though even Lucene's count() API has some nice optimizations to bypass visiting all postings / sub-linear implementations I think?

@jpountz
Copy link
Collaborator

jpountz commented Apr 29, 2024

Indeed IndexSearcher#count has some optimizations to bypass postings. But it was mostly an example, some cheap faceting should work too?

@shubhamvishu
Copy link
Author

Could you use tasks where dynamic pruning doesn't apply instead of disabling it? E.g. use counting tasks?

Do you mean to wrap the clauses with "count( )" like eg https://github.com/mikemccand/luceneutil/blob/master/tasks/countOnly.tasks so that we check the performance but avoid BMW? I like this idea if I understand correctly. But not sure if we could make it an option with benchmarks straightforwardly.

 

Indeed IndexSearcher#count has some optimizations to bypass postings. But it was mostly an example, some cheap faceting should work too?

I'm not sure what you mean by using some cheap faceting here. Maybe you could elaborate on this idea? Also, since we want to enable it via benchmarks, does this also fit well in that picture?

@mikemccand
Copy link
Owner

Indeed IndexSearcher#count has some optimizations to bypass postings. But it was mostly an example, some cheap faceting should work too?

I'm not sure what you mean by using some cheap faceting here. Maybe you could elaborate on this idea? Also, since we want to enable it via benchmarks, does this also fit well in that picture?

I think @jpountz is referring to enabling faceting on each task. luceneutil's TaskParser supports this with e.g. +facets:Date.sortedset. Because facets require counting all hits, it forces Lucene to disable BMW. The problem is, it also adds some cost (I think that's why @jpountz suggested finding a "cheap" one heh), which is not great because it dilutes what you are trying to measure (a change in postings decode / visit time).

Could you use tasks where dynamic pruning doesn't apply instead of disabling it? E.g. use counting tasks?

Do you mean to wrap the clauses with "count( )" like eg https://github.com/mikemccand/luceneutil/blob/master/tasks/countOnly.tasks so that we check the performance but avoid BMW? I like this idea if I understand correctly. But not sure if we could make it an option with benchmarks straightforwardly.

luceneutil supports count tasks with syntax like count(+a +b). This is parsed to use IndexSearcher's count API. I think that may be a quick workaround for benchmarking #258

@shubhamvishu
Copy link
Author

Thanks for the explanation, Mike! I'll try benchmarking it change using count tasks and share the results. Btw, if the above-mentioned approach of maxing out IndexSearcher.TOTAL_HITS_THRESHOLD also makes sense, then in that case I had already shared the results for it over here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants