-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace need for KnnVectorValues.copy() with a dictionary interface #13831
Comments
I started looking at this and realized that we would really want to wait for another major release for it. Otherwise we'd have a deprecated method and have to carry two parallel implementations on 10.x which would be kind of annoying. This shouldn't be so hard but because we copy our codec code every time we have a minor release of the codec we end up having to do a lot of copies of the same change |
Unless I'm mistaken, then To make this more concrete, we can remove the copy method and add the Dictionary interface in, say 10.1.0, if we choose to do so. |
ah, thanks good point, I'm forging ahead now |
I worked this up and I think it will be an improvement - the basic idea is don't copy the vectorvalues, but copy a smaller wrapper around any shared scratch vector data (and clone any underlying IndexInput slices). I have it all working except for the MemorySegment-based scorers. I get TestVectorScorer.copiesAcrossThreads failures. I've checked carefully that the two scorers are different, each with its own clone of the underlying memory-segment based IndexInput slices and I'm scratching my head trying to figure out what I might have broken. I do see that the MemorySegmentIndexInput clones share their underlying MemorySegments - I hope that's OK? Any surmises would be helpful. If it helps I can post a PR with bugs - it's kind of big though and I still have some doubts about naming of various things |
The scratch arrays also need new instances. You are doing this as well correct? |
yes I had to restructure Lucene99MemorySegmentByteVectorScorerSupplier a bit so it would create the scratch arrays more granularly .. but I think the test is not actually using that class but DefaultFlatVectorScorer.ByteScoringSupplier. Oh in fact I think see my error - basically I made the copies in the ScorerSupplier rather than in the Scorer, so they were being "over shared". Phew, thanks! Will post a PR soon maybe in breaks from Apache CoC |
Description
The idea is to get rid of
copy()
since it does more than is needed. What's needed is an independent vector accessor over the same values source. Adrien had suggested a dictionary() interface that would take over the random-access API from the values source (or in addition to it?)See discussion in #13779 for more details.
The text was updated successfully, but these errors were encountered: