You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This feature succeeds #431 and improves the performance (and likely the quality) by building the HNSW hierarchy on a base CAGRA graph on the GPU.
There are two ways in which we can construct this hierarchy (both of which are used to construct the k-means tree in Google's SCaNN algorithm):
Top-down: nested (hierarchical) kmeans. I suspect we have a lot of this already in our balanced kmeans, but it hides the hierarchy away and extracts flattened clusters like HDBSCAN
Bottom-up: Using agglomerative clustering: Basically it's the single-linkage + condense hierarchy steps from HDBSCAN.
We suspect that building the hierarchy after the base graph is built will improve the insertion capability for CAGRA, but it will also have a positive impact on recall, since the hierarchy will be built by taking all vectors in the index into consideration, rather than being done in a greedy manner like HNSW where the quality of the hierarchy depends on the completely random ordering of the vectors during construction.
The text was updated successfully, but these errors were encountered:
This feature succeeds #431 and improves the performance (and likely the quality) by building the HNSW hierarchy on a base CAGRA graph on the GPU.
There are two ways in which we can construct this hierarchy (both of which are used to construct the k-means tree in Google's SCaNN algorithm):
Top-down: nested (hierarchical) kmeans. I suspect we have a lot of this already in our balanced kmeans, but it hides the hierarchy away and extracts flattened clusters like HDBSCAN
Bottom-up: Using agglomerative clustering: Basically it's the single-linkage + condense hierarchy steps from HDBSCAN.
We suspect that building the hierarchy after the base graph is built will improve the insertion capability for CAGRA, but it will also have a positive impact on recall, since the hierarchy will be built by taking all vectors in the index into consideration, rather than being done in a greedy manner like HNSW where the quality of the hierarchy depends on the completely random ordering of the vectors during construction.
The text was updated successfully, but these errors were encountered: