[FEA] Build HNSW hierarchy from a base CAGRA graph on GPU #432

cjnolet · 2024-10-28T19:25:05Z

This feature succeeds #431 and improves the performance (and likely the quality) by building the HNSW hierarchy on a base CAGRA graph on the GPU.

There are two ways in which we can construct this hierarchy (both of which are used to construct the k-means tree in Google's SCaNN algorithm):

Top-down: nested (hierarchical) kmeans. I suspect we have a lot of this already in our balanced kmeans, but it hides the hierarchy away and extracts flattened clusters like HDBSCAN
Bottom-up: Using agglomerative clustering: Basically it's the single-linkage + condense hierarchy steps from HDBSCAN.

We suspect that building the hierarchy after the base graph is built will improve the insertion capability for CAGRA, but it will also have a positive impact on recall, since the hierarchy will be built by taking all vectors in the index into consideration, rather than being done in a greedy manner like HNSW where the quality of the hierarchy depends on the completely random ordering of the vectors during construction.

cjnolet added feature request New feature or request AlloyDB VertexAI Oracle labels Oct 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Build HNSW hierarchy from a base CAGRA graph on GPU #432

[FEA] Build HNSW hierarchy from a base CAGRA graph on GPU #432

cjnolet commented Oct 28, 2024

[FEA] Build HNSW hierarchy from a base CAGRA graph on GPU #432

[FEA] Build HNSW hierarchy from a base CAGRA graph on GPU #432

Comments

cjnolet commented Oct 28, 2024