Skip to yearly menu bar Skip to main content


Talk
in
Competition: Practical Vector Search (Big ANN) Challenge 2023

[Filter track] IVF^2 Index: Fusing Classic and Spatial Inverted Indices for Fast Filtered ANNS

Ben Landrum


Abstract:

The rise of metric embeddings as a crucial tool in search, recommendation, and large language model applications has created significant interest in complex search queries over vectors, such as restricted vector search based on per-vector metadata ("filtered ANNS"). The NeurIPS'23 BigANN competition track for filtered ANNS evaluated submissions based on query throughput above a target level of recall on a 10M vector dataset, with binary per-vector metadata (labels), and with query predicates requiring either one or two specified labels to be present for all vectors returned. Existing state of the art approaches for filtered ANNS struggle to perform such 'AND' queries, which require the returned vectors to have all of a set of specified binary labels. Perhaps surprisingly, we find that a more combinatorial view of the problem leads to highly efficient solutions, matching or exceeding the throughput of unfiltered search on the full dataset. We present a novel approach to indexing these queries which leverages classical and inverted file indices in tandem to dramatically reduce the number of vectors needing to be considered before comparing any of them to the query vector. We demonstrate empirically strong results on the competition dataset (11.5x speedup over the baseline), which we believe can serve as a practical solution to the filtered ANNS problem, and as a strong baseline for future work.

Ben Landrum University of Maryland Magdalen Dobson Manohar CMU Mazin Karjikar University of Maryland Laxman Dhulipala University of Maryland

Chat is not available.