Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Adaptive Foundation Models: Evolving AI for Personalized and Efficient Learning

Approximate Top-k for Increased Parallelism

Oscar Key · Luka Ribar · Alberto Cattaneo · Luke Hudlass-Galley · Douglas Orr


Abstract:

The nearest neighbour search problem underlies many important machine learning applications, including efficient long-context generation, retrieval-augmented generation, and knowledge graph completion. However, computing top-k exactly suffers from limited parallelism, making it inefficient for highly parallel machine learning accelerators. By relaxing the requirement that the top-k is exact, bucketed algorithms can dramatically increase parallelism by independently computing many smaller top-k operations. We explore the design choices for this class of algorithms using both theoretical analysis and empirical evaluation on downstream tasks. Our motivating examples are sparsity algorithms for language models, which often use top-k to select the most important parameters or activations. We also release a fast bucketed top-k implementation for PyTorch.

Chat is not available.