Spotlight Poster
Metric Transforms and Low Rank Representations of Kernels
Timothy Chu · Josh Alman · Gary L. Miller · Shyam Narayanan · Mark Sellke · Zhao Song
[
Abstract
]
Wed 11 Dec 4:30 p.m. PST
— 7:30 p.m. PST
Abstract:
We introduce a new linear-algebraic tool based on Abelian group representation theory, and use it to address three key problems in machine learning.1. Past researchers have suggested fast algorithms for attention by approximating or replacing softmax attention with other functions like polynomials. The key mathematical property enabling fast algorithms is the polynomial method: any low-degree polynomial $f$ applied entry-wise to the matrix $QK^{\top}$ has low rank, when $Q$ and $K$ are $n \times d$ matrices and $n \gg d$. This suggests a natural question: for what functions $f$ is such a low rank decomposition possible? If other functions $f$ exist, they could be used in place of softmax for subquadratic attention. It was previously known that polynomials have this low-rank decomposition property. We show that polynomials are the \textit{only} continuous functions with this property. This suggests that the low rank approach for fast attention computations can \textit{only} work for functions $f$ that are approximable by polynomials.2. We prove the first full classification of all positive definite kernels that are functions of Manhattan or $\ell_1$ distance. Our work generalizes an existing theorem at the heart of all kernel methods in machine learning: the classification of all positive definite kernels that are functions of Euclidean distance. 3. The key problem in metric transforms, a mathematical theory used in geometry and machine learning, asks what functions transform pairwise distances in semi-metric space $M$ to semi-metric space $N$ for specified $M$ and $N$. We provide the first full classification of functions that transform Manhattan distances to Manhattan distances. Our work generalizes the foundational work of Schoenberg, which fully classifies functions that transform Euclidean to Euclidean distances. We additionally prove results about stable-rank preserving functions that are potentially useful in algorithmic design, and more. Our core new tool is called the representation theory of the hyperrectangle.
Live content is unavailable. Log in and register to view live content