Skip to yearly menu bar Skip to main content


Poster

Sketchy Moment Matching: Toward Fast and Provable Data Selection for Finetuning

Yijun Dong · Viet Hoang Phan · Xiang Pan · Qi Lei

[ ] [ Project Page ]
Wed 11 Dec 11 a.m. PST — 2 p.m. PST

Abstract: We revisit data selection in a modern context of finetuning from a fundamental perspective. Extending the classical wisdom of variance minimization in low dimensions to high-dimensional finetuning, our generalization analysis unveils the importance of additionally reducing bias induced by low-rank approximation. Inspired by the variance-bias tradeoff in high dimensions from the theory, we introduce **Sk**etchy **M**oment **M**atching (SkMM), a scalable data selection scheme with two stages. (i) First, the bias is controlled using {gradient sketching} that explores the finetuning parameter space for an informative low-dimensional subspace $\mathcal{S}$; (ii) then the variance is reduced over $\mathcal{S}$ via {moment matching} between the original and selected datasets. Theoretically, we show that {gradient sketching is fast and provably accurate}: selecting $n$ samples by reducing variance over $\mathcal{S}$ preserves the fast-rate generalization $O(\dim(\mathcal{S})/n)$, independent of the parameter dimension. Empirically, we concretize the variance-bias balance via synthetic experiments and demonstrate the effectiveness of SkMM for finetuning in real vision tasks.

Live content is unavailable. Log in and register to view live content