Poster
in
Workshop: Workshop on Machine Learning and Compression
Latent Probabilistic Dataset Distillation with Theoretical Guarantees
Progyan Das · Anirban Dasgupta · Shrutimoy Das
Abstract:
Dataset distillation involves the compression of large datasets into smaller \textit{coresets} that sustain similar performance to the full dataset when downstream models are trained on them -- thus hugely simplifying the training task in terms of storage and computation. The current state-of-the-art methods utilize \textit{Kernel Inducing Points} (KIP), which exploits the link between Kernel Regression and the Neural Tangent Kernel (NTK) to learn synthetic coresets that mimic the performance of a neural network on the full size, via a frequentist adaptation of the inducing point method for Gaussian processes. The frequentist regime prohibits the potential benefits of a Bayesian analysis of bounds on the number of inducing points required. The nature of the mean-squared loss employed does not lend itself to a probabilistic interpretation, while the algorithm itself is computationally intensive, as these they operated directly in the space of the data. To this end, we introduce a new variational Gaussian process-based algorithm for fast, scalable dataset distillation by learning inducing points and soft targets in the latent space of pre-trained autoencoders. Via recent observations on the similarity of the Reproducing Kernel Hilbert Space (RKHS) of the Laplace kernel and the NTK, we also develop associated guarantees on the size and efficacy of coresets over $d$-dimensional datasets normalized to the unit hypersphere $S^{d-1}$, by showing that we can get vanishingly small KL Divergence with a polynomially bound subset of the size of the data. Our method achieves competitive performance to state-of-the-art algorithms in only a fraction of the time required, often in less than one minute.
Chat is not available.