Poster
A Label is Worth A Thousand Images in Dataset Distillation
Tian Qin · Zhiwei Deng · David Alvarez-Melis
Data quality is a crucial factor in machine learning model performance, a principle that has been exploited by dataset distillation methods to compress training datasets into much smaller counterparts with similar downstream performance. Understanding how and why data distillation methods work is vital not only to improve these methods but also to reveal fundamental characteristics of "good" training data. However, a major challenge to this goal is the observation that distillation approaches have little in common with each other, relying on sophisticated but mostly disparate methods to generate synthetic data. In this work, we highlight a largely overlooked aspect that is nevertheless common to most of these methods: the use of soft (probabilistic) labels, whose role in distillation we study in depth through a series of ablation experiments. Our results show that, surprisingly, the main factor explaining the performance of state-of-the-art distillation methods is not the specific techniques used to generate synthetic data, but rather the use of soft labels. Second, we show that not all soft labels are created equal, i.e., they must contain structured information to be beneficial. Finally, we provide empirical scaling laws that characterize the effectiveness of soft labels as a function of images-per-class in the distilled dataset, and establish an empirical Pareto frontier for data-efficient learning. Combined, our findings challenge conventional wisdom in dataset distillation, underscore the importance of soft labels in learning, and suggest new directions for improving distillation methods.
Live content is unavailable. Log in and register to view live content