Poster
in
Workshop: Adaptive Foundation Models: Evolving AI for Personalized and Efficient Learning
Deliberate Practice with Synthetic Data
Reyhane Askari Hemmat · Mohammad Pezeshki · Pietro Astolfi · Melissa Hall · Florian Bordes · Jakob Verbeek · Michal Drozdzal · Adriana Romero
Deliberate practice for humans is the process of improving one’s skills by leveraging external feedback while actively seeking out and correcting mistakes. The current status quo in machine learning is to use static datasets, composed of real or generated data, to train models. While state-of-the-art generative models can serve as an infinite source of synthetic data to train downstream models, prior work has shown that simply increasing the dataset size results in diminishing improvements in model accuracy. In this work, we design a framework that generates synthetic data to improve the performance of a downstream machine learning model. The framework incorporates feedback from the downstream model to refine the generated data used to train the model throughout the training process. In particular, we employ deliberate practice for neural network training to generate challenging synthetic examples tailored to the model’s weaknesses at any stage of training, replacing easier, less informative examples in the dataset. With a fixed-size synthetic dataset throughout training, this approach yields over 14% and 8% accuracy improvement on ImageNet-100 and ImageNet-1000, respectively.