Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Workshop on Machine Learning and Compression

On the Relationship Between Model Training Dynamics and Early Pruning Periods

Elvis Nunez · Stefano Soatto


Abstract:

Contemporary deep learning models typically have billions of learnable parameters, requiring vast amounts of compute for training and inference. An established method for improving model efficiency is parameter pruning, whereby extraneous model parameters are discarded while trying to preserve model performance. Typically, such model compression is performed after the model has been trained. In this work, we aim to identify when a model becomes amenable to compression during training in order to realize the computational savings of model compression earlier. We showcase a phenomenon whereby an "early pruning period" occurs--a period during training where a model becomes amenable to pruning, prior to convergence. To help understand this behavior, we draw inspiration from recent work showing that model training undergoes two phases--the "memorization" and "forgetting" phases--and we show that the early pruning period often correlates with the transition between these two phases, suggesting that a large model capacity is only needed for the transient period of training, after which the model can be effectively compressed. We ground our study in discriminative computer vision applications and train multiple models across a spectrum of sizes, datasets, learning rate schedules, regularization strengths, and pruning criteria. We additionally propose a gradient-free metric that can be computed efficiently during training that also often correlates with the early pruning period. We show that we can identify a period early in the training of ResNet models trained on CIFAR where we can compress the model up to 90% without incurring significant accuracy degradation.

Chat is not available.