Skip to yearly menu bar Skip to main content


Poster

Progressive Ensemble Distillation: Building Ensembles for Efficient Inference

Don Dennis · Abhishek Shetty · Anish Prasad Sevekari · Kazuhito Koishida · Virginia Smith

Great Hall & Hall B1+B2 (level 1) #2013

Abstract:

Knowledge distillation is commonly used to compress an ensemble of models into a single model. In this work we study the problem of progressive ensemble distillation: Given a large, pretrained teacher model , we seek to decompose the model into an ensemble of smaller, low-inference cost student models . The resulting ensemble allows for flexibly tuning accuracy vs. inference cost, which can be useful for a multitude of applications in efficient inference. Our method, B-DISTIL, uses a boosting procedure that allows function composition based aggregation rules to construct expressive ensembles with similar performance as using much smaller student models. We demonstrate the effectiveness of B-DISTIL by decomposing pretrained models across a variety of image, speech, and sensor datasets. Our method comes with strong theoretical guarantees in terms of convergence as well as generalization.

Chat is not available.