Poster
in
Workshop: I Can’t Believe It’s Not Better: Understanding Deep Learning Through Empirical Falsification
The Best Deep Ensembles Sacrifice Predictive Diversity
Taiga Abe · Estefany Kelly Buchanan · Geoff Pleiss · John Cunningham
Ensembling remains a hugely popular method for increasing the performance of a given class of models. In the case of deep learning, the benefits of ensembling are often attributed to the diverse predictions of the individual ensemble members. Here we investigate a tradeoff between diversity and individual model performance, and find that--surprisingly--encouraging diversity during training almost always yields worse ensembles. We show that this tradeoff arises from the Jensen gap between the single model and ensemble losses, and show that Jensen gap is a natural measure of diversity for both the mean squared error and cross entropy loss functions. Our results suggest that to reduce the ensemble error, we should move away from efforts to increase predictive diversity, and instead we should construct ensembles from less diverse (but more accurate) component models.