Poster
in
Workshop: Has it Trained Yet? A Workshop for Algorithmic Efficiency in Practical Neural Network Training
Trajectory ensembling for fine tuning - performance gains without modifying training
Louise Anderson-Conway · Vighnesh Birodkar · Saurabh Singh · Hossein Mobahi · Alexander Alemi
In this work, we present a simple algorithm for ensembling checkpoints from a single training trajectory (trajectory ensembling) resulting in significant gains for several fine tuning tasks. We compare against classical ensembles and perform ablation studies showing that the important checkpoints are not necessarily the best performing models in terms of accuracy. Rather, relatively poor models with low loss are vital for the observed performance gains. We also investigate various mixtures of checkpoints from several independent training trajectories, making the surprising observation that this only leads to marginal gains in this setup. We study how calibrating constituent models with a simple temperature scaling impacts results, and find that the most important region of training is still that of the lowest loss in spite of potential poor accuracy.