Poster
in
Workshop: Optimization for ML Workshop
WASH: Train your Ensemble with Communication-Efficient Weight Shuffling, then Average
Louis Fournier · Adel Nabli · Masih Aminbeidokhti · Marco Pedersoli · Eugene Belilovsky · Edouard Oyallon
Abstract:
Deep neural networks' performance is enhanced by ensemble methods, averaging the output of several models at an increased inference cost. Weight averaging methods aim at avoiding this issue by merging the models, but naive averaging results in poor performance for models in different loss basins. Distributed training methods like DART and PAPA have been proposed to train several models in parallel in the same basin but at the cost of ensembling accuracy and significant communication costs between models. We introduce WASH, a novel distributed method that outperforms previous approaches by randomly shuffling a small percentage of model weights during training, for a much lower communication cost.
Chat is not available.