Poster
in
Workshop: Workshop on Advancing Neural Network Training (WANT): Computational Efficiency, Scalability, and Resource Optimization
Sparse Iso-FLOP Transformations for Maximizing Training Efficiency
Vithursan Thangarasa · Shreyas Saxena · Abhay Gupta · Sean Lie
Recent works have explored the use of weight sparsity to improve the trainingefficiency (test accuracy w.r.t training FLOPs) of deep neural networks (DNNs).These works aim to reduce training FLOPs but training with sparse weights oftenleads to accuracy loss or requires longer training schedules, making theresulting training efficiency less clear. In contrast, we focus on usingsparsity to increase accuracy while using the same FLOPS as the dense model andshow training efficiency gains through higher accuracy. In this work, weintroduce Sparse-IFT, a family of Sparse Iso-FLOP Transformations which are usedas drop-in replacements for dense layers to improve their representationalcapacity and FLOP efficiency. Each transformation is parameterized by a singlehyperparameter (sparsity level) and provides a larger search space to findoptimal sparse masks. Without changing any training hyperparameters, replacingdense layers with Sparse-IFT leads to significant improvements across computervision and natural language processing tasks, including ResNet-18 onImageNet (+3.5\%) and GPT-3 Small on WikiText-103 (-0.4 PPL), both matchinglarger dense model variants that use 2x or more FLOPs. To our knowledge, this isthe first work to demonstrate the use of sparsity for improving the accuracy ofdense models via a simple set of sparse transformations.