Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Workshop on Machine Learning and Compression

Empirical Upper Bounds for Unstructured Sparsity in Compute-Efficient Language Modeling

Esha Singh · Shane Bergsma · Nolan Dey · Joel Hestness · Gavia Gray


Abstract:

Sparsity in deep neural networks promises two improvements in computational efficiency, fewer FLOPs spent both to train the network and to perform inference. We find that both may be quantified best using a compute-efficient scaling law. This tool allows us to compare existing methods to train networks with unstructured sparse regularisation and parametrization. In this setting, it is natural to focus on the proportion of weights in the network whose magnitude is below a given threshold and assume that those weights do not affect the output of the network. However, we may not know where that threshold is, so we aim to separate our analysis from a specific threshold. By evaluating the network sparsity at many possible thresholds we can characterise an empirical upper bound on the advantage of sparsity for pre-training large language models. We test this bound comparing the performance of existing sparse regularization methods to find a 15\% reduction in pre-training FLOPs or a 30-40\% reduction in inference FLOPs and further identify decoupled proximal methods as a promising direction.

Chat is not available.