Poster
in
Workshop: The Fourth Workshop on Efficient Natural Language and Speech Processing (ENLSP-IV): Highlighting New Architectures for Future Foundation Models
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks
Yongchang Hao · Yanshuai Cao · Lili Mou
Keywords: [ Efficient Training ]
The performance of neural networks improves when more parameters are used. However, the model sizes are constrained by the available on-device memory during training and inference. Although applying techniques like quantization can alleviate the constraint, they suffer from performance loss while using less memory. In this work, we introduce NeuZip, a new compression scheme for neural network weights that exploits the entropy structures of different components of floating point numbers appearing in typical neural net weights. With NeuZip, we are able to achieve memory-efficient training without any performance loss. In addition, our method reduces the memory requirements during both the forward and backward passes, applicable to model training and inference. Our empirical tests across various models and datasets demonstrate the effectiveness of NeuZip in memory reduction without sacrificing the model's ability.