Poster
in
Workshop: The Fourth Workshop on Efficient Natural Language and Speech Processing (ENLSP-IV): Highlighting New Architectures for Future Foundation Models
Approximations may be all you need: Towards Pre-training LLMs with Low-Rank Decomposition and Optimizers
Namrata Shivagunde · Mayank Kulkarni · Giannis Karamanolakis · Jack FitzGerald · Yannick Versley · Saleh Soltan · Volkan Cevher · Jianhua Lu · Anna Rumshisky
Keywords: [ Evaluation and Benchmarking of Efficient Models ] [ Efficient Training ]
Large language models (LLMs) have achieved remarkable performance on various natural language processing tasks, but training LLMs at scale is extremely resource-intensive, requiring substantial computational power, memory, and energy consumption. This has motivated research into efficient training methods, particularly during the pre-training phase. There are two main approaches to approximate full-rank training which have emerged to address this challenge: low-rank model decomposition (e.g., ReLoRA) and memory-efficient optimizers (e.g., GaLore). In this work, we systematically evaluate both lines of research on a range of metrics, including validation perplexity, memory usage and throughput. Additionally, we propose improvements on both low-rank decomposition methods, by improving the low-rank matrices decomposition and initialization, and memory efficient optimizer methods via the introduction of error feedback and dynamic update steps. Our comprehensive evaluation under the same experimental setting shows that our proposed optimizations outperform all previous methods, achieving almost same throughput as full-rank training, saving 9% in memory traded off for a 1.5% increase in validation perplexity.