Poster
in
Workshop: Efficient Natural Language and Speech Processing (Models, Training, and Inference)
Compressing Pre-trained Language Models using Progressive Low Rank Decomposition
Habib Hajimolahoseini · Mehdi Rezaghoizadeh · Vahid Partovi Nia · Marzieh Tahaei · Omar Mohamed Awad · Yang Liu
In this paper, a progressive low rank decomposition method is used to compress large-scale pre-trained transformer based language models. To this end, each fully-connected layers of the transformer modules are decomposed into two consecutive smaller ones using a progressive Singular Value Decomposition technique. In contrast to many of state-of-the-art compression methods where intensive pre-training of the compressed model is necessary, progressive LRD can provide promising performance by compressing the model in the fine-tuning stage. Furthermore, the current state-of-the-art model compression techniques usually face a limitation in their compression ratio as the accuracy gap becomes significant with compression ratios higher than 2×. We show that in later steps of the iterative compression where the decomposed models becomes much smaller than their original (compression factors larger than 8×), Knowledge Distillation can also be used to improve the performance.