Poster
ESPACE: Dimensionality Reduction of Activations for Model Compression
Charbel Sakr · Brucek Khailany
We propose ESPACE, an LLM compression technique based on dimensionality reduction of activations. Unlike prior works on weight-centric tensor decomposition, ESPACE projects activations onto a pre-calibrated set of principal components. The activation-centrality of the approach enables retraining LLMs with no loss of expressivity; while at inference, weight decomposition is obtained as a byproduct of matrix multiplication associativity. Theoretical results on the construction of projection matrices with optimal computational accuracy are provided. Experimentally, we find ESPACE enables 50\% compression of GPT3 and Llama2 models with small accuracy degradation, as low as a 0.18 perplexity increase on GPT3-22B. At lower compression rates of 20\% to 40\%, ESPACE drives GPT3 models to outperforming their baseline, by up to a 0.38 decrease in perplexity for GPT3-8B. Comparison with related works on compressing the Llama2-7B model via alternate matrix factorization techniques shows that ESPACE is a first step in advancing the state-of-the-art in tensor decomposition compression of LLMs.
Live content is unavailable. Log in and register to view live content