Poster
in
Workshop: Workshop on Machine Learning and Compression
AdaQuantLM: LLM Quantization with Adaptive Bit-Widths
Shuangyi Chen · Ashish Khisti
Current LLM quantization methods focus on single bitwidth quantization, requiring time-consuming finetuning and benchmarking for each bitwidth version, which limits their adaptability to different scenarios. To address these challenges, we propose AdaQuantLM, a method for LLM quantization with adaptive bit-width. Inspired by techniques such as AdaBits and Additive Quantization for Language Models (AQLM), AdaQuantLM leverages the additivity of codewords in quantized models. This allows for the efficient conversion between different bit-widths by adding or removing specific codewords, eliminating the need for storing full-precision weights. Our approach jointly quantizes and fine-tunes LLMs across multiple bit-widths, enabling the model to adapt to devices with varying computational resources while maintaining performance. We demonstrate the effectiveness of AdaQuantLM through experiments on the Gemma-2b model, highlighting its potential for broad applicability in the efficient deployment of LLMs.