Poster
in
Workshop: Machine Learning for Systems
LLMSteer: Improving Long-Context LLM Inference by Steering Attention on Reused Contexts
Zhuohan Gu · Jiayi Yao · Kuntai Du · Junchen Jiang
Abstract:
As large language models (LLMs) show impressive performance on complex tasks, they still struggle with longer contextual understanding and high computational costs. To balance efficiency and quality, we introduce LLMSteer, a fine-tuning-free framework that enhances LLMs through query-independent attention steering. Tested on popular LLMs and datasets, LLMSteer narrows the performance gap with baselines by 65.9% and reduces the runtime delay by up to 4.8× compared to recent attention steering methods.
Chat is not available.