NeurIPS LLMSteer: Improving Long-Context LLM Inference by Steering Attention on Reused Contexts

Poster
in
Workshop: Machine Learning for Systems

LLMSteer: Improving Long-Context LLM Inference by Steering Attention on Reused Contexts

Zhuohan Gu · Jiayi Yao · Kuntai Du · Junchen Jiang

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

As large language models (LLMs) show impressive performance on complex tasks, they still struggle with longer contextual understanding and high computational costs. To balance efficiency and quality, we introduce LLMSteer, a fine-tuning-free framework that enhances LLMs through query-independent attention steering. Tested on popular LLMs and datasets, LLMSteer narrows the performance gap with baselines by 65.9% and reduces the runtime delay by up to 4.8× compared to recent attention steering methods.

Chat is not available.

Poster in Workshop: Machine Learning for Systems

LLMSteer: Improving Long-Context LLM Inference by Steering Attention on Reused Contexts

Zhuohan Gu · Jiayi Yao · Kuntai Du · Junchen Jiang

Poster
in
Workshop: Machine Learning for Systems