NeurIPS Eagle: Efficient Training-Free Router for Multi-LLM Inference

Poster
in
Workshop: Machine Learning for Systems

Eagle: Efficient Training-Free Router for Multi-LLM Inference

Zesen Zhao · Shuowei Jin · Zhuoqing Morley Mao

[ Abstract ] [ Project Page ]

[ Poster] [ OpenReview]

Abstract:

The proliferation of Large Language Models (LLMs) with varying capabilities and costs has created a need for efficient model selection in AI systems. LLM routers address this need by dynamically choosing the most suitable model for a given query based on task requirements and budget constraints. However, existing routers face challenges in scalability and real-time adaptation, particularly in high-volume online environments. We present Eagle, a novel LLM routing approach that combines global and local ELO ranking modules to overcome these limitations. By evaluating both general and specialized LLM abilities, Eagle provides a scalable, training-free solution that enhances model selection quality while reducing computational overhead. Our experiments across multiple datasets show Eagle consistently outperforms baseline methods, with improvements of up to 23.52% in Area Under Curve (AUC) scores. Moreover, Eagle demonstrates remarkable efficiency, requiring only 1/20 of baseline methods’ time for initialization and 100-200x faster incremental updates in online scenarios, making it well-suited for dynamic, high-volume online serving environments.

Chat is not available.

Poster in Workshop: Machine Learning for Systems

Eagle: Efficient Training-Free Router for Multi-LLM Inference

Zesen Zhao · Shuowei Jin · Zhuoqing Morley Mao

Poster
in
Workshop: Machine Learning for Systems