Skip to yearly menu bar Skip to main content


Poster

A Rank-Based Metric for Evaluating Large Language Models

Lai Wei · Zhiquan Tan · Chenghai Li · Jindong Wang · Weiran Huang

[ ] [ Project Page ]
Wed 11 Dec 4:30 p.m. PST — 7:30 p.m. PST

Abstract:

Large Language Models (LLMs) have revolutionized the field of natural language processing, even extending their strong capabilities into multi-modal domains. Thus, defining proper and diversified metrics for evaluating LLMs is vital. In this paper, we introduce a rank-based metric called rank difference, which is rooted in information theory and geometry principles. Rank difference evaluates LLMs by examining their hidden representations to quantify how LLMs discard redundant information after training. Specifically, we demonstrate its applicability in both single-modal (language) and multi-modal settings. For language models, our findings reveal that the rank difference increases when the model scales up, which also demonstrates a consistent relationship with traditional metrics like loss and accuracy. For multi-modal models, we also propose an evaluation method based on rank difference for assessing alignment quality and we find that modern multi-modal large language models exhibit good alignment performance.

Live content is unavailable. Log in and register to view live content