NeurIPS Uncovering Uncertainty in Transformer Inference

Poster
in
Workshop: Foundation Model Interventions

Uncovering Uncertainty in Transformer Inference

Greyson Brothers · Willa Mannering · John Winder · Amber Tien

Keywords: [ Uncertainty ] [ Mechanistic Interpretability ] [ LLM ] [ Large Language Model ] [ Transformer ] [ Residual Stream ] [ Convergence ] [ Iterative Inference Hypothesis ] [ Natural Language Processing ]

[ Abstract ] [ Project Page ]

[ Slides] [ Poster] [ OpenReview]

Abstract:

We explore the Iterative Inference Hypothesis (IIH) within the context of transformer-based language models, aiming to understand how a model's latent representations are progressively refined and whether observable differences are present between correct and incorrect generations. Our findings provide empirical support for the IIH, showing that the n-th token embedding in the residual stream follows a trajectory of decreasing loss. Additionally, we observe that the rate at which residual embeddings converge to a stable output representation reflects uncertainty in the token generation process. Finally, we introduce a method utilizing cross-entropy to detect this uncertainty and demonstrate its potential to distinguish between correct and incorrect token generations on a dataset of idioms.

Chat is not available.

Poster in Workshop: Foundation Model Interventions

Uncovering Uncertainty in Transformer Inference

Greyson Brothers · Willa Mannering · John Winder · Amber Tien

Poster
in
Workshop: Foundation Model Interventions