NeurIPS Semantic Entropy Neurons: Encoding Semantic Uncertainty in the Latent Space of LLMs

Poster
in
Workshop: Foundation Model Interventions

Semantic Entropy Neurons: Encoding Semantic Uncertainty in the Latent Space of LLMs

Jiatong Han · Jannik Kossen · Muhammed Razzak · Yarin Gal

Keywords: [ uncertainty estimation; large language models; linear probing; interpretability ]

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract: Uncertainty estimation in Large Language Models (LLMs) is challenging because token-level uncertainty includes uncertainty over lexical and syntactical variations, and thus fails to accurately capture uncertainty over the semantic meaning of the generation. To address this, Farquhar et al. have recently introduced semantic uncertainty (SE), which quantifies uncertainty in the semantic meaning by aggregating token-level probabilities of generations if they are semantically equivalent. Kossen et al. further demonstrated that SE can be cheaply and reliably captured using linear probes on the model hidden states. In this work, we build on these results and show that semantic uncertainty in LLMs can be predicted from only a very small set of neurons. We find these neurons by training linear probes with $L_1$ regularization. Our approach matches the performance of full-neuron probes in predicting SE. An intervention study further shows these neurons causally affect the semantic uncertainty of model generations. Our findings reveal how hidden-state neurons encode semantic uncertainty, present a method to manipulate this uncertainty, and contribute insights for the field of interpretability research.

Chat is not available.

Poster in Workshop: Foundation Model Interventions

Semantic Entropy Neurons: Encoding Semantic Uncertainty in the Latent Space of LLMs

Jiatong Han · Jannik Kossen · Muhammed Razzak · Yarin Gal

Poster
in
Workshop: Foundation Model Interventions