Spotlight
in
Workshop: Workshop on robustness of zero/few-shot learning in foundation models (R0-FoMo)
Uncertainty In Natural Language Explanations Of Large Language Models
Sree Harsha Tanneru · Chirag Agarwal · Himabindu Lakkaraju
Abstract:
Large Language Models (LLMs) are increasingly used as powerful tools for several high-stakes natural language processing (NLP) applications. Recent works on prompting claim to elicit intermediate reasoning steps and important tokens in LLMs to serve as proxy explanations for its predictions. However, there is no guarantee or certainty whether these explanations are reliable and reflect the LLM's true behavior. In this work, we introduce the first definitions of uncertainty in natural language explanations of LLMs, where we propose a novel approach $\textit{Probing Uncertainty}$ --- to quantify the confidence of the generated explanations. Our approach probes a neighbourhood of explanations of the LLM to estimate the uncertainty. While verbalized uncertainty involves prompting the LLM to express its confidence level in generated explanations, we show that it is not a reliable estimate of explanation confidence. Our empirical analysis reveals two key insights about uncertainty in generated natural language explanations: i) Verbalized uncertainty estimation using LLMs often exhibits high overconfidence, raising questions on the trustworthiness of its explanation, and ii) Explanation confidence calculated from the proposed metric is correlated with the faithfulness of an explanation, where lower explanation confidence pertains to explanations with lower faithfulness. Our study provides insights into the challenges and opportunities in quantifying uncertainty in explanations of LLMs, contributing to the broader discussion of explainability and trustworthiness in machine learning applications.
Chat is not available.