NeurIPS Conformal Language Model Reasoning with Coherent Factuality

Poster
in
Workshop: Statistical Frontiers in LLMs and Foundation Models

Conformal Language Model Reasoning with Coherent Factuality

Maya Gambhir · Maxon Rubin-Toles · Keshav Ramji · Aaron Roth · Surbhi Goel

Keywords: [ factuality ] [ conformal prediction ] [ reasoning ] [ coherence ] [ language models ] [ graph representation ]

[ Abstract ] [ Project Page ]

[ OpenReview]

Sat 14 Dec 3:45 p.m. PST — 4:30 p.m. PST

Abstract:

Language models are increasingly being used in important decision pipelines, so ensuring the correctness of their outputs is crucial. Recent work has proposed evaluating the “factuality” of subclaims decomposed from a language model generation and applying conformal prediction techniques to filter out those subclaims that are not factual. This can be effective for tasks such as information retrieval, where constituent claims may be evaluated in isolation for factuality, but is not appropriate for reasoning tasks, as steps of a logical argument can be evaluated for correctness only within the context of the claims that have preceded them. We call this “coherent factuality” and develop a conformal-prediction-based method to guarantee coherent factuality of language model outputs. Our approach applies split conformal prediction to subgraphs within a dependency graph that we construct to represent the steps of a reasoning problem. We evaluate our method on mathematical reasoning problems from the MATH dataset, and find that our algorithm achieves coherent factuality across target coverage levels, consistently producing orderings of correct claims that are substantiated by previous ones. Moreover, we achieve 90\% factuality on our more strict definition while retaining 80\% or more of the original subclaims, highlighting the utility of our dependency graph-guided approach.

Chat is not available.

Poster in Workshop: Statistical Frontiers in LLMs and Foundation Models

Conformal Language Model Reasoning with Coherent Factuality

Maya Gambhir · Maxon Rubin-Toles · Keshav Ramji · Aaron Roth · Surbhi Goel

Poster
in
Workshop: Statistical Frontiers in LLMs and Foundation Models