Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Safe Generative AI

Fine-Tuning Large Language Models to Appropriately Abstain with Semantic Entropy

Benedict Aaron Tjandra · Muhammed Razzak · Jannik Kossen · Yarin Gal


Abstract:

Large Language Models (LLMs) are known to hallucinate, whereby they generateplausible but inaccurate text. This phenomenon poses significant risks in criticalapplications, such as medicine or law, necessitating robust hallucination mitigationstrategies. While recent works have proposed fine-tuning methods to teach LLMsto abstain from answering questions beyond their knowledge or capabilities, thesemethods rely on the existence of ground-truth labels or are limited to short-formresponses. To address these limitations, we propose fine-tuning using semanticentropy, an uncertainty measure derived from introspection into the model whichdoes not require external labels. We demonstrate that our approach matches oroutperforms models fine-tuned using prior work and achieves strong performancefor both short and long-form generations on a range of datasets.

Chat is not available.