Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Statistical Frontiers in LLMs and Foundation Models

Mind the Gap: A Surgical Study on the Self-improvement Capabilities of LLMs

Yuda Song · Hanlin Zhang · Udaya Ghai · Carson Eisenach · Sham Kakade · Dean Foster

Keywords: [ LLM ] [ test-time inference ] [ self-improvement ] [ synthetic data ]

[ ] [ Project Page ]
Sat 14 Dec noon PST — 12:45 p.m. PST

Abstract:

Self-improvement is an promising mechanism in Large Language Model (LLM) pre-training, post-training and test-time inference. We explore a framework where the model verifies its own outputs, filters or reweights data based on this verification, and distills the filtered data. Despite several empirical successes, a fundamental understanding is still lacking. In this work, we initiate a comprehensive, modular and controlled study on LLM self-improvement. We provide a mathematical formulation for self-improvement, which is largely governed by a quantity which we coin as the generation-verification gap. Through experiments with various model families and tasks, we examine scaling properties, an iterative self-improvement procedure, and ways to improve its performance. We believe our results have several empirical implications, and our study leaves several exciting future directions for understanding the potential and limits of LLM self-improvement.

Chat is not available.