NeurIPS Distilling System 2 into System 1

Poster
in
Workshop: System-2 Reasoning at Scale

Distilling System 2 into System 1

Ping Yu · Jing Xu · Jason Weston · Ilia Kulikov

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

Large language models (LLMs) can spend extra compute during inference to generate intermediate thoughts, which helps to produce better final responses. Since Chain-of-Thought \citep{CoT}, many such {\em System 2} techniques have been proposed such as Rephrase and Respond \citep{RaR}, System 2 Attention \citep{S2A} and Branch-Solve-Merge \citep{BSM}. In this work we investigate self-supervised methods to ``compile'' (distill) higher quality outputs from System 2 techniques back into LLM generations {\em without} intermediate reasoning token sequences, as this reasoning has been distilled into {\em System 1}. We show that several such techniques can be successfully distilled, resulting in improved results compared to the original System 1 performance, and with less inference cost than System 2. We posit that System 2 distillation will be an important feature of future continually learning AI systems, enabling them to focus System 2 capabilities on the reasoning tasks that they cannot yet do well.

Chat is not available.

Poster in Workshop: System-2 Reasoning at Scale

Distilling System 2 into System 1

Ping Yu · Jing Xu · Jason Weston · Ilia Kulikov

Poster
in
Workshop: System-2 Reasoning at Scale