Poster
in
Workshop: System-2 Reasoning at Scale
Distilling System 2 into System 1
Ping Yu · Jing Xu · Jason Weston · Ilia Kulikov
Large language models (LLMs) can spend extra compute during inference to generate intermediate thoughts, which helps to produce better final responses. Since Chain-of-Thought \citep{CoT}, many such {\em System 2} techniques have been proposed such as Rephrase and Respond \citep{RaR}, System 2 Attention \citep{S2A} and Branch-Solve-Merge \citep{BSM}. In this work we investigate self-supervised methods to ``compile'' (distill) higher quality outputs from System 2 techniques back into LLM generations {\em without} intermediate reasoning token sequences, as this reasoning has been distilled into {\em System 1}. We show that several such techniques can be successfully distilled, resulting in improved results compared to the original System 1 performance, and with less inference cost than System 2. We posit that System 2 distillation will be an important feature of future continually learning AI systems, enabling them to focus System 2 capabilities on the reasoning tasks that they cannot yet do well.