Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Fine-Tuning in Modern Machine Learning: Principles and Scalability

On Efficient Distillation from LLMs to SLMs

Metod Jazbec · Menglin Xia · Ankur Mallick · Daniel Madrigal · Dongge Han · Samuel Kessler · Victor Ruehle


Abstract: Finetuning small language models (SLMs) on data generated by large language models (LLMs), a form of knowledge distillation, has recently been demonstrated to lead to significantly enhanced capabilities of small models across various domains (e.g., mathematical reasoning). However, current approaches typically require synthesizing a large number of new examples ($>100\textrm{K}$), which increases the resources and training time needed for finetuning. To address this issue, we investigate principles for making the distillation process more efficient by reducing the amount of synthetic data required. Specifically, we explore (i) incorporating SLM's feedback into the LLM's data generation process and (ii) including LLM's rationales (i.e., step-by-step solutions) in the distilled data. In our experiments using the Mistral7B model as the SLM on math reasoning tasks (GSM8K, MATH), we find that both feedback and rationales can help make finetuning with distillation more efficient (by requiring up to $\sim2\text{x}$ less synthetic data).

Chat is not available.