Skip to yearly menu bar Skip to main content


Oral
in
Workshop: Compositional Learning: Perspectives, Methods, and Paths Forward

Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning

Simran Kaur · Simon Park · Anirudh Goyal · Sanjeev Arora

Keywords: [ instruction tuning ] [ diverse synthetic data ] [ high quality synthetic data ]


Abstract: We introduce Instruct-SkillMix, an automated approach for creating diverse, high quality SFT data for instruction-following. The pipeline involves two stages, each leveraging an existing powerful LLM: (1) Skill extraction: uses the LLM to extract core ``skills'' for instruction-following, either from existing datasets (Didolkar et al., 2024), or by directly prompting the model; (2) Data generation: uses the powerful LLM to generate (instruction, response) data that exhibit a randomly chosen pair of these skills. Here, the use of random skill combinations promotes diversity and difficulty.Vanilla SFT (i.e., no PPO, DPO, or RL methods) on data generated from Instruct-SkillMix leads to strong gains on instruction following benchmarks such as AlpacaEval 2.0, MT-Bench, and WildBench. With just $4$K examples, LLaMA-3-8B-Base achieves 42.76\% length-controlled win rate on AlpacaEval 2.0, a level similar to frontier models like Claude 3 Opus and LLaMA-3.1-405B-Instruct.

Chat is not available.