Expo Workshop
East Ballroom A, B

Post-training LLMs is a critical step to make LLMs follow instructions, align with human values, reduce hallucinations, etc,. Besides standard Supervised Finetuning (SFT), Direct Preference Optimization (DPO) and Reinforcement Learning from Human Feedback (RLHF) are the commonly used methods for post-training. Also, the test time scaling has become increasingly popular. In this active training, we present a comprehensive introduction of various post-training methods, how they are implemented in practice and a high level overview of inference-time scaling methods. Through this training, attendees will gain a basic understanding of i) the necessity and formal problem formulation of post-training; ii) commonly-used post-training methods and their theory. iii) a live demo that shows how to use existing training infrastructures to build post-training pipelines. iv) a live demo that shows how to use Monte-Carlo Tree Search (MCTS) to boost inference-time performance.

Chat is not available.