Poster
in
Workshop: Pluralistic Alignment Workshop
Conditioned Language Policy: A General Framework For Steerable Multi-Objective Finetuning
Kaiwen Wang · Rahul Kidambi · Ryan Sullivan · Alekh Agarwal · Christoph Dann · Andrea Michi · Marco Gelmi · Yunxuan Li · Raghav Gupta · Kumar Avinava Dubey · Alexandre Rame · Johan Ferret · Geoffrey Cideron · Le Hou · Hongkun Yu · Amr Ahmed · Aranyak Mehta · Leonard Hussenot · Olivier Bachem · Edouard Leurent
Reward-based finetuning is crucial for aligning language policies with intended behaviors (e.g., creativity and safety). A key challenge here is to develop steerable language models that trade-off multiple (conflicting) objectives in a flexible and efficient manner. This paper presents Conditioned Language Policies (CLP), a general framework for finetuning language models on multiple objectives. Building on techniques from multi-task training and parameter-efficient finetuning, CLP helps train steerable models that effectively trade off conflicting objectives at inference time. Notably, this does not require training or maintaining multiple models to achieve different trade-offs between the objectives. Through an extensive set of experiments and ablations, we show that the CLP framework enables learning steerable models that outperform and Pareto-dominate the current state-of-the-art approaches for multi-objective finetuning.