NeurIPS Poster Regularized Conditional Diffusion Model for Multi-Task Preference Alignment

Poster

Regularized Conditional Diffusion Model for Multi-Task Preference Alignment

Xudong Yu · Chenjia Bai · Haoran He · Changhong Wang · Xuelong Li

West Ballroom A-D #6609

[ Abstract ]

[ Paper] [ OpenReview]

Wed 11 Dec 11 a.m. PST — 2 p.m. PST

Abstract:

Sequential decision-making can be formulated as a conditional generation process, with targets for alignment with human intents and versatility across various tasks. Previous return-conditioned diffusion models manifest comparable performance but rely on well-defined reward functions, which requires amounts of human efforts and faces challenges in multi-task settings. Preferences serve as an alternative but recent work rarely considers preference learning given multiple tasks. To facilitate the alignment and versatility in multi-task preference learning, we adopt multi-task preferences as a unified framework. In this work, we propose to learn preference representations aligned with preference labels, which are then used as conditions to guide the conditional generation process of diffusion models. The traditional classifier-free guidance paradigm suffers from the inconsistency between the conditions and generated trajectories. We thus introduce an auxiliary regularization objective to maximize the mutual info

Chat is not available.