Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Adaptive Foundation Models: Evolving AI for Personalized and Efficient Learning

Optimizing Multi-Round Enhanced Training in Diffusion Models for Improved Preference Understanding

Yangfan He · Jianhui Wang · Haoyuan Li · Sida Li · Li Sun · TIANYU SHI


Abstract:

Generative AI has revolutionized industries by enabling text-driven image generation, yet challenges remain in achieving high-resolution outputs that align with nuanced user preferences. Consequently, we need multi-round interactions to ensure the generated images meet their expectations. Previous methods focused on enhancing prompts to make the generated images fit with user needs using reward feedback, however, it hasn't considered optimization using multi-round dialogue dataset. In this research, We present a Visual Co-Adaptation (VCA) framework that incorporates human-in-the-loop feedback, utilizing a well-trained reward model specifically designed to closely align with human preferences. Leveraging a diverse multi-turn dialogue dataset, the framework applies multiple reward functions—such as diversity, consistency, and preference feedback—while fine-tuning the diffusion model through LoRA, effectively optimizing image generation based on user input. We also constructed multi-round dialogue datasets with prompts and image pairs that well fit user intent. Various experiments demonstrate the effectiveness of the proposed method over state-of-the-art baselines, with significant improvements in image consistency and alignment with user intent. Our approach consistently surpasses competing models in user satisfaction, particularly in multi-turn dialogue scenarios.

Chat is not available.