Skip to yearly menu bar Skip to main content


Full Presentation
in
Session: Creative AI Session 4

Diff4Steer: Steerable Diffusion Prior for Generative Music Retrieval with Semantic Guidance

Joonseok Lee · Judith Yue Li · Xuchan Bao · Timo I. Denk · Kun Su · Fei Sha · Zhong Yi Wan · Dima Kuzmin

East Ballroom C
[ ]
Wed 11 Dec 4:30 p.m. PST — 7:30 p.m. PST

Abstract:

Current music retrieval systems often rely on deterministic seed embedding to represent user preference, limiting their ability to capture users' diverse and uncertain retrieval needs. To address this, we propose Diff4Steer, a novel generative retrieval framework that leverages generative models to synthesize potential directions for exploration, represented by "oracle" seed embeddings. These embeddings capture the distribution of user preferences given retrieval queries, enabling more flexible and creative music discovery. Diff4Steer's lightweight diffusion-based generative models provide a statistical prior on the target modality (audio), which can be steered by image or text inputs to generate samples in the audio embedding space. These samples are then used to retrieve candidates via nearest neighbor search. Our framework outperforms deterministic regression methods and LLM-based generative retrieval baseline in terms of retrieval and ranking metrics, demonstrating its effectiveness in capturing user preferences and providing diverse and relevant recommendations. We include appendix and website for demonstration in the supplementary materials.

Chat is not available.