Skip to yearly menu bar Skip to main content


Poster+Demo Session
in
Workshop: Audio Imagination: NeurIPS 2024 Workshop AI-Driven Speech, Music, and Sound Generation

Generating Vocals from Lyrics and Musical Accompaniment

Georg Streich · Luca Lanzendörfer · Florian Grötschla · Roger Wattenhofer

[ ] [ Project Page ]
Sat 14 Dec 4:15 p.m. PST — 5:30 p.m. PST

Abstract:

In this work, we introduce AutoSing, a novel framework designed to generate diverse and high-quality singing voices from provided lyrics and musical accompaniment. AutoSing extends an existing semantic token-based text-to-speech approach by incorporating musical accompaniment as an additional conditioning input. This enables AutoSing to synchronize its vocal output with the rhythm and melodic nuances of the accompaniment while adhering to the provided lyrics. Our contributions include a novel training scheme for autoregressive audio models applied to singing voice synthesis, as well as ablation studies to identify the best way to condition generation on musical accompaniment. We measure AutoSing's performance with subjective listening tests, demonstrating its capability to generate coherent and creative singing voices. Furthermore, we open-source our codebase to foster further research in the field of singing voice synthesis.

Chat is not available.