Skip to yearly menu bar Skip to main content


Poster+Demo Session
in
Workshop: Audio Imagination: NeurIPS 2024 Workshop AI-Driven Speech, Music, and Sound Generation

Latent Diffusion Model for Audio: Generation, Quality Enhancement, and Neural Audio Codec

Haohe Liu · Wenwu Wang · Mark Plumbley

[ ] [ Project Page ]
Sat 14 Dec 10:30 a.m. PST — noon PST

Abstract:

In this demo, we explore the versatile application of Latent Diffusion Models (LDMs) in audio tasks, showcasing their capabilities across three state-of-the-art systems: AudioLDM-2 for text-to-audio generation, AudioSR for audio quality enhancement, and SemantiCodec for ultra-low bitrate neural audio coding. AudioLDM-2 employs an LDM to decode high-quality audio from intermediate Audio Masked Autoencoder (AudioMAE) features, which are generated using a continuous language model conditioned on textual input. AudioSR leverages an LDM to perform robust audio super-resolution, enhancing the quality of low-resolution audio across various types and bandwidths, from speech and music to general sounds. SemantiCodec utilizes an LDM to efficiently decode audio from semantically rich, low-bitrate representations, demonstrating effective audio compression. Together, these systems illustrate the broad utility of LDM as audio decoder for diverse audio generation, enhancement, and neural audio codec tasks. This report highlights the significance of these innovations and outlines our demo objectives.

Chat is not available.