Skip to yearly menu bar Skip to main content


Poster+Demo Session
in
Workshop: Audio Imagination: NeurIPS 2024 Workshop AI-Driven Speech, Music, and Sound Generation

Neural Audio Codec for Latent Music Representations

Luca Lanzendörfer · Florian Grötschla · Amir Dellali · Roger Wattenhofer

[ ] [ Project Page ]
Sat 14 Dec 10:30 a.m. PST — noon PST

Abstract:

Neural audio codecs have become increasingly important for audio compression and, more recently, for creating tokenized representations for various generative downstream tasks. Consequently, the performance of neural audio codecs plays a crucial role in many applications. In this work, we introduce DisCodec, a high-fidelity neural audio codec for compressing 44.1kHz music into discrete or continuous latent representations. DisCodec leverages ConvNeXt and attention layers, an affine re-parametrization of the code vectors, and an improved commitment loss for better alignment between codebooks and model embeddings. We study comparisons of DisCodec against existing codecs, perform a comprehensive ablation of the proposed architecture, and demonstrate its performance against state-of-the-art neural audio codecs. We make the DisCodec codebase and model checkpoints available at https://github.com/ETH-DISCO/discodec.

Chat is not available.