Poster+Demo Session
in
Workshop: Audio Imagination: NeurIPS 2024 Workshop AI-Driven Speech, Music, and Sound Generation
Neural Audio Codec for Latent Music Representations
Luca Lanzendörfer · Florian Grötschla · Amir Dellali · Roger Wattenhofer
Neural audio codecs have become increasingly important for audio compression and, more recently, for creating tokenized representations for various generative downstream tasks. Consequently, the performance of neural audio codecs plays a crucial role in many applications. In this work, we introduce DisCodec, a high-fidelity neural audio codec for compressing 44.1kHz music into discrete or continuous latent representations. DisCodec leverages ConvNeXt and attention layers, an affine re-parametrization of the code vectors, and an improved commitment loss for better alignment between codebooks and model embeddings. We study comparisons of DisCodec against existing codecs, perform a comprehensive ablation of the proposed architecture, and demonstrate its performance against state-of-the-art neural audio codecs. We make the DisCodec codebase and model checkpoints available at https://github.com/ETH-DISCO/discodec.