Skip to yearly menu bar Skip to main content


Poster+Demo Session
in
Workshop: Audio Imagination: NeurIPS 2024 Workshop AI-Driven Speech, Music, and Sound Generation

Contrastive Lyrics Alignment with a Timestamp-Informed Loss

Timon Kick · Florian Grötschla · Luca Lanzendörfer · Roger Wattenhofer

[ ] [ Project Page ]
Sat 14 Dec 4:15 p.m. PST — 5:30 p.m. PST

Abstract:

Recent multimodal methods for lyrics alignment have relied on large datasets. Our approach introduces a box loss that directly incorporates timestamp information into the loss function, enabling precise alignment and competitive results even with limited training data. We also address the noise present in the public DALI dataset, conducting a thorough cleaning process to improve the quality of training data. Finally, we propose JamendoLyrics++, a substantial extension of the common JamendoLyrics evaluation dataset, offering improved genre diversity for better evaluation of lyrics alignment systems.

Chat is not available.