Poster
MARBLE: Music Audio Representation Benchmark for Universal Evaluation
Ruibin Yuan · Yinghao Ma · Yizhi Li · Ge Zhang · Xingran Chen · Hanzhi Yin · zhuo le · Yiqi Liu · Jiawen Huang · Zeyue Tian · Binyue Deng · Ningzhi Wang · Chenghua Lin · Emmanouil Benetos · Anton Ragni · Norbert Gyenge · Roger Dannenberg · Wenhu Chen · Gus Xia · Wei Xue · Si Liu · Shi Wang · Ruibo Liu · Yike Guo · Jie Fu
Great Hall & Hall B1+B2 (level 1) #1106
In the era of extensive intersection between art and Artificial Intelligence (AI), such as image generation and fiction co-creation, AI for music remains relatively nascent, particularly in music understanding. This is evident in the limited work on deep music representations, the scarcity of large-scale datasets, and the absence of a universal and community-driven benchmark. To address this issue, we introduce the Music Audio Representation Benchmark for universaL Evaluation, termed MARBLE. It aims to provide a benchmark for various Music Information Retrieval (MIR) tasks by defining a comprehensive taxonomy with four hierarchy levels, including acoustic, performance, score, and high-level description. We then establish a unified protocol based on 18 tasks on 12 public-available datasets, providing a fair and standard assessment of representations of all open-sourced pre-trained models developed on music recordings as baselines. Besides, MARBLE offers an easy-to-use, extendable, and reproducible suite for the community, with a clear statement on copyright issues on datasets. Results suggest recently proposed large-scale pre-trained musical language models perform the best in most tasks, with room for further improvement. The leaderboard and toolkit repository are published to promote future music AI research.