Skip to yearly menu bar Skip to main content


Poster+Demo Session
in
Workshop: Audio Imagination: NeurIPS 2024 Workshop AI-Driven Speech, Music, and Sound Generation

Do music LLMs learn symbolic concepts? A pilot study using probing and intervention

Wenye Ma · Xinyue Li · Gus Xia

[ ] [ Project Page ]
Sat 14 Dec 10:30 a.m. PST — noon PST

Abstract:

Music large language models (LLMs) have shown impressive capabilities in generating long-term, high-quality music trained on raw audio or token sequences. However, the underlying mechanisms largely remain unexplored. Do these models generate music by simply relying on shallow contextual dependencies, or do they learn symbolic concepts, such as pitch and chord, similar to how the human mind processes music? To address this question, we conducted a pilot study to investigate and manipulate the hidden states of MERT and MusicGen, two state-of-the-art Transformer-based music LLMs. Experiments show that these models indeed acquire the concept of pitch and chord root, with a notable improvement in representational strength in deeper layers. Additionally, we see a strong preference for retaining pitch content over its stylistic counterpart, instrument timbre, and a similar relationship is observed between chord root note and chord quality. These observations offer valuable insights into the inner workings of music LLMs.

Chat is not available.