Poster+Demo Session
in
Workshop: Audio Imagination: NeurIPS 2024 Workshop AI-Driven Speech, Music, and Sound Generation
Do music LLMs learn symbolic concepts? A pilot study using probing and intervention
Wenye Ma · Xinyue Li · Gus Xia
Music large language models (LLMs) have shown impressive capabilities in generating long-term, high-quality music trained on raw audio or token sequences. However, the underlying mechanisms largely remain unexplored. Do these models generate music by simply relying on shallow contextual dependencies, or do they learn symbolic concepts, such as pitch and chord, similar to how the human mind processes music? To address this question, we conducted a pilot study to investigate and manipulate the hidden states of MERT and MusicGen, two state-of-the-art Transformer-based music LLMs. Experiments show that these models indeed acquire the concept of pitch and chord root, with a notable improvement in representational strength in deeper layers. Additionally, we see a strong preference for retaining pitch content over its stylistic counterpart, instrument timbre, and a similar relationship is observed between chord root note and chord quality. These observations offer valuable insights into the inner workings of music LLMs.