Poster+Demo Session
in
Workshop: Audio Imagination: NeurIPS 2024 Workshop AI-Driven Speech, Music, and Sound Generation
Decoding Musical Perception: Music Stimuli Reconstruction from Brain Activity
Matteo Ciferri · Matteo Ferrante · Nicola Toschi
This study explores the feasibility of reconstructing musical stimuli from functional MRI (fMRI) data using generative models. Specifically, we employ MusicLDM, a latent diffusion model capable of generating music from text descriptions, in order to decode musical stimuli from fMRI signals. We first identify music-responsive regions in the brain by correlating neural activity with representations derived from the CLAP (Contrastive Language-Audio Pretraining) model. We then map the fMRI data from these music-responsive regions to the latent embeddings of MusicLDM using regression models, without relying on empirical descriptions of the musical stimuli. To enhance between-subject consistency, we apply functional alignment techniques to align neural data across participants. Our evaluation, based on Identification Accuracy, achieves a high correspondence between the reconstructed embeddings and the original musical stimuli in the MusicLDM space, with an accuracy of 91.4%, surpassing previous methods. Additionally, a human evaluation experiment showed that participants were able to identify the correct decoded stimulus with an average accuracy of 84.1%, further demonstrating the perceptual similarity between the original and reconstructed music. Future work will aim to improve temporal resolution and investigate applications in music cognition.