Poster
in
Workshop: Workshop on Behavioral Machine Learning
Principled probing of foundation models in the auditory modality
Etienne Bost · Mitsuko Aramaki · Richard Kronland-Martinet · Sølvi Ystad · Thierry Artières · Thomas Schatz
We leverage ecological theories of sound perception in humans and a carefully designed dataset of perceptually calibrated sounds to develop and carry out principled fine-grained probing of foundation models in relation to the auditory modality. We show that internal activations of the state-of-the-art audio foundation model BEATs correlate better with perceptual dimensions than a supervised audio classification model and a text-audio multimodal model and that all models fail to represent at least one perceptual dimension. We also report preliminary evidence suggesting that directions aligning invariantly with a perceptual dimension can be identified within the representation space at inner layers of the BEATs model. We briefly discuss future work and potential applications.