Poster
in
Affinity Workshop: Muslims in ML
Towards Understanding Speaker Identity Coding in Data-driven Speech Models
Gasser Elbanna · Fabio Catania · Satra Ghosh
Keywords: [ Self-supervised learning ] [ representational similarity ] [ speaker perception ] [ speaker identity coding ]
Speaker identity plays a significant role in human communication and is being increasingly used in societal applications, many through advances in machine learning. Representational spaces of current deep learning models, self-supervised models in particular, have shown significant performance in various speech-related tasks. In this work, we demonstrate that these representations are significantly better for speaker identification over acoustic representations. We also show that such a speaker identification task can be used to better understand the nature of acoustic information representation in different layers of these powerful networks. By evaluating speaker identification accuracy across acoustic, phonemic, prosodic, and linguistic variants, we report similarity between model performance and human identity perception. These empirical findings provide both enhanced interpretability to these representational spaces and also support using this family of models as candidates to study speaker identity perception in humans.