Poster
in
Workshop: UniReps: Unifying Representations in Neural Models
Artificial Neural Networks Explain Continuous Speech Perception in Humans
Gasser Elbanna · Josh McDermott
Keywords: [ Artificial Neural Networks ] [ Behavioral Experiment ] [ NeuroAI ] [ Speech Perception ] [ Phoneme Recognition ]
Humans have a remarkable ability to convert acoustic signals into linguistic representations. To advance toward the goal of building biologically plausible models that replicate this process, we developed an artificial neural network trained to generate sequences of American English phonemes from audio processed by a simulated cochlea. We trained the model with phoneme transcriptions inferred from text annotations of speech corpora. To compare the model to humans, we ran a behavioral experiment in which humans transcribed non-words, and evaluated the model on the same stimuli. While humans slightly outperformed the model, the model exhibited human-like patterns of phoneme confusions for consonants (r=0.91) and vowels (r=0.87). Additionally, the recognizability of individual phonemes was highly correlated (r=0.93) between humans and the model. These results suggest that human-like speech perception emerges from optimizing for phoneme recognition from cochlear representations.