NeurIPS Artificial Neural Networks Explain Continuous Speech Perception in Humans

Poster
in
Workshop: UniReps: Unifying Representations in Neural Models

Artificial Neural Networks Explain Continuous Speech Perception in Humans

Gasser Elbanna · Josh McDermott

Keywords: [ Artificial Neural Networks ] [ Behavioral Experiment ] [ NeuroAI ] [ Speech Perception ] [ Phoneme Recognition ]

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

Humans have a remarkable ability to convert acoustic signals into linguistic representations. To advance toward the goal of building biologically plausible models that replicate this process, we developed an artificial neural network trained to generate sequences of American English phonemes from audio processed by a simulated cochlea. We trained the model with phoneme transcriptions inferred from text annotations of speech corpora. To compare the model to humans, we ran a behavioral experiment in which humans transcribed non-words, and evaluated the model on the same stimuli. While humans slightly outperformed the model, the model exhibited human-like patterns of phoneme confusions for consonants (r=0.91) and vowels (r=0.87). Additionally, the recognizability of individual phonemes was highly correlated (r=0.93) between humans and the model. These results suggest that human-like speech perception emerges from optimizing for phoneme recognition from cochlear representations.

Chat is not available.

Poster in Workshop: UniReps: Unifying Representations in Neural Models

Artificial Neural Networks Explain Continuous Speech Perception in Humans

Gasser Elbanna · Josh McDermott

Poster
in
Workshop: UniReps: Unifying Representations in Neural Models