Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Workshop on Behavioral Machine Learning

Multimodal Integration in Audio-Visual Speech Recognition --- How Far Are We From Human-Level Robustness?

Marianne Schweitzer · Anna Montagnini · Abdellah Fourtassi · Thomas Schatz


Abstract:

This paper introduces a novel evaluation framework, inspired by methods from human psychophysics, to systematically assess the robustness of multimodal integration in audiovisual speech recognition (AVSR) models relative to human abilities. We present preliminary results on AV-HuBERT suggesting that multimodal integration in state-of-the-art (SOTA) AVSR models remains mediocre when compared to human performance and we discuss avenues for improvement.

Chat is not available.