Skip to yearly menu bar Skip to main content


Poster

SSDM: Scalable Speech Dysfluency Modeling

Jiachen Lian · Xuanru Zhou · Zoe Ezzes · Jet Vonk · Brittany Morin · David Paul Baquirin · Zachary Miller · Maria Luisa Gorno Tempini · Gopala Anumanchipalli

[ ] [ Project Page ]
Thu 12 Dec 4:30 p.m. PST — 7:30 p.m. PST

Abstract:

Speech dysfluency modeling is the core module for spoken language learning, and speech therapy. However, there are three challenges. First, current state-of-the-art solutions~~\cite{lian2023unconstrained-udm, lian-anumanchipalli-2024-towards-hudm} suffer from poor scalability. Second, there is a lack of a large-scale dysfluency corpus. Third, there is not an effective learning framework. In other words, we are at an LeNet~\cite{lecun1998gradient-lenet}-moment. In this paper, we propose \textit{SSDM: Scalable Speech Dysfluency Modeling}, which (1) adopts articulatory gestures as scalable forced alignment; (2) introduces connectionist subsequence aligner (CSA) to achieve dysfluency alignment; (3) introduces a large-scale simulated dysfluency corpus called Libri-Dys; and (4) develops an end-to-end system by leveraging the power of large language models (LLMs). We expect SSDM to serve as a standard in the area of dysfluency modeling. Demo is available at \url{https://eureka235.github.io}.

Live content is unavailable. Log in and register to view live content