Skip to yearly menu bar Skip to main content


Poster

A Full-duplex Speech Dialogue Scheme Based On Large Language Model

Peng Wang · Songshuo Lu · Yaohua Tang · Sijie Yan · Yuanjun Xiong · Wei Xia

[ ]
Fri 13 Dec 4:30 p.m. PST — 7:30 p.m. PST

Abstract:

We present a generative dialogue system capable of operating in a full-duplex manner, allowing for seamless interaction.It is based on a large language model carefully aligned to be aware of a perception module, a motor function module, and the concept of a simple finite state machine (called neural FSM) with two states. The perception and motor function modules operate in tandem, allowing the system to simultaneously speak and listen to the user. The LLM generates textual tokens for inquiry responses and makes autonomous decisions to start responding to, wait for, or interrupt the user by emitting control tokens to the neural FSM. All these tasks of the LLM can then be carried out by simply performing the inherent next token prediction task on a serialized view of the dialogue in real time. In automatic quality evaluations simulating real-life interaction, the system reduces the conversation response latency by more than 3 fold compared with LLM-based half-duplex dialogue systems. With a model with only 8 billion parameters, our system exhibits 8% higher interruption precision than the best available commercial LLM for voice-based dialogue.

Live content is unavailable. Log in and register to view live content