Skip to yearly menu bar Skip to main content


Poster
in
Affinity Event: Black in AI

Self-Supervised Amharic Text-to-Speech Using Unified Encoder-Decoder Pre-Training

Rahel Mekonen Tamiru · ABEL ALEMU


Abstract:

This study presents the development of a Text-to-Speech system for Amharic, the secondmost commonly spoken Semitic language globally. Leveraging the SpeechT5 framework and 50 hours of audio data from both a female and male speaker, the fine-tuned model demonstrated promising results in terms of low loss values, high intelligibility, and good prosody. The system achieved a 98.5% correct rate in word listening tests and an average Mean Opinion Score of 4.2 in sentence listening tests, indicating effective communication and moderate naturalness. Future work aims to enhance the TTS system by recording additional audio data and incorporating gemination information, paving the way for broader language applications.

Live content is unavailable. Log in and register to view live content