At Apple, we believe accessibility is a human right. On-device ML model training is a key research area we focus on. In this talk we will share how we applied text-to-speech model adaptation technology to Apple devices to build the personal voice with limited number of recordings, so that the people who are at risk of losing their voice can store their voice and use it in live speech when they are not able to speak.
This talk will cover how we pre-train and fine-tune Text-to-speech models and how we preprocess user speech data to achieve the best voice quality and similarity. We will also explain how we deploy the entire flow to Apple device.