Poster
in
Workshop: Third Workshop on Efficient Natural Language and Speech Processing (ENLSP-III): Towards the Future of Large Language Models and their Emerging Descendants
UT5: Pretraining Non autoregressive T5 with unrolled denoising
Mahmoud Salem · Jiayu Ye · Frederick Liu · Chu-Cheng Lin
Abstract:
Recent advances in Transformer-based LargeLanguage Models have made great strides innatural language generation. However, to decode K tokens, an autoregressive model needs K sequential forward passes, which may bea performance bottleneck for large languagemodels. Many non-autoregressive (NAR) re-search are aiming to address this sequentialitybottleneck, albeit many have focused on a ded-icated architecture in supervised benchmarks.In this work, we studied unsupervised pretrain-ing for non auto-regressive T5 models via un-rolled denoising and shown its SoTA results indownstream generation tasks such as SQuAD question generation and XSum
Chat is not available.