NeurIPS UT5: Pretraining Non autoregressive T5 with unrolled denoising

Poster
in
Workshop: Third Workshop on Efficient Natural Language and Speech Processing (ENLSP-III): Towards the Future of Large Language Models and their Emerging Descendants

UT5: Pretraining Non autoregressive T5 with unrolled denoising

Mahmoud Salem · Jiayu Ye · Frederick Liu · Chu-Cheng Lin

[ Abstract ]

Abstract:

Recent advances in Transformer-based LargeLanguage Models have made great strides innatural language generation. However, to decode K tokens, an autoregressive model needs K sequential forward passes, which may bea performance bottleneck for large languagemodels. Many non-autoregressive (NAR) re-search are aiming to address this sequentialitybottleneck, albeit many have focused on a ded-icated architecture in supervised benchmarks.In this work, we studied unsupervised pretrain-ing for non auto-regressive T5 models via un-rolled denoising and shown its SoTA results indownstream generation tasks such as SQuAD question generation and XSum

Chat is not available.

Poster in Workshop: Third Workshop on Efficient Natural Language and Speech Processing (ENLSP-III): Towards the Future of Large Language Models and their Emerging Descendants

UT5: Pretraining Non autoregressive T5 with unrolled denoising

Mahmoud Salem · Jiayu Ye · Frederick Liu · Chu-Cheng Lin

Poster
in
Workshop: Third Workshop on Efficient Natural Language and Speech Processing (ENLSP-III): Towards the Future of Large Language Models and their Emerging Descendants