Poster
in
Workshop: NeurIPS 2023 Workshop on Diffusion Models
Discrete Diffusion Language Modeling by Estimating the Ratios of the Data Distribution
Aaron Lou · Chenlin Meng · Stefano Ermon
Abstract:
Despite their groundbreaking performance for many generative modeling tasks, diffusion models have fallen short on discrete data domains such as natural language. Crucially, standard diffusion models rely on the well-established theory of score matching, but efforts to generalize this to discrete structures have not yielded the same empirical gains. In this work, we bridge this gap by proposing score entropy, a novel discrete score matching loss that is more stable than existing methods, forms an ELBO for maximum likelihood training, and can be efficiently optimized with a denoising variant. Combined with architectural improvements, we scale to the GPT-2 language modeling experiments, achieving, for the first time, highly competitive performance with a non-autoregressive model. When comparing similarly sized-architectures to the GPT-2 baseline, our score entropy discrete diffusion (SEDD) model attains comparable zero-shot perplexities despite reporting an upper bound (within $+15$ percent and sometimes outperforming the baseline), can generate better distributions samples faster ($4\times$ lower generative perplexity when matching function evaluations and $16\times$ fewer function evaluations when matching generative perplexity compared to analytic sampling), and enables arbitrary infilling beyond standard autoregressive left to right prompting.
Chat is not available.