Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Third Workshop on Efficient Natural Language and Speech Processing (ENLSP-III): Towards the Future of Large Language Models and their Emerging Descendants

Measuring and Improving Recall in Convolutional Language Models

Evan Sabri Eyuboglu · Simran Arora · Aman Timalsina · Isys Johnson · Michael Poli · James Zou · Atri Rudra · Christopher Ré


Abstract:

Convolution-based language models are asymptotically more efficient than Transformers as sequence length grows and are increasingly competitive in quality. To better understand the quality differences between these architectures, we pre-train a suite of 14 language models across attention and convolution-based architectures, finding that the SoTA gated convolution architectures still underperform Transformers by up to 2.1 perplexity points on the Pile. Our analysis shows that a single language modeling capability, termed associative recall (AR) accounts for 76% of the perplexity gap on average. The task requires recalling an association from earlier in the context, e.g. Hakuna Matata means no worries...Hakuna Matata it means no → ??. We show via experiments and theory that the associative recall solution encoded by convolution-based models is less parameter efficient than the one encoded by attention. The issue arises because convolution-based models process sequences using fixed filters that do not depend on the input data. Finally, we provide evidence that convolutional models with input-dependent filters can solve AR with improved parameter-efficiency.

Chat is not available.