Spotlight
in
Workshop: Information-Theoretic Principles in Cognitive Systems
The more human-like the language model, the more surprisal is the best predictor of N400 amplitude
James Michaelov · Benjamin Bergen
Under information-theoretic accounts of language comprehension, the effort required to process a word is correlated with surprisal, the negative log-probability of that word given its context. This can (equivalently) be considered to reflect cognitive effort in proportion to the amount of information conveyed by a given word (Frank et al., 2015), or the amount of effort required to update the our incremental predictions about upcoming words (Levy, 2008; Aurnhammer and Frank, 2019). In contrast, others (e.g. Brothers and Kuperberg, 2021) have argued that processing difficulty is proportional to the contextual probability of a word, thus positing a linear (rather than logarithmic) relationship between word probability and processing difficulty. We investigate which of these two accounts best explain the N400, a neural response that provides some of the best evidence for prediction in language comprehension (Kutas et al., 2011; Van Petten and Luka, 2012; Kuperberg et al., 2020). To do this, we expand upon previous work by comparing how well the probability and surprisal calculated by 43 transformer language models predict N400 amplitude. We thus investigate both which models’ predictions best predict the N400, and for each model, whether surprisal or probability is more closely correlated with N400 amplitude. We find that of the models tested, OPT-6.7B and GPT-J are reliably the best at predicting N400 amplitude, and that for these transformers, surprisal is the better predictor. In fact, we find that the more highly correlated the predictions of a language model are with N400 amplitude, the greater the extent to which surprisal is a better predictor than probability. Since language models that more closely mirror human statistical knowledge are more likely to be informative about the human predictive system, these results support the information-theoretic account of language comprehension.