Poster
in
Affinity Event: LatinX in AI
Testing Memory Capabilities in Large Language Models with the Sequential Ordered Recall Task
Mathis Pink · Vy Vo · Qinyuan Wu · Jianing Mu · Javier Turek · Uri Hasson · Kenneth Norman · Sebastian Michelmann · Alexander Huth · Mariya Toneva
Many benchmarks focus on evaluating Large Language Models (LLMs) on facts and semantic relations, primarily assessing their semantic memory. However, some memories in language are linked to their contexts, like time and place, following Human episodic memory. To address the gap in evaluating memory in LLMs, we introduce the Sequence Order Recall Task (SORT). SORT requires LLMs to recall the correct order of text segments from a text excerpt. We present an initial evaluation dataset, Book-SORT, comprising 36000 samples extracted from 9 books recently added to the public domain. When the text is given to models in-context, we find that instruction-tuned LLMs can perform this task. However, when models need to rely memory stored in their weights or not presented with the text excerpts, their accuracies drop below 60%, near or at chance levels. We hope that SORT will drive the development of memory-augmented LLMs.
Live content is unavailable. Log in and register to view live content