Oral
in
Workshop: Synthetic Data Generation with Generative AI
Carpe Diem: On the Evaluation of World Knowledge in Lifelong Language Models
Yujin Kim · Jaehong Yoon · Seonghyeon Ye · Sung Ju Hwang · Se-Young Yun
Keywords: [ question answering ] [ Natural Language Processing ] [ continual learning ] [ language model ]
In an ever-evolving world, the dynamic nature of knowledge presents challenges for language models that are trained on static data, leading to outdated encoded information. However, real-world scenarios require models not only to acquire new knowledge but also to overwrite outdated information into updated ones. Addressing this, we introduce the temporally evolving question answering benchmark, EvolvingQA - a novel benchmark designed for training and evaluating LMs on an evolving Wikipedia database, where the construction of our benchmark is automated with our pipeline using large language models. Our benchmark incorporates question-answering as a downstream task to emulate real-world applications. Through EvolvingQA, we uncover that existing continual learning baselines have difficulty in updating and forgetting outdated knowledge. Our findings suggest that the models fail to learn properly when acquiring updated knowledge due to the small weight gradient. Furthermore, we elucidate that the models struggle mostly on providing numerical or temporal answers to questions asking for updated knowledge. Our work aims to model the dynamic nature of real-world information, offering a robust measure for the evolution-adaptability of language models. Our data construction code and dataset files are available at https://anonymous.4open.science/r/EvolvingQA/.