Skip to yearly menu bar Skip to main content


KeyNote Talk
in
Workshop: Third Workshop on Efficient Natural Language and Speech Processing (ENLSP-III): Towards the Future of Large Language Models and their Emerging Descendants

Knowledge Consolidation and Utilization (In)Ability of Large Language Models

Sarath Chandar


Abstract:

Large language models (LLMs) are becoming increasingly used in various downstream applications not only in natural language processing but also in various other domains including computer vision, reinforcement learning, and scientific discovery to name a few. This talk will focus on some of the fundamental limitations of using LLMs as task solvers. In the first half of the talk, I will show that LLMs cannot consolidate the knowledge that is spread across training documents. In the second half, I will show that while LLMs can acquire simple facts from the training data, they cannot utilize all the acquired facts while solving a new task and this utilization gap gets worse when the task distribution is very different from the training data distribution. I will also show that scaling will not solve both of these issues and argue for better pre-training procedures.

Chat is not available.