Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Adaptive Foundation Models: Evolving AI for Personalized and Efficient Learning

Empowering LLM Agents with Zero-Shot Optimal Decision-Making through Q-learning

Jiajun Chai · Sicheng Li · Yuqian Fu · Dongbin Zhao · Yuanheng Zhu


Abstract:

Current Large language model (LLM) agents succeed in making zero-shot decisions but struggle to make optimal decisions, as they rely on pre-trained probabilities rather than maximizing expected future rewards. In contrast, agents trained via reinforcement learning (RL) could make optimal decisions but require extensive data. We develop an algorithm that combines the zero-shot capabilities of LLMs with the optimization of RL, referred to as the Model-based LLM Agent with Q-Learning (MLAQ). MLAQ employs Q-learning to derive optimal policies from transitions within memory. Unlike RL agents, MLAQ constructs an LLM-based imagination space, where a UCB variant generates imaginary data through interactions with the LLM-based world model to derive zero-shot policies. This approach achieves a sub-linear regret bound, as guaranteed by our theorem. Moreover, MLAQ employs a mixed-examination mechanism to further enhance the quality of imaginary data. We evaluate MLAQ on benchmarks that present significant challenges for existing LLM agents. Results show that MLAQ achieves a optimal rate of over 90\% in tasks where other methods struggle to succeed. Additional experiments are conducted to reach the conclusion that introducing model-based RL into LLM agents shows significant potential in optimal decision-making. Our website is available at http://mlaq.site/.

Chat is not available.