NeurIPS Fine-tuning LLM Agents with Retrospective In-Context Online Learning

Oral
in
Workshop: Adaptive Foundation Models: Evolving AI for Personalized and Efficient Learning

Fine-tuning LLM Agents with Retrospective In-Context Online Learning

Wen-Tse Chen · Jiayu Chen · Fahim Tajwar · Hao Zhu · Xintong Duan · Ruslan Salakhutdinov · Jeff Schneider

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

Fine-tuning large language models (LLMs) using online learning, where models learn from self-sampled data and environmental feedback, presents a promising but challenging research direction due to the typically sparse nature of rewards. Traditional methods for addressing this challenge often involve training domain-specific Q-functions to convert sparse rewards into dense signals. However, these methods suffer from poor sample efficiency and limited generalizability.In this work, we propose a novel framework that leverages the pre-trained knowledge of LLMs to transform sparse rewards into dense supervised signals through in-context learning. Specifically, we introduce a retrospective in-context learning approach, where LLMs assign temporal credit to past actions based on feedback. Unlike previous approaches, which rely heavily on extensive feedback data or intricate prompt engineering, our method uses online learning to iteratively update the policy by combining in-context learning with gradient-based fine-tuning.We empirically demonstrate the effectiveness of our approach on the BabyAI benchmark, showing that it is significantly more sample-efficient than traditional online reinforcement learning (RL) algorithms while achieving comparable performance to imitation learning. Our findings suggest that LLM-based agents can refine their policies using sparse feedback in an online manner, making them more adaptive to dynamic environments.

Chat is not available.

Oral in Workshop: Adaptive Foundation Models: Evolving AI for Personalized and Efficient Learning

Fine-tuning LLM Agents with Retrospective In-Context Online Learning

Wen-Tse Chen · Jiayu Chen · Fahim Tajwar · Hao Zhu · Xintong Duan · Ruslan Salakhutdinov · Jeff Schneider

Oral
in
Workshop: Adaptive Foundation Models: Evolving AI for Personalized and Efficient Learning