Skip to yearly menu bar Skip to main content


Poster Session
in
Workshop: Scientific Methods for Understanding Neural Networks

Transformers can reinforcement learn to approximate Gittins Index

Vladimir Petrov · Nikhil Vyas · Lucas Janson

[ ] [ Project Page ]
Sun 15 Dec 11:20 a.m. PST — 12:20 p.m. PST

Abstract:

Transformers have demonstrated the ability to approximate in-context a rich class of functions in supervised learning and more recently in reinforcement learning (RL) settings. In this work, we investigate the transformer's ability to in-context learn the Gittins index, an online RL algorithm computed via dynamic programming (DP) and known to be optimal in Bayesian Bernoulli bandits. Our experiments show that the transformer can learn to approximate this strategy very well in a pure RL manner, without expert demonstrations, especially after we account for the problem's underlying symmetric properties. Our results, therefore, serve as empirical evidence that the class of RL algorithms transformers can learn in context extends to include certain DP-based algorithms.

Chat is not available.