Poster Session
in
Workshop: Scientific Methods for Understanding Neural Networks
Transformers can reinforcement learn to approximate Gittins Index
Vladimir Petrov · Nikhil Vyas · Lucas Janson
Transformers have demonstrated the ability to approximate in-context a rich class of functions in supervised learning and more recently in reinforcement learning (RL) settings. In this work, we investigate the transformer's ability to in-context learn the Gittins index, an online RL algorithm computed via dynamic programming (DP) and known to be optimal in Bayesian Bernoulli bandits. Our experiments show that the transformer can learn to approximate this strategy very well in a pure RL manner, without expert demonstrations, especially after we account for the problem's underlying symmetric properties. Our results, therefore, serve as empirical evidence that the class of RL algorithms transformers can learn in context extends to include certain DP-based algorithms.