Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Bayesian Decision-making and Uncertainty: from probabilistic and spatiotemporal modeling to sequential experiment design

An Information-Theoretic Analysis of Thompson Sampling for Logistic Bandits

Amaury Gouverneur · Borja Rodríguez Gálvez · Tobias Oechtering · Mikael Skoglund

Keywords: [ regret bounds ] [ Information Theory ] [ online optimization ] [ mutli–armed bandit ] [ logistic bandit ] [ thompson sampling ]


Abstract: We study the performance of the Thompson Sampling algorithm for logistic bandit problems, where the agent receives binary rewards with probabilities determined by a logistic function $\exp(\beta \langle a, \theta \rangle)/(1+\exp(\beta \langle a, \theta \rangle))$. We focus on the setting where the action $a$ and parameter $\theta$ lie within the $d$-dimensional unit ball with the action space encompassing the parameter space. Adopting the information-theoretic framework introduced by Russo and Van Roy (2015), we analyze the information ratio, which is defined as the ratio of the expected squared difference between the optimal and actual rewards to the mutual information between the optimal action and the reward. Improving upon previous results, we establish that the information ratio is bounded by $\tfrac{9}{2}d$. Notably, we obtain a regret bound in $O(d\sqrt{T \log(\beta T/d)})$ that depends only logarithmically on the parameter $\beta$.

Chat is not available.