Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Bayesian Decision-making and Uncertainty: from probabilistic and spatiotemporal modeling to sequential experiment design

Gaussian Randomized Exploration for Semi-bandits with Sleeping Arms

Zhiming Huang · Bingshan Hu · jianping pan

Keywords: [ Sleeping Arms ] [ Combinatorial Bandits ] [ Gaussian Priors ] [ Multi-armed Bandits ] [ Online Learning ] [ Semi-bandit Feedback ] [ thompson sampling ]


Abstract: This paper provides theoretical analyses of problem-independent upper and lower regret bounds for Gaussian randomized algorithms in semi-bandits with sleeping arms, where arms may be unavailable in certain rounds, and available arms satisfying combinatorial constraints can be played simultaneously. We first introduce the CTS-G algorithm, an adaptation of Thompson sampling with Gaussian priors, achieving an upper bound of $\tilde{O}(m\sqrt{NT})$ over $T$ rounds with $N$ arms and up to $m$ arms played per round, where $\tilde{O}$ hides the logarithmic factors. Next, we present CL-SG, which improves upon CTS-G by using a single Gaussian sample per round, achieving a near-optimal upper regret bound of $\tilde{O}(\sqrt{mNT})$. We also establish that both algorithms have lower regret bounds of $\Omega(\sqrt{mNT \ln \frac{N}{m}})$ and $\Omega(\sqrt{mNT})$, respectively.

Chat is not available.