An Optimal Elimination Algorithm for Learning a Best Arm
Avinatan Hassidim, Ron Kupfer, Yaron Singer
Spotlight presentation: Orals & Spotlights Track 11: Learning Theory
on 2020-12-08T07:30:00-08:00 - 2020-12-08T07:40:00-08:00
on 2020-12-08T07:30:00-08:00 - 2020-12-08T07:40:00-08:00
Poster Session 2 (more posters)
on 2020-12-08T09:00:00-08:00 - 2020-12-08T11:00:00-08:00
GatherTown: Bandit algorithms and reinforcement learning ( Town B3 - Spot B3 )
on 2020-12-08T09:00:00-08:00 - 2020-12-08T11:00:00-08:00
GatherTown: Bandit algorithms and reinforcement learning ( Town B3 - Spot B3 )
Join GatherTown
Only iff poster is crowded, join Zoom . Authors have to start the Zoom call from their Profile page / Presentation History.
Only iff poster is crowded, join Zoom . Authors have to start the Zoom call from their Profile page / Presentation History.
Toggle Abstract Paper (in Proceedings / .pdf)
Abstract: We consider the classic problem of $(\epsilon,\delta)$-\texttt{PAC} learning a best arm where the goal is to identify with confidence $1-\delta$ an arm whose mean is an $\epsilon$-approximation to that of the highest mean arm in a multi-armed bandit setting. This problem is one of the most fundamental problems in statistics and learning theory, yet somewhat surprisingly its worst case sample complexity is not well understood. In this paper we propose a new approach for $(\epsilon,\delta)$-\texttt{PAC} learning a best arm. This approach leads to an algorithm whose sample complexity converges to \emph{exactly} the optimal sample complexity of $(\epsilon,\delta)$-learning the mean of $n$ arms separately and we complement this result with a conditional matching lower bound. More specifically: \begin{itemize} \item The algorithm's sample complexity converges to \emph{exactly} $\frac{n}{2\epsilon^2}\log \frac{1}{\delta}$ as $n$ grows and $\delta \geq \frac{1}{n}$; % \item We prove that no elimination algorithm obtains sample complexity arbitrarily lower than $\frac{n}{2\epsilon^2}\log \frac{1}{\delta}$. Elimination algorithms is a broad class of $(\epsilon,\delta)$-\texttt{PAC} best arm learning algorithms that includes many algorithms in the literature. \end{itemize} When $n$ is independent of $\delta$ our approach yields an algorithm whose sample complexity converges to $\frac{2n}{\epsilon^2} \log \frac{1}{\delta}$ as $n$ grows. In comparison with the best known algorithm for this problem our approach improves the sample complexity by a factor of over 1500 and over 6000 when $\delta\geq \frac{1}{n}$.