Skip to yearly menu bar Skip to main content


Poster

On Weak Regret Analysis for Dueling Bandits

El Mehdi Saad · Alexandra Carpentier · Tomáš Kocák · Nicolas Verzelen

West Ballroom A-D #7204
[ ]
Wed 11 Dec 4:30 p.m. PST — 7:30 p.m. PST

Abstract: We consider the problem of $K$-armed dueling bandits in the stochastic setting, under the sole assumption of the existence of a Condorcet winner. We study the objective of weak regret minimization, where the learner doesn't incur any loss if one of the selected arms is a Condorcet winner—unlike strong regret minimization, where the learner has to select the Condorcet winner twice to incur no loss. This study is particularly motivated by practical scenarios such as content recommendation and online advertising, where frequently only one optimal choice out of the two presented options is necessary to achieve user satisfaction or engagement. This necessitates the development of strategies with more exploration. While existing literature introduces strategies for weak regret with constant bounds (that do not depend on the time horizon), the optimality of these strategies remains an unresolved question. This problem turns out to be really challenging as the optimal regret should heavily depend on the full structure of the dueling problem at hand, and in particular on whether the Condorcet winner has a large minimal optimality gap with the other arms. Our contribution is threefold: first, when said optimality gap is not negligible compared to other properties of the gap matrix, we characterize the optimal budget as a function of $K$ and the optimality gap. Second, we propose a new strategy called \wrtinf that achieves this optimal regret and improves over the state-of-the-art both in $K$ and the optimality gap. When the optimality gap is negligible, we propose another algorithm that outperforms our first algorithm, highlighting the subtlety of this dueling bandit problem. Finally, we provide numerical simulations to assess our theoretical findings.

Chat is not available.