NeurIPS Poster Strategic Multi-Armed Bandit Problems Under Debt-Free Reporting

Poster

Strategic Multi-Armed Bandit Problems Under Debt-Free Reporting

Ahmed Ben Yahmed · Clément Calauzènes · Vianney Perchet

West Ballroom A-D #5606

[ Abstract ]

[ Paper] [ Poster] [ OpenReview]

Fri 13 Dec 4:30 p.m. PST — 7:30 p.m. PST

Abstract: We examine multi-armed bandit problems featuring strategic arms under debt-free reporting. In this context, each arm is characterized by a bounded support reward distribution and strategically aims to maximize its own utility by retaining a portion of the observed reward, potentially disclosing only a fraction of it to the player. This scenario unfolds as a game over $T$ rounds, leading to a competition of objectives between the player, aiming to minimize regret, and the arms, motivated by the desire to maximize their individual utilities. To address these dynamics, we propose an algorithm that establishes an equilibrium wherein each arm behaves truthfully and discloses as much of its rewards as possible. Utilizing this algorithm, the player can attain the second-highest average (true) reward among arms, with a cumulative regret bounded by $O(\log(T)/\Delta)$ (problem-dependent) or $O(\sqrt{T\log(T)})$ (worst-case).

Chat is not available.