Skip to yearly menu bar Skip to main content


Poster

Corruption-Robust Linear Bandits: Minimax Optimality and Gap-Dependent Misspecification

Haolin Liu · Artin Tajdini · Andrew Wagenmaker · Chen-Yu Wei

[ ]
Wed 11 Dec 4:30 p.m. PST — 7:30 p.m. PST

Abstract:

How can we best learn in the presence of reward corruptions in interactive decision making? While a significant amount of work has been done to answer this question, we lack a holistic understanding for different types of adversary and measures of corruption, as well as a full characterization of the minimax regret bounds. In this work, we seek to develop a tight characterization of this problem in the linear bandit setting. We provide matching upper and lower bounds in the corrupted stochastic setting, and initiate the study on the corrupted adversarial setting, for which we obtain optimal scaling in the corruption level.We then argue that there is a deep connection between corruption-robust learning and learning with gap-dependent misspecification, a setting first studied by Liu et al. (2023) where it is assumed the misspecification level of an action or policy is proportional to its suboptimality. We show a general reduction that allows any corruption-robust algorithm to obtain regret bounds in settings with gap-dependent misspecification. This allows us to recover the results of Liu et al. (2023) in a black-box manner, and generalizes them significantly to settings such as linear MDPs, yielding the first results of their kind for gap-dependent misspecification in RL.Using our tight characterization of the corruption-robust setting, we show, however, that, despite the generality of this reduction, it is unable to obtain the optimal rate for gap-dependent misspecification. Motivated by this, we develop optimal bounds for gap-dependent misspecification in linear bandits, improving on the results of Liu et al. (2023).

Live content is unavailable. Log in and register to view live content