Skip to yearly menu bar Skip to main content


Spotlight
in
Workshop: Multi-Agent Security: Security as Key to AI Safety

Stackelberg Games with Side Information

Keegan Harris · Steven Wu · Maria-Florina Balcan

Keywords: [ side information ] [ stackelberg security game ] [ contextual bandit ] [ Stackelberg game ]


Abstract: We study an online learning setting in which a leader interacts with a sequence of followers over the course of $T$ rounds. At each round, the leader commits to a mixed strategy over actions, after which the follower best-responds. Such settings are referred to in the literature as Stackelberg games. Stackelberg games have received much interest from the community, in part due to their applicability to real-world security settings such as wildlife preservation and airport security. However despite this recent interest, current models of Stackelberg games fail to take into consideration the fact that the players' optimal strategies often depend on external factors such as weather patterns, airport traffic, etc. We address this gap by allowing for player payoffs to depend on an external context, in addition to the actions taken by each player. We formalize this setting as a repeated Stackelberg game with side information and show that under this setting, it is impossible to achieve sublinear regret if both the sequence of contexts and the sequence of followers is chosen adversarially. Motivated by this impossibility result, we consider two natural relaxations: (1) stochastically chosen contexts with adversarially chosen followers and (2) stochastically chosen followers with adversarially chosen contexts. In each of these settings, we provide algorithms which obtain $\tilde{\mathcal{O}}(\sqrt{T})$ regret.

Chat is not available.