Invited talk
in
Competition: Auto-Bidding in Large-Scale Auctions: Learning Decision-Making in Uncertain and Competitive Games
Combinatorial Bandits with Full Bandit Feedback
Vaneet Aggarwal
In this presentation, we will delve into our recent breakthroughs in the realms of combinatorial optimization with bandit feedback. For sequential combinatorial decision-making, we introduce a versatile framework designed to transform discrete offline approximation algorithms into sublinear α-regret methods that exclusively rely on bandit feedback. Remarkably, this framework demands no more than resilience to errors in function evaluation from the offline algorithms. Even more intriguingly, the adaptation process does not necessitate explicit knowledge of the inner workings of the offline approximation algorithm; it can be seamlessly utilized as a black-box subroutine. We have applied the framework to submodular optimization and some problems beyond submodular optimization. We will extend the approach to multiple agents, and generalize the algorithm where the offline algorithm do not have α approximation but α-ε approximation. We also address the issue of feedback delays, and fairness among arm selection in this context.