Skip to yearly menu bar Skip to main content


Poster

Gaussian Process Bandits for Top-k Recommendations

Mohit Yadav · Cameron Musco · Daniel Sheldon

[ ]
Thu 12 Dec 11 a.m. PST — 2 p.m. PST

Abstract:

Algorithms that utilize bandit feedback to optimize top-k recommendations are vital for online marketplaces, search engines, and content platforms. However, the combinatorial nature of this problem poses a significant challenge, as the possible number of ordered top-k recommendations from n items grows exponentially with k. As a result, previous work often relies on restrictive assumptions about the reward or bandit feedback models, such as assuming that the feedback discloses rewards for all recommended items rather than offering a single scalar feedback for the entire set of top-k recommendations. We introduce a novel contextual bandit algorithm for top-k recommendations, leveraging a Gaussian process with a Kendall kernel to model the reward function. Our algorithm requires only scalar feedback from the top-k recommendations and does not impose restrictive assumptions on the reward structure. Theoretical analysis confirms that the proposed algorithm achieves sub-linear regret in relation to the number of rounds and arms. Also, empirical results using a bandit simulator show that the proposed algorithm surpasses other baselines across several scenarios.

Live content is unavailable. Log in and register to view live content