NeurIPS Poster Direct Preference-Based Evolutionary Multi-Objective Optimization with Dueling Bandits

Poster

Direct Preference-Based Evolutionary Multi-Objective Optimization with Dueling Bandits

Tian Huang · Shengbo Wang · Ke Li

East Exhibit Hall A-C #4805

[ Abstract ]

[ Paper] [ OpenReview]

Fri 13 Dec 11 a.m. PST — 2 p.m. PST

Abstract: The ultimate goal of multi-objective optimization (MO) is to assist human decision-makers (DMs) in identifying solutions of interest (SOI) that optimally reconcile multiple objectives according to their preferences. Preference-based evolutionary MO (PBEMO) has emerged as a promising framework that progressively approximates SOI by involving human in the optimization-cum-decision-making process. Yet, current PBEMO approaches are prone to be inefficient and misaligned with the DM’s true aspirations, especially when inadvertently exploiting mis-calibrated reward models. This is further exacerbated when considering the stochastic nature of human feedback. This paper proposes a novel framework that navigates MO to SOI by directly leveraging human feedback without being restricted by a predefined reward model nor cumbersome model selection. Specifically, we developed a clustering-based stochastic dueling bandits algorithm that strategically scales well to high-dimensional dueling bandits, and achieves a regret of $\mathcal{O}(K^2\log T)$, where $K$ is the number of clusters and $T$ is the number of rounds. The learned preferences are then transformed into a unified probabilistic format that can be readily adapted to prevalent EMO algorithms. This also leads to a principled termination criterion that strategically manages human cognitive loads and computational budget. Experiments on $48$ benchmark test problems, including synthetic problems, RNA inverse design and protein structure prediction, fully demonstrate the effectiveness of our proposed approach.

Chat is not available.