Poster
Learning from Noisy Labels via Conditional Distributionally Robust Optimization
Hui GUO · Grace Yi · Boyu Wang
While crowdsourcing has emerged as a practical solution for labeling extensive datasets, it presents a significant challenge in learning accurate models due to noisy labels contributed by annotators with diverse expertise. Existing approaches typically estimate the true label posterior conditional on the instance and noisy annotations to infer true labels or adjust loss functions. These estimates, however, ignore potential misspecification in the true label posterior, which can degrade model performances, particularly in scenarios with high noise ratios. To address this issue, we investigate learning from noisy annotations with estimated true label posterior through the lens of conditional distributionally robust optimization (CDRO). In particular, we propose formulating the problem as minimizing the worst-case risk within a distance-based ambiguity set centered around a reference distribution. By examining the strong duality of the formulation, we derive upper bounds for the worst-case risk. Additionally, we develop the analytical solution for the dual robust risk for each data point, which motivates a novel robust pseudo-label collection algorithm by leveraging the likelihood ratio test. This algorithm enables the construction of a pseudo-empirical distribution, serving as a more robust reference probability distribution in CDRO. Moreover, to devise an efficient algorithm for CDRO, we derive a closed-form expression for the empirical robust risk and the optimal Lagrange multiplier of the dual problem, facilitating a principled balance between robustness and model fitting. Our experimental results on both synthetic and real-world datasets demonstrate the superiority of our method.
Live content is unavailable. Log in and register to view live content