Track: Oral Session 1D: Learning Theory

Oral

Oral Session 1D: Learning Theory

East Meeting Room 1-3

Wed 11 Dec 10 a.m. PST — 11 a.m. PST

Abstract:

Chat is not available.

Wed 11 Dec. 10:00 - 10:20 PST

Generalization Error Bounds for Two-stage Recommender Systems with Tree Structure

Jin Zhang · Ze Liu · Defu Lian · Enhong Chen

Two-stage recommender systems play a crucial role in efficiently identifying relevant items and personalizing recommendations from a vast array of options. This paper, based on an error decomposition framework, analyzes the generalization error for two-stage recommender systems with a tree structure, which consist of an efficient tree-based retriever and a more precise yet time-consuming ranker. We use the Rademacher complexity to establish the generalization upper bound for various tree-based retrievers using beam search, as well as for different ranker models under a shifted training distribution. Both theoretical insights and practical experiments on real-world datasets indicate that increasing the branches in tree-based retrievers and harmonizing distributions across stages can enhance the generalization performance of two-stage recommender systems.

Wed 11 Dec. 10:20 - 10:40 PST

Optimal Parallelization of Boosting

Arthur da Cunha · Mikael Møller Høgsgaard · Kasper Green Larsen

Recent works on the parallel complexity of Boosting have established strong lower bounds on the tradeoff between the number of training rounds $p$ and the total parallel work per round $t$.These works have also presented highly non-trivial parallel algorithms that shed light on different regions of this tradeoff.Despite these advancements, a significant gap persists between the theoretical lower bounds and the performance of these algorithms across much of the tradeoff space.In this work, we essentially close this gap by providing both improved lower bounds on the parallel complexity of weak-to-strong learners, and a parallel Boosting algorithm whose performance matches these bounds across the entire $p$ vs. $t$ compromise spectrum, up to logarithmic factors.Ultimately, this work settles the parallel complexity of Boosting algorithms that are nearly sample-optimal.

Wed 11 Dec. 10:40 - 11:00 PST

Achieving Optimal Clustering in Gaussian Mixture Models with Anisotropic Covariance Structures

Xin Chen · Anderson Ye Zhang

We study clustering under anisotropic Gaussian Mixture Models (GMMs), where covariance matrices from different clusters are unknown and are not necessarily the identity matrix. We analyze two anisotropic scenarios: homogeneous, with identical covariance matrices, and heterogeneous, with distinct matrices per cluster. For these models, we derive minimax lower bounds that illustrate the critical influence of covariance structures on clustering accuracy. To solve the clustering problem, we consider a variant of Lloyd's algorithm, adapted to estimate and utilize covariance information iteratively. We prove that the adjusted algorithm not only achieves the minimax optimality but also converges within a logarithmic number of iterations, thus bridging the gap between theoretical guarantees and practical efficiency.