Abstract:
Most machine learning tasks are inherently multi-objective. This
means that the learner has to come up with a model that performs
well across a number of base objectives $\cL_{1}, \ldots, \cL_{p}$,
as opposed to a single one. Since optimizing with respect to
multiple objectives at the same time is often computationally
expensive, the base objectives are often combined in an ensemble
$\sum_{k=1}^{p}\lambda_{k}\cL_{k}$, thereby reducing the problem to
scalar optimization. The mixture weights $\lambda_{k}$ are set to
uniform or some other fixed distribution, based on the learner's
preferences. We argue that learning with a fixed distribution on the
mixture weights runs the risk of overfitting to some individual
objectives and significantly harming others, despite
performing well on an entire ensemble. Moreover, in reality, the true
preferences of a learner across multiple objectives are often
unknown or hard to express as a specific distribution. Instead, we
propose a new framework of \emph{Agnostic Learning with Multiple
Objectives} ($\almo$), where a model is optimized for \emph{any}
weights in the mixture of base objectives. We present data-dependent
Rademacher complexity guarantees for learning in the $\almo$
framework, which are used to guide a scalable optimization
algorithm and the corresponding regularization. We present
convergence guarantees for this algorithm, assuming convexity of the
loss functions and the underlying hypothesis space. We further
implement the algorithm in a popular symbolic gradient computation
framework and empirically demonstrate on a number of datasets the
benefits of $\almo$ framework versus learning with a fixed mixture
weights distribution.
Chat is not available.