Poster
in
Workshop: Mathematics of Modern Machine Learning (M3L)
How Structured Data Guides Feature Learning: A Case Study of the Parity Problem
Atsushi Nitanda · Kazusato Oko · Taiji Suzuki · Denny Wu
Recent works have shown that neural networks optimized by gradient-based methods can adapt to sparse or low-dimensional target functions through feature learning; an often studied target is classification of the sparse parity function on the unit hypercube. However, such isotropic data setting does not capture the anisotropy and low intrinsic dimensionality exhibited in realistic datasets. In this work, we address this shortcoming by studying how feature learning interacts with structured (anisotropic) input data: we consider the classification of sparse parity on high-dimensional orthotope where the feature coordinates have varying magnitudes. Specifically, we analyze the learning complexity of the mean-field Langevin dynamics (MFLD), which describes the noisy gradient descent update on two-layer neural network, and show that the statistical complexity (i.e. sample size) and computational complexity (i.e. network width) of MFLD can both be improved when prominent directions of the anisotropic input data aligns with the support of the target function. Moreover, we demonstrate the benefit of feature learning by establishing a kernel lower bound on the classification error, which applies to neural networks in the lazy regime.