Abstract:
In this paper, we investigate the limiting behavior of a
continuous-time counterpart of the Stochastic Gradient Descent (SGD)
algorithm applied to two-layer overparameterized neural networks, as
the number or neurons (i.e., the size of the hidden layer)
$N \to \plusinfty$. Following a probabilistic approach, we show
`propagation of chaos' for the particle system defined by this
continuous-time dynamics under different scenarios, indicating that
the statistical interaction between the particles asymptotically
vanishes. In particular, we establish quantitative convergence with
respect to $N$ of any particle to a solution of a mean-field
McKean-Vlasov equation in the metric space endowed with the
Wasserstein distance. In comparison to previous works on the
subject, we consider settings in which the sequence of stepsizes in
SGD can potentially depend on the number of neurons and the
iterations. We then identify two regimes under which different
mean-field limits are obtained, one of them corresponding to an
implicitly regularized version of the minimization problem at
hand. We perform various experiments on real datasets to validate
our theoretical results, assessing the existence of these two
regimes on classification problems and illustrating our convergence
results.
Chat is not available.