Skip to yearly menu bar Skip to main content


Poster

PCA of high dimensional random walks with comparison to neural network training

Joseph Antognini · Jascha Sohl-Dickstein

Room 210 #27

Keywords: [ Optimization ] [ Components Analysis (e.g., CCA, ICA, LDA, PCA) ] [ Statistical Physics of Learning ] [ Visualization or Exposition Techniques for Deep Networks ]


Abstract:

One technique to visualize the training of neural networks is to perform PCA on the parameters over the course of training and to project to the subspace spanned by the first few PCA components. In this paper we compare this technique to the PCA of a high dimensional random walk. We compute the eigenvalues and eigenvectors of the covariance of the trajectory and prove that in the long trajectory and high dimensional limit most of the variance is in the first few PCA components, and that the projection of the trajectory onto any subspace spanned by PCA components is a Lissajous curve. We generalize these results to a random walk with momentum and to an Ornstein-Uhlenbeck processes (i.e., a random walk in a quadratic potential) and show that in high dimensions the walk is not mean reverting, but will instead be trapped at a fixed distance from the minimum. We finally analyze PCA projected training trajectories for: a linear model trained on CIFAR-10; a fully connected model trained on MNIST; and ResNet-50-v2 trained on Imagenet. In all cases, both the distribution of PCA eigenvalues and the projected trajectories resemble those of a random walk with drift.

Live content is unavailable. Log in and register to view live content