Poster
in
Workshop: Has it Trained Yet? A Workshop for Algorithmic Efficiency in Practical Neural Network Training
Fishy: Layerwise Fisher Approximation for Higher-order Neural Network Optimization
Abel Peirson · Ehsan Amid · Yatong Chen · Vladimir Feinberg · Manfred Warmuth · Rohan Anil
We introduce Fishy, a local approximation of the Fisher information matrix at each layer for natural gradient descent training of deep neural networks. The true Fisher approximation for deep networks involves sampling labels from the model's predictive distribution at the output layer and performing a full backward pass -- Fishy defines a Bregman exponential family distribution at each layer, performing the sampling locally. Local sampling allows for model parallelism when forming the preconditioner, removing the need for the extra backward pass. We demonstrate our approach through the Shampoo optimizer, replacing its preconditioner gradients with our locally sampled gradients. Our training results on deep autoencoder and VGG16 image classification models indicate the efficacy of our construction.