Poster
in
Workshop: Bayesian Deep Learning
On Symmetries in Variational Bayesian Neural Nets
Richard Kurle · Tim Januschowski · Jan Gasthaus · Bernie Wang
Probabilistic inference of Neural Network parameters is challenging due to the highly multi-modal likelihood functions. Most importantly, the permutation invariance of the neurons of the hidden layers renders the likelihood function unidentifiable with a factorial number of equivalent (symmetric) modes, independent of the data. We show that variational Bayesian methods that approximate the (multi-modal) posterior by a (uni-modal) Gaussian distribution are biased towards approximations with identical (e.g. zero-centred) weights, resulting in severe underfitting.This explains the common empirical observation that, in contrast to MCMC methods, variational approximations typically collapse most weights to the (zero-centred) prior.We propose a simple modification to the likelihood function that breaks the symmetry using fixed semi-orthogonal matrices as skip connections in each layer.Initial empirical results show an improved predictive performance.