Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Bayesian Deep Learning

Can Network Flatness Explain the Training Speed-Generalisation Connection?

Albert Qiaochu Jiang · Clare Lyle · Lisa Schut · Yarin Gal


Abstract:

Recent work has shown that training speed, as estimated by the sum over training loss, is predictive of generalization performance. From a Bayesian perspective, this metric can be theoretically linked to marginal likelihood in linear models. However, it is unclear why the relationship holds for DNNs and what the underlying mechanisms are. We hypothesise that this relationship holds in DNNs because of network flatness, which causes both fast training speed and good generalization. We also investigated the hypothesis in varying settings and found that it might hold when the variance in the stochastic gradient estimation is moderate, with either logit averaging, or no data transformation at all. This paper specifies the conditions future works should impose when investigating the connecting mechanism.

Chat is not available.