Poster
in
Workshop: Mathematics of Modern Machine Learning (M3L)
Understanding the Role of Noisy Statistics in the Regularization Effect of Batch Normalization
Atli Kosson · Dongyang Fan · Martin Jaggi
Normalization layers have been shown to benefit the training stability and generalization of deep neural networks in various ways. For Batch Normalization (BN), the noisy statistics have been observed to have a regularization effect that depends on the batch size. Following this observation, Hoffer et. al. proposed Ghost Batch Normalization (GBN), where BN is explicitly performed independently on smaller sub-batches, resulting in improved generalization in many settings. In this study we analyze and isolate the effect of the noisy statistics by comparing BN and GBN, introducing a noise injection method. We then quantitatively assess the effects of the noise, juxtaposing it with other regularizers like dropout and examining its potential role in the generalization disparities between batch normalization and its alternatives, including layer normalization and normalization-free methods.