Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Mathematics of Modern Machine Learning (M3L)

Parameter Symmetry and Emergence of Noise Equilibrium in Stochastic Training

Liu Ziyin · Mingze Wang · Hongchao Li · Lei Wu

Keywords: [ fixed point ] [ SGD ] [ symmetry ]


Abstract: Symmetries are abundant in the loss functions of neural networks, and understanding their impact on optimization algorithms is crucial for deep learning. We investigate the learning dynamics of Stochastic Gradient Descent (SGD) through the lens of exponential symmetries, a broad subclass of continuous symmetries in loss functions. Our analysis reveals that when gradient noise is imbalanced, SGD inherently drives model parameters toward a noise-balanced state, leading to the emergence of unique and attractive fixed points along degenerate directions. We prove that every parameter $\theta$ connects without barriers to a unique noise-balanced fixed point $\theta^*$. This finding offers a unified perspective on how symmetry and gradient noise influence SGD. The theory provides novel insights into deep learning phenomena such as progressive sharpening/flattening and warmup, demonstrating that noise balancing is a key mechanism underlying these effects.

Chat is not available.