Poster
in
Workshop: Optimization for ML Workshop
Lion's sign noise can make training more stable
Simon Elistratov · Andrey Podivilov · Timofei Iuzhakov · Dmitry Vetrov
Lion is a novel optimization method that has outperformed traditional optimizers like Adam across a variety of tasks. Despite its empirical success, the reasons behind Lion's superiority remain unclear. In this paper, we investigate the mechanisms contributing to Lion's enhanced performance, focusing on the structured noise introduced by the use of the sign function in gradient updates. We characterize this noise by the angle of rotation between the true gradient and its signum. By injecting this noise as a random rotation of a fixed angle into normalized updates, we analyze how the performance of this method corresponds to that of Lion. We demonstrate that this method has a stronger performance than Lion in our setting. This approach reveals a relationship between the rotation angle and the learning rate in Lion, providing insights into its improved performance metrics. Additionally, we identify an effect called "momentum tracing" in neural networks with normalization layers and ReLU activations, which can significantly destabilize the training process. Our analysis demonstrates that the rotation noise inherent in Lion mitigates the negative impact of "momentum tracing", leading to more stable learning. These findings offer theoretical justification for Lion's effectiveness and suggest avenues for developing more robust optimization algorithms.