Poster
in
Workshop: Symmetry and Geometry in Neural Representations
On the Ricci Curvature of Attention Maps and Transformers Training and Robustness
Amirhossein Farzam · Oded Schlesinger · Joshua Susskind · Juan Matias Di Martino · Guillermo Sapiro
Keywords: [ Attention ] [ Transformers ] [ Geometry ] [ Robustness ]
Transformer models have revolutionized machine learning, yet the underpinnings behind their success are only beginning to be understood. In this work, we analyze transformers through the geometry of attention maps, treating them as weighted graphs and focusing on Ricci curvature, a metric linked to spectral properties and system robustness.We prove that lower Ricci curvature, indicating lower system robustness, leads to faster convergence of gradient descent during training. We also show that a higher frequency of positive curvature values enhances robustness, revealing a trade-off between performance and robustness.Building on this, we propose a regularization method to adjust the curvature distribution and provide experimental results supporting our theoretical predictions while offering insights into ways to improve transformer training and robustness.The geometric perspective provided in our paper offers a versatile framework for both understanding and improving the behavior of transformers.