Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Symmetry and Geometry in Neural Representations

On the Ricci Curvature of Attention Maps and Transformers Training and Robustness

Amirhossein Farzam · Oded Schlesinger · Joshua Susskind · Juan Matias Di Martino · Guillermo Sapiro

Keywords: [ Attention ] [ Transformers ] [ Geometry ] [ Robustness ]


Abstract:

Transformer models have revolutionized machine learning, yet the underpinnings behind their success are only beginning to be understood. In this work, we analyze transformers through the geometry of attention maps, treating them as weighted graphs and focusing on Ricci curvature, a metric linked to spectral properties and system robustness.We prove that lower Ricci curvature, indicating lower system robustness, leads to faster convergence of gradient descent during training. We also show that a higher frequency of positive curvature values enhances robustness, revealing a trade-off between performance and robustness.Building on this, we propose a regularization method to adjust the curvature distribution and provide experimental results supporting our theoretical predictions while offering insights into ways to improve transformer training and robustness.The geometric perspective provided in our paper offers a versatile framework for both understanding and improving the behavior of transformers.

Chat is not available.