Poster
in
Workshop: UniReps: Unifying Representations in Neural Models
Conic Activation Functions
Changqing Fu · Laurent Cohen
Keywords: [ Activation Functions ] [ Equivariance in Neural Networks ] [ Neural Network Architectures ]
Most activation functions are component-wise. However, this restricts the symmetry of neural networks to permutations. We introduce conic activation functions (CoLU) and generalize neural networks' symmetry to continuous orthogonal groups. By regarding ReLU as a projection onto its invariant set---the positive orthant, a conic activation function is derived by using a Lorentz cone instead. CoLU associated with low-dimensional cones outperforms component-wise ReLU in a wide range of models including MLP, ResNet, Transformers and UNet for image/text classification and generation, with better loss and training speed. Its performance can be further improved by considering multi-head structure, soft scaling, and axis sharing. It originates from a unified view of different lines of models and essentially changes the algebraic structure of neural networks' linear mode connectivity.