NeurIPS Leveraging Intermediate Neural Collapse: Fixing Layers Beyond Effective Depth to Simplex ETFs for Efficient Deep Neural Networks

Poster
in
Workshop: Mathematics of Modern Machine Learning (M3L)

Leveraging Intermediate Neural Collapse: Fixing Layers Beyond Effective Depth to Simplex ETFs for Efficient Deep Neural Networks

Emily Liu

Keywords: [ transformers ] [ generalization ] [ neural collapse ] [ efficient neural networks ]

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

Neural collapse is a phenomenon observed during the terminal phase of training (TPT) when a neural network achieves zero training error, in which network activations, class means, and linear weights converge to a simplex equiangular tight frame (ETF), a set of vectors whose mutual distance within a subspace is maximized. Neural collapse has been theorized to contribute to neural network interpretability, robustness, and generalization, but the utilization of neural collapse conditions in neural network training and regularization remains underexplored.Previous work has demonstrated that fixing the final layer of a neural network to a simplex ETF will decrease the number of trainable weights without compromising the neural network's accuracy. However, deep fully-connected can exhibit neural collapse not only in the last layer, but on all layers beyond a given effective depth. We apply this knowledge to introduce two new training schemes: Adaptive-ETF, a generalized training framework in which we set all layers beyond effective depth to simplex ETFs, and ETF-Transformer, where the feed-forward network layers in a transformer block are set to simplex ETFs. We show that these modifications achieve similar training and testing performance to their baseline variants while using fewer learnable parameters.

Chat is not available.

Poster in Workshop: Mathematics of Modern Machine Learning (M3L)

Leveraging Intermediate Neural Collapse: Fixing Layers Beyond Effective Depth to Simplex ETFs for Efficient Deep Neural Networks

Emily Liu

Poster
in
Workshop: Mathematics of Modern Machine Learning (M3L)