NeurIPS Emergence in non-neural models: grokking modular arithmetic via average gradient outer product

Poster
in
Workshop: Mathematics of Modern Machine Learning (M3L)

Emergence in non-neural models: grokking modular arithmetic via average gradient outer product

Neil Mallinar · Daniel Beaglehole · Libin Zhu · Adityanarayanan Radhakrishnan · Parthe Pandit · Misha Belkin

Keywords: [ feature learning ] [ emergence ] [ average gradient outer product (AGOP) ] [ kernel methods ] [ Theory of deep learning ] [ modular arithmetic ] [ grokking ]

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

Neural networks trained to solve modular arithmetic tasks exhibit grokking, the phenomenon where the test accuracy improves only long after the model achieves 100% training accuracy in the training process. It is often taken as an example of ``emergence'', where model ability manifests sharply through a phase transition. In this work, we show that the phenomenon of grokking is not specific to neural networks nor to gradient descent-based optimization. Specifically, we show that grokking occurs when learning modular arithmetic with Recursive Feature Machines (RFM), an iterative algorithm that uses the Average Gradient Outer Product (AGOP) to enable task-specific feature learning with kernel machines. We show that RFM and, furthermore, neural networks that solve modular arithmetic learn block-circulant features transformations which implement the previously proposed Fourier multiplication algorithm.

Chat is not available.

Poster in Workshop: Mathematics of Modern Machine Learning (M3L)

Emergence in non-neural models: grokking modular arithmetic via average gradient outer product

Neil Mallinar · Daniel Beaglehole · Libin Zhu · Adityanarayanan Radhakrishnan · Parthe Pandit · Misha Belkin

Poster
in
Workshop: Mathematics of Modern Machine Learning (M3L)