Spotlight Poster
Continual learning with the neural tangent ensemble
Ari Benjamin · Christian-Gernot Pehle · Kyle Daruwalla
A natural strategy for continual learning is to weigh a Bayesian ensemble of fixed functions. This suggests that if a (single) neural network could be interpreted as an ensemble, one could design effective algorithms that learn without forgetting. To realize this possibility, we observe that a neural network classifier under the Neural Tangent Kernel (NTK) limit can be interpreted as an ensemble of fixed classifiers. Each parameter contributes a single classifier to the ensemble, and the change in parameters acts as the ensemble weights. We term these classifiers the neural tangent experts and show they output valid probability distributions over the labels. We then derive the likelihood and posterior probability of each expert given past data. Surprisingly, we learn that the posterior updates for these experts are equivalent to a scaled and projected form of stochastic gradient descent (SGD) over the network weights. Away from the NTK limit, networks can be seen as ensembles of adaptive experts which improve over time. These results offer a new interpretation of neural networks as Bayesian ensembles of experts, providing a principled framework for understanding and mitigating catastrophic forgetting in continual learning settings.
Live content is unavailable. Log in and register to view live content