Invited Talk
in
Workshop: Mathematics of Modern Machine Learning (M3L)
How do two-layer neural networks learn complex functions from data over time?
Florent Krzakala
How do two-layer neural networks learn complex functions from data over time? In this talk, we shall delve into the interaction between batch size, number of iterations, and task complexity, shedding light on neural network adaptation to data features. I will particularly highlight three key findings:
i) The significant impact of a single gradient step on the feature learning, emphasizing the relationship between batch size and the target's information exponent (or complexity).
ii) The enhancement of the network's approximation ability over multiple gradient steps, enabling the learning of more intricate functions over time.
iii) The improvement in generalization compared to the basic random feature/kernel regime.
Our theoretical approach combines techniques from statistical physics, concentration of measure, projection-based conditioning, and Gaussian equivalence, which we believe holds standalone significance.
Based on joint work with Yatin Dandi, Bruno Loureiro, Luca Pesce, and Ludovic Stephan (https://arxiv.org/pdf/2305.18270.pdf)