Poster Session
in
Workshop: Scientific Methods for Understanding Neural Networks
How rare events shape the learning curves of hierarchical data
Hyunmo Kang · Francesco Cagnetta · Matthieu Wyart
Abstract:
The learning curves of deep learning methods often behave as a power of the dataset size. The theoretical understanding of the corresponding exponent yields fundamental insights about the learning problem. However, it is still limited to extremely simple datasets and idealised learning scenarios, such as the lazy regime where the network acts as a kernel method. Recent works study how deep networks learn synthetic classification tasks generated by probabilistic context-free grammars: generative processes which model the hierarchical and compositional structure of language and images. Previous studies assumed composition rules to be equally likely, leading to non-power-law behavior for classification. In realistic dataset, instead, some rules may be much rarer than others. By assuming that the probabilities of these rules follow a Zipf law with exponent $a$, we show that the classification performance of deep neural networks decays as a power $\alpha\,{=}\,a/(1+a)$ of the number of training examples, with a large multiplicative constant that depends on the hierarchical structure of the data.
Chat is not available.