Poster Session
in
Workshop: Scientific Methods for Understanding Neural Networks
Exploring model depth and data complexity through the lens of cellular automata
Tianyu He · Darshil Doshi · Aritra Das · Andrey Gromov
Abstract:
Large language models excel at solving complex tasks, owing to their hierarchical architecture that enables the implementation of sophisticated algorithms through layered computations. In this work, we study the interplay between model depth and data complexity using elementary cellular automata (ECA) datasets. We demonstrate empirically that, given a fixed parameter count, deeper networks consistently outperform shallower variants. Our findings reveal that complex ECA rules require a deeper model to emulate. Finally, analysis of attention score patterns elucidates why shallower networks struggle to effectively emulate complex rules.
Chat is not available.