Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Compositional Learning: Perspectives, Methods, and Paths Forward

Sometimes I am a Tree: Data Drives Fragile Hierarchical Generalization

Tian Qin · Naomi Saphra · David Alvarez-Melis

Keywords: [ Hierarchical Generalization ] [ Training inconsistency ] [ Training data ] [ OOD Generalization ]


Abstract:

When training deep neural networks, models can adopt various heuristics, leading to different out-of-distribution (OOD) behaviors. Previous works have attributed these preferences to choices of model architecture or training objective, but the role of training data is less explored. Here, we examine how data composition impacts a model's generalization behavior and accounts for inconsistent training outcomes across random seeds. Using the question formation task as a case study, we show that hierarchical rule — the correct rule in English grammar — is induced by grammatically complex sequences with center embedding structures, whereas the linear rule — the surface level heuristic — is learned from simpler right branching sequences. Additionally, we show that models stabilize their OOD behavior in training only when committing to a rule. When the data contains a mix of simple and complex examples, potential rules compete, leading to unstable dynamics in training runs that fail to commit. Our findings highlight how training data shapes generalization patterns and how competition between data subsets can lead to inconsistent training results.

Chat is not available.