Poster Session
in
Workshop: Scientific Methods for Understanding Neural Networks
Distributional Scaling Laws for Emergent Capabilities
Rosie Zhao · Naomi Saphra · Sham Kakade
In this paper, we explore the nature of sudden breakthroughs in language model performance at scale, which stands in contrast to smooth improvements governed by scaling laws. While advocates of ``emergence" argue that abrupt performance gains arise from acquiring new capabilities at specific scales, recent work has suggested that these are illusions caused by thresholding effects. We propose an alternative explanation: that breakthroughs are driven by random variation, particularly multimodal performance distributions across random seeds. Using a length generalization task as a case study, we show that different random seeds lead to both highly linear or emergent behavior. We further demonstrate that the probability of a model acquiring a breakthrough capability increases continuously with scale, despite apparent discontinuities in performance. Additionally, we find that scaling models in width versus depth has distinct effects: depth impacts the likelihood of sampling from a successful distribution, while width improves the average performance of successful models. These insights suggest a need to consider the role of random variation in scaling and emergent capabilities in LMs.