NeurIPS Distributional Scaling Laws for Emergent Capabilities

Poster Session
in
Workshop: Scientific Methods for Understanding Neural Networks

Distributional Scaling Laws for Emergent Capabilities

Rosie Zhao · Naomi Saphra · Sham Kakade

[ Abstract ] [ Project Page ]

[ OpenReview]

Sun 15 Dec 11:20 a.m. PST — 12:20 p.m. PST

Abstract:

In this paper, we explore the nature of sudden breakthroughs in language model performance at scale, which stands in contrast to smooth improvements governed by scaling laws. While advocates of ``emergence" argue that abrupt performance gains arise from acquiring new capabilities at specific scales, recent work has suggested that these are illusions caused by thresholding effects. We propose an alternative explanation: that breakthroughs are driven by random variation, particularly multimodal performance distributions across random seeds. Using a length generalization task as a case study, we show that different random seeds lead to both highly linear or emergent behavior. We further demonstrate that the probability of a model acquiring a breakthrough capability increases continuously with scale, despite apparent discontinuities in performance. Additionally, we find that scaling models in width versus depth has distinct effects: depth impacts the likelihood of sampling from a successful distribution, while width improves the average performance of successful models. These insights suggest a need to consider the role of random variation in scaling and emergent capabilities in LMs.

Chat is not available.

Poster Session in Workshop: Scientific Methods for Understanding Neural Networks

Distributional Scaling Laws for Emergent Capabilities

Rosie Zhao · Naomi Saphra · Sham Kakade

Poster Session
in
Workshop: Scientific Methods for Understanding Neural Networks