NeurIPS Poster Mixed Dynamics In Linear Networks: Unifying the Lazy and Active Regimes

Poster

Mixed Dynamics In Linear Networks: Unifying the Lazy and Active Regimes

Zhenfeng Tu · Santiago Tomas Aranguri Diaz · Arthur Jacot

East Exhibit Hall A-C #2001

[ Abstract ]

[ Paper] [ OpenReview]

Thu 12 Dec 4:30 p.m. PST — 7:30 p.m. PST

Abstract:

The training dynamics of linear networks are well studied in two distinctsetups: the lazy regime and balanced/active regime, depending on theinitialization and width of the network. We provide a surprisinglysimple unifying formula for the evolution of the learned matrix thatcontains as special cases both lazy and balanced regimes but alsoa mixed regime in between the two. In the mixed regime, a part ofthe network is lazy while the other is balanced. More precisely thenetwork is lazy along singular values that are below a certain thresholdand balanced along those that are above the same threshold. At initialization,all singular values are lazy, allowing for the network to align itselfwith the task, so that later in time, when some of the singular valuecross the threshold and become active they will converge rapidly (convergencein the balanced regime is notoriously difficult in the absence ofalignment). The mixed regime is the `best of both worlds': it convergesfrom any random initialization (in contrast to balanced dynamics whichrequire special initialization), and has a low rank bias (absent inthe lazy dynamics). This allows us to prove an almost complete phasediagram of training behavior as a function of the variance at initializationand the width, for a MSE training task.

Chat is not available.