Poster
in
Workshop: Mathematics of Modern Machine Learning (M3L)
Escaping Random Teacher Initialization Enhances Signal Propagation and Representations
Felix Sarnthein · Sidak Pal Singh · Antonio Orvieto · Thomas Hofmann
Recent work shows that, by learning to mimic a random teacher network, student networks land in regions of the loss landscape that lead to better representations (as seen through linear-probing performance). In this paper, we investigate how this phenomenon translates into some concrete properties of the representations. To do so, we first provide a minimal setup that preserves the essence of this phenomenon. Then, we investigate key signal propagation and representation separability properties during random distillation. Our analysis reveals a two-stage process: the network first undergoes a form of collapse in its representations, then it is steered to a landscape region that not only allows for better propagation of input signals but also gives rise to well-conditioned representations.