Poster Session
in
Workshop: Scientific Methods for Understanding Neural Networks
The Unreasonable Ineffectiveness of the Deeper Layers
Andrey Gromov · Kushal Tirumala · Hassan Shapourian · Paolo Glorioso · Dan Roberts
Understanding where and how knowledge is stored in LLMs is an active and important area of research. In this work, we take a model pruning approach: if removing certain parameters does not affect model output in question-answering knowledge benchmarks, then those parameters are likely not useful for storing knowledge. To find these parameters, we design simple layer-pruning strategies for popular families of open-weight pretrained LLMs, finding minimal degradation of performance on different question-answering benchmarks until after a large fraction (up to half) of the layers are removed. Concretely, we identify the optimal block of layers to prune by considering similarity across layers; then, to “heal” the damage, we perform a small amount of finetuning. From a scientific perspective, the robustness of these LLMs to the deletion of layers implies either that current pretraining methods are not properly leveraging the parameters in the deeper layers of the network or that the shallow layers play a critical role in storing knowledge.