Poster Session
in
Workshop: Scientific Methods for Understanding Neural Networks
Model Recycling: Model component reuse to promote in-context learning
Lindsay Smith · Chase Goddard · Vudtiwat Ngampruetikorn · David Schwab
In-context learning (ICL) is a behavior seen in transformer-based models where, during inference, the model is able to leverage examples of a novel task in order to perform accurately on that task. Here we study the role of different model components on ICL behavior via model component recycling. Previous work has found a plateau in the training loss before models begin to learn a general-purpose ICL solution. Additionally, the data the model is trained on must be sufficiently diverse to support the emergence of ICL. We explore two separate model recycling experiments related to ICL: reducing the plateau in the training loss and eliciting performance equivalent to higher task diversity in lower task diversity settings. We find that transferring embeddings, but not the transformer, of a trained model to an untrained model results in the elimination of the plateau seen in standard model training. Additionally, high test accuracy can be achieved in a lower task diversity regime by transferring either frozen embeddings or trainable transformer from a model trained with higher task diversity.