Talk
in
Workshop: Transfer Learning for Natural Language Processing
Modular and Composable Transfer Learning
Jonas Pfeiffer
With pre-trained transformer-based models continuously increasing in size, there is a dire need for parameter-efficient and modular transfer learning strategies. In this talk, we will touch base on adapter-based fine-tuning, where instead of fine-tuning all weights of a model, small neural network components are introduced at every layer. While the pre-trained parameters are frozen, only the newly introduced adapter weights are fine-tuned, achieving an encapsulation of the down-stream task information in designated parts of the model. We will demonstrate that adapters are modular components which can be composed for improvements on a target task and how they can be used for out of distribution generalization on the example of zero-shot cross-lingual transfer. Finally, we will discuss how adding modularity during pre-training can mitigate catastrophic interference and consequently lift the curse of multilinguality.