Skip to yearly menu bar Skip to main content


Oral
in
Workshop: The Fourth Workshop on Efficient Natural Language and Speech Processing (ENLSP-IV): Highlighting New Architectures for Future Foundation Models

One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation

Fabian Paischer · Lukas Hauzenberger · Thomas Schmied · Benedikt Alkin · Marc Deisenroth · Sepp Hochreiter

Keywords: [ Efficient Training ]

[ ]
Sat 14 Dec 1:54 p.m. PST — 2 p.m. PST

Abstract:

Foundation models (FMs) are pre-trained on large-scale datasets and then fine-tuned on a downstream task for a specific application. The most successful and most commonly used fine-tuning method is to modulate the pre-trained weights via a low-rank adaptation (LoRA) of newly introduced weights. These weight matrices are usually initialized at random with the same rank for each layer across the FM, which results in suboptimal performance. We propose to enhance LoRA by initializing the new weights in a data-driven manner, by computing singular value decomposition on activation vectors. Then, we initialize the new LoRA matrices with the obtained right-singular vectors. Finally, we re-distribute the ranks among layers to explain the maximal amount of variance across all layers. This assignment results in an adaptive allocation of ranks per weight matrix, and inherits all benefits of LoRA. We apply our new method, Explained Variance Adaptation (EVA), to a variety of fine-tuning tasks comprising language understanding and generation, image classification, and reinforcement learning. EVA consistently attains the highest average score across a multitude of tasks per domain.

Chat is not available.