Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Fine-Tuning in Modern Machine Learning: Principles and Scalability

On the Transferability of Parameter-Efficient Continual Learning for Vision Transformers

Leon Ackermann · Van-Linh Nguyen


Abstract:

Continual Learning (CL) is the process of continually adapting a model to a new stream of data. Within CL, pre-trained transformer-based vision models such as the ogirinal Vision Transformer (ViT) have recently received increased attention. Various CL methods exist that adapt the base version of the Vision Transformer (ViT-Base) efficiently with outstanding results. However, ViT-Base underperforms several advanced transformer-based vision models on traditional image classification benchmarks. While in Natural Language Processing, state-of-the-art finetuning techniques are evaluated on the most up-to-date models, CL is missing such a comparison. Despite the existence of advanced transformer-based vision models in various sizes, state-of-the-art parameter-efficient CL methods fall back on ViT-Base for benchmarking. In this study, we address this gap by evaluating various sizes of ViT and multiple variants of DeiT3 and DinoV2, two of the best-performing vision transformers, on six state-of-the-art CL methods that are based on prompt tuning and adapter tuning. The experimental results show that the prompt-based techniques DualPrompt and L2P transfer more reliably to new model types and sizes compared to the adapter-based approaches. Furthermore, we show that model size is more important for prompt-based than adapter-based techniques. Finally, we select ViT-Large as the most performant and hence the model of choice. With these findings, we aim to further advance the understanding of the connection between model architecture and the continual learning approach.

Chat is not available.