Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Multimodal Algorithmic Reasoning Workshop

ViLAaD: Enhancing ``Attracting and Dispersing'' Source-Free Domain Adaptation with Vision and Language Model

Shuhei Tarashima · XINQI SHU · Norio Tagawa

[ ]
[ Poster
Sun 15 Dec 2:15 p.m. PST — 4:15 p.m. PST

Abstract:

Source-Free Domain Adaptation (SFDA) aims to adapt a pre-trained source model to a target dataset from a different domain without accessing the original source dataset. Among the various approaches, Attracting and Dispersing (AaD) stands out as a simple yet effective SFDA algorithm. AaD is based on the intuition that features close together in the feature space should yield more similar predictions than those farther apart. In this paper, we propose to enhance AaD-based SFDA by incorporating a pre-trained Vision and Language (ViL) model. Assuming that the ViL model provides a strong initialization for the target dataset, we introduce a novel SFDA objective grounded in the idea that predictions of the target model and the ViL model with neighbors in the feature space should be more similar than those for more distant features. We demonstrate that this approach, dubbed ViL-enhanced AaD (ViLAaD), outperforms AaD on the Office-31 dataset. To further boost SFDA performance, we (1) integrate ViLAaD into an alternative optimization framework that combines ViL prompt tuning with target model adaptation and (2) introduce extra loss terms to refine the model’s performance. Our experimental results on the Office-31 and Office-Home datasets show that this enhanced method, termed ViLAaD++, achieves state-of-the-art results, surpassing existing approaches.

Chat is not available.