Poster
in
Workshop: Attributing Model Behavior at Scale (ATTRIB)
Data Attribution for Multitask Learning
Yiwen Tu · Ziqi Liu · Jiaqi Ma · Weijing Tang
Data attribution quantifies the influence of individual training data points on machine learning models, aiding in their interpretation and improvement. While prior work has primarily focused on single-task learning (STL), this work extends data attribution to multitask learning (MTL). Data attribution in MTL presents new opportunities for interpreting and improving MTL models while also introducing unique technical challenges. On the opportunity side, data attribution in MTL offers a natural way to efficiently measure task relatedness, a key factor that impacts the effectiveness of MTL. However, the shared and task-specific parameters in MTL models present challenges that require specialized data attribution methods. In this paper, we propose the MultiTask Influence Function (MTIF), a novel data attribution method tailored for MTL. MTIF leverages the structure of MTL models to efficiently estimate the impact of removing data points or excluding tasks on the predictions of specific target tasks, providing both data-level and task-level influence analysis. Extensive experiments on both linear and neural network models show that MTIF effectively approximates leave-one-out and leave-one-task-out effects. Moreover, MTIF facilitates fine-grained data selection, consistently improving model performance in MTL, and provides interpretable insights into task relatedness. Our work establishes a novel connection between data attribution and MTL, offering an efficient and scalable solution for measuring task relatedness and enhancing MTL models.