Poster
Is Multiple Object Tracking a Matter of Specialization?
Gianluca Mancusi · Mattia Bernardi · Aniello Panariello · Angelo Porrello · SIMONE CALDERARA · Rita Cucchiara
End-to-end transformer-based trackers have achieved remarkable performance on most human-related datasets. However, training them poses challenges when dealing with heterogeneous scenarios due to: i) negative interference, i.e., when the model tends to learn conflicting scene-specific parameters, and ii) poor Domain Generalization, requiring costly fine-tuning to adapt the models to new domains. In response to these challenges, we introduce PASTA. This novel framework leverages Parameter-Efficient Fine-Tuning (PEFT) and Modular Deep Learning (MDL). Specifically, we define key scenario attributes (e.g., camera-viewpoint, lighting condition) and train specialized PEFT modules for each attribute. These expert modules are combined with task arithmetic, enabling systematic generalization to new domains. Extensive experiments on MOTSynth and zero-shot evaluations on MOT17 and PersonPath22 show how a tracker composed of strategically selected modules outperforms a monolithic one, ultimately leading us to the question: "Is MOT a matter of specialization or generalization?".
Live content is unavailable. Log in and register to view live content