Skip to yearly menu bar Skip to main content


Poster

Is Multiple Object Tracking a Matter of Specialization?

Gianluca Mancusi · Mattia Bernardi · Aniello Panariello · Angelo Porrello · SIMONE CALDERARA · Rita Cucchiara

[ ] [ Project Page ]
Fri 13 Dec 11 a.m. PST — 2 p.m. PST

Abstract:

End-to-end transformer-based trackers have achieved remarkable performance on most human-related datasets. However, training them poses challenges when dealing with heterogeneous scenarios due to: i) negative interference, i.e., when the model tends to learn conflicting scene-specific parameters, and ii) poor Domain Generalization, requiring costly fine-tuning to adapt the models to new domains. In response to these challenges, we introduce PASTA. This novel framework leverages Parameter-Efficient Fine-Tuning (PEFT) and Modular Deep Learning (MDL). Specifically, we define key scenario attributes (e.g., camera-viewpoint, lighting condition) and train specialized PEFT modules for each attribute. These expert modules are combined with task arithmetic, enabling systematic generalization to new domains. Extensive experiments on MOTSynth and zero-shot evaluations on MOT17 and PersonPath22 show how a tracker composed of strategically selected modules outperforms a monolithic one, ultimately leading us to the question: "Is MOT a matter of specialization or generalization?".

Live content is unavailable. Log in and register to view live content