Poster
Revisiting motion information for RGB-Event tracking with MOT philosophy
Tianlu Zhang · Kurt Debattista · Qiang Zhang · guiguang ding · Jungong Han
RGB-Event (RGB-E) tracking aims to leverage the merits of RGB and event data to achieve higher performance. However, existing frameworks focus on exploring complementary appearance information within multi-modal data, and struggle to address the association problem of targets and distractors in the temporal domain using motion information from the event stream. In this paper, we propose a unified framework to keep track of targets as well as distractors by using both RGB and Event data in conjunction with a Multi-Object Tracking (MOT) philosophy, thereby improving the robustness of the tracker. Specifically, an appearance model is employed to predict the initial candidates. Subsequently, the initially predicted tracking results, in combination with the RGB-E features, are encoded into appearance and motion embeddings, respectively. Furthermore, a Spatial-Temporal Transformer Encoder is proposed to model the spatial-temporal relationships and learn discriminative features for each candidate through guidance of the appearance-motion embeddings. Simultaneously, a Dual-Branch Transformer Decoder is designed to adopt such motion and appearance information for candidate matching, thus distinguishing between targets and distractors. The proposed method is evaluated on multiple benchmark datasets and achieves state-of-the-art performance on all the datasets tested.
Live content is unavailable. Log in and register to view live content