Poster
OVT-B: A New Large-Scale Benchmark for Open-Vocabulary Multi-Object Tracking
Haiji Liang · Ruize Han
Open-vocabulary object perception has become an important topic in artificial intelligence, which aims to identify objects with novel classes that have not been seen during training. Under this setting, open-vocabulary object detection (OVD) in a single image has been studied in many literature. However, the open-vocabulary object tracking (OVT) from a video is less studied, and a reason is the shortage of benchmarks. In this work, we have built a new large-scale benchmark for open-vocabulary multi-object tracking namely OVT-B. OVT-B contains 1,048 categories of objects and 1,973 videos with 63,7608 bounding box annotations, which is much larger than the sole open-vocabulary tracking dataset OV-TAO-val dataset (200+ categories, 900+ videos). The proposed OVT-B can be used as a new benchmark to pave the way for the research of OVT. We also develop a simple yet effective baseline method for OVT. It integrates the motion features for object tracking, which is an important feature for MOT but is ignored in previous OVT methods. Experimental results have verified the usefulness of the proposed benchmark and the effectiveness of our method.
Live content is unavailable. Log in and register to view live content