Skip to yearly menu bar Skip to main content


Talk
in
Affinity Workshop: Black in AI

Learning scene and video understanding with limited labels

Mennatullah Siam


Abstract:

Image and video understanding goal is to make inferences about the surrounding world from the corresponding image or video data, e.g., identification and localization of objects. It can also be extended towards identifying actions and/or recognizing relations between objects. It has attracted great attention in the research community both because of its widespread applications (e.g., to automated driving and robotics) and the fascinating scientific and engineering challenges that it brings (e.g., designing a system that can learn about the 3D, time-vary world from a mere video). Through the use of deep learning, great advances in scene/video understanding have been seen in recent years. A major limitation of most such approaches is that they require large-scale labelled data for learning, where the annotation cost can be expensive, especially when annotating pixel-wise segmentation masks in videos. In this talk, I will focus on how to learn a scene and video understanding provided with a few labelled examples, and to use the interpretability of deep spatiotemporal models to give us insights on how to improve their generalization capabilities. This research direction has the potential to decolonize Computer Vision by enabling developing countries with limited resources and labelled data to contribute to the field and work on applications that serve their own communities.

Chat is not available.