Skip to yearly menu bar Skip to main content


Poster

Understanding Bias in Large-Scale Visual Datasets

Boya Zeng · Yida Yin · Zhuang Liu

East Exhibit Hall A-C #1800
[ ] [ Project Page ]
Fri 13 Dec 4:30 p.m. PST — 7:30 p.m. PST

Abstract:

A recent study has shown that large-scale pretraining datasets are very biased: they can be easily classified by modern neural networks. However, the concrete forms of bias among these datasets remain unclear. In this study, we propose a framework to identify the unique visual attributes distinguishing these datasets. Our approach applies various transformations to extract semantic, structural, boundary, color, and frequency information from datasets and assess how much each type of information contributes to their bias. We further decompose their semantic bias with object-level queries, and leverage natural language methods to generate detailed, open-ended descriptions of each dataset's characteristics. Our work aims to help researchers understand the bias in existing large-scale datasets and build more diverse and representative ones in the future. Our project page and code are available at boyazeng.github.io/understand_bias

Chat is not available.