Skip to yearly menu bar Skip to main content


Poster
in
Workshop: 5th Workshop on Self-Supervised Learning: Theory and Practice

Pearls from Pebbles: Improved Confidence Functions for Auto-labeling

Harit Vishwakarma · Yi Chen · Sui Jiet Tay · Satya Sai Srinath Namburi · Frederic Sala · Ramya Korlakai Vinayak


Abstract: Auto-labeling techniques produce labeled data with minimal manual annotations using the representations from self-supervised models and confidence scores. A popular technique, threshold-based auto-labeling (TBAL) trains model using these representations and manual annotations and assigns model's prediction as label to the points where model's confidence score is greater than a certain threshold. However, the model's scores can be overconfident and lead to poor performance. We show that calibration, a common remedy for the overconfidence problem, falls short in tackling this problem for TBAL. Thus, instead of using existing calibration methods, we introduce a framework for optimal confidence functions for TBAL and develop \texttt{Colander}, a method designed to maximize auto-labeling performance. We perform an extensive empirical evaluation of \texttt{Colander} and other confidence functions, using representations from CLIP and text embedding models for image and text data respectively. We find \texttt{Colander} achieves up to 60\% improvement on coverage (the proportion of points labeled by model) over the baselines while maintaining error level below $5\%$ and using the same amount of labeled data.

Chat is not available.