Poster
in
Workshop: 5th Workshop on Self-Supervised Learning: Theory and Practice
Pearls from Pebbles: Improved Confidence Functions for Auto-labeling
Harit Vishwakarma · Yi Chen · Sui Jiet Tay · Satya Sai Srinath Namburi · Frederic Sala · Ramya Korlakai Vinayak
Abstract:
Auto-labeling techniques produce labeled data with minimal manual annotations using the representations from self-supervised models and confidence scores. A popular technique, threshold-based auto-labeling (TBAL) trains model using these representations and manual annotations and assigns model's prediction as label to the points where model's confidence score is greater than a certain threshold. However, the model's scores can be overconfident and lead to poor performance. We show that calibration, a common remedy for the overconfidence problem, falls short in tackling this problem for TBAL. Thus, instead of using existing calibration methods, we introduce a framework for optimal confidence functions for TBAL and develop \texttt{Colander}, a method designed to maximize auto-labeling performance. We perform an extensive empirical evaluation of \texttt{Colander} and other confidence functions, using representations from CLIP and text embedding models for image and text data respectively. We find \texttt{Colander} achieves up to 60\% improvement on coverage (the proportion of points labeled by model) over the baselines while maintaining error level below $5\%$ and using the same amount of labeled data.
Chat is not available.