Spotlight
in
Workshop: Workshop on robustness of zero/few-shot learning in foundation models (R0-FoMo)
Estimating Uncertainty in Multimodal Foundation Models using Public Internet Data
Shiladitya Dutta · Hongbo Wei · Lars van der Laan · Ahmed Alaa
Foundation models are trained on vast amounts of data at scale using self-supervised learning, enabling adaptation to a wide range of downstream tasks. At test time, these models exhibit zero-shot capabilities through which they can classify previously unseen (user-specified) categories. In this paper, we address the problem of quantifying uncertainty in these zero-shot predictions. We propose a heuristic approach for uncertainty estimation in zero-shot settings using conformal prediction with web data. Given a set of classes at test time, we conduct zero-shot classification with CLIP-style models using a prompt template, e.g., ``an image of a