Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Workshop on Distribution Shifts: New Frontiers with Foundation Models

LCA-on-the-Line: Benchmarking Out of Distribution Generalization with Class Taxonomies

Jia Shi · Gautam Rajendrakumar Gare · Jinjin Tian · Siqi Chai · Zhiqiu Lin · Arun Balajee Vasudevan · Di Feng · Francesco Ferroni · Shu Kong · Deva Ramanan

Keywords: [ hierarchy ] [ Zero-Shot ] [ out-of-distribution generalization ] [ Class Taxonomy ] [ representation evaluation ] [ Vision Language Model ]


Abstract:

In this paper, we address the challenge of assessing model generalization under Out-of-Distribution (OOD) conditions. We reintroduce the Least Common Ancestor (LCA) distance, a metric that has been largely overshadowed since ImageNet. By leveraging the WordNet hierarchy, we utilize the LCA to measure the taxonomic distance between labels and predictions, presenting it as a benchmark for model generalization. The LCA metric proves especially robust in comparison to previous state-of-the-art metrics when evaluating diverse models, including both vision-only and vision-language models on natural distribution shift datasets. To validate our benchmark's efficacy, we perform an extensive empirical study on 75 models spanning five distinct ImageNet-OOD datasets. Our findings reveal a strong linear correlation between in-domain ImageNet LCA scores and OOD Top1 performance across ImageNet-S/R/A/ObjectNet. This discovery gives rise to a novel evaluation framework termed "LCA-on-the-Line", facilitating unified and consistent assessments across a broad spectrum of models and datasets.Beside introducing an evaluative tool, we also delve into the intricate ties between the LCA metric and model generalization. By aligning model predictions more closely with the WordNet hierarchy and refining prompt engineering in zero-shot vision-language models, we offer tangible strategies to improve model generalization. We challenge the prevailing notion that LCA offers no added evaluative value over top-1 accuracy, our research provides invaluable insights and actionable techniques to enhance model robustness and generalization across various tasks and scenarios.

Chat is not available.