Poster
in
Workshop: I Can’t Believe It’s Not Better (ICBINB): Failure Modes in the Age of Foundation Models
Interactive Model Correction with Natural Language
Yoonho Lee · Michelle Lam · Helena Vasconcelos · Michael Bernstein · Chelsea Finn
In supervised learning, models are trained to extract correlations from a static dataset. This often leads to models that rely on spurious correlations that fail to generalize to new data distributions, such as a bird classifier that relies on the background of an image. Preventing models from latching on to spurious correlations necessarily requires additional information beyond labeled data. Existing methods incorporate forms of additional instance-level supervision, such as labels for spurious features or additional labeled data from a balanced distribution. Such strategies can become prohibitively costly for large-scale datasets since they require additional annotation at a scale close to the original training data. We hypothesize that far less supervision suffices if we provide targeted feedback about the misconceptions of models trained on a given dataset. We introduce Clarify, a novel natural language interface and method for interactively correcting model misconceptions. Through Clarify, users need only provide a short text description to describe a model's consistent failure patterns, such as ``water background'' for a bird classifier. Then, in an entirely automated way, we use such descriptions to improve the training process by reweighting the training data or gathering additional targeted data. Our empirical results show that non-expert users can successfully describe model misconceptions via Clarify, improving worst-group accuracy by an average of 7.3% in two datasets with spurious correlations. Finally, we use Clarify to find and rectify 31 novel spurious correlations in ImageNet, improving minority-split accuracy from 21.1% to 28.7%.