Poster
in
Workshop: Attributing Model Behavior at Scale (ATTRIB)
Why do landscape diagnostics matter? Pinpointing the failure mode of generalization
Yefan Zhou · Jianlong Chen · Qinxue Cao · Konstantin Schürholt · Yaoqing Yang
Conventional validation-based and learning-curve-based methods are widely applied for model selection and hyperparameter tuning. In this paper, we consider a novel framework of ``model diagnostics'' to extend these approaches, where a practitioner wants to determine the best way of using a given budget to either collect more data, purchase a larger model, or conduct more careful hyperparameter tuning. We apply our framework to multiple transfer learning scenarios, including tuning on models trained with small data while transferring the tuning decisions to large data and tuning on clean data while transferring the decisions to noisy data. We experimentally demonstrate that generalization measures, especially those motivated by studying the loss landscape of neural networks, play a crucial role in improving the model diagnostic performance compared to classical validation-based and learning-curve-based methods.