Poster
in
Workshop: XAI in Action: Past, Present, and Future Applications
Beyond Concept Bottleneck Models: How to Make Black Boxes Intervenable?
Ričards Marcinkevičs · Sonia Laguna · Moritz Vandenhirtz · Julia Vogt
Recently, interpretable machine learning has re-explored concept bottleneck models (CBM), comprising step-by-step prediction of the high-level concepts from the raw features and the target variable from the predicted concepts. A compelling advantage of this model class is the user's ability to intervene on the predicted concept values, consequently affecting the model's downstream output. In this work, we introduce a method to perform such concept-based interventions on already-trained neural networks, which are not interpretable by design. Furthermore, we formalise the model's intervenability as a measure of the effectiveness of concept-based interventions and leverage this definition to fine-tune black-box models. Empirically, we explore the intervenability of black-box classifiers on synthetic tabular and natural image benchmarks. We demonstrate that fine-tuning improves intervention effectiveness and often yields better-calibrated predictions. To showcase the practical utility of the proposed techniques, we apply them to chest X-ray classifiers and show that fine-tuned black boxes can be as intervenable and more performant than CBMs.