Poster
in
Workshop: eXplainable AI approaches for debugging and diagnosis
Revisiting Sanity Checks for Saliency Maps
Gal Yona
Saliency methods are a popular approach for model debugging and explainability. However, in the absence of ground-truth data for what the correct maps should be, evaluating and comparing different approaches remains a long-standing challenge. The sanity checks methodology of Adebayo et al [Neurips 2018] has sought to address this challenge. They argue that some popular saliency methods should not be used for explainability purposes since the maps they produce are not sensitive to the underlying model that is to be explained. In this work, we revisit the logic behind their proposed methodology. We cast the objective of ruling out a saliency method as not being sensitive to the model as a causal inference question, and use this to argue that their empirical results do not sufficiently establish their conclusions, due to a form of confounding that may be inherent to the tasks they evaluate on. On a technical level, our findings call into question the current consensus around some methods being not suitable for explainability purposes. On a broader level, our work further highlights the challenges involved with saliency map evaluation.