Poster
in
Workshop: AI for Science: Mind the Gaps
Towards trustworthy explanations with gradient-based attribution methods
Ethan Labelson · Rohit Tripathy · Peter Koo
The low interpretability of deep neural networks (DNNs) remains a key barrier to their wide-spread adoption in the sciences. Attribution methods offer a promising solution, providing feature importance scores that serve as first-order model explanations for a given input. In practice, gradient-based attribution methods, such as saliency maps, can yield noisy importance scores depending on model architecture and training procedure. Here we explore how various regularization techniques affect model explanations with saliency maps using synthetic regulatory genomic data, which allows us to quantitatively assess the efficacy of attribution maps. Strikingly, we find that generalization performance does not imply better saliency explanations; though unlike before, we do not observe a clear tradeoff. Interestingly, we find that conventional regularization strategies, when tuned appropriately, can yield high generalization and interpretability performance, similar to what can be achieved with more sophisticated techniques, such as manifold mixup. Our work challenges the conventional knowledge that model selection should be based on test performance; another criterion is needed to sub-select models ideally suited for downstream post hoc interpretability for scientific discovery.