Skip to yearly menu bar Skip to main content


Invited talk
in
Workshop: Algorithmic Fairness through the lens of Causality and Robustness

Invited Talk: Towards Reliable and Robust Model Explanations

Himabindu Lakkaraju


Abstract:

As machine learning black boxes are increasingly being deployed in domains such as healthcare and criminal justice, there is growing emphasis on building tools and techniques for explaining these black boxes in an interpretable manner. Such explanations are being leveraged by domain experts to diagnose systematic errors and underlying biases of black boxes. In this talk, I will present some of our recent research that sheds light on the vulnerabilities of popular post hoc explanation techniques such as LIME and SHAP, and also introduce novel methods to address some of these vulnerabilities. More specifically, I will first demonstrate that these methods are brittle, unstable, and are vulnerable to a variety of adversarial attacks. Then, I will discuss two solutions to address some of the aforementioned vulnerabilities–(i) a Bayesian framework that captures the uncertainty associated with post hoc explanations and in turn allows us to generate explanations with user specified levels of confidence, and (ii) a framework based on adversarial training that is designed to make post hoc explanationsmore stable and robust to shifts in the underlying data; I will conclude the talk by discussing our recent theoretical results which shed light on the equivalence and robustness of state-of-the-art explanation methods.