Skip to yearly menu bar Skip to main content


Poster
in
Workshop: New Frontiers of AI for Drug Discovery and Development

Identifying regularization schemes that make the feature attributions faithful

Julius Adebayo · Samuel Stanton · Simon Kelow · Michael Maser · Richard Bonneau · Vladimir Gligorijevic · Kyunghyun Cho · Stephen Ra · Nathan Frey

Keywords: [ Regularization ] [ feature attributions ] [ Faithfulness ]


Abstract:

Feature attribution methods assign a score to each input dimension as a measure of the relevance of that dimension to a model's output. Despite wide use, the feature importance rankings induced by gradient-based feature attributions are unfaithful, that is, they do not correlate with the input-perturbation sensitivity of the model---unless the model is trained to be adversarially robust. Here we demonstrate that these concerns translate to models trained for protein function prediction tasks. Despite making a model's gradient-based attributions faithful to the model, adversarial training has low real-data performance. We find that independent Gaussian noise corruption is an effective alternative, to adversarial training, that confers faithfulness onto a model's gradient-based attributions without performance degradation. On the other hand, we observe no meaningful faithfulness benefits from regularization schemes like dropout and weight decay. We translate these insights to a real-world protein function prediction task, where the gradient-based feature attributions of noise-regularized models, correctly indicate low sensitivity to irrelevant gap tokens in a protein's sequence alignment.

Chat is not available.