NeurIPS From Flexibility to Manipulation: The Slippery Slope of Parameterizing Interpretability Evaluation

Poster
in
Workshop: Interpretable AI: Past, Present and Future

From Flexibility to Manipulation: The Slippery Slope of Parameterizing Interpretability Evaluation

Kristoffer Wickstrøm · Marina Höhne · Anna Hedström

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

The absence of ground truth explanation labels poses a key challenge for quantitative evaluation in interpretable AI (IAI), particularly when evaluation methods involve numerous user-specified hyperparameters. Without a ground truth, optimising hyperparameter selection is difficult, often leading researchers to make choices based on similar studies, which offers considerable flexibility. We show how this flexibility can be exploited to manipulate evaluation outcomes by framing it as an adversarial attack where minor hyperparameter adjustments lead to significant changes in results. Our experiments demonstrate substantial variations in evaluation outcomes across multiple datasets, explanation methods, and models. To counteract this, we propose a ranking-based mitigation strategy that enhances robustness against such manipulations. This work underscores the challenges of reliable evaluation in IAI. Code is available at \url{anonymised_link}.

Chat is not available.

Poster in Workshop: Interpretable AI: Past, Present and Future

From Flexibility to Manipulation: The Slippery Slope of Parameterizing Interpretability Evaluation

Kristoffer Wickstrøm · Marina Höhne · Anna Hedström

Poster
in
Workshop: Interpretable AI: Past, Present and Future