Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Interpretable AI: Past, Present and Future

The effect of whitening on explanation performance

Benedict Clark · Stoyan Karastoyanov · Rick Wilming · Stefan Haufe


Abstract:

Explainable artificial intelligence (XAI) promises to provide information about models, their training data, and given test inputs to users of machine learning systems. As many XAI method are algorithmically defined, the ability of these method to provide correct answers to relevant questions needs to be theoretically verified and/or empirically validated. Prior work (Haufe et al., 2014; Wilming et al., 2023) has pointed out that popular feature attribution methods tend to assign significant importance to input features lacking a statistical association with the prediction targe, leading to misinterpretations. This phenomenon is caused by the presence of dependent noises and is absent when all features are mutually independent. This motivates the question whether whitening, a common preprocessing effectively decorrelating the data before training, can avoid such misinterpretations. Using an established benchmark (Clark et al., 2024b) comprising ground truth-based definition of explanation correctness and quantitative metrics of explanation performance, we evaluate 16 popular feature attribution methods in combination with 5 different whitening transforms, and compare their performance to baselines. The results show that whitening’s impact on XAI performance is multifaceted, with some whitening techniques showing marked improvement in performance, though the degree of this improvement varies by XAI method and model architecture. The variability revealed in the experiments can be explained by the complexity of the relationship between the quality of pre-processing and the subsequent effectiveness of XAI methods, which underlines the significance of pre-processing techniques for model interpretability.

Chat is not available.