NeurIPS What do we even know about interpretability?

Poster
in
Workshop: Interpretable AI: Past, Present and Future

What do we even know about interpretability?

Julian Skirzynski · Berk Ustun · Elena Glassman

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

This study challenges the assumption that all seemingly interpretable AI models lead to uniform user behavior. We compared user interactions with Disjunctive Normal Form (DNF) and Score Function representations of an identical model in a controlled experiment. Despite logical equivalence, we found significant differences in user reliance and decision quality across representations. The Score Function elicited higher reliance but led to more costly errors compared to the DNF representation. Increasing the Score Function's perceived complexity through non-integer coefficients yielded behavior similar to the DNF condition. Our findings reveal that the form of interpretability, not just its presence, significantly impacts human-AI interaction.

Chat is not available.

Poster in Workshop: Interpretable AI: Past, Present and Future

What do we even know about interpretability?

Julian Skirzynski · Berk Ustun · Elena Glassman

Poster
in
Workshop: Interpretable AI: Past, Present and Future