Interpretable AI: Past, Present and Future

Workshop

Interpretable AI: Past, Present and Future

Suraj Srinivas · Michal Moshkovitz · Chhavi Yadav · Lesia Semenova · Nave Frost · Vinayak Abrol · Bitya Neuhof · Valentyn Boreiko · Dotan Di Castro · Himabindu Lakkaraju · Kamalika Chaudhuri

East Ballroom A, B

Sun 15 Dec, 8:50 a.m. PST

[ Abstract ] Workshop Website

[ OpenReview]

This workshop is the second in a series focused on interpretability and explainability. The first workshop, titled "XAI in Action: Past, Present, and Future Applications," was held at NeurIPS 2023. In this edition, we aim to bridge classical interpretability and modern methods for foundation models. We retain the core organizing team from the previous workshop while welcoming three new members. Additionally, we have introduced research roundtables to support community building.

Chat is not available.

Timezone: America/Los_Angeles

Schedule

Sun 8:50 a.m. - 9:00 a.m.	Opening Remarks ( Intro ) > SlidesLive Video	🔗
Sun 9:00 a.m. - 9:30 a.m.	Cynthia Rudin: The Marriage of Noise and Simplicity ( Invited Talk ) > SlidesLive Video	Cynthia Rudin 🔗
Sun 9:30 a.m. - 10:00 a.m.	Rich Caruana: The Unexpected Success of GlassBox Learning with Tabular Data ( Invited Talk ) > SlidesLive Video	Rich Caruana 🔗
Sun 10:00 a.m. - 11:15 a.m.	Poster Session ( Poster ) >	🔗
Sun 11:15 a.m. - 12:00 p.m.	Panel Discussion: Moderator - Kamalika Chaudhuri ( Panel ) > SlidesLive Video	🔗
Sun 12:00 p.m. - 1:00 p.m.	Lunch ( Lunch ) >	🔗
Sun 1:00 p.m. - 1:30 p.m.	Contributed Talks 1 ( Contributed talks ) > SlidesLive Video	🔗
Sun 1:30 p.m. - 2:00 p.m.	Jiaxin Zhang: Building AI-Native Customer Experiences with Confidence at Intuit ( Invited talk ) > SlidesLive Video	Jiaxin Zhang 🔗
Sun 2:00 p.m. - 2:30 p.m.	Tong Wang: Using Advanced LLMs to Enhance Smaller LLMs - An Interpretable Knowledge Distillation Approach ( Invited talk ) > SlidesLive Video	Tong Wang 🔗
Sun 2:30 p.m. - 3:00 p.m.	Coffee Break ( Coffee Break ) >	🔗
Sun 3:00 p.m. - 3:30 p.m.	Neel Nanda: Sparse Autoencoders - Assessing the evidence ( Invited Talk ) > SlidesLive Video	Neel Nanda 🔗
Sun 3:30 p.m. - 4:00 p.m.	Contributed Talks 2 ( Contributed talks ) > SlidesLive Video	🔗
Sun 4:00 p.m. - 4:45 p.m.	Poster Session 2 ( Poster Session ) >	🔗
Sun 4:45 p.m. - 5:00 p.m.	Concluding Remarks ( Concluding Remarks ) > SlidesLive Video	🔗
-	Clustering and Alignment: Understanding the Training Dynamics in Modular Addition ( Poster ) > link Link	Tiberiu Mușat 🔗
-	How Do Training Methods Influence the Utilization of Vision Models? ( Poster ) > link Link	Paul Gavrikov · Shashank Agnihotri · Margret Keuper · Janis Keuper 🔗
-	[published paper track (COLT 2024)] A Theory of Interpretable Approximations ( Poster ) > link Link	Marco Bressan · Nicolò Cesa-Bianchi · Emmanuel Esposito · Yishay Mansour · Shay Moran · Maximilian Thiessen 🔗
-	Enhancing patient stratification and interpretability through class-contrastive and feature attribution techniques ( Poster ) > link Link	Sharday Olowu · Neil Lawrence · Soumya Banerjee 🔗
-	ProtoS-ViT: Visual foundation models for sparse self-explainable classifications ( Poster ) > link Link	Hugues Turbe · Mina Bjelogrlic · Gianmarco Mengaldo · Christian Lovis 🔗
-	Competence-Based Analysis of Language Models ( Poster ) > link Link	Adam Davies · Jize Jiang · Cheng Xiang Zhai 🔗
-	Residual Stream Analysis with Multi-Layer SAEs ( Poster ) > link Link	Tim Lawson · Lucy Farnik · Conor Houghton · Laurence Aitchison 🔗
-	PCNN: Probable-Class Nearest-Neighbor Explanations Improve Fine-Grained Image Classification Accuracy for AIs and Humans ( Poster ) > link Link	Giang Nguyen · Valerie Chen · Mohammad Reza Taesiri · Anh Nguyen 🔗
-	Latent Concept-based Explanation of NLP Models ( Poster ) > link Link	Xuemin Yu · Fahim Dalvi · Nadir Durrani · Marzia Nouri · Hassan Sajjad 🔗
-	Measuring the Reliability of Causal Probing Methods: Tradeoffs, Limitations, and the Plight of Nullifying Interventions ( Poster ) > link Link	Marc Canby · Adam Davies · Chirag Rastogi · Julia C Hockenmaier 🔗
-	Exploiting Interpretable Capabilities with Concept-Enhanced Diffusion and Prototype Networks ( Poster ) > link Link	Alba Carballo Castro · Sonia Laguna · Moritz Vandenhirtz · Julia Vogt 🔗
-	Subgroup Discovery with the Cox Model ( Poster ) > link Link	Zachary Izzo · Iain Melvin 🔗
-	Explainable AI-based analysis of human pancreas sections detects traits of type 2 diabetes ( Poster ) > link Link	25 presenters Lukas Klein · Sebastian Ziegler · Felicia Gerst · Yanni Morgenroth · Karol Gotkowski · Eyke Schöniger · Nicole Kipke · Annika Seiler · Ellen Geibelt · Martin Heni · Silvia Wagner · Silvio Nadalin · Falko Fend · Daniela Aust · Andre Mihaljevic · Daniel Hartmann · Jurgen Weitz · Reiner Schwartzenberg · Marius Distler · Andreas Birkefeld · Susanne Ullrich · Paul Jaeger · Fabian Isensee · Michele Solimena · Robert Wagner 🔗
-	Explainable Concept Generation through Vision-Language Preference Learning ( Poster ) > link Link	Aditya Taparia · Som Sagar · Ransalu Senanayake 🔗
-	Disentangling Mean Embeddings for Better Diagnostics of Image Generators ( Poster ) > link Link	Sebastian Gruber · Pascal Tobias Ziegler · Florian Buettner 🔗
-	You can remove GPT2's LayerNorm by fine-tuning ( Poster ) > link Link	Stefan Heimersheim 🔗
-	Towards scientific discovery with dictionary learning: Extracting biological concepts from microscopy foundation models ( Poster ) > link Link	Konstantin Donhauser · Gemma Moran · Aditya Ravuri · Kian Kenyon-Dean · Kristina Ulicna · Cian Eastwood · Jason Hartford 🔗
-	Words in Motion: Interpreting Motion Forecasting Transformers by Controlling Representations ( Poster ) > link Link	Omer Sahin Tas · Royden Wagner 🔗
-	Position: XAI needs formal notions of explanation correctness ( Poster ) > link Link	Stefan Haufe · Rick Wilming · Benedict Clark · Rustam Zhumagambetov · Danny Panknin · Ahcene Boubekki 🔗
-	Aligning Characteristic Descriptors with Images for Human-Expert-like Explainability ( Poster ) > link Link	Bharat Chandra Yalavarthi · Nalini Ratha 🔗
-	Error-controlled interaction discovery in deep neural networks ( Poster ) > link Link	Winston Chen · Yifan Jiang · William Stafford Noble · Yang Lu 🔗
-	Evaluating Machine Learning Models with NERO: Non-Equivariance Revealed on Orbits ( Poster ) > link Link	Zhuokai Zhao · Takumi Matsuzawa · William Irvine · Michael Maire · Gordon Kindlmann 🔗
-	This Looks Like Those: Illuminating Prototypical Concepts Using Multiple Visualizations ( Poster ) > link Link	Chiyu Ma · Brandon Zhao · Chaofan Chen · Cynthia Rudin 🔗
-	Bivariate Decision Trees: Smaller, Interpretable, More Accurate ( Poster ) > link Link	Rasul Kairgeldin · Miguel A. Carreira-Perpinan 🔗
-	ConceptDrift: Uncovering Biases through the Lens of Foundational Models ( Poster ) > link Link	Cristian D Paduraru · Elena Burceanu · Antonio Barbalau · Andrei Nicolicioiu · Radu Filipescu 🔗
-	A is for Absorption: Studying Sparse Autoencoder Feature Splitting and Absorption in Spelling Tasks ( Poster ) > link Link	James Wilken-Smith · Tomáš Dulka · David Chanin · Hardik Bhatnagar · Joseph Bloom 🔗
-	Deep quantum graph dreaming: deciphering neural network insights into quantum experiments ( Poster ) > link Link	Tareq Jaouni · Sören Arlt · Carlos Ruiz-Gonzalez · Ebrahim Karimi · Xuemei Gu · Mario Krenn 🔗
-	CoS: Enhancing Personalization and Mitigating Bias with Context Steering ( Poster ) > link Link	Sashrika Pandey · Jerry He · Mariah Schrum · Anca Dragan 🔗
-	A Concept-Based Explainability Framework for Large Multimodal Models ( Poster ) > link Link	Jayneel Parekh · Pegah KHAYATAN · Mustafa Shukor · Alasdair Newson · Matthieu Cord 🔗
-	Riemann Sum Optimization for Accurate Integrated Gradients Computation ( Poster ) > link Link	Swadesh Swain · Shree Singhi 🔗
-	Interpretability as Compression: Reconsidering SAE Explanations of Neural Activations ( Poster ) > link Link	Kola Ayonrinde · Michael Pearce 🔗
-	Model Reconstruction Using Counterfactual Explanations: A Perspective From Polytope Theory ( Poster ) > link Link	Pasan Dissanayake · Sanghamitra Dutta 🔗
-	Interpretable AI in Human-Machine Systems: Insights from Human-in-the-Loop Product Recommendation Engines ( Poster ) > link Link	Pooria Assadi · NIMA SAFAEI 🔗
-	SignAttention: On the Interpretability of Transformer Models for Sign Language Translation ( Poster ) > link Link	Pedro Alejandro Dal Bianco · Oscar Stanchi · Facundo Manuel Quiroga · Franco Ronchetti · Enzo Ferrante 🔗
-	Interactive Semantic Interventions for VLMs: A Human-in-the-Loop Approach to Interpretability ( Poster ) > link Link	Lukas Klein · Kenza Amara · Carsten Lüth · Antonio Foncubierta-Rodriguez · Hendrik Strobelt · Mennatallah El-Assady · Paul Jaeger 🔗
-	Right on Time: Revising Time Series Models by Constraining their Explanations ( Poster ) > link Link	Maurice Kraus · David Steinmann · Antonia Wüst · Andre Kokozinski · Kristian Kersting 🔗
-	Position: In Defense of Post-hoc Explainability ( Poster ) > link Link	Nick Oh 🔗
-	Isometry pursuit ( Poster ) > link Link	Samson Koelle · Marina Meila 🔗
-	From Flexibility to Manipulation: The Slippery Slope of Parameterizing Interpretability Evaluation ( Poster ) > link Link	Kristoffer Wickstrøm · Marina Höhne · Anna Hedström 🔗
-	Your Theory Is Wrong: Using Linguistic Frameworks for LLM Probing ( Poster ) > link Link	Victoria Firsanova 🔗
-	Can sparse autoencoders be used to decompose and interpret steering vectors? ( Poster ) > link Link	Harry Mayne · Yushi Yang · Adam Mahdi 🔗
-	Policy-shaped prediction: improving world modeling through interpretability ( Poster ) > link Link	Miles Hutson · Isaac Kauvar · Nick Haber 🔗
-	A Mechanism for Storing Positional Information Without Positional Embeddings ( Poster ) > link Link	Chunsheng Zuo · Pavel Guerzhoy · Michael Guerzhoy 🔗
-	What do we even know about interpretability? ( Poster ) > link Link	Julian Skirzynski · Berk Ustun · Elena Glassman 🔗
-	GAMformer: Exploring In-Context Learning for Generalized Additive Models ( Poster ) > link Link	Andreas Mueller · Julien Siems · Harsha Nori · Rich Caruana · Frank Hutter 🔗
-	The effect of whitening on explanation performance ( Poster ) > link Link	Benedict Clark · Stoyan Karastoyanov · Rick Wilming · Stefan Haufe 🔗