Poster
in
Workshop: Attributing Model Behavior at Scale (ATTRIB)
Interactive Semantic Interventions for VLMs: A Human-in-the-Loop Approach to Interpretability
Lukas Klein · Kenza Amara · Carsten Lüth · Hendrik Strobelt · Mennatallah El-Assady · Paul Jaeger
Vision Language Models (VLMs), like ChatGPT-o and LLaVA, exhibit exceptional versatility across a wide array of tasks with minimal adaptation due to their ability to seamlessly integrate visual and textual data. However, model failure remains a crucial problem in VLMs, particularly when they produce incorrect outputs such as hallucinations or confabulations. These failures can be detected and analyzed by leveraging model interpretability methods and controlling input semantics, providing valuable insights into how different modalities influence model behavior and guiding improvements in model architecture for greater accuracy and robustness. To address this challenge, we introduce Interactive Semantic Interventions (ISI), a tool designed to enable researchers and VLM users to investigate how these models respond to semantic changes and interventions across image and text modalities, with a focus on identifying potential model failures in the context of Visual Question Answering (VQA). Specifically, it offers an interface and pipeline for semantically meaningful interventions on both image and text, while quantitatively evaluating the generated output in terms of modality importance and model uncertainty. Alongside the tool we publish a specifically tailored VQA dataset including predefined presets for semantic meaningful interventions on image and text modalities. ISI empowers researchers and users to gain deeper insights into VLM behavior, facilitating more effective troubleshooting to prevent and understand model failures. It also establishes a well-evaluated foundation before conducting large-scale VLM experiments. The tool and dataset are hosted at: https://gitlab.com/dekfsx1/isi-vlm.