Poster
in
Workshop: Attributing Model Behavior at Scale (ATTRIB)
Shapley Interactions for Complex Feature Attribution
Divyansh Singhvi · Andrej Erkelens · Raghav Jain · Diganta Misra · Naomi Saphra
Feature interaction is an established approach to understanding complex patterns of attribution in many models. In this paper, we use Shapley Taylor interaction indices (STII) to analyze how linguistic structure influences language model output in masked and auto-regressive language models (MLMs and ALMs). We find that ALMs, and to a lesser degree MLMs, tend to combine pairs of tokens with more nonlinear interactions if they co-occur in the same idiomatic multiword expression. We also find that while ALMs tend to become more linear in their interactions at greater positional distances, in MLMs this linearity is scaled by syntactic distance, implying that the learned structure in MLMs relies more on syntax than the recency-based structure favored natively by ALMs.