Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Causality and Large Models

Causal Interventions on Causal Paths: Mapping GPT-2's Reasoning From Syntax to Semantics

Isabelle Lee · Joshua Lum · Ziyi Liu · Dani Yogatama

Keywords: [ semantic perturbations ] [ syntactic attention analysis ] [ causal circuits ]


Abstract:

One of the surprising emergent capabilities of large language models (LLMs) is its reasoning abilities spanning various domain. While interpretability research has made significant progress towards parsing the algorithms that transformer-based LLMs learn and evaluative research has better characterized and benchmarked LLM behaviors, understanding the mechanism behind such macroscopic capabilities still remain out of reach. In this work, we study cause-and-effect reasoning with curated and simple causal relational sentences. We analyze the internal model responses to such data from the syntactical and semantic perspectives with GPT-2 small. We find that a model's understanding of reasoning syntax are localized to a particular sets of attention heads in the first 2-3 layers. We perform semantic analysis with activation patching and discover that specific heads in later layers are particularly responsive to nonsensical perturbations to originally causal sentences. Our findings on basic causal understanding suggest that a model like GPT-2 might infer reasoning by: 1) first recognizing syntactical causal cues, and 2) eventually identifying few distinct attention heads in the final layers that pay specific attention to semantic relationships.

Chat is not available.