Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Attributing Model Behavior at Scale (ATTRIB)

Investigating Language Model Dynamics using Meta-Tokens

Alok Shah · Khush Gupta · Keshav Ramji · Vedant Gaur


Abstract:

Transformers have achieved remarkable success across various domains, but much remains unknown about their internal reasoning and training dynamics. This paper presents a novel approach using meta-tokens, special tokens injected into the input sequence, and a dedicated meta-attention mechanism to improve model performance and interpretability. We hypothesize that meta-tokens store and retrieve global contextual information by interacting through meta-attention. We test this by pretraining modified GPT-2 architecture equipped with meta-attention, in addition to causal multi-headed attention, and demonstrate its efficacy through empirical gains on the MMLU benchmark. Furthermore, we explore the distribution of attention scores and residual stream alterations by visualizing model internals. By applying the language model head at key points in the residual stream, we find that meta-tokens accelerate layer-wise logit convergence to the correct output token. These results suggest that meta-tokens effectively capture global dependencies, providing enhanced performance on long-context tasks while offering new insights into the flow of attention scores and subsequently training behavior in transformers.

Chat is not available.