NeurIPS Structure Development in List Sorting Transformers

Poster
in
Workshop: Symmetry and Geometry in Neural Representations

Structure Development in List Sorting Transformers

Einar Urdshals · Jasmina Urdshals

Keywords: [ Developmental Interpretability ] [ Copy Suppression ] [ Head Specialization ]

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

We present an analysis of the evolution of the QK and OV circuits for a list sorting attention only transformer. Using various measures, we identify the developmental stages in the training process. In particular, we find two forms of head specialization later in the training: vocabulary-splitting and copy-suppression. We study their robustness by varying the training hyperparameters and the model architecture.

Chat is not available.

Poster in Workshop: Symmetry and Geometry in Neural Representations

Structure Development in List Sorting Transformers

Einar Urdshals · Jasmina Urdshals

Poster
in
Workshop: Symmetry and Geometry in Neural Representations