Poster
in
Workshop: Interpretable AI: Past, Present and Future
A Mechanism for Storing Positional Information Without Positional Embeddings
Chunsheng Zuo · Pavel Guerzhoy · Michael Guerzhoy
Abstract:
Transformers with causal attention can solve tasks that require positional information without using positional embeddings. In this work, we propose and investigate a new mechanism through which positional information can be stored without using explicit positional encoding. We observe that nearby embeddings are more similar to each other than faraway embedding, allowing the transformer to potentially reconstruct the positions of tokens. We show that this pattern can occur in both the trained and the randomly initialized Transformer models with causal attention and no positional embeddings over a common range of hyperparameters.
Chat is not available.