Poster
in
Workshop: The Fourth Workshop on Efficient Natural Language and Speech Processing (ENLSP-IV): Highlighting New Architectures for Future Foundation Models
Composite Attention: A Framework for Combining Sequence Mixing Primitives
Jake Cunningham · Marc Deisenroth
Keywords: [ Efficient Architectures ]
Hybrid attention architectures have shown promising success in both equipping self attention with inductive bias for long-sequence modelling and reducing the computational burden of transformers without sacrificing quality. This paper introduces Composite Attention, a theoretical framework for analyzing the combination of sequence mixing primitives in modern deep learning architectures. Utilizing the definition of sequence mixers as structured linear maps, we formalize the composition of sequence mixing primitives as either sequential or recurrent composition.