Skip to yearly menu bar Skip to main content


Spotlight
in
Workshop: UniReps: Unifying Representations in Neural Models

Inverted-Attention Transformers can Learn Object Representations: Insights from Slot Attention

Yi-Fu Wu · Klaus Greff · Gamaleldin Elsayed · Michael Mozer · Thomas Kipf · Sjoerd van Steenkiste


Abstract:

Visual reasoning is supported by a causal understanding of the physical world, and theories of human cognition suppose that a necessary step to causal understanding is the discovery and representation of high-level entities like objects. Slot Attention is a popular method aimed at object-centric learning, and its popularity has resulted in dozens of variants and extensions. To help understand the core assumptions that lead to successful object-centric learning, we take a step back and identify the minimal set of changes to a standard Transformer architecture to obtain the same performance as the specialized Slot Attention models. We systematically evaluate the performance and scaling behaviour of several ``intermediate'' architectures on seven image and video datasets from prior work. Our analysis reveals that by simply inverting the attention mechanism of Transformers, we obtain performance competitive with state-of-the-art Slot Attention in several domains.

Chat is not available.