Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Compositional Learning: Perspectives, Methods, and Paths Forward

Compositional Visual Reasoning with SlotSSMs

Jindong Jiang · Fei Deng · Gautam Singh · Minseung Lee · Sungjin Ahn

Keywords: [ Slot ] [ Video Models ] [ Mamba ] [ Object-Centric Learning ] [ Visual Reasoning ] [ State Space Models ]


Abstract:

In many real-world sequence modeling problems, the underlying process is inherently modular and it is important to design machine learning architectures that can leverage this modular structure. In this paper, we introduce SlotSSMs, a novel framework for incorporating independent mechanisms into State Space Models (SSMs), such as Mamba, to preserve or encourage separation of information, thereby improving visual reasoning. We evaluate SlotSSMs on long-sequence reasoning and real-world depth estimation tasks, demonstrating substantial performance improvements over existing sequence modeling methods. Our design efficiently exploits the modularity of inputs and scales effectively through the parallelizable architecture enabled by SSMs. We hope this approach will inspire future research on compositional reasoning architectures.

Chat is not available.