Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Workshop on Video-Language Models

HiMemFormer: Hierarchical Memory-Aware Transformer for Multi-Agent Action Anticipation

Zirui Wang · Xinran Zhao · Simon Stepputtis · Woojun Kim · Tongshuang Wu · Katia Sycara · Yaqi Xie


Abstract:

Understanding and predicting human actions has been a long-standing challenge and is a crucial measure of perception in robotics AI. While significant progress has been made in anticipating the future actions of individual agents, prior work has largely overlooked a key aspect of real-world human activity -- interactions. To address this gap in human-like forecasting within multi-agent environments, we present the Hierarchical Memory-Aware Transformer (HiMemFormer), a transformer-based model for online multi-agent action anticipation. HiMemFormer integrates and distributes global memory that captures joint historical information across all agents through a transformer framework, with a hierarchical local memory decoder that interprets agent-specific features based on these global representations using a coarse-to-fine strategy. In contrast to previous approaches, HiMemFormer uniquely hierarchically applies the global context with agent-specific preferences to avoid noisy or redundant information in multi-agent action anticipation. Extensive experiments on various multi-agent scenarios demonstrate the significant performance of HiMemFormer, compared with other state-of-the-art methods.

Chat is not available.