Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Workshop on Open-World Agents: Synnergizing Reasoning and Decision-Making in Open-World Environments (OWA-2024)

Variational Inequality Perspective and Optimizers for Multi-Agent Reinforcement Learning

Baraah Adil Mohammed Sidahmed · Tatjana Chavdarova

Keywords: [ multi-agent reinforcement learning ] [ game optimization ] [ Variational Inequality ]


Abstract:

Multi-agent reinforcement learning (MARL) presents unique challenges as agentslearn strategies through trial and error. Gradient-based methods are often sensitive to hyperparameter selection and initial random seed variations. Recently,progress has been made in solving problems modeled by Variational Inequalities(VIs)—which include equilibrium-finding problems—particularly in addressingthe non-converging rotational dynamics that impede convergence of traditionalgradient-based optimization methods. This paper explores the potential of leveraging VI-based techniques to improve MARL training. Specifically, we study theperformance of VI methods—namely, Nested-Lookahead VI (nLA-VI) and Extragradient (EG)—in enhancing the multi-agent deep deterministic policy gradient(MADDPG) algorithm. We present a VI reformulation of the actor-critic algorithmfor both single- and multi-agent settings. We introduce three algorithms that usenLA-VI, EG, and a combination of both, named LA-MADDPG, EG-MADDPG,and LA-EG-MADDPG, respectively. Our empirical results demonstrate that theseVI-based approaches yield significant performance improvements in benchmarkenvironments, such as the zero-sum games: rock-paper-scissors and matchingpennies, where equilibrium strategies can be quantitatively assessed, and the MPEPredator-prey environment [Lowe et al., 2017], where VI-based methods also fostermore balanced participation among agents on the same team.

Chat is not available.