Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Intrinsically Motivated Open-ended Learning (IMOL)

Quality-Diversity Self-Play: Open-Ended Strategy Innovation via Foundation Models

Aaron Dharna · Cong Lu · Jeff Clune

Keywords: [ open-ended learning ] [ policy search ] [ self-play ] [ quality-diversity ] [ foundation models ]


Abstract:

Multi-agent dynamics have powered innovation from time immemorial, such as scientific innovations during the space race or predator-prey dynamics in the natural world.The resulting landscape of interacting agents is a continually changing, interconnected, and complex mosaic of opportunities for innovation.Yet, training innovative and adaptive artificial agents remains challenging.Self-Play algorithms bootstrap the complexity of their solutions by automatically generating a curriculum.Recent work has demonstrated the power of foundation models (FMs) as intelligent and efficient search operators.In this paper, we investigate whether combining the human-like priors and extensive knowledge embedded in FMs with multi-agent race dynamics can lead to rapid policy innovation in open-ended Self-Play algorithms.We propose a novel algorithm, Quality-Diversity Self-Play (QDSP) that explores diverse and high-performing strategies in interacting (here, competing) populations.We evaluate QDSP in a two-player asymmetric pursuer-evader simulation with code-based policies and show that QDSP surpasses high-performing human-designed policies.Furthermore, QDSP discovers better policies than those from quality-only or diversity-only Self-Play algorithms.Since QDSP explores new code-based strategies, the discovered policies come from many distinct subfields of computer science and control, including reinforcement learning, heuristic search, model predictive control, tree search, and machine learning approaches.Combining multi-agent dynamics with the knowledge of FMs demonstrates a powerful new approach to efficiently create a Cambrian explosion of diverse, performant, and complex strategies in multi-agent settings.

Chat is not available.