NeurIPS PokeChamp: an Expert-level Minimax Language Agent for Competitive Pokemon

Poster
in
Workshop: Language Gamification

PokeChamp: an Expert-level Minimax Language Agent for Competitive Pokemon

Seth Karten · Andy Nguyen · Chi Jin

[ Abstract ] [ Project Page ]

[ Poster] [ OpenReview]

Abstract:

We introduce \texttt{Pok'eChamp}, a Large Language Model (LLM) powered game-theoretic aware agent for two-player competitive Pok'emon battles, that uses an LLM prior and collected high-Elo human data to model minimax search without any additional training. \texttt{Pok'eChamp} uses a depth-limited minimax search online where the LLM replaces three key components: 1) action sampling from the LLM guided by prompts (including from a damage calculation tool), 2) opponent-modeling via the historical likelihood of actions from our dataset to model the effect of LLM-predicted opponent actions, and 3) state value calculation for the LLM to reflect on each intrinsic state. \texttt{Pok'eChamp} outperforms all existing AIs (76%) and heuristic bots (84%) by an enormous margin, including winning consistently (>50%) against prior human-parity work run with a frontier model, GPT 4-o, while using an open-source 8 billion parameter Llama 3.1 model. \texttt{Pok'eChamp} achieves expert performance in the top 10% of players on the online ladder against competitive human players at an Elo of 1500. Finally, we collect the largest Pok'emon battling dataset, including 1 million+ games with 150k+ high Elo games, prepare a series of battling benchmarks based on real player data and puzzles to analyze specific battling abilities, and provide crucial updates to the local game engine. Our code is available \href{https://sites.google.com/view/pokechamp-llm}{online}.

Chat is not available.

Poster in Workshop: Language Gamification

PokeChamp: an Expert-level Minimax Language Agent for Competitive Pokemon

Seth Karten · Andy Nguyen · Chi Jin

Poster
in
Workshop: Language Gamification