Poster
Accelerating Nash Equilibrium Convergence in Monte Carlo Settings Through Counterfactual Value Based Fictitious Play
Qi Ju · Falin Hei · Ting Feng · Dengbing Yi · Zhemei Fang · YunFeng Luo
Counterfactual Regret Minimization (CFR) and its variants are effective algorithms for solving extensive-form imperfect information games. Recently, many improvements have been made to enhance the convergence rate of the CFR algorithm, although most variants cannot be applied under Monte Carlo (MC) conditions. As a result, these variants are not suitable for training in large-scale games. Currently, the standard approach to large-scale games involves a “pre-trained blueprint strategy + real-time search” methodology. This process begins with the MCCFR algorithm to develop a blueprint strategy for the early stages of the game, followed by real-time searches during gameplay. Therefore, accelerating the MCCFR algorithm is crucial for the effective training of blueprint strategies in large-scale games.We propose a new MC-based algorithm for solving extensive-form imperfect information games, called MCCFVFP (Monte Carlo Counterfactual Value-Based Fictitious Play). MCCFVFP integrates CFR’s counterfactual value calculations with fictitious play’s best response strategy. This synthesis leverages the strengths of fictitious play, especially in games where over 90% of the strategies are dominated. In our tests, MCCFVFP achieved convergence speeds up to three times faster than the most advanced MCCFR variants and exhibited superior performance in games where over 90% of the strategies are dominated, achieving convergence speeds two orders of magnitude faster than MCCFR. Additionally, in large-scale settings like two-player limit short deck Texas Hold’em poker, the blueprint strategy developed by MCCFVFP outperformed the one developed by MCCFR within the same training duration.
Live content is unavailable. Log in and register to view live content