Deep Fictitious Play for Games with Continuous Action Spaces

Fictitious play has been a classic algorithm to solve two-player adversarial games with discrete action spaces. In this work we develop an approximate extension of fictitious play to two-player games with high-dimensional continuous action spaces. We use generative neural networks to approximate players' best responses while also learning a differentiable approximate model to the players' rewards given their actions. Both these networks are trained jointly with gradient-based optimization to emulate fictitious play. We explore our approach in zero-sum games, non zero-sum games and security game domains.

[1]  Mohammad Taghi Hajiaghayi,et al.  A Polynomial Time Algorithm for Spatio-Temporal Security Games , 2017, EC.

[2]  Milind Tambe,et al.  Patrol Strategies to Maximize Pristine Forest Area , 2012, AAAI.

[3]  Milind Tambe,et al.  "A Game of Thrones": When Human Behavior Models Compete in Repeated Stackelberg Security Games , 2015, AAMAS.

[4]  V. Conitzer,et al.  Approximation guarantees for fictitious play , 2009, 2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[5]  Rong Yang,et al.  Adaptive resource allocation for wildlife protection against illegal poachers , 2014, AAMAS.

[6]  Vijay Krishna,et al.  On the Convergence of Fictitious Play , 1998, Math. Oper. Res..

[7]  Joelle Pineau,et al.  TarMAC: Targeted Multi-Agent Communication , 2018, ICML.

[8]  David Silver,et al.  Fictitious Self-Play in Extensive-Form Games , 2015, ICML.

[9]  Milind Tambe,et al.  Robust Protection of Fisheries with COmPASS , 2014, AAAI.

[10]  Sheng Zhong,et al.  On repeated stackelberg security game with the cooperative human behavior model for wildlife protection , 2018, Applied Intelligence.

[11]  Yan Liu,et al.  Policy Learning for Continuous Space Security Games Using Neural Networks , 2018, AAAI.

[12]  David S. Leslie,et al.  Stochastic fictitious play with continuous action sets , 2014, J. Econ. Theory.

[13]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[14]  David Silver,et al.  Deep Reinforcement Learning from Self-Play in Imperfect-Information Games , 2016, ArXiv.

[15]  William H. Sandholm,et al.  ON THE GLOBAL CONVERGENCE OF STOCHASTIC FICTITIOUS PLAY , 2002 .

[16]  Bo An,et al.  Game-Theoretic Resource Allocation for Protecting Large Public Events , 2014, AAAI.

[17]  Milind Tambe,et al.  Handling Continuous Space Security Games with Neural Networks , 2017 .

[18]  Jeff S. Shamma,et al.  Unified convergence proofs of continuous-time fictitious play , 2004, IEEE Transactions on Automatic Control.

[19]  Milind Tambe,et al.  Optimal patrol strategy for protecting moving targets with multiple mobile resources , 2013, AAMAS.

[20]  Yoshua Bengio,et al.  Deep Directed Generative Models with Energy-Based Probability Estimation , 2016, ArXiv.

[21]  Mohammad Taghi Hajiaghayi,et al.  Spatio-Temporal Games Beyond One Dimension , 2018, EC.

[22]  Jonathan P. How,et al.  Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability , 2017, ICML.