论文信息 - DeepFP for Finding Nash Equilibrium in Continuous Action Spaces - 字舞流文

DeepFP for Finding Nash Equilibrium in Continuous Action Spaces

Finding Nash equilibrium in continuous action spaces is a challenging problem and has applications in domains such as protecting geographic areas from potential attackers. We present DeepFP, an approximate extension of fictitious play in continuous action spaces. DeepFP represents players’ approximate best responses via generative neural networks which are highly expressive implicit density approximators. It additionally uses a game-model network which approximates the players’ expected payoffs given their actions, and trains the networks end-to-end in a model-based learning regime. Further, DeepFP allows using domain-specific oracles if available and can hence exploit techniques such as mathematical programming to compute best responses for structured games. We demonstrate stable convergence to Nash equilibrium on several classic games and also apply DeepFP to a large forest security domain with a novel defender best response oracle. We show that DeepFP learns strategies robust to adversarial exploitation and scales well with growing number of players’ resources.

Milind Tambe | Kai Wang | Yan Liu | Nitin Kamra | Fei Fang | Umang Gupta

[1] David Silver,et al. A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.

[2] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[3] Branislav Bosanský,et al. Using Correlated Strategies for Computing Stackelberg Equilibria in Extensive-Form Games , 2016, AAAI.

[4] Rong Yang,et al. Adaptive resource allocation for wildlife protection against illegal poachers , 2014, AAMAS.

[5] Vijay Krishna,et al. On the Convergence of Fictitious Play , 1998, Math. Oper. Res..

[6] David Silver,et al. Fictitious Self-Play in Extensive-Form Games , 2015, ICML.

[7] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[8] Sarit Kraus,et al. When Security Games Hit Traffic: Optimal Traffic Enforcement Under One Sided Uncertainty , 2017, IJCAI.

[9] Milind Tambe,et al. Optimal patrol strategy for protecting moving targets with multiple mobile resources , 2013, AAMAS.

[10] Alexander J. Smola,et al. Deep Sets , 2017, 1703.06114.

[11] Milind Tambe,et al. Robust Protection of Fisheries with COmPASS , 2014, AAAI.

[12] Yan Liu,et al. Policy Learning for Continuous Space Security Games Using Neural Networks , 2018, AAAI.

[13] David S. Leslie,et al. Stochastic fictitious play with continuous action sets , 2014, J. Econ. Theory.

[14] Bo An,et al. Game-Theoretic Resource Allocation for Protecting Large Public Events , 2014, AAAI.

[15] Vincent Conitzer,et al. Stackelberg vs. Nash in Security Games: An Extended Investigation of Interchangeability, Equivalence, and Uniqueness , 2011, J. Artif. Intell. Res..

[16] Sheng Zhong,et al. On repeated stackelberg security game with the cooperative human behavior model for wildlife protection , 2018, Applied Intelligence.

[17] Mohammad Taghi Hajiaghayi,et al. Spatio-Temporal Games Beyond One Dimension , 2018, EC.

[18] David S. Leslie,et al. Generalised weakened fictitious play , 2006, Games Econ. Behav..

[19] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[20] Nicola Basilico,et al. Coordinating Multiple Defensive Resources in Patrolling Games with Alarm Systems , 2017, AAMAS.

[21] Thore Graepel,et al. The Mechanics of n-Player Differentiable Games , 2018, ICML.

[22] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[23] Milind Tambe,et al. Handling Continuous Space Security Games with Neural Networks , 2017 .

[24] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[25] Jeff S. Shamma,et al. Unified convergence proofs of continuous-time fictitious play , 2004, IEEE Transactions on Automatic Control.

[26] Milind Tambe,et al. Protecting Moving Targets with Multiple Mobile Resources , 2013, J. Artif. Intell. Res..

[27] Michael P. Wellman,et al. Gradient methods for stackelberg security games , 2016, UAI 2016.

[28] Bo An,et al. Security Games on a Plane , 2017, AAAI.

[29] Milind Tambe,et al. Patrol Strategies to Maximize Pristine Forest Area , 2012, AAAI.