Iterated Deep Reinforcement Learning in Games: History-Aware Training for Improved Stability

Deep reinforcement learning (RL) is a powerful method for generating policies in complex environments, and recent breakthroughs in game-playing have leveraged deep RL as part of an iterative multiagent search process. We build on such developments and present an approach that learns progressively better mixed strategies in complex dynamic games of imperfect information, through iterated use of empirical game-theoretic analysis (EGTA) with deep RL policies. We apply the approach to a challenging cybersecurity game defined over attack graphs. Iterating deep RL with EGTA to convergence over dozens of rounds, we generate mixed strategies far stronger than earlier published heuristic strategies for this game. We further refine the strategy-exploration process, by fine-tuning in a training environment that includes out-of-equilibrium but recently seen opponents. Experiments suggest this history-aware approach yields strategies with lower regret at each stage of training.

[1]  Indrajit Ray,et al.  Dynamic Security Risk Management Using Bayesian Attack Graphs , 2012, IEEE Transactions on Dependable and Secure Computing.

[2]  Somesh Jha,et al.  Automated generation and analysis of attack graphs , 2002, Proceedings 2002 IEEE Symposium on Security and Privacy.

[3]  Andrew McLennan,et al.  Gambit: Software Tools for Game Theory , 2006 .

[4]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[5]  Richard Lippmann,et al.  Practical Attack Graph Generation for Network Defense , 2006, 2006 22nd Annual Computer Security Applications Conference (ACSAC'06).

[6]  Thore Graepel,et al.  The Mechanics of n-Player Differentiable Games , 2018, ICML.

[7]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[8]  Philip Bachman,et al.  Deep Reinforcement Learning that Matters , 2017, AAAI.

[9]  Vincent Conitzer,et al.  A double oracle algorithm for zero-sum security games on graphs , 2011, AAMAS.

[10]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[11]  Michael P. Wellman Putting the agent in agent-based modeling , 2016, Autonomous Agents and Multi-Agent Systems.

[12]  Branislav Bosanský,et al.  Approximate Solutions for Attack Graph Games with Imperfect Information , 2015, GameSec.

[13]  Michael P. Wellman,et al.  Multistage Attack Graph Security Games: Heuristic Strategies, with Empirical Game-Theoretic Analysis , 2018, Secur. Commun. Networks.

[14]  Jianfeng Gao,et al.  Deep Reinforcement Learning with a Combinatorial Action Space for Predicting Popular Reddit Threads , 2016, EMNLP.

[15]  Shimon Whiteson,et al.  Learning with Opponent-Learning Awareness , 2017, AAMAS.

[16]  Cynthia A. Phillips,et al.  A graph-based system for network-vulnerability analysis , 1998, NSPW '98.

[17]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[18]  Dorian Kodelja,et al.  Multiagent cooperation and competition with deep reinforcement learning , 2015, PloS one.

[19]  Zachary C. Lipton,et al.  Troubling Trends in Machine Learning Scholarship , 2018, ACM Queue.

[20]  Noam Brown,et al.  Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.

[21]  Sushil Jajodia,et al.  Detecting Stealthy Botnets in a Resource-Constrained Environment using Reinforcement Learning , 2017, MTD@CCS.

[22]  David Silver,et al.  A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.

[23]  Thore Graepel,et al.  Re-evaluating evaluation , 2018, NeurIPS.

[24]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[25]  Avrim Blum,et al.  Planning in the Presence of Cost Functions Controlled by an Adversary , 2003, ICML.

[26]  Michael P. Wellman,et al.  Strategy exploration in empirical games , 2010, AAMAS.

[27]  Kevin Waugh,et al.  DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker , 2017, ArXiv.

[28]  Barbara Kordy,et al.  DAG-based attack and defense modeling: Don't miss the forest for the attack trees , 2013, Comput. Sci. Rev..

[29]  Satish Vadlamani,et al.  Interdicting attack graphs to protect organizations from cyber attacks: A bi-level defender-attacker model , 2016, Comput. Oper. Res..

[30]  Joshua B. Tenenbaum,et al.  Beating the World's Best at Super Smash Bros. with Deep Reinforcement Learning , 2017, ArXiv.

[31]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[32]  Jakub W. Pachocki,et al.  Emergent Complexity via Multi-Agent Competition , 2017, ICLR.

[33]  Michael P. Wellman,et al.  Evaluating the Stability of Non-Adaptive Trading in Continuous Double Auctions: A Reinforcement Learning Approach , 2018, AAAI Workshops.

[34]  Michael P. Wellman,et al.  Multi-Stage Attack Graph Security Games: Heuristic Strategies, with Empirical Game-Theoretic Analysis , 2017, MTD@CCS.

[35]  Lantao Yu,et al.  Deep Reinforcement Learning for Green Security Games with Real-Time Information , 2018, AAAI.

[36]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[37]  Hado van Hasselt,et al.  Double Q-learning , 2010, NIPS.

[38]  Andreas Krause,et al.  An Online Learning Approach to Generative Adversarial Networks , 2017, ICLR.

[39]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[40]  Demosthenis Teneketzis,et al.  Optimal Defense Policies for Partially Observable Spreading Processes on Bayesian Attack Graphs , 2015, MTD@CCS.

[41]  Ian J. Goodfellow,et al.  Skill Rating for Generative Models , 2018, ArXiv.

[42]  Luiz Chaimowicz,et al.  Rock, Paper, StarCraft: Strategy Selection in Real-Time Strategy Games , 2016, AIIDE.

[43]  Yan Liu,et al.  Policy Learning for Continuous Space Security Games Using Neural Networks , 2018, AAAI.

[44]  Guillaume Lample,et al.  Playing FPS Games with Deep Reinforcement Learning , 2016, AAAI.

[45]  Michael P. Wellman,et al.  Stronger CDA strategies through empirical game-theoretic analysis and reinforcement learning , 2009, AAMAS.

[46]  Kevin Waugh,et al.  DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[47]  Michael P. Wellman,et al.  Empirical game-theoretic analysis of the TAC Supply Chain game , 2007, AAMAS '07.