Applying reinforcement learning to an insurgency Agent-based Simulation

A requirement of an Agent-based Simulation (ABS) is that the agents must be able to adapt to their environment. Many ABSs achieve this adaption through simple threshold equations due to the complexity of incorporating more sophisticated approaches. Threshold equations are when an agent behavior changes because a numeric property of the agent goes above or below a certain threshold value. Threshold equations do not guarantee that the agents will learn what is best for them. Reinforcement learning is an artificial intelligence approach that has been extensively applied to multi-agent systems but there is very little in the literature on its application to ABS. Reinforcement learning has previously been applied to discrete-event simulations with promising results; thus, reinforcement learning is a good candidate for use within an Agent-based Modeling and Simulation (ABMS) environment. This paper uses an established insurgency case study to show some of the consequences of applying reinforcement learning to ABMS, for example, determining whether any actual learning has occurred. The case study was developed using the Repast Simphony software package.

[1]  J. Gareth Polhill,et al.  Is Your Model Susceptible to Floating-Point Errors? , 2006, J. Artif. Soc. Soc. Simul..

[2]  Frank L. Lewis,et al.  Online adaptive learning for team strategies in multi-agent systems , 2012 .

[3]  Yoav Shoham,et al.  Multi-Agent Reinforcement Learning:a critical survey , 2003 .

[4]  David C. Arney,et al.  Modeling insurgency, counter-insurgency, and coalition strategies and operations , 2013 .

[5]  Boris Stilman Linguistic Geometry: From Search to Construction , 2000 .

[6]  Franziska Klügl-Frohnmeyer,et al.  PROGRAMMING AGENT BEHAVIOR BY LEARNING IN SIMULATION MODELS , 2012, Appl. Artif. Intell..

[7]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[8]  Andrew Collins,et al.  Evaluating reinforcement learning for game theory application learning to price airline seats under competition , 2009 .

[9]  Alan M. Leslie,et al.  Learning: Association or Computation? Introduction to a Special Section , 2001 .

[10]  Michael Pidd,et al.  Tools for Thinking—Modelling in Management Science , 1997 .

[11]  Charles M. Macal,et al.  Managing Business Complexity: Discovering Strategic Solutions with Agent-Based Modeling and Simulation , 2007 .

[12]  Michael J. North,et al.  Experiences creating three implementations of the repast agent modeling toolkit , 2006, TOMC.

[13]  Greyson Daugherty,et al.  A Q-learning approach to automated unmanned air vehicle demining , 2012 .

[14]  Stephen Wolfram,et al.  A New Kind of Science , 2003, Artificial Life.

[15]  H. Peyton Young,et al.  The Possible and the Impossible in Multi-Agent Learning , 2007, Artif. Intell..

[16]  Abhijit Gosavi,et al.  Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning , 2003 .

[17]  Craig W. Reynolds Flocks, herds, and schools: a distributed behavioral model , 1998 .

[18]  Thomas C. Schelling,et al.  Dynamic models of segregation , 1971 .

[19]  Lyn C. Thomas,et al.  Comparing reinforcement learning approaches for solving game theoretic models: a dynamic airline pricing game example , 2012, J. Oper. Res. Soc..

[20]  Yi-Chi Wang,et al.  Application of reinforcement learning for agent-based production scheduling , 2005, Eng. Appl. Artif. Intell..

[21]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[22]  John A. Sokolowski,et al.  Modeling the Niger Delta Insurgency , 2010 .

[23]  Donald Michie,et al.  BOXES: AN EXPERIMENT IN ADAPTIVE CONTROL , 2013 .

[24]  Craig W. Reynolds Flocks, herds, and schools: a distributed behavioral model , 1987, SIGGRAPH.

[25]  Douglas A. Popken,et al.  A Simulation-optimization Approach to Air Warfare Planning , 2004 .

[26]  Abhijit Gosavi,et al.  Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning , 2003 .

[27]  John S. Bridle,et al.  Training Stochastic Model Recognition Algorithms as Networks can Lead to Maximum Mutual Information Estimation of Parameters , 1989, NIPS.

[28]  Michael J. North,et al.  Reinforcement Learning in Supply Chains , 2009, Int. J. Neural Syst..

[29]  Gavin Adrian Rummery Problem solving with reinforcement learning , 1995 .

[30]  John D. Salt,et al.  The seven habits of highly defective simulation projects , 2008, J. Simulation.

[31]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[32]  Juliane Freud,et al.  Fun And Games A Text On Game Theory , 2016 .

[33]  John A. Sokolowski,et al.  Using an agent-based model to explore troop surge strategy , 2012 .

[34]  C. L. Mallows,et al.  Individual Choice Behaviour. , 1961 .

[35]  Charles M. Macal,et al.  Managing Business Complexity: Discovering Strategic Solutions with Agent-Based Modeling and Simulation , 2007 .