Strategies for simulating pedestrian navigation with multiple reinforcement learning agents

In this paper, a new multi-agent reinforcement learning approach is introduced for the simulation of pedestrian groups. Unlike other solutions, where the behaviors of the pedestrians are coded in the system, in our approach the agents learn by interacting with the environment. The embodied agents must learn to control their velocity, avoiding obstacles and the other pedestrians, to reach a goal inside the scenario. The main contribution of this paper is to propose this new methodology that uses different iterative learning strategies, combining a vector quantization (state space generalization) with the Q-learning algorithm (VQQL). Two algorithmic schemas, Iterative VQQL and Incremental, which differ in the way of addressing the problems, have been designed and used with and without transfer of knowledge. These algorithms are tested and compared with the VQQL algorithm as a baseline in two scenarios where agents need to solve well-known problems in pedestrian modeling. In the first, agents in a closed room need to reach the unique exit producing and solving a bottleneck. In in the second, two groups of agents inside a corridor need to reach their goal that is placed in opposite sides (they need to solve the crossing). In the first scenario, we focus on scalability, use metrics from the pedestrian modeling field, and compare with the Helbing’s social force model. The emergence of collective behaviors, that is, the shell-shaped clogging in front of the exit in the first scenario, and the lane formation as a solution to the problem of the crossing, have been obtained and analyzed. The results demonstrate that the proposed schemas find policies that carry out the tasks, suggesting that they are applicable and generalizable to the simulation of pedestrians groups.

[1]  Fernando Fernández,et al.  Multi-agent Reinforcement Learning for Simulating Pedestrian Navigation , 2011, ALA.

[2]  Matthew E. Taylor,et al.  Towards student/teacher learning in sequential decision tasks , 2012, AAMAS.

[3]  ndez,et al.  Multi-agent reinforcement learning for simulating pedestrian navigation , 2011, ALA-11 2011.

[4]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[5]  Lisa Torrey,et al.  Crowd Simulation Via Multi-Agent Reinforcement Learning , 2010, AIIDE.

[6]  Mark H. Overmars,et al.  Simulating and Evaluating the Local Behavior of Small Pedestrian Groups , 2012, IEEE Transactions on Visualization and Computer Graphics.

[7]  Franziska Klügl,et al.  A Case Study of the Bern Railway Station , 2007 .

[8]  D. Helbing,et al.  Self-organizing pedestrian movement; Environment and Planning B , 2001 .

[9]  Norman I. Badler,et al.  Controlling individual agents in high-density crowd simulation , 2007, SCA '07.

[10]  Aude Billard,et al.  From Animals to Animats , 2004 .

[11]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[12]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[13]  Gunnar G. Løvås,et al.  Modeling and Simulation of Pedestrian Traffic Flow , 1994 .

[14]  M. Matarić Learning to Behave Socially , 1994 .

[15]  Demetri Terzopoulos,et al.  Autonomous pedestrians , 2007, Graph. Model..

[16]  David Chapman,et al.  Pengi: An Implementation of a Theory of Activity , 1987, AAAI.

[17]  Sandip Sen,et al.  Multiagent Coordination with Learning Classifier Systems , 1995, Adaption and Learning in Multi-Agent Systems.

[18]  Miguel Lozano,et al.  A comparative study of partitioning methods for crowd simulations , 2010, Appl. Soft Comput..

[19]  Dirk Helbing,et al.  Collective phenomena and states in traffic and self-driven many-particle systems , 2004 .

[20]  Dana H. Ballard,et al.  Learning to perceive and act by trial and error , 1991, Machine Learning.

[21]  Hubert Klüpfel,et al.  Evacuation Dynamics: Empirical Results, Modeling and Applications , 2009, Encyclopedia of Complexity and Systems Science.

[22]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[23]  Serge P. Hoogendoorn,et al.  Experimental Research of Pedestrian Walking Behavior , 2003 .

[24]  Dirk Helbing,et al.  Pedestrian, Crowd and Evacuation Dynamics , 2013, Encyclopedia of Complexity and Systems Science.

[25]  Stéphane Donikian,et al.  Experiment-based modeling, simulation and validation of interactions between virtual walkers , 2009, SCA '09.

[26]  Demetri Terzopoulos,et al.  Environmental Modeling for Autonomous Virtual Pedestrians , 2005 .

[27]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[28]  Dirk Helbing,et al.  Dynamics of crowd disasters: an empirical study. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[29]  Maja J. Matarić,et al.  Leaning to behave socially , 1994 .

[30]  F. Fernández,et al.  A COMPARATIVE STUDY OF DISCRETIZATION APPROACHES FOR STATE SPACE GENERALIZATION , 2010 .

[31]  Craig W. Reynolds Evolution of corridor following behavior in a noisy world , 1994 .

[32]  Michel Bierlaire,et al.  Specification, estimation and validation of a pedestrian walking behaviour model , 2007 .

[33]  Vincent Chevrier,et al.  Application of reinforcement learning to control a multi-agent system , 2009, ICAART 2009.

[34]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[35]  Peter Stone,et al.  Reinforcement Learning for RoboCup Soccer Keepaway , 2005, Adapt. Behav..

[36]  Albert Steiner,et al.  Parameter estimation for a pedestrian simulation model , 2007 .

[37]  P G Gipps,et al.  A micro simulation model for pedestrian flows , 1985 .

[38]  Jeffrey O. Kephart,et al.  Pricing in Agent Economies Using Multi-Agent Q-Learning , 2002, Autonomous Agents and Multi-Agent Systems.

[39]  Fernando Fernández,et al.  Reinforcement Learning for Decision-Making in a Business Simulator , 2012, Int. J. Inf. Technol. Decis. Mak..

[40]  이동규,et al.  강화(Reinforcement) 이론에 근거한 교사 보조 로봇 인터랙션 디자인 , 2006 .

[41]  Lynne E. Parker,et al.  A Reinforcement Learning Algorithm in Cooperative Multi-Robot Domains , 2005, J. Intell. Robotic Syst..

[42]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[43]  L. Rabiner,et al.  The acoustics, speech, and signal processing society - A historical perspective , 1984, IEEE ASSP Magazine.

[44]  Michel C. A. Klein,et al.  Modelling collective decision making in groups and crowds: Integrating social contagion and interacting emotions, beliefs and intentions , 2013, Autonomous Agents and Multi-Agent Systems.

[45]  Lisa A. Torrey Help an Agent Out : Student / Teacher Learning in Sequential Decision Tasks , 2011 .

[46]  Fernando Fernández,et al.  A Reinforcement Learning Approach for Multiagent Navigation , 2010, ICAART.

[47]  Ulrich Weidmann,et al.  Transporttechnik der Fussgänger , 1992 .

[48]  Dinesh Manocha,et al.  Reciprocal Velocity Obstacles for real-time multi-agent navigation , 2008, 2008 IEEE International Conference on Robotics and Automation.

[49]  Ulrich Weidmann,et al.  Transporttechnik der Fussgänger: Transporttechnische Eigenschaften des Fussgängerverkehrs, Literaturauswertung , 1992 .

[50]  Lubos Buzna,et al.  Self-Organized Pedestrian Crowd Dynamics: Experiments, Simulations, and Design Solutions , 2005, Transp. Sci..

[51]  Sonia Chernova,et al.  Integrating reinforcement learning with human demonstrations of varying ability , 2011, AAMAS.

[52]  Dirk Helbing,et al.  Simulating dynamical features of escape panic , 2000, Nature.

[53]  Peter Stone,et al.  Towards reinforcement learning representation transfer , 2007, AAMAS '07.

[54]  John J. Fruin,et al.  Pedestrian planning and design , 1971 .

[55]  Dirk Helbing,et al.  Self-Organizing Pedestrian Movement , 2001 .

[56]  Takeshi Sakuma,et al.  Psychological model for animating crowded pedestrians , 2005, Comput. Animat. Virtual Worlds.

[57]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[58]  Andreas Schadschneider,et al.  Empirical results for pedestrian dynamics and their implications for modeling , 2011, Networks Heterog. Media.

[59]  Bikramjit Banerjee,et al.  Layered Intelligence for Agent-based Crowd Simulation , 2009, Simul..

[60]  R. Gray,et al.  Vector quantization , 1984, IEEE ASSP Magazine.

[61]  Scott Stevens,et al.  Reinforcement Learning in Nonstationary Environment Navigation Tasks , 2007, Canadian Conference on AI.

[62]  Mario Campanella CALIBRATING WALKER MODELS : A METHODOLOGY AND APPLICATIONS , 2010 .

[63]  Didier Sornette,et al.  Encyclopedia of Complexity and Systems Science , 2009 .

[64]  Robert A. Meyers,et al.  Encyclopedia of Complexity and Systems Science , 2009 .

[65]  Fernando Fernández,et al.  Two steps reinforcement learning , 2008, Int. J. Intell. Syst..

[66]  Takeshi Sakuma,et al.  Psychological model for animating crowded pedestrians: Virtual Humans and Social Agents , 2005 .

[67]  A. Seyfried,et al.  The fundamental diagram of pedestrian movement revisited , 2005, physics/0506170.

[68]  Kardi Teknomo,et al.  Microscopic Pedestrian Flow Characteristics: Development of an Image Processing Data Collection and Simulation Model , 2016, ArXiv.

[69]  Helbing,et al.  Social force model for pedestrian dynamics. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[70]  Javier García,et al.  Probabilistic Policy Reuse for inter-task transfer learning , 2010, Robotics Auton. Syst..