Multi-agent reinforcement learning for simulating pedestrian navigation

In this paper we introduce a Multi-agent system that uses Reinforcement Learning (RL) techniques to learn local navigational behaviors to simulate virtual pedestrian groups. The aim of the paper is to study empirically the validity of RL to learn agent-based navigation controllers and their transfer capabilities when they are used in simulation environments with a higher number of agents than in the learned scenario. Two RL algorithms which use Vector Quantization (VQ) as the generalization method for the space state are presented. Both strategies are focused on obtaining a good vector quantizier that generalizes adequately the state space of the agents. We empirically state the convergence of both methods in our navigational Multi-agent learning domain. Besides, we use validation tools of pedestrian models to analyze the simulation results in the context of pedestrian dynamics. The simulations carried out, scaling up the number of agents in our environment (a closed room with a door through which the agents have to leave), have revealed that the basic characteristics of pedestrian movements have been learned.

[1]  Helbing,et al.  Social force model for pedestrian dynamics. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[2]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[3]  Dirk Helbing,et al.  Specification of the Social Force Pedestrian Model by Evolutionary Adjustment to Video Tracking Data , 2007, Adv. Complex Syst..

[4]  Javier García,et al.  Probabilistic Policy Reuse for inter-task transfer learning , 2010, Robotics Auton. Syst..

[5]  Evacuation Dynamics,et al.  Pedestrian and evacuation dynamics 2005 , 2007 .

[6]  Bernhard Steffen,et al.  New Insights into Pedestrian Flow Through Bottlenecks , 2009, Transp. Sci..

[7]  Akihiro Nakayama,et al.  Instability of pedestrian flow in two-dimensional optimal velocity model , 2007 .

[8]  Sandip Sen,et al.  Multiagent Coordination with Learning Classifier Systems , 1995, Adaption and Learning in Multi-Agent Systems.

[9]  Lisa Torrey,et al.  Crowd Simulation Via Multi-Agent Reinforcement Learning , 2010, AIIDE.

[10]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[11]  David Chapman,et al.  Pengi: An Implementation of a Theory of Activity , 1987, AAAI.

[12]  Dana H. Ballard,et al.  Learning to perceive and act by trial and error , 1991, Machine Learning.

[13]  Hubert Klüpfel,et al.  Evacuation Dynamics: Empirical Results, Modeling and Applications , 2009, Encyclopedia of Complexity and Systems Science.

[14]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[15]  Sandip Sen,et al.  Adaption and Learning in Multi-Agent Systems , 1995, Lecture Notes in Computer Science.

[16]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[17]  Maja J. Matarić,et al.  Leaning to behave socially , 1994 .

[18]  Carlo H. Séquin,et al.  Optimal adaptive k-means algorithm with dynamic adjustment of learning rate , 1995, IEEE Trans. Neural Networks.

[19]  Akihiro Nakayama,et al.  Instability of pedestrian flow and phase structure in a two-dimensional optimal velocity model. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20]  Fernando Fernández,et al.  Editorial: Modeling decisions for artificial intelligence , 2008 .

[21]  Peter Stone,et al.  Behavior transfer for value-function-based reinforcement learning , 2005, AAMAS '05.