论文信息 - Emergent Collective Behaviors in a Multi-agent Reinforcement Learning Pedestrian Simulation: A Case Study

Emergent Collective Behaviors in a Multi-agent Reinforcement Learning Pedestrian Simulation: A Case Study

In this work, a Multi-agent Reinforcement Learning framework is used to generate simulations of virtual pedestrians groups. The aim is to study the influence of two different learning approaches in the quality of generated simulations. The case of study consists on the simulation of the crossing of two groups of embodied virtual agents inside a narrow corridor. This scenario is a classic experiment inside the pedestrian modeling area, because a collective behavior, specifically the lanes formation, emerges with real pedestrians. The paper studies the influence of different learning algorithms, function approximation approaches, and knowledge transfer mechanisms on performance of learned pedestrian behaviors. Specifically, two different RL-based schemas are analyzed. The first one, Iterative Vector Quantization with Q-Learning (ITVQQL), improves iteratively a state-space generalizer based on vector quantization. The second scheme, named TS, uses tile coding as the generalization method with the Sarsa(\(\lambda \)) algorithm. Knowledge transfer approach is based on the use of Probabilistic Policy Reuse to incorporate previously acquired knowledge in current learning processes; additionally, value function transfer is also used in the ITVQQL schema to transfer the value function between consecutive iterations. Results demonstrate empirically that our RL framework generates individual behaviors capable of emerging the expected collective behavior as occurred in real pedestrians. This collective behavior appears independently of the learning algorithm and the generalization method used, but depends extremely on whether knowledge transfer was applied or not. In addition, the use of transfer techniques has a remarkable influence in the final performance (measured in number of times that the task was solved) of the learned behaviors.

[1] Fernando Fernández,et al. Editorial: Modeling decisions for artificial intelligence , 2008 .

[2] R. Gray,et al. Vector quantization , 1984, IEEE ASSP Magazine.

[3] M. Haklay,et al. Agent-Based Models and Individualism: Is the World Agent-Based? , 2000 .

[4] Dirk Helbing,et al. Self-Organizing Pedestrian Movement , 2001 .

[5] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[6] Majid Sarvi,et al. Animal dynamics based approach for modeling pedestrian crowd egress under panic conditions , 2011 .

[7] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[8] Adrien Treuille,et al. Near-optimal character animation with continuous control , 2007, SIGGRAPH 2007.

[9] ndez,et al. Multi-agent reinforcement learning for simulating pedestrian navigation , 2011, ALA-11 2011.

[10] Robert M. Gray,et al. An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[11] Helbing,et al. Social force model for pedestrian dynamics. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[12] L. Rabiner,et al. The acoustics, speech, and signal processing society - A historical perspective , 1984, IEEE ASSP Magazine.

[13] Cécile Appert-Rolland,et al. Traffic Instabilities in Self-Organized Pedestrian Crowds , 2012, PLoS Comput. Biol..

[14] Lubos Buzna,et al. Self-Organized Pedestrian Crowd Dynamics: Experiments, Simulations, and Design Solutions , 2005, Transp. Sci..

[15] Javier García,et al. Probabilistic Policy Reuse for inter-task transfer learning , 2010, Robotics Auton. Syst..

[16] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.

[17] Irfan A. Essa,et al. Machine Learning for Video-Based Rendering , 2000, NIPS.

[18] Matthew E. Taylor,et al. Adaptive and Learning Agents, Second Workshop, ALA 2009, Held as Part of the AAMAS 2009 Conference in Budapest, Hungary, May 12, 2009, Revised Selected Papers , 2010, ALA.

[19] P. Anderson. More is different. , 1972, Science.

[20] Michel Bierlaire,et al. Specification, estimation and validation of a pedestrian walking behaviour model , 2007 .

[21] 이동규,et al. 강화(Reinforcement) 이론에 근거한 교사 보조 로봇 인터랙션 디자인 , 2006 .