Emergent collective behaviors in a multi-agent reinforcement learning based pedestrian simulation

In this work, a Multi-agent Reinforcement Learning framework is used to get plausible simulations of pedestrians groups. In our framework, each virtual agent learns individually and independently to control its velocity inside a virtual environment. The case of study consists on the simulation of the crossing of two groups of embodied virtual agents inside a narrow corridor. This scenario permits us to test if a collective behavior, specifically the lanes formation is produced in our study as occurredin corridors with realpedestrians. The paper studies the influence of differentlearning algorithms, function approximation approaches, and knowledge transfer mechanisms in the performance of the learned pedestrian behaviors. Specifically, two different RL-based schemas are analyzed. The first one, Iterative Vector Quantization with Q-Learning (ITVQQL) improves iteratively a state-space generalizer based on vector quantization. The second scheme, named TS, uses Tile coding as the generalization method with the Sarsa(λ) algorithm. Knowledge transfer approach is based on the use of Probabilistic Policy Reuse to incorporate previously acquired knowledge in current learning processes; additionally, value function transfer is also used in the ITVQQL schema to transfer the value function between consecutive iterations. The results demonstrate empirically that our RL framework generates individual behaviors capable of emerging the expected collective behavior as occurred in real pedestrians. This collective behavior appears independently of the generalization method used, but depends extremely on whether knowledge transfer was applied or not. In addition, the use of transfer techniques has a notable influence in the final performance (measured in number of times that the task was solved) of the learned behaviors. A video of the simulation is available at the URL: http://www.uv.es/agentes/RL/index.htm

[1]  R. Gray,et al.  Vector quantization , 1984, IEEE ASSP Magazine.

[2]  Javier García,et al.  Probabilistic Policy Reuse for inter-task transfer learning , 2010, Robotics Auton. Syst..

[3]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[4]  Lubos Buzna,et al.  Self-Organized Pedestrian Crowd Dynamics: Experiments, Simulations, and Design Solutions , 2005, Transp. Sci..

[5]  ndez,et al.  Multi-agent reinforcement learning for simulating pedestrian navigation , 2011, ALA-11 2011.

[6]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[7]  Peter Stone,et al.  Towards reinforcement learning representation transfer , 2007, AAMAS '07.

[8]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[9]  Michel Bierlaire,et al.  Specification, estimation and validation of a pedestrian walking behaviour model , 2007 .

[10]  이동규,et al.  강화(Reinforcement) 이론에 근거한 교사 보조 로봇 인터랙션 디자인 , 2006 .

[11]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[12]  Irfan A. Essa,et al.  Machine Learning for Video-Based Rendering , 2000, NIPS.

[13]  David Elkind,et al.  Learning: An Introduction , 1968 .

[14]  Fernando Fernández,et al.  Multi-agent Reinforcement Learning for Simulating Pedestrian Navigation , 2011, ALA.

[15]  Peter Stone,et al.  Representation Transfer for Reinforcement Learning , 2007, AAAI Fall Symposium: Computational Approaches to Representation Change during Learning and Development.