Learning to Simulate Self-Driven Particles System with Coordinated Policy Optimization

Self-Driven Particles (SDP) describe a category of multi-agent systems common in everyday life, such as flocking birds and traffic flows. In a SDP system, each agent pursues its own goal and constantly changes its cooperative or competitive behaviors with its nearby agents. Manually designing the controllers for such SDP system is time-consuming, while the resulting emergent behaviors are often not realistic nor generalizable. Thus the realistic simulation of SDP systems remains challenging. Reinforcement learning provides an appealing alternative for automating the development of the controller for SDP. However, previous multiagent reinforcement learning (MARL) methods define the agents to be teammates or enemies before hand, which fail to capture the essence of SDP where the role of each agent varies to be cooperative or competitive even within one episode. To simulate SDP with MARL, a key challenge is to coordinate agents’ behaviors while still maximizing individual objectives. Taking traffic simulation as the testing bed, in this work we develop a novel MARL method called Coordinated Policy Optimization (CoPO), which incorporates social psychology principle to learn neural controller for SDP. Experiments show that the proposed method can achieve superior performance compared to MARL baselines in various metrics. Noticeably the trained vehicles exhibit complex and diverse social behaviors that improve performance and safety of the population as a whole. Demo video and source code are available at: https://decisionforce.github.io/CoPO/.

[1]  Mike Goslin,et al.  The Panda 3 D Graphics Engine , 2022 .

[2]  Alexandre M. Bayen,et al.  Flow: Architecture and Benchmarking for Reinforcement Learning in Traffic Control , 2017, ArXiv.

[3]  David Silver,et al.  Deep Reinforcement Learning from Self-Play in Imperfect-Information Games , 2016, ArXiv.

[4]  Demis Hassabis,et al.  Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.

[5]  Yuk Ying Chung,et al.  Learning Implicit Credit Assignment for Cooperative Multi-Agent Reinforcement Learning , 2020, NeurIPS.

[6]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[7]  Ming Zhou,et al.  Mean Field Multi-Agent Reinforcement Learning , 2018, ICML.

[8]  Shimon Whiteson,et al.  The StarCraft Multi-Agent Challenge , 2019, AAMAS.

[9]  D. Chowdhury Collective effects in intra-cellular molecular motor transport: Coordination, cooperation and competition ☆ , 2006, physics/0605053.

[10]  Vicsek,et al.  Novel type of phase transition in a system of self-driven particles. , 1995, Physical review letters.

[11]  Martin Mauve,et al.  Reducing Traffic Jams via VANETs , 2012, IEEE Transactions on Vehicular Technology.

[12]  Shimon Whiteson,et al.  Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge? , 2020, ArXiv.

[13]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[14]  Germán Ros,et al.  CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[15]  Karl Henrik Johansson,et al.  String Stability and a Delay-Based Spacing Policy for Vehicle Platoons Subject to Disturbances , 2017, IEEE Transactions on Automatic Control.

[16]  Siegfried Mercelis,et al.  Learning to Communicate Using Counterfactual Reasoning , 2020, ArXiv.

[17]  Matthew E. Taylor,et al.  A survey and critique of multiagent deep reinforcement learning , 2019, Autonomous Agents and Multi-Agent Systems.

[18]  Joel Z. Leibo,et al.  Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[19]  Yun-Pang Flötteröd,et al.  Microscopic Traffic Simulation using SUMO , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[20]  Alain L. Kornhauser,et al.  Beyond Grand Theft Auto V for Training, Testing and Enhancing Deep Learning in Self Driving Cars , 2017, ArXiv.

[21]  Dirk Helbing,et al.  Simulating dynamical features of escape panic , 2000, Nature.

[22]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[23]  Guy Lever,et al.  Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[24]  T. Vicsek,et al.  Collective decision making in cohesive flocks , 2010, 1007.4453.

[25]  J. K. Terry,et al.  Revisiting Parameter Sharing in Multi-Agent Deep Reinforcement Learning , 2020 .

[26]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[27]  Yung Yi,et al.  QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning , 2019, ICML.

[28]  Martin Treiber,et al.  The Intelligent Driver Model with Stochasticity -New Insights Into Traffic Flow Oscillations , 2017 .

[29]  David Hsu,et al.  SUMMIT: A Simulator for Urban Driving in Massive Mixed Traffic , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[30]  T. Vicsek,et al.  Collective behavior of interacting self-propelled particles , 2000, cond-mat/0611742.

[31]  Matthew E. Taylor,et al.  Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey , 2020, J. Mach. Learn. Res..

[32]  T. Ihle Kinetic theory of flocking: derivation of hydrodynamic equations. , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[33]  Mykel J. Kochenderfer,et al.  Cooperative Multi-agent Control Using Deep Reinforcement Learning , 2017, AAMAS Workshops.

[34]  Shimon Whiteson,et al.  Learning with Opponent-Learning Awareness , 2017, AAMAS.

[35]  David Silver,et al.  A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.

[36]  Yichuan Charlie Tang,et al.  Towards Learning Multi-Agent Negotiations via Self-Play , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[37]  Meng Wang,et al.  Connected variable speed limits control and car-following control with vehicle-infrastructure communication to resolve stop-and-go waves , 2016, J. Intell. Transp. Syst..

[38]  Lei Han,et al.  LIIR: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning , 2019, NeurIPS.

[39]  Guy Theraulaz,et al.  Key Behavioural Factors in a Self-Organised Fish School Model , 2008 .

[40]  Wim B. G. Liebrand,et al.  The effect of social motives, communication and group size on behaviour in an N-person multi-stage mixed-motive game , 1984 .

[41]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[42]  Mark R. Mine,et al.  The Panda3D Graphics Engine , 2004, Computer.

[43]  Eleni I. Vlahogianni,et al.  Statistical methods versus neural networks in transportation research: Differences, similarities and some insights , 2011 .

[44]  Michael I. Jordan,et al.  RLlib: Abstractions for Distributed Reinforcement Learning , 2017, ICML.

[45]  Joseph J. Hale,et al.  From Disorder to Order in Marching Locusts , 2006, Science.

[46]  Malte Risto,et al.  The social behavior of autonomous vehicles , 2016, UbiComp Adjunct.

[47]  Christos Dimitrakakis,et al.  TORCS, The Open Racing Car Simulator , 2005 .

[48]  Zihan Zhou,et al.  CityFlow: A Multi-Agent Reinforcement Learning Environment for Large Scale City Traffic Scenario , 2019, WWW.

[49]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[50]  Steven E Schladover Review of the State of Development of Advanced Vehicle Control Systems (AVCS) , 1995 .

[51]  Igor Mordatch,et al.  Emergent Tool Use From Multi-Agent Autocurricula , 2019, ICLR.

[52]  Xiaogang Wang,et al.  Measuring Crowd Collectiveness , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Sanja Fidler,et al.  Emergent Road Rules In Multi-Agent Driving Environments , 2020, ICLR.

[54]  Dirk Helbing,et al.  Modelling the evolution of human trail systems , 1997, Nature.

[55]  Bolei Zhou,et al.  MetaDrive: Composing Diverse Driving Scenarios for Generalizable Reinforcement Learning , 2021, ArXiv.

[56]  Olivier Bachem,et al.  Google Research Football: A Novel Reinforcement Learning Environment , 2020, AAAI.

[57]  Martin Treiber,et al.  Self-driven particle model for mixed traffic and other disordered flows , 2018, Physica A: Statistical Mechanics and its Applications.

[58]  Michel Droz,et al.  Hydrodynamic equations for self-propelled particles: microscopic derivation and stability analysis , 2009, 0907.4688.

[59]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[60]  Praveen Palanisamy,et al.  Multi-Agent Connected Autonomous Driving using Deep Reinforcement Learning , 2019, 2020 International Joint Conference on Neural Networks (IJCNN).

[61]  Alexandre M. Bayen,et al.  Benchmarks for reinforcement learning in mixed-autonomy traffic , 2018, CoRL.

[62]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.