Delay-Aware Multi-Agent Reinforcement Learning

Action and observation delays exist prevalently in the real-world cyber-physical systems which may pose challenges in reinforcement learning design. It is particularly an arduous task when handling multi-agent systems where the delay of one agent could spread to other agents. To resolve this problem, this paper proposes a novel framework to deal with delays as well as the non-stationary training issue of multi-agent tasks with model-free deep reinforcement learning. We formally define the Delay-Aware Markov Game that incorporates the delays of all agents in the environment. To solve Delay-Aware Markov Games, we apply centralized training and decentralized execution that allows agents to use extra information to ease the non-stationarity issue of the multi-agent systems during training, without the need of a centralized controller during execution. Experiments are conducted in multi-agent particle environments including cooperative communication, cooperative navigation, and competitive experiments. We also test the proposed algorithm in traffic scenarios that require coordination of all autonomous vehicles to show the practical value of delay-awareness. Results show that the proposed delay-aware multi-agent reinforcement learning algorithm greatly alleviates the performance degradation introduced by delay. Codes and demo videos are available at: this https URL.

[1]  M. Lazarevic Finite time stability analysis of PDα fractional control of robotic time-delay systems , 2006 .

[2]  Jun Wang,et al.  Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games , 2017, ArXiv.

[3]  Ming Tan,et al.  Multi-Agent Reinforcement Learning: Independent versus Cooperative Agents , 1997, ICML.

[4]  Nathan van de Wouw,et al.  Lp String Stability of Cascaded Systems: Application to Vehicle Platooning , 2014, IEEE Transactions on Control Systems Technology.

[5]  Pablo Hernandez-Leal,et al.  A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity , 2017, ArXiv.

[6]  Konstantinos V. Katsikopoulos,et al.  Markov decision processes with delays and asynchronous cost collection , 2003, IEEE Trans. Autom. Control..

[7]  R. Happee,et al.  Delay-compensating strategy to enhance string stability of adaptive cruise controlled vehicles , 2018 .

[8]  Wilfrid Perruquetti,et al.  Finite-time stability and stabilization of time-delay systems , 2008, Syst. Control. Lett..

[9]  A. Olbrot,et al.  Finite spectrum assignment problem for systems with delays , 1979 .

[10]  Joshua B. Tenenbaum,et al.  At Human Speed: Deep Reinforcement Learning with Action Delay , 2018, ArXiv.

[11]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[12]  Filippos Christianos,et al.  Dealing with Non-Stationarity in Multi-Agent Deep Reinforcement Learning , 2019, ArXiv.

[13]  Germán Ros,et al.  CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[14]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[15]  Wim Michiels,et al.  Finite spectrum assignment of unstable time-delay systems with a safe implementation , 2003, IEEE Trans. Autom. Control..

[16]  Donald F. Towsley,et al.  Estimation and removal of clock skew from network delay measurements , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[17]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[18]  Thomas J. Walsh,et al.  Learning and planning in environments with delayed feedback , 2009, Autonomous Agents and Multi-Agent Systems.

[19]  Baiming Chen,et al.  Delay-Aware Model-Based Reinforcement Learning for Continuous Control , 2020, ArXiv.

[20]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[21]  Pieter Abbeel,et al.  Emergence of Grounded Compositional Language in Multi-Agent Populations , 2017, AAAI.

[22]  Wotao Yin,et al.  On Unbounded Delays in Asynchronous Parallel Fixed-Point Algorithms , 2016, J. Sci. Comput..

[23]  Michael I. Jordan,et al.  Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.

[24]  Kagan Tumer,et al.  Unifying temporal and structural credit assignment problems , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[25]  Silviu-Iulian Niculescu,et al.  Survey on Recent Results in the Stability and Control of Time-Delay Systems* , 2003 .

[26]  Ashley L. Dunn,et al.  Brake Timing Measurements for a Tractor-Semitrailer Under Emergency Braking , 2009 .

[27]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[28]  Z. Artstein Linear systems with delayed controls: A reduction , 1982 .

[29]  Guillaume J. Laurent,et al.  Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems , 2012, The Knowledge Engineering Review.

[30]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[31]  Rajesh Rajamani,et al.  Vehicle dynamics and control , 2005 .

[32]  Robert Babuska,et al.  Control delay in Reinforcement Learning for real-time dynamic systems: A memoryless approach , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[33]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[34]  Aleksandar Micic,et al.  On the modified Smith predictor for controlling a process with an integrator and long dead-time , 1999, IEEE Trans. Autom. Control..

[35]  Leonid Mirkin,et al.  On the extraction of dead-time controllers from delay-free parametrizations* , 2000 .

[36]  Peng Peng,et al.  Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games , 2017, 1703.10069.

[37]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[38]  C. C. Hang,et al.  A new Smith predictor for controlling a process with an integrator and long dead-time , 1994, IEEE Trans. Autom. Control..

[39]  Matthew E. Taylor,et al.  A survey and critique of multiagent deep reinforcement learning , 2019, Autonomous Agents and Multi-Agent Systems.

[40]  Lili Du,et al.  Constrained optimization and distributed computation based car following control of a connected and autonomous vehicle platoon , 2016 .

[41]  Fawzi Nashashibi,et al.  Real-time crash avoidance system on crossroads based on 802.11 devices and GPS receivers , 2006, 2006 IEEE Intelligent Transportation Systems Conference.

[42]  Chris Pal,et al.  Real-Time Reinforcement Learning , 2019, NeurIPS.

[43]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[44]  Subir Biswas,et al.  Vehicle-to-vehicle wireless communication protocols for enhancing highway traffic safety , 2006, IEEE Communications Magazine.

[45]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[46]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[47]  S. Niculescu Delay Effects on Stability: A Robust Control Approach , 2001 .

[48]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[49]  Roland Siegwart,et al.  Control of a Quadrotor With Reinforcement Learning , 2017, IEEE Robotics and Automation Letters.

[50]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[51]  Patrick M. Pilarski,et al.  Reactive Reinforcement Learning in Asynchronous Environments , 2018, Front. Robot. AI.