Delay-Aware Multi-Agent Reinforcement Learning

Action and observation delays exist prevalently in the real-world cyber-physical systems which may pose challenges in reinforcement learning design. It is particularly an arduous task when handling multi-agent systems where the delay of one agent could spread to other agents. To resolve this problem, this paper proposes a novel framework to deal with delays as well as the non-stationary training issue of multi-agent tasks with model-free deep reinforcement learning. We formally define the Delay-Aware Markov Game that incorporates the delays of all agents in the environment. To solve Delay-Aware Markov Games, we apply centralized training and decentralized execution that allows agents to use extra information to ease the non-stationary issue of the multi-agent systems during training, without the need of a centralized controller during execution. Experiments are conducted in multi-agent particle environments including cooperative communication, cooperative navigation, and competitive experiments. We also test the proposed algorithm in traffic scenarios that require coordination of all autonomous vehicles to show the practical value of delay-awareness. Results show that the proposed delay-aware multi-agent reinforcement learning algorithm greatly alleviates the performance degradation introduced by delay. Codes available at: this https URL.

[1]  A. Olbrot,et al.  Finite spectrum assignment problem for systems with delays , 1979 .

[2]  Guillaume J. Laurent,et al.  Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems , 2012, The Knowledge Engineering Review.

[3]  Joshua B. Tenenbaum,et al.  At Human Speed: Deep Reinforcement Learning with Action Delay , 2018, ArXiv.

[4]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[5]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[6]  Robert Babuska,et al.  Control delay in Reinforcement Learning for real-time dynamic systems: A memoryless approach , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[8]  Nathan van de Wouw,et al.  Lp String Stability of Cascaded Systems: Application to Vehicle Platooning , 2014, IEEE Transactions on Control Systems Technology.

[9]  Pablo Hernandez-Leal,et al.  A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity , 2017, ArXiv.

[10]  Pieter Abbeel,et al.  Emergence of Grounded Compositional Language in Multi-Agent Populations , 2017, AAAI.

[11]  Rajesh Rajamani,et al.  Vehicle dynamics and control , 2005 .

[12]  Aleksandar Micic,et al.  On the modified Smith predictor for controlling a process with an integrator and long dead-time , 1999, IEEE Trans. Autom. Control..

[13]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[14]  Konstantinos V. Katsikopoulos,et al.  Markov decision processes with delays and asynchronous cost collection , 2003, IEEE Trans. Autom. Control..

[15]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16]  Subir Biswas,et al.  Vehicle-to-vehicle wireless communication protocols for enhancing highway traffic safety , 2006, IEEE Communications Magazine.

[17]  Michael I. Jordan,et al.  Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.

[18]  Roland Siegwart,et al.  Control of a Quadrotor With Reinforcement Learning , 2017, IEEE Robotics and Automation Letters.

[19]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[20]  Wim Michiels,et al.  Finite spectrum assignment of unstable time-delay systems with a safe implementation , 2003, IEEE Trans. Autom. Control..

[21]  Donald F. Towsley,et al.  Estimation and removal of clock skew from network delay measurements , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[22]  Germán Ros,et al.  CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[23]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[24]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[25]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[26]  Matthew E. Taylor,et al.  A survey and critique of multiagent deep reinforcement learning , 2019, Autonomous Agents and Multi-Agent Systems.

[27]  Lili Du,et al.  Constrained optimization and distributed computation based car following control of a connected and autonomous vehicle platoon , 2016 .

[28]  Kagan Tumer,et al.  Unifying temporal and structural credit assignment problems , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[29]  Fawzi Nashashibi,et al.  Real-time crash avoidance system on crossroads based on 802.11 devices and GPS receivers , 2006, 2006 IEEE Intelligent Transportation Systems Conference.

[30]  Jun Wang,et al.  Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games , 2017, ArXiv.

[31]  Filippos Christianos,et al.  Dealing with Non-Stationarity in Multi-Agent Deep Reinforcement Learning , 2019, ArXiv.

[32]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[33]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[34]  Ashley L. Dunn,et al.  Brake Timing Measurements for a Tractor-Semitrailer Under Emergency Braking , 2009 .

[35]  R. Happee,et al.  Delay-compensating strategy to enhance string stability of adaptive cruise controlled vehicles , 2018 .

[36]  Wilfrid Perruquetti,et al.  Finite-time stability and stabilization of time-delay systems , 2008, Syst. Control. Lett..

[37]  S. Niculescu Delay Effects on Stability: A Robust Control Approach , 2001 .

[38]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[39]  Z. Artstein Linear systems with delayed controls: A reduction , 1982 .

[40]  C. C. Hang,et al.  A new Smith predictor for controlling a process with an integrator and long dead-time , 1994, IEEE Trans. Autom. Control..

[41]  Chris Pal,et al.  Real-Time Reinforcement Learning , 2019, NeurIPS.

[42]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[43]  Thomas J. Walsh,et al.  Learning and planning in environments with delayed feedback , 2009, Autonomous Agents and Multi-Agent Systems.

[44]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[45]  Silviu-Iulian Niculescu,et al.  Survey on Recent Results in the Stability and Control of Time-Delay Systems* , 2003 .

[46]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[47]  Patrick M. Pilarski,et al.  Reactive Reinforcement Learning in Asynchronous Environments , 2018, Front. Robot. AI.

[48]  Wotao Yin,et al.  On Unbounded Delays in Asynchronous Parallel Fixed-Point Algorithms , 2016, J. Sci. Comput..

[49]  Leonid Mirkin,et al.  On the extraction of dead-time controllers from delay-free parametrizations* , 2000 .