Delay-Aware Model-Based Reinforcement Learning for Continuous Control

Action delays degrade the performance of reinforcement learning in many real-world systems. This paper proposes a formal definition of delay-aware Markov Decision Process and proves it can be transformed into standard MDP with augmented states using the Markov reward process. We develop a delay-aware model-based reinforcement learning framework that can incorporate the multi-step delay into the learned system models without learning effort. Experiments with the Gym and MuJoCo platforms show that the proposed delay-aware model-based algorithm is more efficient in training and transferable between systems with various durations of delay compared with off-policy model-free reinforcement learning methods. Codes available at: this https URL.

[1]  Konstantinos V. Katsikopoulos,et al.  Markov decision processes with delays and asynchronous cost collection , 2003, IEEE Trans. Autom. Control..

[2]  Lap-Loi Chung,et al.  Time‐delay control of structures , 1995 .

[3]  Michael I. Jordan,et al.  Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.

[4]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[5]  Lili Du,et al.  Constrained optimization and distributed computation based car following control of a connected and autonomous vehicle platoon , 2016 .

[6]  Sergey Levine,et al.  Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.

[7]  Ashley L. Dunn,et al.  Brake Timing Measurements for a Tractor-Semitrailer Under Emergency Braking , 2009 .

[8]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[9]  S. Niculescu Delay Effects on Stability: A Robust Control Approach , 2001 .

[10]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[11]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[12]  Jimmy Ba,et al.  Exploring Model-based Planning with Policy Networks , 2019, ICLR.

[13]  Wotao Yin,et al.  On Unbounded Delays in Asynchronous Parallel Fixed-Point Algorithms , 2016, J. Sci. Comput..

[14]  Leonid Mirkin,et al.  On the extraction of dead-time controllers from delay-free parametrizations* , 2000 .

[15]  Patrick M. Pilarski,et al.  Reactive Reinforcement Learning in Asynchronous Environments , 2018, Front. Robot. AI.

[16]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[17]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[18]  Silviu-Iulian Niculescu,et al.  Survey on Recent Results in the Stability and Control of Time-Delay Systems* , 2003 .

[19]  Tsuneo Yoshikawa,et al.  Ground-space bilateral teleoperation of ETS-VII robot arm by direct bilateral coupling under 7-s time delay condition , 2004, IEEE Transactions on Robotics and Automation.

[20]  Robert Babuska,et al.  Control delay in Reinforcement Learning for real-time dynamic systems: A memoryless approach , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21]  Maolin Jin,et al.  Robust Compliant Motion Control of Robot With Nonlinear Friction Using Time-Delay Estimation , 2008, IEEE Transactions on Industrial Electronics.

[22]  Nathan van de Wouw,et al.  Lp String Stability of Cascaded Systems: Application to Vehicle Platooning , 2014, IEEE Transactions on Control Systems Technology.

[23]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[24]  Aleksandar Micic,et al.  On the modified Smith predictor for controlling a process with an integrator and long dead-time , 1999, IEEE Trans. Autom. Control..

[25]  Thomas J. Walsh,et al.  Learning and planning in environments with delayed feedback , 2009, Autonomous Agents and Multi-Agent Systems.

[26]  Wilfrid Perruquetti,et al.  Finite-time stability and stabilization of time-delay systems , 2008, Syst. Control. Lett..

[27]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[28]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[29]  Sergey Levine,et al.  When to Trust Your Model: Model-Based Policy Optimization , 2019, NeurIPS.

[30]  Wim Michiels,et al.  Finite spectrum assignment of unstable time-delay systems with a safe implementation , 2003, IEEE Trans. Autom. Control..

[31]  Donald F. Towsley,et al.  Estimation and removal of clock skew from network delay measurements , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[32]  Roland Siegwart,et al.  Control of a Quadrotor With Reinforcement Learning , 2017, IEEE Robotics and Automation Letters.

[33]  Benjamin Van Roy,et al.  Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[34]  A. Olbrot,et al.  Finite spectrum assignment problem for systems with delays , 1979 .

[35]  Chris Pal,et al.  Real-Time Reinforcement Learning , 2019, NeurIPS.

[36]  Erik I. Verriest,et al.  Stability and Control of Time-delay Systems , 1998 .

[37]  Z. Artstein Linear systems with delayed controls: A reduction , 1982 .

[38]  C. C. Hang,et al.  A new Smith predictor for controlling a process with an integrator and long dead-time , 1994, IEEE Trans. Autom. Control..