论文信息 - Delay-Aware Model-Based Reinforcement Learning for Continuous Control

Delay-Aware Model-Based Reinforcement Learning for Continuous Control

Action delays degrade the performance of reinforcement learning in many real-world systems. This paper proposes a formal definition of delay-aware Markov Decision Process and proves it can be transformed into standard MDP with augmented states using the Markov reward process. We develop a delay-aware model-based reinforcement learning framework that can incorporate the multi-step delay into the learned system models without learning effort. Experiments with the Gym and MuJoCo platforms show that the proposed delay-aware model-based algorithm is more efficient in training and transferable between systems with various durations of delay compared with off-policy model-free reinforcement learning methods. Codes available at: this https URL.

Baiming Chen | Mengdi Xu | Liang Li | Ding Zhao

[1] Konstantinos V. Katsikopoulos,et al. Markov decision processes with delays and asynchronous cost collection , 2003, IEEE Trans. Autom. Control..

[2] Lap-Loi Chung,et al. Time‐delay control of structures , 1995 .

[3] Michael I. Jordan,et al. Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.

[4] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[5] Lili Du,et al. Constrained optimization and distributed computation based car following control of a connected and autonomous vehicle platoon , 2016 .

[6] Sergey Levine,et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.

[7] Ashley L. Dunn,et al. Brake Timing Measurements for a Tractor-Semitrailer Under Emergency Braking , 2009 .

[8] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[9] S. Niculescu. Delay Effects on Stability: A Robust Control Approach , 2001 .

[10] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[11] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[12] Jimmy Ba,et al. Exploring Model-based Planning with Policy Networks , 2019, ICLR.

[13] Wotao Yin,et al. On Unbounded Delays in Asynchronous Parallel Fixed-Point Algorithms , 2016, J. Sci. Comput..

[14] Leonid Mirkin,et al. On the extraction of dead-time controllers from delay-free parametrizations* , 2000 .

[15] Patrick M. Pilarski,et al. Reactive Reinforcement Learning in Asynchronous Environments , 2018, Front. Robot. AI.

[16] Charles Blundell,et al. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[17] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[18] Silviu-Iulian Niculescu,et al. Survey on Recent Results in the Stability and Control of Time-Delay Systems* , 2003 .

[19] Tsuneo Yoshikawa,et al. Ground-space bilateral teleoperation of ETS-VII robot arm by direct bilateral coupling under 7-s time delay condition , 2004, IEEE Transactions on Robotics and Automation.

[20] Robert Babuska,et al. Control delay in Reinforcement Learning for real-time dynamic systems: A memoryless approach , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21] Maolin Jin,et al. Robust Compliant Motion Control of Robot With Nonlinear Friction Using Time-Delay Estimation , 2008, IEEE Transactions on Industrial Electronics.

[22] Nathan van de Wouw,et al. Lp String Stability of Cascaded Systems: Application to Vehicle Platooning , 2014, IEEE Transactions on Control Systems Technology.

[23] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.

[24] Aleksandar Micic,et al. On the modified Smith predictor for controlling a process with an integrator and long dead-time , 1999, IEEE Trans. Autom. Control..

[25] Thomas J. Walsh,et al. Learning and planning in environments with delayed feedback , 2009, Autonomous Agents and Multi-Agent Systems.

[26] Wilfrid Perruquetti,et al. Finite-time stability and stabilization of time-delay systems , 2008, Syst. Control. Lett..

[27] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[28] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[29] Sergey Levine,et al. When to Trust Your Model: Model-Based Policy Optimization , 2019, NeurIPS.

[30] Wim Michiels,et al. Finite spectrum assignment of unstable time-delay systems with a safe implementation , 2003, IEEE Trans. Autom. Control..

[31] Donald F. Towsley,et al. Estimation and removal of clock skew from network delay measurements , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[32] Roland Siegwart,et al. Control of a Quadrotor With Reinforcement Learning , 2017, IEEE Robotics and Automation Letters.

[33] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[34] A. Olbrot,et al. Finite spectrum assignment problem for systems with delays , 1979 .

[35] Chris Pal,et al. Real-Time Reinforcement Learning , 2019, NeurIPS.

[36] Erik I. Verriest,et al. Stability and Control of Time-delay Systems , 1998 .

[37] Z. Artstein. Linear systems with delayed controls: A reduction , 1982 .

[38] C. C. Hang,et al. A new Smith predictor for controlling a process with an integrator and long dead-time , 1994, IEEE Trans. Autom. Control..