论文信息 - Reinforcement Learning with Random Delays

Reinforcement Learning with Random Delays

Action and observation delays commonly occur in many Reinforcement Learning applications, such as remote control scenarios. We study the anatomy of randomly delayed environments, and show that partially resampling trajectory fragments in hindsight allows for off-policy multi-step value estimation. We apply this principle to derive Delay-Correcting Actor-Critic (DCAC), an algorithm based on Soft Actor-Critic with significantly better performance in environments with delays. This is shown theoretically and also demonstrated practically on a delay-augmented version of the MuJoCo continuous control benchmark.

Jonathan Binas | Giovanni Beltrame | Christopher Pal | Simon Ramstedt | Yann Bouteiller

[1] Karol Hausman,et al. Thinking While Moving: Deep Reinforcement Learning with Concurrent Control , 2020, ICLR.

[2] Roland Siegwart,et al. Control of a Quadrotor With Reinforcement Learning , 2017, IEEE Robotics and Automation Letters.

[3] Chris Pal,et al. Real-Time Reinforcement Learning , 2019, NeurIPS.

[4] Ming Jiang,et al. Modeling of random delays in networked control systems , 2013 .

[5] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.

[6] Björn Wittenmark,et al. Stochastic Analysis and Control of Real-time Systems with Random Time Delays , 1999 .

[7] David Silver,et al. Learning values across many orders of magnitude , 2016, NIPS.

[8] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[9] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[10] Joshua B. Tenenbaum,et al. At Human Speed: Deep Reinforcement Learning with Action Delay , 2018, ArXiv.

[11] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.

[12] Jérôme Morio,et al. Revising Measurement-Based Probabilistic Timing Analysis , 2017, 2017 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).

[13] Robert Babuska,et al. Control delay in Reinforcement Learning for real-time dynamic systems: A memoryless approach , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[15] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[16] James Bergstra,et al. Setting up a Reinforcement Learning Task with a Real-World Robot , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[17] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[18] Henry Zhu,et al. Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.

[19] Davide Scaramuzza,et al. Super-Human Performance in Gran Turismo Sport Using Deep Reinforcement Learning , 2020, IEEE Robotics and Automation Letters.

[20] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[22] Thomas J. Walsh,et al. Learning and planning in environments with delayed feedback , 2009, Autonomous Agents and Multi-Agent Systems.