Approximate Policy-Based Accelerated Deep Reinforcement Learning

In recent years, the deep reinforcement learning (DRL) algorithms have been developed rapidly and have achieved excellent performance in many challenging tasks. However, due to the complexity of network structure and a large amount of network parameters, the training of deep network is time-consuming, and consequently, the learning efficiency of DRL is limited. In this paper, aiming to speed up the learning process of DRL agent, we propose a novel approximate policy-based accelerated (APA) algorithm from the viewpoint of the error analysis of approximate policy iteration reinforcement learning algorithms. The proposed APA is proven to be convergent even with a more aggressive learning rate, making the DRL agent have a faster learning speed. Furthermore, to combine the accelerated algorithm with deep Q-network (DQN), Double DQN and deep deterministic policy gradient (DDPG), we proposed three novel DRL algorithms: APA-DQN, APA-Double DQN, and APA-DDPG, which demonstrates the adaptability of the accelerated algorithm with DRL algorithms. We have tested the proposed algorithms on both discrete-action and continuous-action tasks. Their superior performance demonstrates their great potential in the practical applications.

[1]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[2]  Chaitali Chakrabarti,et al.  A Deep Q-Learning Approach for Dynamic Management of Heterogeneous Processors , 2019, IEEE Computer Architecture Letters.

[3]  Guohui Tian,et al.  A method for knowledge construction from natural language based on reinforcement learning , 2017, 2017 29th Chinese Control And Decision Conference (CCDC).

[4]  David Budden,et al.  Distributed Prioritized Experience Replay , 2018, ICLR.

[5]  Yishay Mansour,et al.  Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..

[6]  Qiang Yu,et al.  Multisource Transfer Double DQN Based on Actor Learning , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[7]  Jun Tan,et al.  Parameterized Batch Reinforcement Learning for Longitudinal Control of Autonomous Land Vehicles , 2019, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[8]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[9]  Wei Hu,et al.  Exploring Deep Reinforcement Learning with Multi Q-Learning , 2016 .

[10]  Pieter Abbeel,et al.  Accelerated Methods for Deep Reinforcement Learning , 2018, ArXiv.

[11]  Dongbin Zhao,et al.  Deep Reinforcement Learning With Visual Attention for Vehicle Classification , 2017, IEEE Transactions on Cognitive and Developmental Systems.

[12]  Stephen Tyree,et al.  GA3C: GPU-based A3C for Deep Reinforcement Learning , 2016, ArXiv.

[13]  Rémi Munos,et al.  Error Bounds for Approximate Policy Iteration , 2003, ICML.

[14]  Santiago Ontañón,et al.  High-Level Representations for Game-Tree Search in RTS Games , 2014, Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment.

[15]  Marc G. Bellemare,et al.  Increasing the Action Gap: New Operators for Reinforcement Learning , 2015, AAAI.

[16]  Dongbin Zhao,et al.  Cooperative reinforcement learning for multiple units combat in starCraft , 2017, 2017 IEEE Symposium Series on Computational Intelligence (SSCI).

[17]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[18]  Tingwen Huang,et al.  Model-Free Optimal Tracking Control via Critic-Only Q-Learning , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[19]  Hilbert J. Kappen,et al.  Speedy Q-Learning , 2011, NIPS.

[20]  Yunpeng Pan,et al.  Efficient Reinforcement Learning via Probabilistic Trajectory Optimization , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[21]  Yi Zhang,et al.  Human-like Autonomous Vehicle Speed Control by Deep Reinforcement Learning with Double Q-Learning , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[22]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[23]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[24]  Yuval Tassa,et al.  Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.

[25]  Daoyi Dong,et al.  Self-Paced Prioritized Curriculum Learning With Coverage Penalty in Deep Reinforcement Learning , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[26]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[27]  Qiang Liu,et al.  Learning to Explore with Meta-Policy Gradient , 2018, ICML 2018.

[28]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[29]  Xiaofang Zhang,et al.  Averaged-A3C for Asynchronous Deep Reinforcement Learning , 2018, ICONIP.

[30]  Shiji Song,et al.  Plume Tracing via Model-Free Reinforcement Learning Method , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[31]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[32]  Zhen Ni,et al.  A Multistage Game in Smart Grid Security: A Reinforcement Learning Solution , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[33]  Frank L. Lewis,et al.  Off-Policy Interleaved $Q$ -Learning: Optimal Control for Affine Nonlinear Discrete-Time Systems , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[34]  Rui Wang,et al.  Multi-critic DDPG Method and Double Experience Replay , 2018, 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[35]  Wei Xing Zheng,et al.  Optimal Synchronization Control of Multiagent Systems With Input Saturation via Off-Policy Reinforcement Learning , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[36]  Atsushi Ike,et al.  GUNREAL: GPU-accelerated UNsupervised REinforcement and Auxiliary Learning , 2017, 2017 Fifth International Symposium on Computing and Networking (CANDAR).

[37]  Nahum Shimkin,et al.  Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning , 2016, ICML.

[38]  Akanksha Rai Sharma,et al.  Literature survey of statistical, deep and reinforcement learning in natural language processing , 2017, 2017 International Conference on Computing, Communication and Automation (ICCCA).

[39]  Csaba Szepesvári,et al.  Error Propagation for Approximate Policy and Value Iteration , 2010, NIPS.

[40]  Hado van Hasselt,et al.  Double Q-learning , 2010, NIPS.

[41]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.