Synthesized Prioritized Data Pruning based Deep Deterministic Policy Gradient Algorithm Improvement

In recent years, reinforcement learning has been widely applied in various fields and has achieved rapid development. Based on the deep deterministic policy gradient (DDPG) algorithm, we propose a synthesized prioritized data Pruning DDPG method. It proposes a solution to the problem of first-in-first-out (FIFO) storage methods and random sampling in the replay buffer: Selecting high-priority samples to the network for training, while removing similar samples in the buffer and retaining some rare samples. The model of the method is programmed with TensorFlow. The proposed method is verified by the Pendulum experiment which is performed on the OpenAI Gym platform. The experimental results indicate that our method can not only achieve similar performances in a shorter training time, but also accelerate the training process and improve the learning stability and long-term memory.

[1]  Yuxi Li,et al.  Deep Reinforcement Learning: An Overview , 2017, ArXiv.

[2]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[3]  David Silver,et al.  Memory-based control with recurrent neural networks , 2015, ArXiv.

[4]  Richard S. Sutton,et al.  A Deeper Look at Experience Replay , 2017, ArXiv.

[5]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[6]  Lovekesh Vig,et al.  Deterministic Policy Gradient Based Robotic Path Planning with Continuous Action Spaces , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[7]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[8]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[9]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[10]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[11]  Robert Babuska,et al.  Experience Replay for Real-Time Reinforcement Learning Control , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[12]  Andrew W. Moore,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[13]  Robert Babuska,et al.  Learning state representation for deep actor-critic control , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[14]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  Chunlin Chen,et al.  A novel DDPG method with prioritized experience replay , 2017, 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[16]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[17]  Longxin Lin Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching , 2004, Machine Learning.

[18]  Irina Kataeva,et al.  Density-Based Data Pruning Method for Deep Reinforcement Learning , 2016, 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA).

[19]  Daniel Paul Romero-Martí,et al.  Navigation and path planning using reinforcement learning for a Roomba robot , 2016, 2016 XVIII Congreso Mexicano de Robotica.