Policy gradient methods with Gaussian process modelling acceleration

Policy gradient algorithm is often used to deal with the continuous control problems. But as a model-free algorithm, it suffers from the low data efficiency and long learning phase. In this paper, a policy gradient with Gaussian process modelling (PGGPM) algorithm is proposed to accelerate learning process. The system model is approximated by Gaussian process in an incremental way, which is used to explore state action space virtually by generating imaginary samples. Both the real and imaginary samples are used to train the actor and critic networks. Finally, we apply our algorithm to two experiments to verify that Gaussian process can accurately fit system model and the supplementary imaginary samples can speed up the learning phase.

[1]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[2]  Miguel Lázaro-Gredilla,et al.  Kernel Recursive Least-Squares Tracker for Time-Varying Regression , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Tara N. Sainath,et al.  FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[4]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[5]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[6]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[7]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[8]  Qichao Zhang,et al.  Event-Triggered H ∞ Control for Continuous-Time Nonlinear System , 2015, ISNN.

[9]  Shie Mannor,et al.  Reinforcement learning with Gaussian processes , 2005, ICML.

[10]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[11]  Ben Tse,et al.  Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[12]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[15]  Dongbin Zhao,et al.  Deep Reinforcement Learning With Visual Attention for Vehicle Classification , 2017, IEEE Transactions on Cognitive and Developmental Systems.

[16]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Qichao Zhang,et al.  Experience Replay for Optimal Control of Nonzero-Sum Game Systems With Unknown Dynamics , 2016, IEEE Transactions on Cybernetics.

[18]  Weifeng Liu,et al.  Extended Kernel Recursive Least Squares Algorithm , 2009, IEEE Transactions on Signal Processing.

[19]  Shie Mannor,et al.  The kernel recursive least-squares algorithm , 2004, IEEE Transactions on Signal Processing.

[20]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[21]  Dongbin Zhao,et al.  Online reinforcement learning control by Bayesian inference , 2016 .