Online learning control based on projected gradient temporal difference and advanced heuristic dynamic programming

We present a novel online learning control algorithm (OLCPA) which comprises projected gradient temporal difference for action-value function (PGTDAVF) and advanced heuristic dynamic programming with one step delay (AHD-POSD). PGTDAVF can guarantee the convergence of temporal difference(TD)-based policy learning with smooth action-value function approximators, such as neural networks. Meanwhile, AHDPOSD is a specially designed framework for embedding PGTDAVF in to conduct online learning control. It not only coincides with the intention of temporal difference but also enables PGTDAVF to be effective under nonidentical policy environment, which results in more practicality. In this way, the proposed algorithms achieve the stability and practicability simultaneously. Finally, simulation of online learning control on a cart pole benchmark demonstrates practical control capability and efficiency of the presented method.

[1]  Frank L. Lewis,et al.  Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2009, 2009 International Joint Conference on Neural Networks.

[2]  F.L. Lewis,et al.  Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.

[3]  Derong Liu,et al.  Adaptive Dynamic Programming for Optimal Tracking Control of Unknown Nonlinear Systems With Application to Coal Gasification , 2014, IEEE Transactions on Automation Science and Engineering.

[4]  Xin Xu,et al.  Reinforcement learning algorithms with function approximation: Recent advances and applications , 2014, Inf. Sci..

[5]  Haibo He,et al.  A three-network architecture for on-line learning and optimization based on adaptive dynamic programming , 2012, Neurocomputing.

[6]  Haibo He,et al.  Adaptive Learning and Control for MIMO System Based on Adaptive Dynamic Programming , 2011, IEEE Transactions on Neural Networks.

[7]  Shalabh Bhatnagar,et al.  Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.

[8]  Sanqing Hu,et al.  MAC protocol identification approach for implement smart cognitive radio , 2012, 2012 IEEE International Conference on Communications (ICC).

[9]  Frank L. Lewis,et al.  Reinforcement learning and optimal adaptive control: An overview and implementation examples , 2012, Annu. Rev. Control..

[10]  Jinyu Wen,et al.  Adaptive Learning in Tracking Control Based on the Dual Critic Network Design , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[11]  Jennie Si,et al.  Online learning control by association and reinforcement , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[12]  Haibo He Self-Adaptive Systems for Machine Intelligence: He/Machine Intelligence , 2011 .

[13]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[14]  Frank L. Lewis,et al.  Adaptive Dynamic Programming for feedback control , 2009, 2009 7th Asian Control Conference.

[15]  Frank L. Lewis,et al.  Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach , 2005, Autom..

[16]  Haibo He Self-Adaptive Systems for Machine Intelligence , 2011 .

[17]  John N. Tsitsiklis,et al.  Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[18]  Warrren B Powell,et al.  A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications , 2011 .

[19]  Xin Xu,et al.  Kernel-Based Representation Policy Iteration with Applications to Optimal Path Tracking of Wheeled Mobile Robots , 2013, IScIDE.

[20]  Huaguang Zhang,et al.  Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.