Adaptive critic design with graph Laplacian for online learning control of nonlinear systems

SUMMARY In recent years, reinforcement learning (RL) and approximate dynamic programming (ADP) have been widely studied in the community of artificial intelligence and machine learning. As an important class of RL and ADP methods, adaptive critic designs (ACDs) with function approximation have been studied to realize online learning control of nonlinear dynamical systems. However, how to construct efficient feature representations for approximating value functions or policies is still a difficult problem. In this paper, ACDs with graph Laplacian (GL) are proposed by integrating manifold learning methods into feature representations of ACDs. An online learning control algorithm called graph Laplacian dual heuristic programming (GL-DHP) is presented, and its performance is analyzed both theoretically and empirically. Because of the nonlinear approximation ability of feature representation with GL, the GL-DHP method has much better performance than previous DHP methods with manually designed neural networks. Simulation results on learning control of a ball and plate system, which is a typical nonlinear dynamical system with continuous state and action spaces, demonstrate the effectiveness of the GL-DHP method. Copyright © 2012 John Wiley & Sons, Ltd.

[1]  Radhakant Padhi,et al.  Single network adaptive critic design for power system stabilisers , 2009 .

[2]  Huaguang Zhang,et al.  Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.

[3]  Simon X. Yang,et al.  Hierarchical Approximate Policy Iteration With Binary-Tree State Space Decomposition , 2011, IEEE Transactions on Neural Networks.

[4]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[5]  Shalabh Bhatnagar,et al.  Natural actor-critic algorithms , 2009, Autom..

[6]  F.L. Lewis,et al.  Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.

[7]  Chao Lu,et al.  Direct Heuristic Dynamic Programming for Damping Oscillations in a Large Power System , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[8]  Jennie Si,et al.  Helicopter trimming and tracking control using direct neural dynamic programming , 2003, IEEE Trans. Neural Networks.

[9]  Xin Xu,et al.  Kernel-Based Least Squares Policy Iteration for Reinforcement Learning , 2007, IEEE Transactions on Neural Networks.

[10]  Peter L. Bartlett,et al.  Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[11]  Dan Zhang,et al.  An Approximate Dynamic Programming Approach to Network Revenue Management with Customer Choice , 2009, Transp. Sci..

[12]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[13]  Ann B. Lee,et al.  Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[14]  J H Lee,et al.  Approximate dynamic programming approach for process control , 2010, ICCAS 2010.

[15]  Shalabh Bhatnagar,et al.  Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.

[16]  Mikhail Belkin,et al.  Semi-Supervised Learning on Riemannian Manifolds , 2004, Machine Learning.

[17]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[18]  John D. Lafferty,et al.  Diffusion Kernels on Graphs and Other Discrete Input Spaces , 2002, ICML.

[19]  Xi Chen,et al.  Adaptive Critic Design for Energy Minimization of Portable Video Communication Devices , 2010, IEEE Trans. Circuits Syst. Video Technol..

[20]  Bart De Schutter,et al.  Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[21]  S. N. Balakrishnan,et al.  Adaptive-critic based neural networks for aircraft optimal control , 1996 .

[22]  Dewen Hu,et al.  Noisy manifold learning using neighborhood smoothing embedding , 2008, Pattern Recognit. Lett..

[23]  Bart De Schutter,et al.  Approximate dynamic programming with a fuzzy parameterization , 2010, Autom..

[24]  Shuicheng Yan,et al.  Graph Embedding and Extensions: A General Framework for Dimensionality Reduction , 2007 .

[25]  Sridhar Mahadevan,et al.  Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..

[26]  John N. Tsitsiklis,et al.  Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.