Two-Phase Iteration for Value Function Approximation and Hyperparameter Optimization in Gaussian-Kernel-Based Adaptive Critic Design

Adaptive Dynamic Programming (ADP) with critic-actor architecture is an effective way to perform online learning control. To avoid the subjectivity in the design of a neural network that serves as a critic network, kernel-based adaptive critic design (ACD) was developed recently. There are two essential issues for a static kernel-based model: how to determine proper hyperparameters in advance and how to select right samples to describe the value function. They all rely on the assessment of sample values. Based on the theoretical analysis, this paper presents a two-phase simultaneous learning method for a Gaussian-kernel-based critic network. It is able to estimate the values of samples without infinitively revisiting them. And the hyperparameters of the kernel model are optimized simultaneously. Based on the estimated sample values, the sample set can be refined by adding alternatives or deleting redundances. Combining this critic design with actor network, we present a Gaussian-kernel-based Adaptive Dynamic Programming (GK-ADP) approach. Simulations are used to verify its feasibility, particularly the necessity of two-phase learning, the convergence characteristics, and the improvement of the system performance by using a varying sample set.

[1]  Frank L. Lewis,et al.  Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[2]  Haibo He,et al.  Online Learning Control Using Adaptive Critic Designs With Sparse Kernel Machines , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Carl E. Rasmussen,et al.  Gaussian process dynamic programming , 2009, Neurocomputing.

[4]  E. Newport,et al.  Science Current Directions in Psychological Statistical Learning : from Acquiring Specific Items to Forming General Rules on Behalf Of: Association for Psychological Science , 2022 .

[5]  Huaguang Zhang,et al.  Neural-Network-Based Near-Optimal Control for a Class of Discrete-Time Affine Nonlinear Systems With Control Constraints , 2009, IEEE Transactions on Neural Networks.

[6]  Xin Xu,et al.  Reinforcement learning algorithms with function approximation: Recent advances and applications , 2014, Inf. Sci..

[7]  Derong Liu,et al.  Neural-network-based optimal tracking control scheme for a class of unknown discrete-time nonlinear systems using iterative ADP algorithm , 2014, Neurocomputing.

[8]  Huaguang Zhang,et al.  Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.

[9]  Justin A. Boyan,et al.  Least-Squares Temporal Difference Learning , 1999, ICML.

[10]  Alex Smola,et al.  Kernel methods in machine learning , 2007, math/0701907.

[11]  Wang Xue,et al.  Q-learning System Based on Cooperative Least Squares Support Vector Machine , 2009 .

[12]  Michael Rabadi,et al.  Kernel Methods for Machine Learning , 2015 .

[13]  F. Lewis,et al.  Reinforcement Learning and Feedback Control: Using Natural Decision Methods to Design Optimal Adaptive Controllers , 2012, IEEE Control Systems.

[14]  Shie Mannor,et al.  The kernel recursive least-squares algorithm , 2004, IEEE Transactions on Signal Processing.

[15]  Carl E. Rasmussen,et al.  Gaussian Processes in Reinforcement Learning , 2003, NIPS.

[16]  Shie Mannor,et al.  Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning , 2003, ICML.

[17]  Xue-Song Wang,et al.  Q-learning System Based on Cooperative Least Squares Support Vector Machine: Q-learning System Based on Cooperative Least Squares Support Vector Machine , 2009 .

[18]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[19]  Derong Liu,et al.  An iterative adaptive dynamic programming algorithm for optimal control of unknown discrete-time nonlinear systems with constrained inputs , 2013, Inf. Sci..

[20]  Xin Wang,et al.  Batch Value Function Approximation via Support Vectors , 2001, NIPS.

[21]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[22]  Derong Liu,et al.  Discrete-Time Adaptive Dynamic Programming using Wavelet Basis Function Neural Networks , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[23]  Petia Koprinkova-Hristova,et al.  Adaptive Critic Design with Echo State Network , 2010, 2010 IEEE International Conference on Systems, Man and Cybernetics.

[24]  Xin Xu,et al.  Kernel Least-Squares Temporal Difference Learning , 2006 .

[25]  James E. Steck,et al.  Adaptive Feedback Control by Constrained Approximate Dynamic Programming , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[26]  Qinglai Wei,et al.  Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming , 2012, Autom..

[27]  Shie Mannor,et al.  Reinforcement learning with Gaussian processes , 2005, ICML.

[28]  Marco F. Huber Recursive Gaussian process: On-line regression and learning , 2014, Pattern Recognit. Lett..

[29]  Frank L. Lewis,et al.  2009 Special Issue: Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems , 2009 .

[30]  Haibo He,et al.  Kernel-Based Approximate Dynamic Programming for Real-Time Online Learning Control: An Experimental Study , 2014, IEEE Transactions on Control Systems Technology.

[31]  Roderick Murray-Smith,et al.  Gaussian Process priors with Uncertain Inputs: Multiple-Step-Ahead Prediction , 2002 .

[32]  Luigi Fortuna,et al.  Reinforcement Learning and Adaptive Dynamic Programming for Feedback Control , 2009 .