论文信息 - Policy iteration approximate dynamic programming using Volterra series based actor

Policy iteration approximate dynamic programming using Volterra series based actor

There is an extensive literature on value function approximation for approximate dynamic programming (ADP). Multilayer perceptrons (MLPs) and radial basis functions (RBFs), among others, are typical approximators for value functions in ADP. Similar approaches have been taken for policy approximation. In this paper, we propose a new Volterra series based structure for actor approximation in ADP. The Volterra approx-imator is linear in parameters with global optima attainable. Given the proposed approximator structures, we further develop a policy iteration framework under which a gradient descent training algorithm for obtaining the optimal Volterra kernels can be obtained. Associated with this ADP design, we provide a sufficient condition based on actor approximation error to guarantee convergence of the value function iterations. A finite bound of the final convergent value function is also given. Finally, by using a simulation example we illustrate the effectiveness of the proposed Volterra actor for optimal control of a nonlinear system.

[1] Derong Liu,et al. Policy Iteration Adaptive Dynamic Programming Algorithm for Discrete-Time Nonlinear Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[2] Robert E. Hampson,et al. Nonlinear Dynamic Modeling of Spike Train Transformations for Hippocampal-Cortical Prostheses , 2007, IEEE Transactions on Biomedical Engineering.

[3] Randal W. Beard,et al. Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation , 1997, Autom..

[4] Haibo He,et al. Online Learning Control Using Adaptive Critic Designs With Sparse Kernel Machines , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[5] Marcello Sanguineti,et al. Dynamic Programming and Value-Function Approximation in Sequential Decision Problems: Error Analysis and Numerical Results , 2012, Journal of Optimization Theory and Applications.

[6] R. de Figueiredo. The Volterra and Wiener theories of nonlinear systems , 1982, Proceedings of the IEEE.

[7] Frank L. Lewis,et al. Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach , 2005, Autom..

[8] Matthieu Geist,et al. Algorithmic Survey of Parametric Value Function Approximation , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[9] Nicholas Kalouptsidis,et al. Second-order Volterra system identification , 2000, IEEE Trans. Signal Process..

[10] Vasilios N. Katsikis,et al. An improved method for the computation of the Moore-Penrose inverse matrix , 2011, Appl. Math. Comput..

[11] Jennie Si,et al. The best approximation to C/sup 2/ functions and its error bounds using regular-center Gaussian networks , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[12] Haibo He,et al. A three-network architecture for on-line learning and optimization based on adaptive dynamic programming , 2012, Neurocomputing.

[13] Ronald K. Pearson,et al. Identification of structurally constrained second-order Volterra models , 1996, IEEE Trans. Signal Process..

[14] Dimitri P. Bertsekas,et al. Abstract Dynamic Programming , 2013 .

[15] Huaguang Zhang,et al. Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.

[16] Jennie Si,et al. The best approximation to C2 functions and its error bounds using regular-center Gaussian networks , 1994, IEEE Trans. Neural Networks.

[17] Jennie Si,et al. Online learning control by association and reinforcement. , 2001, IEEE transactions on neural networks.

[18] Derong Liu,et al. Finite-Approximation-Error-Based Optimal Control Approach for Discrete-Time Nonlinear Systems , 2013, IEEE Transactions on Cybernetics.

[19] George G. Lendaris,et al. Adaptive dynamic programming , 2002, IEEE Trans. Syst. Man Cybern. Part C.

[20] F. Lewis,et al. Reinforcement Learning and Feedback Control: Using Natural Decision Methods to Design Optimal Adaptive Controllers , 2012, IEEE Control Systems.

[21] R. W. Miksad,et al. Adaptive second-order Volterra filtering and its application to second-order drift phenomena , 1994 .

[22] Cristiano Cervellera,et al. Low-discrepancy sampling for approximate dynamic programming with local approximators , 2014, Comput. Oper. Res..

[23] F.L. Lewis,et al. Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.

[24] Warren B. Powell,et al. Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[25] Frank L. Lewis,et al. Adaptive Optimal Control of Unknown Constrained-Input Systems Using Policy Iteration and Neural Networks , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[26] Madan Gopal,et al. SVM-Based Tree-Type Neural Networks as a Critic in Adaptive Critic Designs for Control , 2007, IEEE Transactions on Neural Networks.

[27] A. Krener,et al. The existence and uniqueness of volterra series for nonlinear systems , 1977, 1977 IEEE Conference on Decision and Control including the 16th Symposium on Adaptive Processes and A Special Symposium on Fuzzy Set Theory and Applications.

[28] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[29] D. Liu,et al. Adaptive Dynamic Programming for Finite-Horizon Optimal Control of Discrete-Time Nonlinear Systems With $\varepsilon$-Error Bound , 2011, IEEE Transactions on Neural Networks.

[30] A. Barto,et al. LEARNING AND APPROXIMATE DYNAMIC PROGRAMMING Scaling Up to the Real World , 2003 .

[31] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .

[32] Robert Babuska,et al. A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[33] Huaguang Zhang,et al. A Novel Infinite-Time Optimal Tracking Control Scheme for a Class of Discrete-Time Nonlinear Systems via the Greedy HDP Iteration Algorithm , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[34] Sarangapani Jagannathan,et al. Online Optimal Control of Affine Nonlinear Discrete-Time Systems With Unknown Internal Dynamics by Using Time-Based Policy Update , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[35] Frank L. Lewis,et al. Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).