Policy Approximation in Policy Iteration Approximate Dynamic Programming for Discrete-Time Nonlinear Systems

Policy iteration approximate dynamic programming (DP) is an important algorithm for solving optimal decision and control problems. In this paper, we focus on the problem associated with policy approximation in policy iteration approximate DP for discrete-time nonlinear systems using infinite-horizon undiscounted value functions. Taking policy approximation error into account, we demonstrate asymptotic stability of the control policy under our problem setting, show boundedness of the value function during each policy iteration step, and introduce a new sufficient condition for the value function to converge to a bounded neighborhood of the optimal value function. Aiming for practical implementation of an approximate policy, we consider using Volterra series, which has been extensively covered in controls literature for its good theoretical properties and for its success in practical applications. We illustrate the effectiveness of the main ideas developed in this paper using several examples including a practical problem of excitation control of a hydrogenerator.

[1]  Randal W. Beard,et al.  Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation , 1997, Autom..

[2]  Frank L. Lewis,et al.  Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[3]  Feng Liu,et al.  Error bound analysis of policy iteration based approximate dynamic programming for deterministic discrete-time nonlinear systems , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[4]  Jennie Si,et al.  Helicopter Flight-Control Reconfiguration for Main Rotor Actuator Failures , 2003 .

[5]  Greg Foderaro,et al.  A model-based approximate λ-policy iteration approach to online evasive path planning and the video game Ms. Pac-Man , 2011 .

[6]  Nicholas Kalouptsidis,et al.  Second-order Volterra system identification , 2000, IEEE Trans. Signal Process..

[7]  Robert F. Stengel,et al.  Online Adaptive Critic Flight Control , 2004 .

[8]  Jennie Si,et al.  Helicopter trimming and tracking control using direct neural dynamic programming , 2003, IEEE Trans. Neural Networks.

[9]  Sarangapani Jagannathan,et al.  Online Optimal Control of Affine Nonlinear Discrete-Time Systems With Unknown Internal Dynamics by Using Time-Based Policy Update , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[10]  M. Goto,et al.  Decentralised nonlinear H/sub /spl infin// excitation control based on regulation linearisation , 2000 .

[11]  Feng Liu,et al.  Policy iteration approximate dynamic programming using Volterra series based actor , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[12]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[13]  Frank L. Lewis,et al.  Adaptive Optimal Control of Unknown Constrained-Input Systems Using Policy Iteration and Neural Networks , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[14]  Feng Liu,et al.  Approximate dynamic programming based supplementary reactive power control for DFIG wind farm to enhance power system stability , 2015, Neurocomputing.

[15]  Frank L. Lewis,et al.  Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach , 2005, Autom..

[16]  John N. Tsitsiklis,et al.  Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[17]  James E. Steck,et al.  Adaptive Feedback Control by Constrained Approximate Dynamic Programming , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[18]  Derong Liu,et al.  Policy Iteration Adaptive Dynamic Programming Algorithm for Discrete-Time Nonlinear Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[19]  M. Schetzen The Volterra and Wiener Theories of Nonlinear Systems , 1980 .

[20]  Ronald K. Pearson,et al.  Identification of structurally constrained second-order Volterra models , 1996, IEEE Trans. Signal Process..

[21]  Robert Babuska,et al.  A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[22]  Jennie Si,et al.  Apache Helicopter Stabilization Using Neural Dynamic Programming , 2002 .

[23]  Robert E. Hampson,et al.  Nonlinear Dynamic Modeling of Spike Train Transformations for Hippocampal-Cortical Prostheses , 2007, IEEE Transactions on Biomedical Engineering.

[24]  Huaguang Zhang,et al.  Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.

[25]  Jennie Si,et al.  The best approximation to C2 functions and its error bounds using regular-center Gaussian networks , 1994, IEEE Trans. Neural Networks.

[26]  Chaoxu Mu,et al.  Intelligent Critic Control with Disturbance Attenuation for a Micro-Grid System , 2019 .

[27]  Derong Liu,et al.  Finite-Approximation-Error-Based Discrete-Time Iterative Adaptive Dynamic Programming , 2014, IEEE Transactions on Cybernetics.

[28]  Haibo He,et al.  A three-network architecture for on-line learning and optimization based on adaptive dynamic programming , 2012, Neurocomputing.

[29]  Luigi Fortuna,et al.  Reinforcement Learning and Adaptive Dynamic Programming for Feedback Control , 2009 .

[30]  Haibo He,et al.  Online Learning Control Using Adaptive Critic Designs With Sparse Kernel Machines , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[31]  Jennie Si,et al.  Online learning control by association and reinforcement , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[32]  Feng Liu,et al.  Online Supplementary ADP Learning Controller Design and Application to Power System Frequency Control With Large-Scale Wind Energy Integration , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[33]  S. Mei,et al.  Field experiments of NR-PSS for large synchronous generators on a 300MW machine in Baishan Hydro Plant , 2007 .

[34]  Matthieu Geist,et al.  Algorithmic Survey of Parametric Value Function Approximation , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[35]  Vasilios N. Katsikis,et al.  An improved method for the computation of the Moore-Penrose inverse matrix , 2011, Appl. Math. Comput..

[36]  Huaguang Zhang,et al.  A Novel Infinite-Time Optimal Tracking Control Scheme for a Class of Discrete-Time Nonlinear Systems via the Greedy HDP Iteration Algorithm , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[37]  Derong Liu,et al.  Finite-Approximation-Error-Based Optimal Control Approach for Discrete-Time Nonlinear Systems , 2013, IEEE Transactions on Cybernetics.

[38]  George G. Lendaris,et al.  Adaptive dynamic programming , 2002, IEEE Trans. Syst. Man Cybern. Part C.

[39]  Xin Xu,et al.  Kernel-Based Least Squares Policy Iteration for Reinforcement Learning , 2007, IEEE Transactions on Neural Networks.

[40]  F. Lewis,et al.  Reinforcement Learning and Feedback Control: Using Natural Decision Methods to Design Optimal Adaptive Controllers , 2012, IEEE Control Systems.

[41]  R. W. Miksad,et al.  Adaptive second-order Volterra filtering and its application to second-order drift phenomena , 1994 .

[42]  Lei Yang,et al.  A performance gradient perspective on gradient-based policy iteration and a modified value iteration , 2008, Int. J. Intell. Comput. Cybern..

[43]  Jennie Si,et al.  Approximate Robust Policy Iteration Using Multilayer Perceptron Neural Networks for Discounted Infinite-Horizon Markov Decision Processes With Uncertain Correlated Transition Matrices , 2010, IEEE Transactions on Neural Networks.

[44]  Dirk Ormoneit,et al.  Kernel-Based Reinforcement Learning , 2017, Encyclopedia of Machine Learning and Data Mining.

[45]  Mariesa L. Crow,et al.  Zero-sum two-player game theoretic formulation of affine nonlinear discrete-time systems using neural networks , 2010, IJCNN.

[46]  D. Liu,et al.  Adaptive Dynamic Programming for Finite-Horizon Optimal Control of Discrete-Time Nonlinear Systems With $\varepsilon$-Error Bound , 2011, IEEE Transactions on Neural Networks.

[47]  R. Bellman,et al.  Dynamic Programming and Markov Processes , 1960 .

[48]  Bo Lincoln,et al.  Relaxing dynamic programming , 2006, IEEE Transactions on Automatic Control.

[49]  Madan Gopal,et al.  SVM-Based Tree-Type Neural Networks as a Critic in Adaptive Critic Designs for Control , 2007, IEEE Transactions on Neural Networks.

[50]  A. Krener,et al.  The existence and uniqueness of volterra series for nonlinear systems , 1977, 1977 IEEE Conference on Decision and Control including the 16th Symposium on Adaptive Processes and A Special Symposium on Fuzzy Set Theory and Applications.

[51]  Haibo He,et al.  Air-Breathing Hypersonic Vehicle Tracking Control Based on Adaptive Dynamic Programming , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[52]  O. P. Malik,et al.  An adaptive generator excitation controller based on linear optimal control , 1990 .

[53]  Chao Lu,et al.  Direct Heuristic Dynamic Programming for Damping Oscillations in a Large Power System , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).