Adaptive Dynamic Programming for Finite-Horizon Optimal Control of Discrete-Time Nonlinear Systems With $\varepsilon$-Error Bound

In this paper, we study the finite-horizon optimal control problem for discrete-time nonlinear systems using the adaptive dynamic programming (ADP) approach. The idea is to use an iterative ADP algorithm to obtain the optimal control law which makes the performance index function close to the greatest lower bound of all performance indices within an -error bound. The optimal number of control steps can also be obtained by the proposed ADP algorithms. A convergence analysis of the proposed ADP algorithms in terms of performance index function and control policy is made. In order to facilitate the implementation of the iterative ADP algorithms, neural networks are used for approximating the performance index function, computing the optimal control policy, and modeling the nonlinear system. Finally, two simulation examples are employed to illustrate the applicability of the proposed method.

[1]  P.J. Werbos,et al.  Using ADP to Understand and Replicate Brain Intelligence: the Next Level Design , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[2]  E. Gilbert,et al.  Optimal infinite-horizon control and the stabilization of linear discrete-time systems: State-control constraints and non-quadratic cost functions , 1986, 1985 24th IEEE Conference on Decision and Control.

[3]  Huaguang Zhang,et al.  Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.

[4]  H. Kang,et al.  Optimal control of nonlinear stochastic systems , 1971 .

[5]  G. Saridis,et al.  On Successive Approximation of Optimal Control of Stochastic Dynamic Systems , 2005 .

[6]  G. Saridis,et al.  Suboptimal control for nonlinear stochastic systems , 1992, [1992] Proceedings of the 31st IEEE Conference on Decision and Control.

[7]  George M. Siouris,et al.  Applied Optimal Control: Optimization, Estimation, and Control , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[8]  Jennie Si,et al.  Helicopter trimming and tracking control using direct neural dynamic programming , 2003, IEEE Trans. Neural Networks.

[9]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[10]  Frank L. Lewis,et al.  Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach , 2005, Autom..

[11]  Thomas Parisini,et al.  Neural approximations for infinite-horizon optimal control of nonlinear stochastic systems , 1998, IEEE Trans. Neural Networks.

[12]  Christos G. Cassandras,et al.  Optimal Control of Multi-Stage Discrete Event Systems With Real-Time Constraints , 2009, IEEE Transactions on Automatic Control.

[13]  Paul J. Werbos,et al.  Approximate dynamic programming for real-time control and neural modeling , 1992 .

[14]  Alexander Zadorojniy,et al.  Robustness of policies in constrained Markov decision processes , 2006, IEEE Transactions on Automatic Control.

[15]  O. L. V. Costa,et al.  Finite horizon quadratic optimal control and a separation principle for Markovian jump linear systems , 2003, IEEE Trans. Autom. Control..

[16]  Jennie Si,et al.  Online learning control by association and reinforcement , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[17]  Gang Feng,et al.  Robust Filtering With Randomly Varying Sensor Delay: The Finite-Horizon Case , 2009, IEEE Transactions on Circuits and Systems I: Regular Papers.

[18]  Paul J. Werbos,et al.  2009 Special Issue: Intelligence in the brain: A theory of how it works and how to build it , 2009 .

[19]  Frank L. Lewis,et al.  Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[20]  S. N. Balakrishnan,et al.  Adaptive-critic based neural networks for aircraft optimal control , 1996 .

[21]  George G. Lendaris,et al.  Intelligent supply chain management using adaptive critic learning , 2003, IEEE Trans. Syst. Man Cybern. Part A.

[22]  Lyle Noakes,et al.  Continuous-Time Adaptive Critics , 2007, IEEE Transactions on Neural Networks.

[23]  Yi Zhang,et al.  A self-learning call admission control scheme for CDMA cellular networks , 2005, IEEE Transactions on Neural Networks.

[24]  Hiroyuki Ichihara Optimal Control for Polynomial Systems Using Matrix Sum of Squares Relaxations , 2009, IEEE Transactions on Automatic Control.

[25]  F.L. Lewis,et al.  Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.

[26]  J. Si,et al.  Robust Dynamic Programming for Discounted Infinite-Horizon Markov Decision Processes with Uncertain Stationary Transition Matrice , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[27]  Alberto Bemporad,et al.  Optimal control of continuous-time switched affine systems , 2006, IEEE Transactions on Automatic Control.

[28]  Paul J. Webros A menu of designs for reinforcement learning over time , 1990 .

[29]  Ichiro Hagiwara,et al.  An Optimal Control Method Based on the Energy Flow Equation , 2009, IEEE Transactions on Control Systems Technology.

[30]  Donald A. Sofge,et al.  Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches , 1992 .

[31]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[32]  Derong Liu,et al.  Action-dependent adaptive critic designs , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[33]  Huaguang Zhang,et al.  Neural-Network-Based Near-Optimal Control for a Class of Discrete-Time Affine Nonlinear Systems With Control Constraints , 2009, IEEE Transactions on Neural Networks.

[34]  Derong Liu,et al.  Discrete-Time Adaptive Dynamic Programming using Wavelet Basis Function Neural Networks , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[35]  A. Banerjee Convex Analysis and Optimization , 2006 .

[36]  Huaguang Zhang,et al.  A Neural Dynamic Programming Approach F or Learning Control O f Failure Avoidance Problems , 2005 .

[37]  Edward S. Plumer,et al.  Optimal control of terminal processes using neural networks , 1996, IEEE Trans. Neural Networks.

[38]  Eric C. Kerrigan,et al.  Control of Constrained Discrete-Time Systems With Bounded $\ell_{2}$ Gain , 2009, IEEE Transactions on Automatic Control.

[39]  Kemin Zhou,et al.  Mixed /spl Hscr//sub 2/ and /spl Hscr//sub /spl infin// performance objectives. II. Optimal control , 1994 .

[40]  Huaguang Zhang,et al.  Model-free multiobjective approximate dynamic programming for discrete-time nonlinear systems with general performance index functions , 2009, Neurocomputing.

[41]  R. Bellman Dynamic programming. , 1957, Science.

[42]  Chao Lu,et al.  Direct Heuristic Dynamic Programming for Damping Oscillations in a Large Power System , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[43]  Ah-Hwee Tan,et al.  Integrating Temporal Difference Methods and Self-Organizing Neural Networks for Reinforcement Learning With Delayed Evaluative Feedback , 2008, IEEE Transactions on Neural Networks.

[44]  Bart De Schutter,et al.  Finite-Horizon Min–Max Control of Max-Plus-Linear Systems , 2007, IEEE Transactions on Automatic Control.

[45]  Guang-Ren Duan,et al.  $H_{\infty}$ Control of Discrete-Time Systems With Multiple Input Delays , 2007, IEEE Transactions on Automatic Control.

[46]  Huaguang Zhang,et al.  A Novel Infinite-Time Optimal Tracking Control Scheme for a Class of Discrete-Time Nonlinear Systems via the Greedy HDP Iteration Algorithm , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[47]  Engin Yaz Infinite horizon quadratic optimal control of a class of nonlinear stochastic systems , 1988, Proceedings of the 27th IEEE Conference on Decision and Control.

[48]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[49]  Frank L. Lewis,et al.  Adaptive Critic Designs for Discrete-Time Zero-Sum Games With Application to $H_{\infty}$ Control , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[50]  George G. Lendaris,et al.  A retrospective on Adaptive Dynamic Programming for control , 2009, 2009 International Joint Conference on Neural Networks.

[51]  George G. Lendaris,et al.  Adaptive dynamic programming , 2002, IEEE Trans. Syst. Man Cybern. Part C.

[52]  Wook Hyun Kwon,et al.  Receding Horizon Controls for Input-Delayed Systems , 2008, IEEE Transactions on Automatic Control.

[53]  Elena Zattoni Structural Invariant Subspaces of Singular Hamiltonian Systems and Nonrecursive Solutions of Finite-Horizon Optimal Control Problems , 2008, IEEE Transactions on Automatic Control.

[54]  Richard S. Sutton,et al.  A Menu of Designs for Reinforcement Learning Over Time , 1995 .

[55]  Harold Chestnut The International Federation of Automatic Control , 1960 .

[56]  C. Mademlis,et al.  A Unified Approach for Four-Quadrant Optimal Controlled Switched Reluctance Machine Drives With Smooth Transition Between Control Operations , 2009, IEEE Transactions on Power Electronics.

[57]  Brian C. Williams,et al.  Active Estimation for Jump Markov Linear Systems , 2008, IEEE Transactions on Automatic Control.

[58]  S. N. Balakrishnan,et al.  State-constrained agile missile control with adaptive-critic-based neural networks , 2002, IEEE Trans. Control. Syst. Technol..

[59]  James E. Steck,et al.  Adaptive Feedback Control by Constrained Approximate Dynamic Programming , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[60]  Kenko Uchida,et al.  Finite horizon H ∞ control problems with terminal penalties , 1990 .

[61]  K. Uchida,et al.  Finite horizon H/sup infinity / control problems with terminal penalties , 1992 .

[62]  Stephen P. Banks,et al.  Nonlinear optimal tracking control with application to super-tankers for autopilot design , 2004, Autom..