Neural-Network-Based Near-Optimal Control for a Class of Discrete-Time Affine Nonlinear Systems With Control Constraints

In this paper, the near-optimal control problem for a class of nonlinear discrete-time systems with control constraints is solved by iterative adaptive dynamic programming algorithm. First, a novel nonquadratic performance functional is introduced to overcome the control constraints, and then an iterative adaptive dynamic programming algorithm is developed to solve the optimal feedback control problem of the original constrained system with convergence analysis. In the present control scheme, there are three neural networks used as parametric structures for facilitating the implementation of the iterative algorithm. Two examples are given to demonstrate the convergence and feasibility of the proposed optimal control scheme.

[1]  R. Bellman Dynamic programming. , 1957, Science.

[2]  H. Kang,et al.  Optimal control of nonlinear stochastic systems , 1971 .

[3]  Bernard Widrow,et al.  Punish/Reward: Learning with a Critic in Adaptive Threshold Systems , 1973, IEEE Trans. Syst. Man Cybern..

[4]  George N. Saridis,et al.  An Approximation Theory of Optimal Control for Trainable Manipulators , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[5]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[6]  G. Saridis,et al.  Suboptimal control for nonlinear stochastic systems , 1992, [1992] Proceedings of the 31st IEEE Conference on Decision and Control.

[7]  Donald A. Sofge,et al.  Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches , 1992 .

[8]  Eduardo Sontag,et al.  A general result on the stabilization of linear systems using bounded controls , 1994, IEEE Trans. Autom. Control..

[9]  Richard S. Sutton,et al.  A Menu of Designs for Reinforcement Learning Over Time , 1995 .

[10]  Andrew R. Teel,et al.  Control of linear systems with saturating actuators , 1995, Proceedings of 1995 American Control Conference - ACC'95.

[11]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[12]  D. Bernstein Optimal nonlinear, but continuous, feedback control of systems with saturating actuators , 1995 .

[13]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[14]  M. Bardi,et al.  Optimal Control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations , 1997 .

[15]  George G. Lendaris,et al.  Training strategies for critic and action neural networks in dual heuristic programming method , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[16]  S. Lyshevski Optimal control of nonlinear continuous-time systems: design of bounded controllers via generalized nonquadratic functionals , 1998, Proceedings of the 1998 American Control Conference. ACC (IEEE Cat. No.98CH36207).

[17]  S. Lyshevski Nonlinear discrete-time systems: constrained optimization and application of nonquadratic costs , 1998, Proceedings of the 1998 American Control Conference. ACC (IEEE Cat. No.98CH36207).

[18]  Jennie Si,et al.  Online learning control by association and reinforcement. , 2001, IEEE transactions on neural networks.

[19]  Derong Liu,et al.  Action-dependent adaptive critic designs , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[20]  S. Lyshevski Optimization of dynamic systems using novel performance functionals , 2002, Proceedings of the 41st IEEE Conference on Decision and Control, 2002..

[21]  George G. Lendaris,et al.  Adaptive dynamic programming , 2002, IEEE Trans. Syst. Man Cybern. Part C.

[22]  Jennie Si,et al.  Helicopter trimming and tracking control using direct neural dynamic programming , 2003, IEEE Trans. Neural Networks.

[23]  Jeff G. Schneider,et al.  Policy Search by Dynamic Programming , 2003, NIPS.

[24]  George G. Lendaris,et al.  Intelligent supply chain management using adaptive critic learning , 2003, IEEE Trans. Syst. Man Cybern. Part A.

[25]  Robert F. Stengel,et al.  Online Adaptive Critic Flight Control , 2004 .

[26]  Yi Zhang,et al.  A self-learning call admission control scheme for CDMA cellular networks , 2005, IEEE Transactions on Neural Networks.

[27]  Huaguang Zhang,et al.  A Neural Dynamic Programming Approach F or Learning Control O f Failure Avoidance Problems , 2005 .

[28]  Frank L. Lewis,et al.  Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach , 2005, Autom..

[29]  Frank L. Lewis,et al.  Fixed-Final-Time-Constrained Optimal Control of Nonlinear Systems Using Neural Network HJB Approach , 2007, IEEE Transactions on Neural Networks.

[30]  Shen Furao,et al.  An incremental network for on-line unsupervised classification and topology learning , 2006, Neural Networks.

[31]  Radhakant Padhi,et al.  A single network adaptive critic (SNAC) architecture for optimal control synthesis for a class of nonlinear systems , 2006, Neural Networks.

[32]  Jennie Si,et al.  A performance gradient perspective on approximate dynamic programming and its application to partially observable Markov decision processes , 2006, 2006 IEEE Conference on Computer Aided Control System Design, 2006 IEEE International Conference on Control Applications, 2006 IEEE International Symposium on Intelligent Control.

[33]  Frank L. Lewis,et al.  Adaptive Critic Designs for Discrete-Time Zero-Sum Games With Application to $H_{\infty}$ Control , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[34]  Derong Liu,et al.  Discrete-Time Adaptive Dynamic Programming using Wavelet Basis Function Neural Networks , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[35]  S. Jagannathan,et al.  Online Reinforcement Learning Neural Network Controller Design for Nanomanipulation , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[36]  F. Lewis,et al.  Discrete-time nonlinear HJB solution using Approximate dynamic programming: Convergence Proof , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[37]  J. Si,et al.  Robust Dynamic Programming for Discounted Infinite-Horizon Markov Decision Processes with Uncertain Stationary Transition Matrice , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[38]  Lyle Noakes,et al.  Continuous-Time Adaptive Critics , 2007, IEEE Transactions on Neural Networks.

[39]  P.J. Werbos,et al.  Using ADP to Understand and Replicate Brain Intelligence: the Next Level Design , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[40]  Derong Liu,et al.  Adaptive Critic Learning Techniques for Engine Torque and Air–Fuel Ratio Control , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[41]  Huaguang Zhang,et al.  A Novel Infinite-Time Optimal Tracking Control Scheme for a Class of Discrete-Time Nonlinear Systems via the Greedy HDP Iteration Algorithm , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[42]  Frank L. Lewis,et al.  Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).