Robust Adaptive Dynamic Programming and Feedback Stabilization of Nonlinear Systems

This paper studies the robust optimal control design for a class of uncertain nonlinear systems from a perspective of robust adaptive dynamic programming (RADP). The objective is to fill up a gap in the past literature of adaptive dynamic programming (ADP) where dynamic uncertainties or unmodeled dynamics are not addressed. A key strategy is to integrate tools from modern nonlinear control theory, such as the robust redesign and the backstepping techniques as well as the nonlinear small-gain theorem, with the theory of ADP. The proposed RADP methodology can be viewed as an extension of ADP to uncertain nonlinear systems. Practical learning algorithms are developed in this paper, and have been applied to the controller design problems for a jet engine and a one-machine power system.

[1]  P. Olver Nonlinear Systems , 2013 .

[2]  F. Moore,et al.  A Theory of Post-Stall Transients in Axial Compression Systems: Part I—Development of Equations , 1986 .

[3]  Frank L. Lewis,et al.  Reinforcement Learning and Approximate Dynamic Programming for Feedback Control , 2012 .

[4]  L. C. Baird,et al.  Reinforcement learning in continuous time: advantage updating , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[5]  Jerry M. Mendel,et al.  Reinforcement-learning control and pattern recognition systems , 1994 .

[6]  Paul J. Werbos,et al.  2009 Special Issue: Intelligence in the brain: A theory of how it works and how to build it , 2009 .

[7]  A. Isidori Nonlinear Control Systems , 1985 .

[8]  Randal W. Beard,et al.  Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation , 1997, Autom..

[9]  Eduardo Sontag,et al.  On characterizations of the input-to-state stability property , 1995 .

[10]  Frank L. Lewis,et al.  Reinforcement Learning for Partially Observable Dynamic Processes: Adaptive Dynamic Programming Using Measured Output Data , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[11]  Yuan Wang,et al.  Stabilization in spite of matched unmodeled dynamics and an equivalent definition of input-to-state stability , 1996, Math. Control. Signals Syst..

[12]  Frank L. Lewis,et al.  Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control , 2007, Autom..

[13]  Zhong-Ping Jiang,et al.  Robust Adaptive Dynamic Programming With an Application to Power Systems , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[14]  R. Godson Elements of intelligence , 1979 .

[15]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[16]  M. Powell,et al.  Approximation theory and methods , 1984 .

[17]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[18]  John Tsinias,et al.  Partial-state global stabilization for general triangular systems , 1995 .

[19]  Paul J. Werbos,et al.  Neural networks for control and system identification , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[20]  Zhong-Ping Jiang,et al.  Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics , 2012, Autom..

[21]  Zhong-Ping Jiang,et al.  Stable neural controller design for unknown nonlinear systems using backstepping , 2000, IEEE Trans. Neural Networks Learn. Syst..

[22]  Paul J. Werbos,et al.  Approximate dynamic programming for real-time control and neural modeling , 1992 .

[23]  George N. Saridis,et al.  An Approximation Theory of Optimal Control for Trainable Manipulators , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[24]  Frank L. Lewis,et al.  2009 Special Issue: Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems , 2009 .

[25]  Eduardo Sontag Further facts about input to state stabilization , 1990 .

[26]  R. Bellman,et al.  Dynamic Programming and Markov Processes , 1960 .

[27]  Frank L. Lewis,et al.  Optimal Control: Lewis/Optimal Control 3e , 2012 .

[28]  Paul J. Webros A menu of designs for reinforcement learning over time , 1990 .

[29]  Frank L. Lewis,et al.  Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton-Jacobi equations , 2011, Autom..

[30]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[31]  Zhong-Ping Jiang,et al.  Robust approximate dynamic programming and global stabilization with nonlinear dynamic uncertainties , 2011, IEEE Conference on Decision and Control and European Control Conference.

[32]  Zhong-Ping Jiang,et al.  A Lyapunov formulation of the nonlinear small-gain theorem for interconnected ISS systems , 1996, Autom..

[33]  Frank L. Lewis,et al.  Adaptive optimal control for continuous-time linear systems based on policy iteration , 2009, Autom..

[34]  Iasson Karafyllis,et al.  Stability and Stabilization of Nonlinear Systems , 2011 .

[35]  Marvin Minsky,et al.  Steps toward Artificial Intelligence , 1995, Proceedings of the IRE.

[36]  Frank L. Lewis,et al.  Neurodynamic Programming and Zero-Sum Games for Constrained Control Systems , 2008, IEEE Transactions on Neural Networks.

[37]  George G. Lendaris,et al.  Adaptive dynamic programming , 2002, IEEE Trans. Syst. Man Cybern. Part C.

[38]  Zhong-Ping Jiang,et al.  Small-gain theorem for ISS systems and applications , 1994, Math. Control. Signals Syst..

[39]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[40]  Zhong-Ping Jiang,et al.  A small-gain control method for nonlinear cascaded systems with dynamic uncertainties , 1997, IEEE Trans. Autom. Control..

[41]  Babu Narayanan,et al.  POWER SYSTEM STABILITY AND CONTROL , 2015 .

[42]  Luigi Fortuna,et al.  Reinforcement Learning and Adaptive Dynamic Programming for Feedback Control , 2009 .

[43]  D. Mayne Nonlinear and Adaptive Control Design [Book Review] , 1996, IEEE Transactions on Automatic Control.

[44]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[45]  Huaguang Zhang,et al.  An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games , 2011, Autom..

[46]  A. Teel,et al.  Tools for Semiglobal Stabilization by Partial State and Output Feedback , 1995 .

[47]  K. Fu,et al.  A heuristic approach to reinforcement learning control systems , 1965 .

[48]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[49]  Zhong-Ping Jiang,et al.  Robust adaptive dynamic programming for linear and nonlinear systems: An overview , 2013, Eur. J. Control.

[50]  Eduardo Sontag Smooth stabilization implies coprime factorization , 1989, IEEE Transactions on Automatic Control.

[51]  Huaguang Zhang,et al.  Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.

[52]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[53]  Donald A. Sofge,et al.  Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches , 1992 .

[54]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.