Stable adaptive control using new critic designs

Classical adaptive control proves total-system stability for control of linear plants, but only for plants meeting very restrictive assumptions. Approximate Dynamic Programming (ADP) has the potential, in principle, to ensure stability without such tight restrictions. It also offers nonlinear and neural extensions for optimal control, with empirically supported links to what is seen in the brain. However, the relevant ADP methods in use today--TD, HDP, DHP, GDHP--and the Galerkin-based versions of these all have serious limitations when used here as parallel distributed real-time learning systems; either they do not possess quadratic unconditional stability (to be defined) or they lead to incorrect results in the stochastic case. (ADAC or Q- learning designs do not help.) After explaining these conclusions, this paper describes new ADP designs which overcome these limitations. It also addresses the Generalized Moving Target problem, a common family of static optimization problems, and describes a way to stabilize large-scale economic equilibrium models, such as the old long-term energy mode of DOE.

[1]  J. Neumann,et al.  Theory of games and economic behavior , 1945, 100 Years of Math Milestones.

[2]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1972 .

[3]  Bernard Widrow,et al.  Punish/Reward: Learning with a Critic in Adaptive Threshold Systems , 1973, IEEE Trans. Syst. Man Cybern..

[4]  Paul J. Werbos,et al.  Applications of advances in nonlinear sensitivity analysis , 1982 .

[5]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[6]  Graham C. Goodwin,et al.  Adaptive filtering prediction and control , 1984 .

[7]  Paul J. Werbos,et al.  Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[8]  Stephen Grossberg,et al.  Nonlinear neural networks: Principles, mechanisms, and architectures , 1988, Neural Networks.

[9]  Anuradha M. Annaswamy,et al.  Stable Adaptive Systems , 1989 .

[10]  Karl Johan Åström,et al.  Adaptive Control , 1989, Embedded Digital Control with Microcontrollers.

[11]  M. Marcus,et al.  A Survey of Matrix Theory and Matrix Inequalities , 1965 .

[12]  Paul J. Werbos,et al.  Neurocontrol and related techniques , 1990 .

[13]  Bernard Widrow,et al.  30 years of adaptive neural networks: perceptron, Madaline, and backpropagation , 1990, Proc. IEEE.

[14]  Paul J. Werbos,et al.  Consistency of HDP applied to a simple reinforcement learning problem , 1990, Neural Networks.

[15]  S. Laberge,et al.  Exploring the World of Lucid Dreaming , 1990 .

[16]  P. Werbos 1.5. Rational approaches to identifying policy objectives , 1990 .

[17]  Weiping Li,et al.  Applied Nonlinear Control , 1991 .

[18]  Kumpati S. Narendra,et al.  Gradient methods for the optimization of dynamical systems containing neural networks , 1991, IEEE Trans. Neural Networks.

[19]  Gerald Tesauro,et al.  Practical Issues in Temporal Difference Learning , 1992, Mach. Learn..

[20]  Steven J. Bradtke,et al.  Reinforcement Learning Applied to Linear Quadratic Regulation , 1992, NIPS.

[21]  Hamid R. Berenji,et al.  A reinforcement learning--based architecture for fuzzy logic control , 1992, Int. J. Approx. Reason..

[22]  B. Widrow,et al.  Adaptive inverse control , 1987, Proceedings of 8th IEEE International Symposium on Intelligent Control.

[23]  Stephen P. Boyd,et al.  Linear Matrix Inequalities in Systems and Control Theory , 1994 .

[24]  Paul J. Werbos,et al.  Supervised Learning: Can it Escape its Local Minimum? , 1994 .

[25]  Paul J. Werbos,et al.  The Roots of Backpropagation: From Ordered Derivatives to Neural Networks and Political Forecasting , 1994 .

[26]  Robert F. Stengel,et al.  Optimal Control and Estimation , 1994 .

[27]  P. R. Kumar,et al.  Adaptive Control, Filtering, and Signal Processing , 1995 .

[28]  P. J. Werbos Optimal neurocontrol: practical benefits, new results and biological evidence , 1995, Proceedings of WESCON'95.

[29]  Vladimir A. Yakubovich,et al.  Linear Matrix Inequalities in System and Control Theory (S. Boyd, L. E. Ghaoui, E. Feron, and V. Balakrishnan) , 1995, SIAM Rev..

[30]  N. S. Patel,et al.  Information state for robust control of set-valued discrete time systems , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.

[31]  Leiba Rodman,et al.  Algebraic Riccati equations , 1995 .

[32]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[33]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[34]  B. Pasik-Duncan,et al.  Adaptive Control , 1996, IEEE Control Systems.

[35]  John N. Tsitsiklis,et al.  Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[36]  N. Karoui,et al.  Backward Stochastic Differential Equations , 1997 .

[37]  Donald C. Wunsch,et al.  Adaptive critic designs and their applications , 1997 .

[38]  Tomas Landelius,et al.  Reinforcement Learning and Distributed Local Model Synthesis , 1997 .

[39]  L. M. Dolmatova,et al.  TRAPS AND TRICKS IN STANDARD BENCHMARK PROBLEMS FOR NEUROCONTROL , 1997 .

[40]  Michael I. Jordan,et al.  Modular and hierarchical learning systems , 1998 .

[41]  X. Pang,et al.  Neural network design for J function approximation in dynamic programming , 1998, adap-org/9806001.

[42]  V. Kůrková,et al.  Dealing with complexity : a neural networks approach , 1998 .

[43]  S. N. Balakrishnan,et al.  Adaptive Critic-Based Neural Networks for Agile Missile Control , 1998 .

[44]  P. Werbos Can Soliton Attractors Exist in Realistic 3+1-D Conservative Systems? , 1999 .

[45]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.