A retrospective on Adaptive Dynamic Programming for control

Some three decades ago, certain computational intelligence methods of reinforcement learning were recognized as implementing an approximation of Bellman's Dynamic Programming method, which is known in the controls community as an important tool for designing optimal control policies for nonlinear plants and sequential decision making. Significant theoretical and practical developments have occurred within this arena, mostly in the past decade, with the methodology now usually referred to as Adaptive Dynamic Programming (ADP). The objective of this paper is to provide a retrospective of selected threads of such developments. In addition, a commentary is offered concerning present status of ADP, and threads for future research and development within the controls field are suggested.

[1]  K. N. Dollman,et al.  - 1 , 1743 .

[2]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[3]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[4]  A. L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[5]  Donald E. Kirk,et al.  Optimal Control Theory , 1970 .

[6]  Bernard Widrow,et al.  Punish/Reward: Learning with a Critic in Adaptive Threshold Systems , 1973, IEEE Trans. Syst. Man Cybern..

[7]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[8]  Naresh K. Sinha,et al.  Modern Control Systems , 1981, IEEE Transactions on Systems, Man, and Cybernetics.

[9]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[10]  Karl Johan Åström,et al.  Computer-Controlled Systems: Theory and Design , 1984 .

[11]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[12]  Dimitri P. Bertsekas,et al.  Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[13]  Paul J. Werbos,et al.  Neural networks for control and system identification , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[14]  S. Sastry,et al.  Adaptive Control: Stability, Convergence and Robustness , 1989 .

[15]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[16]  H. Penny Nii,et al.  The Handbook of Artificial Intelligence , 1982 .

[17]  Kumpati S. Narendra,et al.  Identification and control of dynamical systems using neural networks , 1990, IEEE Trans. Neural Networks.

[18]  George A. Perdikaris Computer Controlled Systems , 1991 .

[19]  Frank L. Lewis,et al.  Applied Optimal Control and Estimation , 1992 .

[20]  Eduardo D. Sontag,et al.  Neural Networks for Control , 1993 .

[21]  G. Lendaris,et al.  Using A Priori Knowledge to Prestructure ANNs , 1994 .

[22]  Snehasis Mukhopadhyay,et al.  Adaptive control of nonlinear multivariable systems using neural networks , 1993, Proceedings of 32nd IEEE Conference on Decision and Control.

[23]  E. Mosca Optimal, Predictive and Adaptive Control , 1994 .

[24]  Richard S. Sutton,et al.  A Menu of Designs for Reinforcement Learning Over Time , 1995 .

[25]  Li-Xin Wang,et al.  A Course In Fuzzy Systems and Control , 1996 .

[26]  Katsuhiko Ogata,et al.  Modern control engineering (3rd ed.) , 1996 .

[27]  Richard D. Braatz,et al.  On the "Identification and control of dynamical systems using neural networks" , 1997, IEEE Trans. Neural Networks.

[28]  Donald C. Wunsch,et al.  Adaptive critic designs and their applications , 1997 .

[29]  James C. Neidhoefer,et al.  Immunized adaptive critics , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[30]  J. Doyle,et al.  Essentials of Robust Control , 1997 .

[31]  J. Yen,et al.  Fuzzy Logic: Intelligence, Control, and Information , 1998 .

[32]  T. T. Shannon,et al.  Application considerations for the DHP methodology , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[33]  T. Shannon,et al.  Qualitative models for adaptive critic neurocontrol , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).

[34]  George G. Lendaris,et al.  Prestructuring neural networks via extended dependency analysis with application to pattern classification , 1999, Defense, Security, and Sensing.

[35]  George G. Lendaris,et al.  A comparison of training algorithms for DHP adaptive critic neurocontrol , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[36]  R. Saeks,et al.  Adaptive critic control of the power train in a hybrid electric vehicle , 1999, SMCia/99 Proceedings of the 1999 IEEE Midnight - Sun Workshop on Soft Computing Methods in Industrial Applications (Cat. No.99EX269).

[37]  J. Neidhoefer,et al.  Immunized Adaptive Critic for an Autonomous Aircraft Control Application , 1999 .

[38]  Rein Luus,et al.  Iterative dynamic programming , 2019, Iterative Dynamic Programming.

[39]  George G. Lendaris,et al.  Adaptive critic based approximate dynamic programming for tuning fuzzy controllers , 2000, Ninth IEEE International Conference on Fuzzy Systems. FUZZ- IEEE 2000 (Cat. No.00CH37063).

[40]  George G. Lendaris,et al.  A New Hybrid Critic-Training Method for Approximate Dynamic Programming , 2000 .

[41]  George G. Lendaris,et al.  Adaptive critic design for intelligent steering and speed control of a 2-axle vehicle , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[42]  George G. Lendaris,et al.  Controller design (from scratch) using approximate dynamic programming , 2000, Proceedings of the 2000 IEEE International Symposium on Intelligent Control. Held jointly with the 8th IEEE Mediterranean Conference on Control and Automation (Cat. No.00CH37147).

[43]  Douglas C. Hittle,et al.  Robust reinforcement learning control with static and dynamic stability , 2001 .

[44]  Frank L. Lewis,et al.  Introduction to the special issue on neural network feedback control , 2001, Autom..

[45]  Radhakant Padhi,et al.  A systematic synthesis of optimal process control with neural networks , 2001, Proceedings of the 2001 American Control Conference. (Cat. No.01CH37148).

[46]  Zhongwu Huang,et al.  ROBUST ADAPTIVE CRITIC BASED NEUROCONTROLLERS FOR MISSILES WITH MODEL UNCERTAINTIES , 2001 .

[47]  George G. Lendaris,et al.  Using DHP adaptive critic methods to tune a fuzzy automobile steering controller , 2001, Proceedings Joint 9th IFSA World Congress and 20th NAFIPS International Conference (Cat. No. 01TH8569).

[48]  Radhakant Padhi,et al.  Adaptive-critic based optimal neuro control synthesis for distributed parameter systems , 2001, Autom..

[49]  P. N. Paraskevopoulos,et al.  Modern Control Engineering , 2001 .

[50]  George G. Lendaris,et al.  Dual heuristic programming for fuzzy control , 2001, Proceedings Joint 9th IFSA World Congress and 20th NAFIPS International Conference (Cat. No. 01TH8569).

[51]  George G. Lendaris,et al.  A comparison of DHP based antecedent parameter tuning strategies for fuzzy control , 2001, Proceedings Joint 9th IFSA World Congress and 20th NAFIPS International Conference (Cat. No. 01TH8569).

[52]  T. T. Shannon,et al.  Adaptive critic based adaptation of a fuzzy policy manager for a logistic system , 2001, Proceedings Joint 9th IFSA World Congress and 20th NAFIPS International Conference (Cat. No. 01TH8569).

[53]  Naim A. Kheir,et al.  Control system design , 2001, Autom..

[54]  George G. Lendaris,et al.  Adaptive critic based design of a fuzzy motor speed controller , 2001, Proceeding of the 2001 IEEE International Symposium on Intelligent Control (ISIC '01) (Cat. No.01CH37206).

[55]  George G. Lendaris,et al.  Proposed framework for applying adaptive critics in real-time realm , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[56]  John J. Murray,et al.  Adaptive control of a hybrid electric vehicle , 2002, IEEE Trans. Intell. Transp. Syst..

[57]  George G. Lendaris,et al.  Adaptive dynamic programming , 2002, IEEE Trans. Syst. Man Cybern. Part C.

[58]  George G. Lendaris,et al.  Controller design via adaptive critic and model reference methods , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[59]  Roberto A. Santiago,et al.  Accelerating critic learning in approximate dynamic programming via value templates and perceptual learning , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[60]  Warren B. Powell,et al.  GUIDANCE IN THE USE OF ADAPTIVE CRITICS FOR CONTROL , 2007 .

[61]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[62]  Douglas C. Hittle,et al.  Robust Reinforcement Learning Control Using Integral Quadratic Constraints for Recurrent Neural Networks , 2007, IEEE Transactions on Neural Networks.

[63]  Dimitri P. Bertsekas,et al.  Stochastic optimal control : the discrete time case , 2007 .

[64]  Frank L. Lewis,et al.  Guest Editorial: Special Issue on Adaptive Dynamic Programming and Reinforcement Learning in Feedback Control , 2008, IEEE Trans. Syst. Man Cybern. Part B.

[65]  George G. Lendaris,et al.  Higher Level Application of ADP: A Next Phase for the Control Field? , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[66]  Victor M. Becerra,et al.  Optimal control , 2008, Scholarpedia.

[67]  Frank L. Lewis,et al.  Special issue on approximate dynamic programming and reinforcement learning , 2011 .

[68]  P. Schrimpf,et al.  Dynamic Programming , 2011 .