3 Model-Based Adaptive Critic Designs

Editor’s Summary: This chapter provides an overview of model-based adaptive critic designs, including background, general algorithms, implementations, and comparisons. The authors begin by introducing the mathematical background of model-reference adaptive critic designs. Various ADP designs such as Heuristic Dynamic Programming (HDP), Dual HDP (DHP), Globalized DHP (GDHP), and Action-Dependent (AD) designs are examined from both a mathematical and implementation standpoint and put into perspective. Pseudocode is provided for many aspects of the algorithms. The chapter concludes with applications and examples. For another overview perspective that focuses more on implementation issues read Chapter 4: Guidance in the Use of Adaptive Critics for Control. Chapter 15 contains a comparison of DHP with back-propagation through time, building a common framework for comparing these methods.

[1]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[2]  P. B. Coaker,et al.  Applied Dynamic Programming , 1964 .

[3]  A. G. Butkovskiy,et al.  Optimal control of systems , 1966 .

[4]  Robert E. Kalaba,et al.  Dynamic Programming and Modern Control Theory , 1966 .

[5]  Donald E. Kirk,et al.  Optimal control theory : an introduction , 1970 .

[6]  T. Wonnacott,et al.  Introductory statistics for business and economics , 1972 .

[7]  Y. Bar-Shalom,et al.  Dual effect, certainty equivalence, and separation in stochastic control , 1974 .

[8]  Y. Bar-Shalom Stochastic dynamic programming: Caution and probing , 1981 .

[9]  Paul J. Werbos,et al.  Applications of advances in nonlinear sensitivity analysis , 1982 .

[10]  Graham C. Goodwin,et al.  Adaptive filtering prediction and control , 1984 .

[11]  Paul J. Werbos,et al.  Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[12]  H. Michalska RECEDING HORIZON CONTROL OF NON-LINEAR SYSTEMS , 1988 .

[13]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[14]  Robert F. Stengel,et al.  Optimal Control and Estimation , 1994 .

[15]  Geoffrey J. Gordon Stable Function Approximation in Dynamic Programming , 1995, ICML.

[16]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[17]  Richard S. Sutton,et al.  A Menu of Designs for Reinforcement Learning Over Time , 1995 .

[18]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[19]  S. N. Balakrishnan,et al.  Adaptive-critic based neural networks for aircraft optimal control , 1996 .

[20]  John N. Tsitsiklis,et al.  Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[21]  K. K. Kumar,et al.  Immunized adaptive critics for level 2 intelligent control , 1997, 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation.

[22]  T. T. Shannon,et al.  Application considerations for the DHP methodology , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[23]  George G. Lendaris,et al.  Adaptive critic design for intelligent steering and speed control of a 2-axle vehicle , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[24]  R. Stengel,et al.  Classical/Neural Synthesis of Nonlinear Control Systems , 2000 .

[25]  Robert F. Stengel,et al.  Algebraic training of a neural network , 2001, Proceedings of the 2001 American Control Conference. (Cat. No.01CH37148).

[26]  Silvia Ferrari,et al.  Algebraic and adaptive learning in neural control systems , 2002 .

[27]  Robert F. Stengel,et al.  An adaptive critic global controller , 2002, Proceedings of the 2002 American Control Conference (IEEE Cat. No.CH37301).

[28]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[29]  J. Walrand,et al.  Distributed Dynamic Programming , 2022 .