Finite horizon discrete-time approximate dynamic programming

Dynamic programming for discrete time system is difficult due to the "curse of dimensionality": one has to find a series of control actions that must be taken in sequence, hoping that this sequence will lead to the optimal performance cost, but the total cost of those actions will be unknown until the end of that sequence. In this paper, we present our work on adaptive optimal control of nonlinear discrete time system using neural networks. We study the relationships of optimal controls for different control steps and then develop a neural dynamic programming algorithm based on these relationships

[1]  A. A. Mullin,et al.  Principles of neurodynamics , 1962 .

[2]  W. Wonham Random differential equations in control theory , 1970 .

[3]  H. Kang,et al.  Optimal control of nonlinear stochastic systems , 1971 .

[4]  Bernard Widrow,et al.  Punish/Reward: Learning with a Critic in Adaptive Threshold Systems , 1973, IEEE Trans. Syst. Man Cybern..

[5]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[6]  Averill M. Law,et al.  The art and theory of dynamic programming , 1977 .

[7]  G. Gopalakrishnan Nair,et al.  Suboptimal control of nonlinear systems , 1978, Autom..

[8]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[9]  Paul J. Werbos,et al.  Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[10]  Kumpati S. Narendra,et al.  Identification and control of dynamical systems using neural networks , 1990, IEEE Trans. Neural Networks.

[11]  Bernard Widrow,et al.  30 years of adaptive neural networks: perceptron, Madaline, and backpropagation , 1990, Proc. IEEE.

[12]  Paul J. Werbos,et al.  Consistency of HDP applied to a simple reinforcement learning problem , 1990, Neural Networks.

[13]  Kumpati S. Narendra,et al.  Control of nonlinear dynamical systems using neural networks: controllability and stabilization , 1993, IEEE Trans. Neural Networks.

[14]  Richard S. Sutton,et al.  A Menu of Designs for Reinforcement Learning Over Time , 1995 .

[15]  S. N. Balakrishnan,et al.  Adaptive-critic based neural networks for aircraft optimal control , 1996 .

[16]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[17]  S. N. Balakrishnan,et al.  A neighboring optimal adaptive critic for missile guidance , 1996 .

[18]  R. Saeks,et al.  On the design of a neural network autolander , 1999 .

[19]  Paul J. Werbos,et al.  Stable adaptive control using new critic designs , 1998, Other Conferences.

[20]  Jennie Si,et al.  Online learning control by association and reinforcement , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[21]  George G. Lendaris,et al.  A radial basis function implementation of the adaptive dynamic programming algorithm , 2002, The 2002 45th Midwest Symposium on Circuits and Systems, 2002. MWSCAS-2002..

[22]  George G. Lendaris,et al.  Adaptive dynamic programming , 2002, IEEE Trans. Syst. Man Cybern. Part C.

[23]  Derong Liu,et al.  Call admission control for CDMA cellular networks using adaptive critic designs , 2003, Proceedings of the 2003 IEEE International Symposium on Intelligent Control.

[24]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[25]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[26]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.