Discrete-Time Adaptive Dynamic Programming using Wavelet Basis Function Neural Networks

Dynamic programming for discrete time systems is difficult due to the "curse of dimensionality": one has to find a series of control actions that must be taken in sequence, hoping that this sequence will lead to the optimal performance cost, but the total cost of those actions will be unknown until the end of that sequence. In this paper, we present our work on adaptive dynamic programming (ADP) for nonlinear discrete time system using neural networks. The neural network we adopted here is the wavelet basis function (WBF) neural network. We will exam the performance of an ADP algorithm using WBF neural networks. The comparison shows that when WBF neural networks are employed, the ADP algorithm gives faster training speed than when RBF neural networks are employed

[1]  Jennie Si,et al.  Online learning control by association and reinforcement , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[2]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[3]  Kumpati S. Narendra,et al.  Control of nonlinear dynamical systems using neural networks: controllability and stabilization , 1993, IEEE Trans. Neural Networks.

[4]  Paul J. Werbos,et al.  Consistency of HDP applied to a simple reinforcement learning problem , 1990, Neural Networks.

[5]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[6]  Paul J. Werbos,et al.  Approximate dynamic programming for real-time control and neural modeling , 1992 .

[7]  Victor M. Becerra,et al.  Optimal control , 2008, Scholarpedia.

[8]  S. N. Balakrishnan,et al.  A neighboring optimal adaptive critic for missile guidance , 1996 .

[9]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[10]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[11]  A. A. Mullin,et al.  Principles of neurodynamics , 1962 .

[12]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[13]  Jacobus P. H. Wessels,et al.  The art and theory of dynamic programming , 1977 .

[14]  George G. Lendaris,et al.  A radial basis function implementation of the adaptive dynamic programming algorithm , 2002, The 2002 45th Midwest Symposium on Circuits and Systems, 2002. MWSCAS-2002..

[15]  Derong Liu,et al.  Call admission control for CDMA cellular networks using adaptive critic designs , 2003, Proceedings of the 2003 IEEE International Symposium on Intelligent Control.

[16]  H. Kang,et al.  Optimal control of nonlinear stochastic systems , 1971 .

[17]  R. Saeks,et al.  On the design of a neural network autolander , 1999 .

[18]  Paul J. Werbos,et al.  Stable adaptive control using new critic designs , 1998, Other Conferences.

[19]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 2005, IEEE Transactions on Neural Networks.

[20]  Richard D. Braatz,et al.  On the "Identification and control of dynamical systems using neural networks" , 1997, IEEE Trans. Neural Networks.

[21]  G. Saridis,et al.  Suboptimal control for nonlinear stochastic systems , 1992, [1992] Proceedings of the 31st IEEE Conference on Decision and Control.

[22]  Bernard Widrow,et al.  Punish/Reward: Learning with a Critic in Adaptive Threshold Systems , 1973, IEEE Trans. Syst. Man Cybern..

[23]  Bernard Widrow,et al.  30 years of adaptive neural networks: perceptron, Madaline, and backpropagation , 1990, Proc. IEEE.

[24]  W. Wonham Random differential equations in control theory , 1970 .

[25]  Paul J. Werbos,et al.  Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[26]  S. N. Balakrishnan,et al.  Adaptive-critic based neural networks for aircraft optimal control , 1996 .

[27]  Yi Zhang,et al.  A self-learning call admission control scheme for CDMA cellular networks , 2005, IEEE Transactions on Neural Networks.

[28]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[29]  Richard S. Sutton,et al.  A Menu of Designs for Reinforcement Learning Over Time , 1995 .

[30]  Derong Liu,et al.  Finite horizon discrete-time approximate dynamic programming , 2006, 2006 IEEE Conference on Computer Aided Control System Design, 2006 IEEE International Conference on Control Applications, 2006 IEEE International Symposium on Intelligent Control.

[31]  George G. Lendaris,et al.  Adaptive dynamic programming , 2002, IEEE Trans. Syst. Man Cybern. Part C.