Action-dependent adaptive critic designs

We study a class of action-dependent adaptive critic designs. Conventional adaptive critic designs contain three basic modules: critic, model and action. Each of the three modules can be implemented using a neural network. By combining the critic network and the model network to form a new critic network, we propose a form of action-dependent adaptive critic designs where the critic network implicitly includes a model network in it. An important feature of the present design is that the proposed action-dependent adaptive critic designs can be applied to online learning control applications. We also provide details about the training of the neural networks used in the present design. The training approach described makes it possible the use of many readily available neural network training algorithms and tools without modifications. We employ the pole balancing problem in our simulation study to show the applicability of the present results.

[1]  R. Bellman Dynamic programming. , 1957, Science.

[2]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[3]  Paul J. Werbos,et al.  Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[4]  C.W. Anderson,et al.  Learning to control an inverted pendulum using neural networks , 1989, IEEE Control Systems Magazine.

[5]  Paul J. Werbos,et al.  Consistency of HDP applied to a simple reinforcement learning problem , 1990, Neural Networks.

[6]  Mohammad Bagher Menhaj,et al.  Training feedforward networks with the Marquardt algorithm , 1994, IEEE Trans. Neural Networks.

[7]  Richard S. Sutton,et al.  A Menu of Designs for Reinforcement Learning Over Time , 1995 .

[8]  Richard S. Sutton,et al.  Challenging Control Problems , 1995 .

[9]  Roberto A. Santiago,et al.  Adaptive critic designs: A case study for neurocontrol , 1995, Neural Networks.

[10]  S. N. Balakrishnan,et al.  Adaptive-critic based neural networks for aircraft optimal control , 1996 .

[11]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[12]  Karl Mathia,et al.  Asymptotic dynamic programming: preliminary concepts and results , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[13]  George G. Lendaris,et al.  Training strategies for critic and action neural networks in dual heuristic programming method , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).