暂无分享,去创建一个
[1] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[2] Michael Fairbank,et al. Approximating Optimal Control with Value Gradient Learning , 2013 .
[3] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..
[4] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[5] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[6] Kenji Doya,et al. Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.
[7] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[8] Michael Fairbank,et al. A comparison of learning speed and ability to cope without exploration between DHP and TD(0) , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).
[9] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[10] Paul J. Werbos,et al. Approximate dynamic programming for real-time control and neural modeling , 1992 .
[11] P J Webros. BACKPROPAGATION THROUGH TIME: WHAT IT DOES AND HOW TO DO IT , 1990 .
[12] S. F. R. F. Stengel. 3 Model-Based Adaptive Critic Designs , 2004 .
[13] Michael Fairbank,et al. Reinforcement Learning by Value Gradients , 2008, ArXiv.
[14] P. Werbos,et al. Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .
[15] Michael Fairbank,et al. Simple and Fast Calculation of the Second-Order Gradients for Globalized Dual Heuristic Dynamic Programming in Neural Networks , 2012, IEEE Transactions on Neural Networks and Learning Systems.
[16] Jennie Si,et al. Backpropagation Through Time and Derivative Adaptive CriticsA Common Framework for ComparisonPortions of this chapter were previously published in [4, 7,9, 1214,23]. , 2004 .
[17] Michael Fairbank,et al. Value-gradient learning , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).
[18] Etienne Barnard,et al. Temporal-difference methods and Markov models , 1993, IEEE Trans. Syst. Man Cybern..
[19] Rémi Munos,et al. Policy Gradient in Continuous Time , 2006, J. Mach. Learn. Res..
[20] Michael Fairbank,et al. The Local Optimality of Reinforcement Learning by Value Gradients, and its Relationship to Policy Gradient Learning , 2011, ArXiv.
[21] Chris Watkins,et al. Learning from delayed rewards , 1989 .
[22] Razvan V. Florian,et al. Correct equations for the dynamics of the cart-pole system , 2005 .
[23] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[24] Frank L. Lewis,et al. Reinforcement Learning and Approximate Dynamic Programming for Feedback Control , 2012 .
[25] Huaguang Zhang,et al. Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.
[26] George G. Lendaris,et al. Training strategies for critic and action neural networks in dual heuristic programming method , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).