Linear Least-Squares algorithms for temporal difference learning
暂无分享,去创建一个
[1] John G. Kemeny,et al. Finite Markov chains , 1960 .
[2] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[3] T. Söderström,et al. Instrumental variable methods for system identification , 1983 .
[4] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[5] Graham C. Goodwin,et al. Adaptive filtering prediction and control , 1984 .
[6] P. Kumar,et al. Theory and practice of recursive identification , 1985, IEEE Transactions on Automatic Control.
[7] Hong Wang,et al. Recursive estimation and time-series analysis , 1986, IEEE Trans. Acoust. Speech Signal Process..
[8] Paul J. Werbos,et al. Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.
[9] Charles W. Anderson,et al. Strategy Learning with Multilayer Connectionist Representations , 1987 .
[10] PAUL J. WERBOS,et al. Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.
[11] P. Werbos,et al. Expectation Driven Learning with an Associative Memory , 1990 .
[12] Paul J. Werbos,et al. Consistency of HDP applied to a simple reinforcement learning problem , 1990, Neural Networks.
[13] John Moody,et al. Learning rate schedules for faster stochastic gradient search , 1992, Neural Networks for Signal Processing II Proceedings of the 1992 IEEE Workshop.
[14] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[15] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[16] Steven J. Bradtke,et al. Incremental dynamic programming for on-line adaptive optimal control , 1995 .
[17] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[18] Peter Dayan,et al. The convergence of TD(λ) for general λ , 1992, Machine Learning.
[19] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[20] Gerald Tesauro,et al. Practical issues in temporal difference learning , 1992, Machine Learning.
[21] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.
[22] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.