The Convergence of TD(λ) for General λ
暂无分享,去创建一个
[1] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[2] Stuart E. Dreyfus,et al. Applied Dynamic Programming , 1965 .
[3] A. L. Samuel,et al. Some studies in machine learning using the game of checkers. II: recent progress , 1967 .
[4] A. H. Klopf,et al. Brain Function and Adaptive Systems: A Heterostatic Theory , 1972 .
[5] James S. Albus,et al. New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)1 , 1975 .
[6] Ian H. Witten,et al. An Adaptive Optimal Controller for Discrete-Time Markov Environments , 1977, Inf. Control..
[7] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[8] Steven Edward Hampson,et al. A neural model of adaptive behavior , 1983 .
[9] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[10] Bernard Widrow,et al. Adaptive Signal Processing , 1985 .
[11] John H. Holland,et al. Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems , 1995 .
[12] Stephen M. Omohundro,et al. Efficient Algorithms with Neural Network Behavior , 1987, Complex Syst..
[13] C. Watkins. Learning from delayed rewards , 1989 .
[14] Richard S. Sutton,et al. Learning and Sequential Decision Making , 1989 .
[15] Paul J. Werbos,et al. Consistency of HDP applied to a simple reinforcement learning problem , 1990, Neural Networks.