Truncating Temporal Differences: On the Efficient Implementation of TD(lambda) for Reinforcement Learning
暂无分享,去创建一个
[1] Wg Lehnert,et al. THE HEDONISTIC NEURON - A THEORY OF MEMORY, LEARNING, AND INTELLIGENCE - KLOPF,AH , 1983 .
[2] John S. Edwards,et al. The Hedonistic Neuron: A Theory of Memory, Learning and Intelligence , 1983 .
[3] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[4] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[5] C. Watkins. Learning from delayed rewards , 1989 .
[6] Richard S. Sutton,et al. Learning and Sequential Decision Making , 1989 .
[7] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[8] Richard S. Sutton,et al. Reinforcement Learning is Direct Adaptive Optimal Control , 1992, 1991 American Control Conference.
[9] R.J. Williams,et al. Reinforcement learning is direct adaptive optimal control , 1991, IEEE Control Systems.
[10] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .
[11] Gerald Tesauro,et al. Practical Issues in Temporal Difference Learning , 1992, Mach. Learn..
[12] G. Tesauro. Practical Issues in Temporal Difference Learning , 1992 .
[13] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.
[14] Satinder P. Singh,et al. Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes , 1994, AAAI.
[15] LearningRichard S. Suttonsutton. On Step-Size and Bias in Temporal-Di erence , 1994 .
[16] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[17] Mark D. Pendrith. On Reinforcement Learning of Control Actions in Noisy and Non-Markovian Domains , 1994 .
[18] P. Dayan,et al. TD(λ) converges with probability 1 , 2004, Machine Learning.
[19] Paweł Cichosz,et al. Reinforcement Learning Algorithms Based on the Methods of Temporal Differences , 1994 .
[20] Matthias Heger,et al. Consideration of Risk in Reinforcement Learning , 1994, ICML.