Incremental multi-step Q-learning

[1]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[2]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[3]  Peter Dayan,et al.  The convergence of TD(λ) for general λ , 1992, Machine Learning.

[4]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[5]  Andrew W. Moore,et al.  Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.

[6]  Leslie Pack Kaelbling,et al.  On reinforcement learning for robots , 1996, IROS.

[7]  Pawel Cichosz,et al.  Fast and Efficient Reinforcement Learning with Truncated Temporal Differences , 1995, ICML.

[8]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[9]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[10]  Mark D. Pendrith On Reinforcement Learning of Control Actions in Noisy and Non-Markovian Domains , 1994 .

[11]  J. Peng,et al.  Efficient Learning and Planning Within the Dyna Framework , 1993, IEEE International Conference on Neural Networks.

[12]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[13]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[14]  Paul J. Werbos,et al.  Consistency of HDP applied to a simple reinforcement learning problem , 1990, Neural Networks.

[15]  C. Watkins Learning from delayed rewards , 1989 .

[16]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[17]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.