Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time
暂无分享,去创建一个
[1] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[2] R. Bellman. Dynamic programming. , 1957, Science.
[3] Nils J. Nilsson,et al. Problem-solving methods in artificial intelligence , 1971, McGraw-Hill computer science series.
[4] Donald E. Knuth,et al. Sorting and Searching , 1973 .
[5] G. Siouris,et al. Optimum systems control , 1979, Proceedings of the IEEE.
[6] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[7] Donald A. Berry,et al. Bandit Problems: Sequential Allocation of Experiments. , 1986 .
[8] David L. Waltz,et al. Toward memory-based reasoning , 1986, CACM.
[9] MITSUO SATO,et al. Learning control of finite Markov chains with an explicit trade-off between estimation and control , 1988, IEEE Trans. Syst. Man Cybern..
[10] C. Watkins. Learning from delayed rewards , 1989 .
[11] Richard S. Sutton,et al. Learning and Sequential Decision Making , 1989 .
[12] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .
[13] Richard E. Korf,et al. Real-Time Heuristic Search , 1990, Artif. Intell..
[14] Richard S. Sutton,et al. Time-Derivative Models of Pavlovian Reinforcement , 1990 .
[15] Alan D. Christiansen,et al. Learning reliable manipulation strategies without initial physical models , 1990, Proceedings., IEEE International Conference on Robotics and Automation.
[16] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[17] Andrew G. Barto,et al. On the Computational Economics of Reinforcement Learning , 1991 .
[18] Long Ji Lin,et al. Programming Robots Using Reinforcement Learning and Teaching , 1991, AAAI.
[19] Sebastian Thrun,et al. Active Exploration in Dynamic Environments , 1991, NIPS.
[20] Satinder P. Singh,et al. Transfer of Learning Across Compositions of Sequentail Tasks , 1991, ML.
[21] Sridhar Mahadevan,et al. Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..
[22] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .
[23] Peter Dayan,et al. The convergence of TD(λ) for general λ , 1992, Machine Learning.
[24] Gerald Tesauro,et al. Practical issues in temporal difference learning , 1992, Machine Learning.
[25] Timothy J. Purcell. Sorting and searching , 2005, SIGGRAPH Courses.
[26] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.