Prioritized sweeping: Reinforcement learning with less data and less time
暂无分享,去创建一个
[1] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[2] Nils J. Nilsson,et al. Problem-solving methods in artificial intelligence , 1971, McGraw-Hill computer science series.
[3] G. Siouris,et al. Optimum Systems Control , 1979, IEEE Transactions on Systems, Man and Cybernetics.
[4] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[5] David L. Waltz,et al. Toward memory-based reasoning , 1986, CACM.
[6] MITSUO SATO,et al. Learning control of finite Markov chains with an explicit trade-off between estimation and control , 1988, IEEE Trans. Syst. Man Cybern..
[7] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .
[8] Richard E. Korf,et al. Real-Time Heuristic Search , 1990, Artif. Intell..
[9] Richard S. Sutton,et al. Time-Derivative Models of Pavlovian Reinforcement , 1990 .
[10] Alan D. Christiansen,et al. Learning reliable manipulation strategies without initial physical models , 1990, Proceedings., IEEE International Conference on Robotics and Automation.
[11] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[12] Andrew G. Barto,et al. On the Computational Economics of Reinforcement Learning , 1991 .
[13] Long Ji Lin,et al. Programming Robots Using Reinforcement Learning and Teaching , 1991, AAAI.
[14] Sebastian Thrun,et al. Active Exploration in Dynamic Environments , 1991, NIPS.
[15] G. Tesauro. Practical Issues in Temporal Difference Learning , 1992 .
[16] Sridhar Mahadevan,et al. Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..
[17] Timothy J. Purcell. Sorting and searching , 2005, SIGGRAPH Courses.
[18] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.