Exploiting Best-Match Equations for Efficient Reinforcement Learning
暂无分享,去创建一个
Shimon Whiteson | Marco Wiering | Harm van Seijen | Hado van Hasselt | H. V. Hasselt | M. Wiering | H. V. Seijen | Shimon Whiteson
[1] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.
[2] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[3] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.
[4] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[5] Lihong Li,et al. The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning , 2009, ICML '09.
[6] C. Atkeson,et al. Prioritized Sweeping : Reinforcement Learning withLess Data and Less Real , 1993 .
[7] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[8] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[9] Michael L. Littman,et al. A theoretical analysis of Model-Based Interval Estimation , 2005, ICML.
[10] M. Puterman,et al. Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .
[11] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[12] Andrew W. Moore,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.
[13] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[14] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[15] Michael Kearns,et al. Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.
[16] Jürgen Schmidhuber,et al. Fast Online Q(λ) , 1998, Machine Learning.
[17] Andrew W. Moore,et al. Locally Weighted Learning , 1997, Artificial Intelligence Review.
[18] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .
[19] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[20] Chris Watkins,et al. Learning from delayed rewards , 1989 .
[21] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[22] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[23] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[24] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[25] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[26] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[27] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..