Reinforcement Learning: a Brief Overview
暂无分享,去创建一个
[1] Andrew W. Moore,et al. Gradient Descent for General Reinforcement Learning , 1998, NIPS.
[2] Richard S. Sutton,et al. Learning and Sequential Decision Making , 1989 .
[3] George Lakoff,et al. Women, Fire, and Dangerous Things , 1987 .
[4] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.
[5] Armin Stahl,et al. Learning Feature Weights from Case Order Feedback , 2001, ICCBR.
[6] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[7] Jieyu Zhao,et al. Direct Policy Search and Uncertain Policy Evaluation , 1998 .
[8] Dimitri P. Bertsekas,et al. Dynamic Programming and Stochastic Control , 1977, IEEE Transactions on Systems, Man, and Cybernetics.
[9] Michael M. Richter,et al. On the Notion of Similarity in Case Based Reasoning and Fuzzy Theory , 2001, Soft Computing in Case Based Reasoning.
[10] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[11] J. J. Martin. Bayesian Decision Problems and Markov Chains , 1967 .
[12] Richard S. Sutton,et al. Reinforcement learning with replacing eligibility traces , 2004, Machine Learning.
[13] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[14] Jeremy L. Wyatt,et al. Exploration Control in Reinforcement Learning using Optimistic Model Selection , 2001, International Conference on Machine Learning.
[15] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.
[16] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[17] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[18] Kee-Eung Kim,et al. Learning to Cooperate via Policy Search , 2000, UAI.
[19] Michael L. Littman,et al. Algorithms for Sequential Decision Making , 1996 .
[20] Paul Bourgine,et al. Exploration of Multi-State Environments: Local Measures and Back-Propagation of Uncertainty , 1999, Machine Learning.
[21] Stefan Wess,et al. Case-Based Reasoning Technology: From Foundations to Applications , 1998, Lecture Notes in Computer Science.
[22] Jing Peng,et al. Incremental multi-step Q-learning , 1994, Machine-mediated learning.
[23] Katia P. Sycara,et al. Evolutionary Search, Stochastic Policies with Memory, and Reinforcement Learning with Hidden State , 2001, ICML.
[24] John H. Holland,et al. Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems , 1995 .
[25] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[26] David Andre,et al. Model based Bayesian Exploration , 1999, UAI.
[27] Zdzislaw Pawlak,et al. Rough classification , 1984, Int. J. Hum. Comput. Stud..
[28] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .
[29] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[30] Kumpati S. Narendra,et al. Learning automata - an introduction , 1989 .
[31] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[32] Andrew W. Moore,et al. Direct Policy Search using Paired Statistical Tests , 2001, ICML.
[33] Claude-Nicolas Fiechter. Expected Mistake Bound Model for On-Line Reinforcement Learning , 1997, ICML.
[34] Jeremy Wyatt,et al. Exploration and inference in learning from reinforcement , 1998 .
[35] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .
[36] Geoffrey J. Gordon. Reinforcement Learning with Function Approximation Converges to a Region , 2000, NIPS.
[37] Jürgen Schmidhuber,et al. Efficient model-based exploration , 1998 .