R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning
暂无分享,去创建一个
[1] R. Karp,et al. On Nonterminating Stochastic Games , 1966 .
[2] A. Banos. On Pseudo-Games , 1968 .
[3] N. Megiddo. On repeated games with incomplete information played by non-Bayesian players , 1980 .
[4] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[5] Jürgen Schmidhuber,et al. Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.
[6] C. Atkeson,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.
[7] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .
[8] D. Fudenberg,et al. Self-confirming equilibrium , 1993 .
[9] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[10] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[11] Csaba Szepesvári,et al. A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.
[12] Moshe Tennenholtz,et al. Dynamic Non-Bayesian Decision Making , 1997, J. Artif. Intell. Res..
[13] Prasad Tadepalli,et al. Model-Based Average Reward Reinforcement Learning , 1998, Artif. Intell..
[14] Michael P. Wellman,et al. Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.
[15] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[16] Michael Kearns,et al. Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.
[17] Ronen I. Brafman,et al. A near-optimal polynomial time algorithm for learning in certain classes of stochastic games , 2000, Artif. Intell..
[18] S. Hart,et al. A Reinforcement Procedure Leading to Correlated Equilibrium , 2001 .
[19] David B. Leake. Artiicial Intelligence , 2001 .
[20] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 1998, Machine Learning.
[21] Andrew W. Moore,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.