Sample Complexity Bounds of Exploration
暂无分享,去创建一个
[1] Csaba Szepesvári,et al. Model-based reinforcement learning with nearly tight exploration complexity bounds , 2010, ICML.
[2] Neil D. Lawrence,et al. Missing Data in Kernel PCA , 2006, ECML.
[3] Clayton T. Morrison,et al. Blending Autonomous Exploration and Apprenticeship Learning , 2011, NIPS.
[4] Claude-Nicolas Fiechter. Expected Mistake Bound Model for On-Line Reinforcement Learning , 1997, ICML.
[5] Michael L. Littman,et al. A unifying framework for computational reinforcement learning theory , 2009 .
[6] Paul Bourgine,et al. Exploration of Multi-State Environments: Local Measures and Back-Propagation of Uncertainty , 1999, Machine Learning.
[7] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[8] Michael Kearns,et al. Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.
[9] Andre Cohen,et al. An object-oriented representation for efficient reinforcement learning , 2008, ICML '08.
[10] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[11] Alexander L. Strehl,et al. Probably Approximately Correct (PAC) Exploration in Reinforcement Learning , 2008, ISAIM.
[12] Andrew G. Barto,et al. Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .
[13] Michael L. Littman,et al. Efficient Structure Learning in Factored-State MDPs , 2007, AAAI.
[14] Michael L. Littman,et al. Multi-resolution Exploration in Continuous Spaces , 2008, NIPS.
[15] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[16] Jesse Hoey,et al. An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.
[17] Lihong Li,et al. Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..
[18] Steven D. Whitehead,et al. Complexity and Cooperation in Q-Learning , 1991, ML.
[19] D. Sofge. THE ROLE OF EXPLORATION IN LEARNING CONTROL , 1992 .
[20] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.
[21] Thomas J. Walsh,et al. Integrating Sample-Based Planning and Model-Based Reinforcement Learning , 2010, AAAI.
[22] Csaba Szepesvári,et al. Agnostic KWIK learning and efficient approximate reinforcement learning , 2011, COLT.
[23] Csaba Szepesv. Agnostic KWIK learning and ecient approximate reinforcement learning , 2011 .
[24] Apostolos Burnetas,et al. Optimal Adaptive Policies for Markov Decision Processes , 1997, Math. Oper. Res..
[25] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[26] Andrew W. Moore,et al. Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.
[27] Nick Littlestone,et al. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm , 2004, Machine Learning.
[28] Ambuj Tewari,et al. Optimistic Linear Programming gives Logarithmic Regret for Irreducible MDPs , 2007, NIPS.
[29] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .
[30] Ambuj Tewari,et al. REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs , 2009, UAI.
[31] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[32] John Langford,et al. Exploration in Metric State Spaces , 2003, ICML.
[33] R. Simmons,et al. The effect of representation and knowledge on goal-directed exploration with reinforcement-learning algorithms , 2004, Machine Learning.
[34] Csaba Szepesvári,et al. Online Markov Decision Processes Under Bandit Feedback , 2010, IEEE Transactions on Automatic Control.
[35] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.
[36] Lihong Li,et al. Incremental Model-based Learners With Formal Learning-Time Guarantees , 2006, UAI.
[37] Jürgen Schmidhuber,et al. Efficient model-based exploration , 1998 .
[38] Lihong Li,et al. Online exploration in least-squares policy iteration , 2009, AAMAS.
[39] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[40] Michael L. Littman,et al. Online Linear Regression and Its Application to Model-Based Reinforcement Learning , 2007, NIPS.
[41] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[42] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[43] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[44] Reid G. Simmons,et al. The Effect of Representation and Knowledge on Goal-Directed Exploration with Reinforcement-Learning Algorithms , 2005, Machine Learning.
[45] Shie Mannor,et al. PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.
[46] András Lörincz,et al. The many faces of optimism: a unifying approach , 2008, ICML '08.
[47] Thomas J. Walsh,et al. Exploring compact reinforcement-learning representations with linear regression , 2009, UAI.
[48] Michael L. Littman,et al. Dimension reduction and its application to model-based exploration in continuous spaces , 2010, Machine Learning.
[49] Pieter Abbeel,et al. Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.
[50] Dale Schuurmans,et al. Algorithm-Directed Exploration for Model-Based Reinforcement Learning in Factored MDPs , 2002, ICML.
[51] Doina Precup,et al. Using MDP Characteristics to Guide Exploration in Reinforcement Learning , 2003, ECML.
[52] Csaba Szepesv. Model-based reinforcement learning with nearly tight exploration complexity bounds , 2010 .
[53] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[54] Thomas J. Walsh,et al. Generalizing Apprenticeship Learning across Hypothesis Classes , 2010, ICML.
[55] Lihong Li,et al. A Bayesian Sampling Approach to Exploration in Reinforcement Learning , 2009, UAI.
[56] Morteza Zadimoghaddam,et al. Trading off Mistakes and Don't-Know Predictions , 2010, NIPS.
[57] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[58] Hendrik Blockeel,et al. Machine Learning: ECML 2003 , 2003, Lecture Notes in Computer Science.
[59] Andrew Y. Ng,et al. Near-Bayesian exploration in polynomial time , 2009, ICML '09.
[60] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[61] Nicholas Roy,et al. Provably Efficient Learning with Typed Parametric Models , 2009, J. Mach. Learn. Res..
[62] Alexander L. Strehl,et al. Model-Based Reinforcement Learning in Factored-State MDPs , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[63] Peter Stone,et al. Model-based function approximation in reinforcement learning , 2007, AAMAS '07.
[64] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[65] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[66] David Andre,et al. Model based Bayesian Exploration , 1999, UAI.
[67] Thomas J. Walsh,et al. Knows what it knows: a framework for self-aware learning , 2008, ICML '08.
[68] Michael Kearns,et al. Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.
[69] Claude-Nicolas Fiechter,et al. Efficient reinforcement learning , 1994, COLT '94.
[70] Yishay Mansour,et al. Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..
[71] Lihong Li,et al. Reducing reinforcement learning to KWIK online regression , 2010, Annals of Mathematics and Artificial Intelligence.
[72] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[73] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.
[74] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[75] Thomas J. Walsh,et al. Efficient learning of relational models for sequential decision making , 2010 .
[76] Lihong Li,et al. The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning , 2009, ICML '09.