论文信息 - The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning

The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning

The purpose of this paper is three-fold. First, we formalize and study a problem of learning probabilistic concepts in the recently proposed KWIK framework. We give details of an algorithm, known as the Adaptive k-Meteorologists Algorithm, analyze its sample-complexity upper bound, and give a matching lower bound. Second, this algorithm is used to create a new reinforcement-learning algorithm for factored-state problems that enjoys significant improvement over the previous state-of-the-art algorithm. Finally, we apply the Adaptive k-Meteorologists Algorithm to remove a limiting assumption in an existing reinforcement-learning algorithm. The effectiveness of our approaches is demonstrated empirically in a couple benchmark domains as well as a robotics navigation problem.

Lihong Li | Bethany R. Leffler | Carlos Diuk | Carlos Diuk | Lihong Li

[1] Pieter Abbeel,et al. Learning Factor Graphs in Polynomial Time and Sample Complexity , 2006, J. Mach. Learn. Res..

[2] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[3] Keiji Kanazawa,et al. A model for reasoning about persistence and causation , 1989 .

[4] Shobha Venkataraman,et al. Efficient Solution Algorithms for Factored MDPs , 2003, J. Artif. Intell. Res..

[5] Michael L. Littman,et al. A theoretical analysis of Model-Based Interval Estimation , 2005, ICML.

[6] Michael L. Littman,et al. A unifying framework for computational reinforcement learning theory , 2009 .

[7] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .

[8] Craig Boutilier,et al. Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[9] Michael L. Littman,et al. Efficient Structure Learning in Factored-State MDPs , 2007, AAAI.

[10] Thomas J. Walsh,et al. Knows what it knows: a framework for self-aware learning , 2008, ICML.

[11] Lihong Li,et al. Incremental Model-based Learners With Formal Learning-Time Guarantees , 2006, UAI.

[12] Nicholas Roy,et al. CORL: A Continuous-state Offset-dynamics Reinforcement Learner , 2008, UAI.

[13] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[14] David Haussler,et al. How to use expert advice , 1993, STOC.

[15] Michael L. Littman,et al. Efficient Reinforcement Learning with Relocatable Action Models , 2007, AAAI.

[16] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.

[17] Robert E. Schapire,et al. Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[18] Alexander L. Strehl,et al. Model-Based Reinforcement Learning in Factored-State MDPs , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[19] John N. Tsitsiklis,et al. The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..

[20] Kenji Yamanishi,et al. A learning criterion for stochastic rules , 1990, COLT '90.

[21] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[22] Michael Kearns,et al. Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.

[23] Leslie G. Valiant,et al. A theory of the learnable , 1984, CACM.