Knows what it knows: a framework for self-aware learning

We introduce a learning framework that combines elements of the well-known PAC and mistake-bound models. The KWIK (knows what it knows) framework was designed particularly for its utility in learning settings where active exploration can impact the training examples the learner is exposed to, as is true in reinforcement-learning and active-learning problems. We catalog several KWIK-learnable classes and open problems.

[1]  Y. Mansour,et al.  Generalization bounds for averaged classifiers , 2004, math/0410092.

[2]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[3]  Ye Tian,et al.  Maximizing classifier utility when training data is costly , 2006, SKDD.

[4]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[5]  Claudio Gentile,et al.  Robust bounds for classification via selective sampling , 2009, ICML '09.

[6]  Thomas J. Walsh,et al.  Exploring compact reinforcement-learning representations with linear regression , 2009, UAI.

[7]  Philip M. Long,et al.  Apple Tasting , 2000, Inf. Comput..

[8]  Satinder Singh,et al.  An upper bound on the loss from approximate optimal-value functions , 1994, Machine Learning.

[9]  Pieter Abbeel,et al.  Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.

[10]  Yoram Singer,et al.  Using and combining predictors that specialize , 1997, STOC '97.

[11]  Michael Kearns,et al.  Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.

[12]  D. Angluin Queries and Concept Learning , 1988 .

[13]  Eduardo D. Sontag,et al.  Mathematical Control Theory: Deterministic Finite Dimensional Systems , 1990 .

[14]  Vladimir Vovk,et al.  A tutorial on conformal prediction , 2007, J. Mach. Learn. Res..

[15]  Nick Littlestone,et al.  From on-line to batch learning , 1989, COLT '89.

[16]  Claudio Gentile,et al.  Worst-Case Analysis of Selective Sampling for Linear Classification , 2006, J. Mach. Learn. Res..

[17]  Carla E. Brodley,et al.  An Empirical Study of Two Approaches to Sequence Learning for Anomaly Detection , 2003, Machine Learning.

[18]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[19]  Philip W. L. Fong A Quantitative Study of Hypothesis Selection , 1995, ICML.

[20]  Dimitri P. Bertsekas,et al.  Stochastic optimal control : the discrete time case , 2007 .

[21]  Michael L. Littman,et al.  Efficient Structure Learning in Factored-State MDPs , 2007, AAAI.

[22]  Dana Angluin Queries revisited , 2004, Theor. Comput. Sci..

[23]  John Langford,et al.  Exploration in Metric State Spaces , 2003, ICML.

[24]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[25]  Michael L. Littman,et al.  A unifying framework for computational reinforcement learning theory , 2009 .

[26]  Hans Ulrich Simon,et al.  From noise-free to noise-tolerant and from on-line to batch learning , 1995, COLT '95.

[27]  Lihong Li,et al.  Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..

[28]  Lihong Li,et al.  Incremental Model-based Learners With Formal Learning-Time Guarantees , 2006, UAI.

[29]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[30]  Chris Mesterharm,et al.  Experience-efficient learning in associative bandit problems , 2006, ICML.

[31]  Michael L. Littman,et al.  Efficient Reinforcement Learning with Relocatable Action Models , 2007, AAAI.

[32]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[33]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[34]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[35]  Linda Sellie,et al.  Toward efficient agnostic learning , 1992, COLT '92.

[36]  Andrew W. Moore,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[37]  Lihong Li,et al.  PAC model-free reinforcement learning , 2006, ICML.

[38]  Nicholas Roy,et al.  Provably Efficient Learning with Typed Parametric Models , 2009, J. Mach. Learn. Res..

[39]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[40]  Sham M. Kakade,et al.  On the sample complexity of reinforcement learning. , 2003 .

[41]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[42]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[43]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[44]  John N. Tsitsiklis,et al.  The complexity of dynamic programming , 1989, J. Complex..

[45]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[46]  Lihong Li,et al.  The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning , 2009, ICML '09.

[47]  R. Bellman Dynamic programming. , 1957, Science.

[48]  Avrim Blum Separating Distribution-Free and Mistake-Bound Learning Models over the Boolean Domain , 1994, SIAM J. Comput..

[49]  Robert E. Schapire,et al.  Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[50]  Yishay Mansour,et al.  A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.

[51]  Michael L. Littman,et al.  Online Linear Regression and Its Application to Model-Based Reinforcement Learning , 2007, NIPS.

[52]  Gábor Lugosi,et al.  Minimizing regret with label efficient prediction , 2004, IEEE Transactions on Information Theory.

[53]  Nicholas Roy,et al.  CORL: A Continuous-state Offset-dynamics Reinforcement Learner , 2008, UAI.

[54]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..