Knows what it knows: a framework for self-aware learning
暂无分享,去创建一个
[1] Y. Mansour,et al. Generalization bounds for averaged classifiers , 2004, math/0410092.
[2] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[3] Ye Tian,et al. Maximizing classifier utility when training data is costly , 2006, SKDD.
[4] Craig Boutilier,et al. Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..
[5] Claudio Gentile,et al. Robust bounds for classification via selective sampling , 2009, ICML '09.
[6] Thomas J. Walsh,et al. Exploring compact reinforcement-learning representations with linear regression , 2009, UAI.
[7] Philip M. Long,et al. Apple Tasting , 2000, Inf. Comput..
[8] Satinder Singh,et al. An upper bound on the loss from approximate optimal-value functions , 1994, Machine Learning.
[9] Pieter Abbeel,et al. Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.
[10] Yoram Singer,et al. Using and combining predictors that specialize , 1997, STOC '97.
[11] Michael Kearns,et al. Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.
[12] D. Angluin. Queries and Concept Learning , 1988 .
[13] Eduardo D. Sontag,et al. Mathematical Control Theory: Deterministic Finite Dimensional Systems , 1990 .
[14] Vladimir Vovk,et al. A tutorial on conformal prediction , 2007, J. Mach. Learn. Res..
[15] Nick Littlestone,et al. From on-line to batch learning , 1989, COLT '89.
[16] Claudio Gentile,et al. Worst-Case Analysis of Selective Sampling for Linear Classification , 2006, J. Mach. Learn. Res..
[17] Carla E. Brodley,et al. An Empirical Study of Two Approaches to Sequence Learning for Anomaly Detection , 2003, Machine Learning.
[18] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[19] Philip W. L. Fong. A Quantitative Study of Hypothesis Selection , 1995, ICML.
[20] Dimitri P. Bertsekas,et al. Stochastic optimal control : the discrete time case , 2007 .
[21] Michael L. Littman,et al. Efficient Structure Learning in Factored-State MDPs , 2007, AAAI.
[22] Dana Angluin. Queries revisited , 2004, Theor. Comput. Sci..
[23] John Langford,et al. Exploration in Metric State Spaces , 2003, ICML.
[24] H. Sebastian Seung,et al. Query by committee , 1992, COLT '92.
[25] Michael L. Littman,et al. A unifying framework for computational reinforcement learning theory , 2009 .
[26] Hans Ulrich Simon,et al. From noise-free to noise-tolerant and from on-line to batch learning , 1995, COLT '95.
[27] Lihong Li,et al. Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..
[28] Lihong Li,et al. Incremental Model-based Learners With Formal Learning-Time Guarantees , 2006, UAI.
[29] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[30] Chris Mesterharm,et al. Experience-efficient learning in associative bandit problems , 2006, ICML.
[31] Michael L. Littman,et al. Efficient Reinforcement Learning with Relocatable Action Models , 2007, AAAI.
[32] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[33] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .
[34] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.
[35] Linda Sellie,et al. Toward efficient agnostic learning , 1992, COLT '92.
[36] Andrew W. Moore,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.
[37] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[38] Nicholas Roy,et al. Provably Efficient Learning with Typed Parametric Models , 2009, J. Mach. Learn. Res..
[39] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[40] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[41] H. Sebastian Seung,et al. Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.
[42] N. Littlestone. Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).
[43] David A. Cohn,et al. Improving generalization with active learning , 1994, Machine Learning.
[44] John N. Tsitsiklis,et al. The complexity of dynamic programming , 1989, J. Complex..
[45] Keiji Kanazawa,et al. A model for reasoning about persistence and causation , 1989 .
[46] Lihong Li,et al. The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning , 2009, ICML '09.
[47] R. Bellman. Dynamic programming. , 1957, Science.
[48] Avrim Blum. Separating Distribution-Free and Mistake-Bound Learning Models over the Boolean Domain , 1994, SIAM J. Comput..
[49] Robert E. Schapire,et al. Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.
[50] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[51] Michael L. Littman,et al. Online Linear Regression and Its Application to Model-Based Reinforcement Learning , 2007, NIPS.
[52] Gábor Lugosi,et al. Minimizing regret with label efficient prediction , 2004, IEEE Transactions on Information Theory.
[53] Nicholas Roy,et al. CORL: A Continuous-state Offset-dynamics Reinforcement Learner , 2008, UAI.
[54] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..