Theoretical foundations of active learning

I study the informational complexity of active learning in a statistical learning theory framework. Specifically, I derive bounds on the rates of convergence achievable by active learning, under various noise models and under general conditions on the hypothesis class. I also study the theoretical advantages of active learning over passive learning, and develop procedures for transforming passive learning algorithms into active learning algorithms with asymptotically superior label complexity. Finally, I study generalizations of active learning to more general forms of interactive statistical learning.

[1]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[2]  K. Alexander,et al.  Probability Inequalities for Empirical Processes and a Law of the Iterated Logarithm , 1984 .

[3]  Alon Itai,et al.  Learnability by fixed distributions , 1988, COLT '88.

[4]  David Haussler,et al.  Predicting {0,1}-functions on randomly drawn points , 1988, COLT '88.

[5]  Manfred K. Warmuth,et al.  Learning Nested Differences of Intersection-Closed Concept Classes , 1989, COLT '89.

[6]  S. Kulkarni,et al.  On metric entropy, Vapnik-Chervonenkis dimension, and learnability for a class of distributions , 1989 .

[7]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[8]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[9]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[10]  Gábor Lugosi,et al.  Strong minimax lower bounds for learning , 1996, COLT '96.

[11]  Lisa Hellerstein,et al.  How Many Queries Are Needed to Learn? , 1996, J. ACM.

[12]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[13]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[14]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[15]  E. Mammen,et al.  Smooth Discrimination Analysis , 1999 .

[16]  José L. Balcázar,et al.  A General Dimension for Exact Learning , 2001, COLT/EuroCOLT.

[17]  José L. Balcázar,et al.  A New Abstract Combinatorial Dimension for Exact Learning via Queries , 2002, J. Comput. Syst. Sci..

[18]  José L. Balcázar,et al.  The consistency dimension and distribution-dependent learning from queries , 2002, Theor. Comput. Sci..

[19]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[20]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[21]  John N. Tsitsiklis,et al.  Active Learning Using Arbitrary Binary Valued Queries , 1993, Machine Learning.

[22]  Manfred K. Warmuth The Optimal PAC Algorithm , 2004, COLT.

[23]  Sanjoy Dasgupta,et al.  Analysis of a greedy active learning strategy , 2004, NIPS.

[24]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[25]  Maria-Florina Balcan,et al.  A PAC-Style Model for Learning from Labeled and Unlabeled Data , 2005, COLT.

[26]  Rocco A. Servedio,et al.  Agnostically Learning Halfspaces , 2005, FOCS.

[27]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[28]  Adam Tauman Kalai,et al.  Analysis of Perceptron-Based Active Learning , 2009, COLT.

[29]  Sanjoy Dasgupta,et al.  Coarse sample complexity bounds for active learning , 2005, NIPS.

[30]  Maria-Florina Balcan,et al.  Agnostic active learning , 2006, J. Comput. Syst. Sci..

[31]  V. Koltchinskii Local Rademacher complexities and oracle inequalities in risk minimization , 2006, 0708.0083.

[32]  Yi Li,et al.  Learnability and the doubling dimension , 2006, NIPS.

[33]  R. Nowak,et al.  Upper and Lower Error Bounds for Active Learning , 2006 .

[34]  Peter Auer,et al.  A new PAC bound for intersection-closed concept classes , 2004, Machine Learning.

[35]  Matti Kääriäinen,et al.  Active Learning in the Non-realizable Case , 2006, ALT.

[36]  Steve Hanneke,et al.  Teaching Dimension and the Complexity of Active Learning , 2007, COLT.

[37]  Maria-Florina Balcan,et al.  Margin Based Active Learning , 2007, COLT.

[38]  Sanjoy Dasgupta,et al.  A General Agnostic Active Learning Algorithm , 2007, ISAIM.

[39]  Steve Hanneke,et al.  A bound on the label complexity of agnostic active learning , 2007, ICML '07.

[40]  Martin Sewell Structural Risk Minimization , 2008 .

[41]  Maria-Florina Balcan,et al.  The true sample complexity of active learning , 2010, Machine Learning.