A compression technique for analyzing disagreement-based active learning

We introduce a new and improved characterization of the label complexity of disagreement-based active learning, in which the leading quantity is the version space compression set size. This quantity is defined as the size of the smallest subset of the training data that induces the same version space. We show various applications of the new characterization, including a tight analysis of CAL and refined label complexity bounds for linear separators under mixtures of Gaussians and axis-aligned rectangles under product densities. The version space compression set size, as well as the new characterization of the label complexity, can be naturally extended to agnostic learning problems, for which we show new speedup results for two well known active learning algorithms.

[1]  Peter L. Bartlett,et al.  Learning in Neural Networks: Theoretical Foundations , 1999 .

[2]  Maria-Florina Balcan,et al.  Active and passive learning of linear separators under log-concave distributions , 2012, COLT.

[3]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[4]  Steve Hanneke,et al.  Activized Learning: Transforming Passive to Active with Improved Label Complexity , 2011, J. Mach. Learn. Res..

[5]  John Langford,et al.  Agnostic active learning , 2006, J. Comput. Syst. Sci..

[6]  E. Mammen,et al.  Smooth Discrimination Analysis , 1999 .

[7]  Eric Friedman,et al.  Active Learning for Smooth Problems , 2009, COLT.

[8]  Ran El-Yaniv,et al.  Active Learning via Perfect Selective Classification , 2012, J. Mach. Learn. Res..

[9]  Peter Auer,et al.  A new PAC bound for intersection-closed concept classes , 2004, Machine Learning.

[10]  V. Koltchinskii,et al.  Concentration inequalities and asymptotic results for ratio type empirical processes , 2006, math/0606788.

[11]  R. A. Silverman,et al.  Introductory Real Analysis , 1972 .

[12]  Maria-Florina Balcan,et al.  Margin Based Active Learning , 2007, COLT.

[13]  Daniel J. Hsu Algorithms for active learning , 2010 .

[14]  Sanjoy Dasgupta,et al.  A General Agnostic Active Learning Algorithm , 2007, ISAIM.

[15]  Ralf Herbrich,et al.  Learning Kernel Classifiers: Theory and Algorithms , 2001 .

[16]  John Langford,et al.  Importance weighted active learning , 2008, ICML '09.

[17]  Christopher K. I. Williams Learning Kernel Classifiers , 2003 .

[18]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[19]  Ran El-Yaniv,et al.  Pointwise Tracking the Optimal Regression Function , 2012, NIPS.

[20]  Manfred K. Warmuth,et al.  Relating Data Compression and Learnability , 2003 .

[21]  Steve Hanneke,et al.  Theoretical foundations of active learning , 2009 .

[22]  Steve Hanneke,et al.  Teaching Dimension and the Complexity of Active Learning , 2007, COLT.

[23]  Tibor Hegedüs,et al.  Generalized Teaching Dimensions and the Query Complexity of Learning , 1995, COLT.

[24]  Ran El-Yaniv,et al.  On the Foundations of Noise-free Selective Classification , 2010, J. Mach. Learn. Res..

[25]  John N. Tsitsiklis,et al.  Active Learning Using Arbitrary Binary Valued Queries , 1993, Machine Learning.

[26]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[27]  K. Alexander,et al.  Rates of growth and sample moduli for weighted empirical processes indexed by sets , 1987 .

[28]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[29]  Liwei Wang,et al.  Smoothness, Disagreement Coefficient, and the Label Complexity of Agnostic Active Learning , 2011, J. Mach. Learn. Res..

[30]  Ran El-Yaniv,et al.  Agnostic Pointwise-Competitive Selective Classification , 2015, J. Artif. Intell. Res..

[31]  Lisa Hellerstein,et al.  How many queries are needed to learn? , 1995, JACM.

[32]  Steve Hanneke,et al.  A bound on the label complexity of agnostic active learning , 2007, ICML '07.

[33]  John Langford,et al.  Agnostic Active Learning Without Constraints , 2010, NIPS.

[34]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[35]  P. Massart,et al.  Risk bounds for statistical learning , 2007, math/0702683.

[36]  Philip M. Long On the sample complexity of PAC learning half-spaces against the uniform distribution , 1995, IEEE Trans. Neural Networks.

[37]  Liu Yang,et al.  Surrogate Losses in Passive and Active Learning , 2012, Electronic Journal of Statistics.

[38]  Steve Hanneke Rates of convergence in active learning , 2011, 1103.1790.

[39]  Vladimir Koltchinskii,et al.  Rademacher Complexities and Bounding the Excess Risk in Active Learning , 2010, J. Mach. Learn. Res..

[40]  Michael Kearns,et al.  On the complexity of teaching , 1991, COLT '91.

[41]  Maria-Florina Balcan,et al.  The true sample complexity of active learning , 2010, Machine Learning.

[42]  Tom M. Mitchell,et al.  Version Spaces: A Candidate Elimination Approach to Rule Learning , 1977, IJCAI.