Active Learning via Perfect Selective Classification

We discover a strong relation between two known learning models: stream-based active learning and perfect selective classification (an extreme case of 'classification with a reject option'). For these models, restricted to the realizable case, we show a reduction of active learning to selective classification that preserves fast rates. Applying this reduction to recent results for selective classification, we derive exponential target-independent label complexity speedup for actively learning general (non-homogeneous) linear classifiers when the data distribution is an arbitrary high dimensional mixture of Gaussians. Finally, we study the relation between the proposed technique and existing label complexity measures, including teaching dimension and disagreement coefficient.

[1]  Daniel Hug,et al.  Gaussian polytopes: variances and limit theorems , 2005, Advances in Applied Probability.

[2]  Maria-Florina Balcan,et al.  The true sample complexity of active learning , 2010, Machine Learning.

[3]  Steve Hanneke,et al.  Theoretical foundations of active learning , 2009 .

[4]  C. K. Chow,et al.  An optimum character recognition system using decision functions , 1957, IRE Trans. Electron. Comput..

[5]  Ran El-Yaniv,et al.  On the Foundations of Noise-free Selective Classification , 2010, J. Mach. Learn. Res..

[6]  Ran El-Yaniv,et al.  Online Choice of Active Learning Algorithms , 2003, J. Mach. Learn. Res..

[7]  Peter L. Bartlett,et al.  Learning in Neural Networks: Theoretical Foundations , 1999 .

[8]  H. Sebastian Seung,et al.  Information, Prediction, and Query by Committee , 1992, NIPS.

[9]  Daniel Hug,et al.  Asymptotic mean values of Gaussian polytopes , 2003 .

[10]  Y. Mansour,et al.  Generalization bounds for averaged classifiers , 2004, math/0410092.

[11]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[12]  Sanjoy Dasgupta,et al.  Coarse sample complexity bounds for active learning , 2005, NIPS.

[13]  Adam Tauman Kalai,et al.  Analysis of Perceptron-Based Active Learning , 2009, COLT.

[14]  Steve Hanneke,et al.  A bound on the label complexity of agnostic active learning , 2007, ICML '07.

[15]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[16]  Peter L. Bartlett,et al.  Classification with a Reject Option using a Hinge Loss , 2008, J. Mach. Learn. Res..

[17]  Eric Friedman,et al.  Active Learning for Smooth Problems , 2009, COLT.

[18]  M. Habib Probabilistic methods for algorithmic discrete mathematics , 1998 .

[19]  Steve Hanneke Rates of convergence in active learning , 2011, 1103.1790.

[20]  Michael Kearns,et al.  On the complexity of teaching , 1991, COLT '91.

[21]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[22]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[23]  Ran El-Yaniv,et al.  Agnostic Selective Classification , 2011, NIPS.

[24]  M. Wegkamp Lasso type classifiers with a reject option , 2007, 0705.2363.

[25]  Ran El-Yaniv,et al.  Repairing self-confident active-transductive learners using systematic exploration , 2008, Pattern Recognit. Lett..

[26]  Radu Herbei,et al.  Classification with reject option , 2006 .

[27]  Steve Hanneke,et al.  Activized Learning: Transforming Passive to Active with Improved Label Complexity , 2011, J. Mach. Learn. Res..

[28]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[29]  Tom M. Mitchell,et al.  Version Spaces: A Candidate Elimination Approach to Rule Learning , 1977, IJCAI.

[30]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[31]  Steve Hanneke,et al.  Teaching Dimension and the Complexity of Active Learning , 2007, COLT.

[32]  Tibor Hegedüs,et al.  Generalized Teaching Dimensions and the Query Complexity of Learning , 1995, COLT.

[33]  C. K. Chow,et al.  On optimum recognition error and reject tradeoff , 1970, IEEE Trans. Inf. Theory.

[34]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[35]  Eli Shamir,et al.  Query by committee, linear separation and random walks , 1999, Theor. Comput. Sci..

[36]  David A. Cohn,et al.  Training Connectionist Networks with Queries and Selective Sampling , 1989, NIPS.