Analysis of Perceptron-Based Active Learning

We start by showing that in an active learning setting, the Perceptron algorithm needs Ω(1/e2) labels to learn linear separators within generalization error e. We then present a simple active learning algorithm for this problem, which combines a modification of the Perceptron update with an adaptive filtering rule for deciding which points to query. For data distributed uniformly over the unit sphere, we show that our algorithm reaches generalization error e after asking for just O(d log 1/e) labels. This exponential improvement over the usual sample complexity of supervised learning had previously been demonstrated only for the computationally more complex query-by-committee algorithm.

[1]  I. J. Schoenberg,et al.  The Relaxation Method for Linear Inequalities , 1954, Canadian Journal of Mathematics.

[2]  S. Agmon The Relaxation Method for Linear Inequalities , 1954, Canadian Journal of Mathematics.

[3]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[4]  Eric B. Baum,et al.  The Perceptron Algorithm is Fast for Nonmalicious Distributions , 1990, Neural Computation.

[5]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[6]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[7]  Philip M. Long On the sample complexity of PAC learning half-spaces against the uniform distribution , 1995, IEEE Trans. Neural Networks.

[8]  Alan M. Frieze,et al.  A Polynomial-Time Algorithm for Learning Noisy Linear Threshold Functions , 1996, Algorithmica.

[9]  Rocco A. Servedio,et al.  On PAC learning using Winnow, Perceptron, and a Perceptron-like algorithm , 1999, COLT '99.

[10]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[11]  Eli Shamir,et al.  Query by committee, linear separation and random walks , 1999, Theor. Comput. Sci..

[12]  Philip M. Long An upper bound on the sample complexity of PAC-learning halfspaces with respect to the uniform distribution , 2003, Inf. Process. Lett..

[13]  Claudio Gentile,et al.  Learning Probabilistic Linear-Threshold Classifiers via Selective Sampling , 2003, COLT.

[14]  Ran Gilad-Bachrach Kernel Query By Committee ( KQBC ) , 2003 .

[15]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[16]  Claudio Gentile,et al.  Worst-Case Analysis of Selective Sampling for Linear-Threshold Algorithms , 2004, NIPS.

[17]  Dana Angluin Queries revisited , 2004, Theor. Comput. Sci..

[18]  Steven E. Hampson,et al.  Minimum Generalization Via Reflection: A Fast Linear Threshold Learner , 1999, Machine Learning.

[19]  Sanjoy Dasgupta,et al.  Analysis of a greedy active learning strategy , 2004, NIPS.

[20]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[21]  Naftali Tishby,et al.  Query by Committee Made Real , 2005, NIPS.

[22]  Sanjoy Dasgupta,et al.  Coarse sample complexity bounds for active learning , 2005, NIPS.

[23]  Maria-Florina Balcan,et al.  Agnostic active learning , 2006, J. Comput. Syst. Sci..

[24]  Claire Monteleoni,et al.  Learning with online constraints: shifting concepts and active learning , 2006 .

[25]  Claudio Gentile,et al.  Worst-Case Analysis of Selective Sampling for Linear Classification , 2006, J. Mach. Learn. Res..

[26]  Matti Kääriäinen,et al.  Active Learning in the Non-realizable Case , 2006, ALT.

[27]  Claire Monteleoni,et al.  Practical Online Active Learning for Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Maria-Florina Balcan,et al.  Margin Based Active Learning , 2007, COLT.

[29]  Steve Hanneke,et al.  A bound on the label complexity of agnostic active learning , 2007, ICML '07.