Worst-Case Analysis of Selective Sampling for Linear Classification

A selective sampling algorithm is a learning algorithm for classification that, based on the past observed data, decides whether to ask the label of each new instance to be classified. In this paper, we introduce a general technique for turning linear-threshold classification algorithms from the general additive family into randomized selective sampling algorithms. For the most popular algorithms in this family we derive mistake bounds that hold for individual sequences of examples. These bounds show that our semi-supervised algorithms can achieve, on average, the same accuracy as that of their fully supervised counterparts, but using fewer labels. Our theoretical results are corroborated by a number of experiments on real-world textual data. The outcome of these experiments is essentially predicted by our theoretical results: Our selective sampling algorithms tend to perform as well as the algorithms receiving the true label after each classification, while observing in practice substantially fewer labels.

[1]  Mark Herbster,et al.  Tracking the Best Linear Predictor , 2001, J. Mach. Learn. Res..

[2]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[3]  Jürgen Forster,et al.  On Relative Loss Bounds in Generalized Linear Regression , 1999, FCT.

[4]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[5]  J. Shawe-Taylor Potential-Based Algorithms in On-Line Prediction and Game Theory ∗ , 2001 .

[6]  Claudio Gentile,et al.  Linear Hinge Loss and Average Margin , 1998, NIPS.

[7]  Claudio Gentile,et al.  The Robustness of the p-Norm Algorithms , 1999, COLT '99.

[8]  T. Lai,et al.  Least Squares Estimates in Stochastic Regression Models with Applications to Identification and Control of Dynamic Systems , 1982 .

[9]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[10]  N. Littlestone Mistake bounds and logarithmic linear-threshold learning algorithms , 1990 .

[11]  Dana Angluin,et al.  Queries and concept learning , 1988, Machine Learning.

[12]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[13]  Claudio Gentile,et al.  A Second-Order Perceptron Algorithm , 2002, SIAM J. Comput..

[14]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[15]  Philip M. Long,et al.  Apple Tasting , 2000, Inf. Comput..

[16]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[17]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[18]  Claudio Gentile,et al.  A New Approximate Maximal Margin Classification Algorithm , 2002, J. Mach. Learn. Res..

[19]  Claudio Gentile,et al.  Learning Probabilistic Linear-Threshold Classifiers via Selective Sampling , 2003, COLT.

[20]  Yi Li,et al.  The Relaxed Online Maximum Margin Algorithm , 1999, Machine Learning.

[21]  Jason Weston,et al.  Fast Kernel Classifiers with Online and Active Learning , 2005, J. Mach. Learn. Res..

[22]  H. D. Block The perceptron: a model for brain functioning. I , 1962 .

[23]  Manfred K. Warmuth,et al.  Relative Loss Bounds for Multidimensional Regression Problems , 1997, Machine Learning.

[24]  Daphne Koller,et al.  Support Vector Machine Active Learning with Application sto Text Classification , 2000, ICML.

[25]  Dale Schuurmans,et al.  General Convergence Results for Linear Discriminant Updates , 1997, COLT '97.

[26]  Nello Cristianini,et al.  Query Learning with Large Margin Classi ersColin , 2000 .

[27]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[28]  Yoram Singer,et al.  The Forgetron: A Kernel-Based Perceptron on a Fixed Budget , 2005, NIPS.

[29]  Manfred K. Warmuth,et al.  Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions , 1999, Machine Learning.

[30]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT' 98.

[31]  Claudio Gentile,et al.  Adaptive and Self-Confident On-Line Learning Algorithms , 2000, J. Comput. Syst. Sci..

[32]  David P. Helmbold,et al.  Some label efficient learning results , 1997, COLT '97.

[33]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[34]  Albert B Novikoff,et al.  ON CONVERGENCE PROOFS FOR PERCEPTRONS , 1963 .

[35]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[36]  David A. Cohn,et al.  Training Connectionist Networks with Queries and Selective Sampling , 1989, NIPS.