论文信息 - Learning noisy linear classifiers via adaptive and selective sampling

Learning noisy linear classifiers via adaptive and selective sampling

We introduce efficient margin-based algorithms for selective sampling and filtering in binary classification tasks. Experiments on real-world textual data reveal that our algorithms perform significantly better than popular and similarly efficient competitors. Using the so-called Mammen-Tsybakov low noise condition to parametrize the instance distribution, and assuming linear label noise, we show bounds on the convergence rate to the Bayes risk of a weaker adaptive variant of our selective sampler. Our analysis reveals that, excluding logarithmic factors, the average risk of this adaptive sampler converges to the Bayes risk at rate N−(1+α)(2+α)/2(3+α) where N denotes the number of queried labels, and α>0 is the exponent in the low noise condition. For all $\alpha>\sqrt{3}-1\approx0.73$ this convergence rate is asymptotically faster than the rate N−(1+α)/(2+α) achieved by the fully supervised version of the base selective sampler, which queries all labels. Moreover, for α→∞ (hard margin condition) the gap between the semi- and fully-supervised rates becomes exponential.

[1] David A. Cohn,et al. Training Connectionist Networks with Queries and Selective Sampling , 1989, NIPS.

[2] William A. Gale,et al. A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[3] David D. Lewis,et al. Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[4] Thorsten Joachims,et al. Making large-scale support vector machine learning practical , 1999 .

[5] B. Schölkopf,et al. Advances in kernel methods: support vector learning , 1999 .

[6] Nello Cristianini,et al. Query Learning with Large Margin Classifiers , 2000, ICML.

[7] Daphne Koller,et al. Support Vector Machine Active Learning with Application sto Text Classification , 2000, ICML.

[8] Craig A. Knoblock,et al. Selective Sampling with Redundant Views , 2000, AAAI/IAAI.

[9] Philip M. Long,et al. Apple Tasting , 2000, Inf. Comput..

[10] V. Vovk. Competitive On‐line Statistics , 2001 .

[11] Dana Angluin,et al. Queries revisited , 2001, Theoretical Computer Science.

[12] Daphne Koller,et al. Support Vector Machine Active Learning with Applications to Text Classification , 2002, J. Mach. Learn. Res..

[13] A. Tsybakov,et al. Optimal aggregation of classifiers in statistical learning , 2003 .

[14] H. Sebastian Seung,et al. Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[15] Claudio Gentile,et al. Worst-Case Analysis of Selective Sampling for Linear-Threshold Algorithms , 2004, NIPS.

[16] Claudio Gentile,et al. Incremental Algorithms for Hierarchical Classification , 2004, J. Mach. Learn. Res..

[17] Manfred K. Warmuth,et al. Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions , 1999, Machine Learning.

[18] Gilles Blanchard,et al. Statistical properties of Kernel Prinicipal Component Analysis , 2019 .

[19] David A. Cohn,et al. Improving generalization with active learning , 1994, Machine Learning.

[20] Naftali Tishby,et al. Query by Committee Made Real , 2005, NIPS.

[21] Nello Cristianini,et al. On the eigenspectrum of the gram matrix and the generalization error of kernel-PCA , 2005, IEEE Transactions on Information Theory.

[22] Claudio Gentile,et al. A Second-Order Perceptron Algorithm , 2002, SIAM J. Comput..

[23] Adam Tauman Kalai,et al. Analysis of Perceptron-Based Active Learning , 2009, COLT.

[24] S. Boucheron,et al. Theory of classification : a survey of some recent advances , 2005 .

[25] Michael I. Jordan,et al. Convexity, Classification, and Risk Bounds , 2006 .

[26] Mikio L. Braun,et al. Accurate Error Bounds for the Eigenvalues of the Kernel Matrix , 2006, J. Mach. Learn. Res..

[27] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .

[28] Claudio Gentile,et al. Worst-Case Analysis of Selective Sampling for Linear Classification , 2006, J. Mach. Learn. Res..

[29] Yiming Ying,et al. Online Regularized Classification Algorithms , 2006, IEEE Transactions on Information Theory.

[30] Matti Kääriäinen,et al. Active Learning in the Non-realizable Case , 2006, ALT.

[31] Claire Monteleoni,et al. Practical Online Active Learning for Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[32] Maria-Florina Balcan,et al. Margin Based Active Learning , 2007, COLT.

[33] Sanjoy Dasgupta,et al. A General Agnostic Active Learning Algorithm , 2007, ISAIM.

[34] Ingo Steinwart,et al. Fast rates for support vector machines using Gaussian kernels , 2007, 0708.1838.

[35] Michael L. Littman,et al. Online Linear Regression and Its Application to Model-Based Reinforcement Learning , 2007, NIPS.

[36] Steve Hanneke,et al. A bound on the label complexity of agnostic active learning , 2007, ICML '07.

[37] Thomas J. Walsh,et al. Knows what it knows: a framework for self-aware learning , 2008, ICML.

[38] Carla E. Brodley,et al. Advances in online learning-based spam filtering , 2008 .

[39] Robert D. Nowak,et al. Minimax Bounds for Active Learning , 2007, IEEE Transactions on Information Theory.

[40] Steve Hanneke,et al. Adaptive Rates of Convergence in Active Learning , 2009, COLT.

[41] John Langford,et al. Agnostic active learning , 2006, J. Comput. Syst. Sci..

[42] W. Marsden. I and J , 2012 .