Active learning with support vector machines

In machine learning, active learning refers to algorithms that autonomously select the data points from which they will learn. There are many data mining applications in which large amounts of unlabeled data are readily available, but labels (e.g., human annotations or results coming from complex experiments) are costly to obtain. In such scenarios, an active learning algorithm aims at identifying data points that, if labeled and used for training, would most improve the learned model. Labels are then obtained only for the most promising data points. This speeds up learning and reduces labeling costs. Support vector machine (SVM) classifiers are particularly well‐suited for active learning due to their convenient mathematical properties. They perform linear classification, typically in a kernel‐induced feature space, which makes expressing the distance of a data point from the decision boundary straightforward. Furthermore, heuristics can efficiently help estimate how strongly learning from a data point influences the current model. This information can be used to actively select training samples. After a brief introduction to the active learning problem, we discuss different query strategies for selecting informative data points and review how these strategies give rise to different variants of active learning with SVMs.

[1]  Daphne Koller,et al.  Active learning: theory and applications , 2001 .

[2]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[3]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[4]  Klaus Brinker,et al.  Incorporating Diversity in Active Learning with Support Vector Machines , 2003, ICML.

[5]  Hwanjo Yu,et al.  SVM selective sampling for ranking with application to data retrieval , 2005, KDD '05.

[6]  Jason Weston,et al.  Support vector machines for multi-class pattern recognition , 1999, ESANN.

[7]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[8]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[9]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[10]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[11]  Christine D. Piatko,et al.  Named Entity Recognition using Hundreds of Thousands of Features , 2003, CoNLL.

[12]  John Langford,et al.  Importance weighted active learning , 2008, ICML '09.

[13]  Isabelle Guyon,et al.  Results of the Active Learning Challenge , 2011, Active Learning and Experimental Design @ AISTATS.

[14]  Gert Cauwenberghs,et al.  Incremental and Decremental Support Vector Machine Learning , 2000, NIPS.

[15]  Lawrence O. Hall,et al.  Active learning to recognize multiple types of plankton , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[16]  C.-J. Lin,et al.  Active Learning and Experimental Design with SVMs , 2011, Active Learning and Experimental Design @ AISTATS.

[17]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[18]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[19]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[20]  Hiroshi Sako,et al.  Handwritten digit recognition using state-of-the-art techniques , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[21]  Rong Jin,et al.  Active Learning by Querying Informative and Representative Examples , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[23]  Sanjoy Dasgupta,et al.  Hierarchical sampling for active learning , 2008, ICML '08.

[24]  Nello Cristianini,et al.  Machine Learning and Knowledge Discovery in Databases (ECML PKDD) , 2010 .

[25]  Mehryar Mohri,et al.  Sample Selection Bias Correction Theory , 2008, ALT.

[26]  Yuji Matsumoto,et al.  Use of Support Vector Learning for Chunk Identification , 2000, CoNLL/LLL.

[27]  Burr Settles,et al.  Active Learning , 2012, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[28]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[29]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[30]  Nello Cristianini,et al.  Support vector machines , 2009 .

[31]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[32]  John Langford,et al.  Agnostic Active Learning Without Constraints , 2010, NIPS.

[33]  Gökhan Tür,et al.  Active learning for spoken language understanding , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[34]  S. Vavasis Nonlinear optimization: complexity issues , 1991 .

[35]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[36]  Tom M. Mitchell,et al.  Generalization as Search , 2002 .

[37]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[38]  Nello Cristianini,et al.  Query Learning with Large Margin Classifiers , 2000, ICML.

[39]  Manabu Sassano,et al.  An Empirical Study of Active Learning with Support Vector Machines for Japanese Word Segmentation , 2002, ACL.

[40]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[41]  Jason Baldridge,et al.  Active learning for HPSG parse selection , 2003, CoNLL.

[42]  Olivia M. L. Stone,et al.  The Canary Islands , 1887, Nature.

[43]  Adam A. Miller,et al.  ACTIVE LEARNING TO OVERCOME SAMPLE SELECTION BIAS: APPLICATION TO PHOTOMETRIC VARIABLE STAR CLASSIFICATION , 2011, 1106.2832.

[44]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[45]  John Langford,et al.  Agnostic active learning , 2006, J. Comput. Syst. Sci..

[46]  Sanjoy Dasgupta,et al.  Two faces of active learning , 2011, Theor. Comput. Sci..

[47]  Xiaowei Xu,et al.  Representative Sampling for Text Classification Using Support Vector Machines , 2003, ECIR.

[48]  Klaus Brinker,et al.  Active learning of label ranking functions , 2004, ICML.

[49]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[50]  Alexander G. Gray,et al.  UPAL: Unbiased Pool Based Active Learning , 2011, AISTATS.

[51]  Lorenzo Bruzzone,et al.  A novel active learning method for support vector regression to estimate biophysical parameters from remotely sensed images , 2012, Remote Sensing.

[52]  Christian Igel,et al.  Second-Order SMO Improves SVM Online and Active Learning , 2008, Neural Computation.

[53]  Ulrich H.-G. Kreßel,et al.  Pairwise classification and support vector machines , 1999 .

[54]  Wei Fan,et al.  Actively Transfer Domain Knowledge , 2008, ECML/PKDD.

[55]  Michele Banko,et al.  Scaling to Very Very Large Corpora for Natural Language Disambiguation , 2001, ACL.

[56]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[57]  Daphne Koller,et al.  Support Vector Machine Active Learning with Application sto Text Classification , 2000, ICML.

[58]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[59]  Peter Sollich Probabilistic interpretations and Bayesian methods for support vector machines , 1999 .

[60]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[61]  C. A. Murthy,et al.  A probabilistic active support vector learning algorithm , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[62]  Jason Weston,et al.  Fast Kernel Classifiers with Online and Active Learning , 2005, J. Mach. Learn. Res..

[63]  Raymond J. Mooney,et al.  Active Learning for Natural Language Parsing and Information Extraction , 1999, ICML.

[64]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[65]  Rong Jin,et al.  Semi-supervised SVM batch mode active learning for image retrieval , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[66]  Koby Crammer,et al.  On the Learnability and Design of Output Codes for Multiclass Problems , 2002, Machine Learning.

[67]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.

[68]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[69]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[70]  Stefan Rüping,et al.  Incremental Learning with Support Vector Machines , 2001, ICDM.

[71]  Andreas Krause,et al.  Advances in Neural Information Processing Systems (NIPS) , 2014 .

[72]  Chih-Jen Lin,et al.  Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[73]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[74]  Shlomo Argamon,et al.  Committee-Based Sample Selection for Probabilistic Classifiers , 1999, J. Artif. Intell. Res..

[75]  Steven C. H. Hoi,et al.  PAMR: Passive aggressive mean reversion strategy for portfolio selection , 2012, Machine Learning.

[76]  Thomas G. Dietterich,et al.  Editors. Advances in Neural Information Processing Systems , 2002 .

[77]  John Langford,et al.  Cost-sensitive learning by cost-proportionate example weighting , 2003, Third IEEE International Conference on Data Mining.

[78]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[79]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[80]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[81]  Chun-Liang Li,et al.  Active Learning with Hinted Support Vector Machine , 2012, ACML.

[82]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[83]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[84]  José Luis Rojo-Álvarez,et al.  Support vector machines in engineering: an overview , 2014, Wiley Interdiscip. Rev. Data Min. Knowl. Discov..

[85]  Samy Bengio,et al.  SVMTorch: Support Vector Machines for Large-Scale Regression Problems , 2001, J. Mach. Learn. Res..

[86]  Fredrik Olsson,et al.  A literature survey of active machine learning in the context of natural language processing , 2009 .

[87]  Jason Baldridge,et al.  Active Learning and the Total Cost of Annotation , 2004, EMNLP.

[88]  Gita Reese Sukthankar,et al.  Importance-weighted label prediction for active learning with noisy annotations , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).