Fast and simple gradient-based optimization for semi-supervised support vector machines

One of the main learning tasks in machine learning is the one of classifying data items. The basis for such a task is usually a training set consisting of labeled patterns. In real-world settings, however, such labeled data are usually scarce, and the corresponding models might yield unsatisfying results. Unlabeled data, on the other hand, can often be obtained in huge quantities without much additional effort. A prominent research direction in the field of machine learning is semi-supervised support vector machines. This type of binary classification approach aims at taking the additional information provided by the unlabeled patterns into account to reveal more information about the structure of the data at hand. In some cases, this can yield significantly better classification results compared to a straightforward application of supervised models. One drawback, however, is the fact that generating such models requires solving difficult non-convex optimization tasks. In this work, we present a simple but effective gradient-based optimization framework to address the induced problems. The resulting method can be implemented easily using black-box optimization engines and yields excellent classification and runtime results on both sparse and non-sparse data sets.

[1]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[2]  Jason Weston,et al.  Trading convexity for scalability , 2006, ICML.

[3]  Nello Cristianini,et al.  Convex Methods for Transduction , 2003, NIPS.

[4]  M. Narasimha Murty,et al.  A fast quasi-Newton method for semi-supervised SVM , 2011, Pattern Recognit..

[5]  Ayhan Demiriz,et al.  Semi-Supervised Support Vector Machines , 1998, NIPS.

[6]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[7]  S. Sathiya Keerthi,et al.  Deterministic annealing for semi-supervised kernel machines , 2006, ICML.

[8]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[9]  Rong Jin,et al.  Generalized Maximum Margin Clustering and Unsupervised Kernel Learning , 2006, NIPS.

[10]  Fei Wang,et al.  Cuts3vm: a fast semi-supervised svm algorithm , 2008, KDD.

[11]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[12]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[13]  S. Sathiya Keerthi,et al.  Large scale semi-supervised linear SVMs , 2006, SIGIR.

[14]  Alexander Zien,et al.  A continuation method for semi-supervised SVMs , 2006, ICML.

[15]  Jing Peng,et al.  SVM vs regularized least squares classification , 2004, ICPR 2004.

[16]  Alexander Zien,et al.  Semi-Supervised Classification by Low Density Separation , 2005, AISTATS.

[17]  Bernhard Schölkopf,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[18]  Sameer A. Nene,et al.  Columbia Object Image Library (COIL100) , 1996 .

[19]  O. Mangasarian,et al.  Semi-Supervised Support Vector Machines for Unlabeled Data Classification , 2001 .

[20]  Oliver Kramer,et al.  Sparse Quasi-Newton Optimization for Semi-supervised Support Vector Machines , 2012, ICPRAM.

[21]  Ivor W. Tsang,et al.  Maximum Margin Clustering Made Practical , 2009, IEEE Trans. Neural Networks.

[22]  Alain Biem,et al.  Semisupervised Least Squares Support Vector Machine , 2009, IEEE Transactions on Neural Networks.

[23]  Zhao Hong-hai Semi-Supervised Support Vector Machines for Data Classification , 2004 .

[24]  Dale Schuurmans,et al.  Maximum Margin Clustering , 2004, NIPS.

[25]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[26]  James T. Kwok,et al.  Prototype vector machine for large scale semi-supervised learning , 2009, ICML '09.

[27]  Fei Wang,et al.  Linear Time Maximum Margin Clustering , 2010, IEEE Transactions on Neural Networks.

[28]  Oliver Kramer,et al.  Fast evolutionary maximum margin clustering , 2009, ICML '09.

[29]  S. Sathiya Keerthi,et al.  Optimization Techniques for Semi-Supervised Support Vector Machines , 2008, J. Mach. Learn. Res..

[30]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[31]  S. Sathiya Keerthi,et al.  Branch and Bound for Semi-Supervised Support Vector Machines , 2006, NIPS.

[32]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[33]  Dale Schuurmans,et al.  Unsupervised and Semi-Supervised Multi-Class Support Vector Machines , 2005, AAAI.

[34]  Tong Zhang,et al.  Text Categorization Based on Regularized Linear Classification Methods , 2001, Information Retrieval.

[35]  T. Poggio,et al.  Chapter 7 Regularized Least-Squares Classification , 2003 .

[36]  Tomaso Poggio,et al.  Everything old is new again: a fresh look at historical approaches in machine learning , 2002 .

[37]  Ivor W. Tsang,et al.  Tighter and Convex Maximum Margin Clustering , 2009, AISTATS.

[38]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.