Lagrangean-Based Combinatorial Optimization for Large-Scale S3VMs

The process of manually labeling instances, essential to a supervised classifier, can be expensive and time-consuming. In such a scenario the semisupervised approach, which makes the use of unlabeled patterns when building the decision function, is a more appealing choice. Indeed, large amounts of unlabeled samples often can be easily obtained. Many optimization techniques have been developed in the last decade to include the unlabeled patterns in the support vector machines formulation. Two broad strategies are followed: continuous and combinatorial. The approach presented in this paper belongs to the latter family and is especially suitable when a fair estimation of the proportion of positive and negative samples is available. Our method is very simple and requires a very light parameter selection. Several medium- and large-scale experiments on both artificial and real-world data sets have been carried out proving the effectiveness and the efficiency of the proposed algorithm.

[1]  Massih-Reza Amini,et al.  Learning from Multiple Partially Observed Views - an Application to Multilingual Text Categorization , 2009, NIPS.

[2]  Zhi-Hua Zhou,et al.  Semi-supervised learning using label mean , 2009, ICML '09.

[3]  Ivor W. Tsang,et al.  Convex and scalable weakly labeled SVMs , 2013, J. Mach. Learn. Res..

[4]  Gui-Song Xia,et al.  Learning High-level Features for Satellite Image Classification With Limited Labeled Samples , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[5]  S. Sathiya Keerthi,et al.  Optimization Techniques for Semi-Supervised Support Vector Machines , 2008, J. Mach. Learn. Res..

[6]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[7]  S. Sathiya Keerthi,et al.  Branch and Bound for Semi-Supervised Support Vector Machines , 2006, NIPS.

[8]  Prem Timsina,et al.  Using Semi-Supervised Learning for the Creation of Medical Systematic Review: An Exploratory Analysis , 2016, 2016 49th Hawaii International Conference on System Sciences (HICSS).

[9]  Tijl De Bie,et al.  Semi-Supervised Learning Using Semi-Definite Programming , 2006, Semi-Supervised Learning.

[10]  Jason Weston,et al.  Large Scale Transductive SVMs , 2006, J. Mach. Learn. Res..

[11]  Oliver Kramer,et al.  Fast and simple gradient-based optimization for semi-supervised support vector machines , 2014, Neurocomputing.

[12]  S. Sathiya Keerthi,et al.  Deterministic annealing for semi-supervised kernel machines , 2006, ICML.

[13]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[14]  Zhi-Hua Zhou,et al.  Towards Making Unlabeled Data Never Hurt , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Ron Cole,et al.  The ISOLET spoken letter database , 1990 .

[16]  Sameer A. Nene,et al.  Columbia Object Image Library (COIL100) , 1996 .

[17]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[18]  Alexander Zien,et al.  A continuation method for semi-supervised SVMs , 2006, ICML.

[19]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[20]  Yi Zhu,et al.  Predicting user evaluations of spoken dialog systems using semi-supervised learning , 2010, 2010 IEEE Spoken Language Technology Workshop.

[21]  Tomaso Poggio,et al.  Everything old is new again: a fresh look at historical approaches in machine learning , 2002 .