Towards Making Unlabeled Data Never Hurt

It is usually expected that learning performance can be improved by exploiting unlabeled data, particularly when the number of labeled data is limited. However, it has been reported that, in some cases existing semi-supervised learning approaches perform even worse than supervised ones which only use labeled data. For this reason, it is desirable to develop safe semi-supervised learning approaches that will not significantly reduce learning performance when unlabeled data are used. This paper focuses on improving the safeness of semi-supervised support vector machines (S3VMs). First, the S3VM-us approach is proposed. It employs a conservative strategy and uses only the unlabeled instances that are very likely to be helpful, while avoiding the use of highly risky ones. This approach improves safeness but its performance improvement using unlabeled data is often much smaller than S3VMs. In order to develop a safe and well-performing approach, we examine the fundamental assumption of S3VMs, i.e., low-density separation. Based on the observation that multiple good candidate low-density separators may be identified from training data, safe semi-supervised support vector machines (S4VMs) are here proposed. This approach uses multiple low-density separators to approximate the ground-truth decision boundary and maximizes the improvement in performance of inductive SVMs for any candidate separator. Under the assumption employed by S3VMs, it is here shown that S4VMs are provably safe and that the performance improvement using unlabeled data can be maximized. An out-of-sample extension of S4VMs is also presented. This extension allows S4VMs to make predictions on unseen instances. Our empirical study on a broad range of data shows that the overall performance of S4VMs is highly competitive with S3VMs, whereas in contrast to S3VMs which hurt performance significantly in many cases, S4VMs rarely perform worse than inductive SVMs.

[1]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2]  Scott Kirkpatrick,et al.  Optimization by simulated annealing: Quantitative studies , 1984 .

[3]  V. Cerný Thermodynamical approach to the traveling salesman problem: An efficient simulation algorithm , 1985 .

[4]  Editors , 1986, Brain Research Bulletin.

[5]  Emile H. L. Aarts,et al.  Simulated Annealing: Theory and Applications , 1987, Mathematics and Its Applications.

[6]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[7]  David J. Miller,et al.  A Mixture of Experts Classifier with Learning Based on Both Labelled and Unlabelled Data , 1996, NIPS.

[8]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[9]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[10]  Ayhan Demiriz,et al.  Semi-Supervised Support Vector Machines , 1998, NIPS.

[11]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[12]  Nikunj C. Oza,et al.  Online Ensemble Learning , 2000, AAAI/IAAI.

[13]  Tong Zhang,et al.  The Value of Unlabeled Data for Classification Problems , 2000, ICML 2000.

[14]  Avrim Blum,et al.  Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.

[15]  Jean-Michel Renders,et al.  Combining Labelled and Unlabelled Data: A Case Study on Fisher Kernels and Transductive Inference for Biological Entity Recognition , 2002, CoNLL.

[16]  Zoubin Ghahramani,et al.  Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions , 2003, ICML.

[17]  Nello Cristianini,et al.  Convex Methods for Transduction , 2003, NIPS.

[18]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[19]  Shaoning Pang,et al.  Transductive support vector machines and applications in bioinformatics for promoter recognition , 2003, International Conference on Neural Networks and Signal Processing, 2003. Proceedings of the 2003.

[20]  Lei Wang,et al.  Bootstrapping SVM active learning by incorporating unlabelled images for image retrieval , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[21]  James C. Bezdek,et al.  Convergence of Alternating Optimization , 2003, Neural Parallel Sci. Comput..

[22]  Fabio Gagliardi Cozman,et al.  Semi-Supervised Learning of Mixture Models , 2003, ICML.

[23]  Alan L. Yuille,et al.  The Concave-Convex Procedure , 2003, Neural Computation.

[24]  Yoshua Bengio,et al.  Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[25]  S. Dreyfus,et al.  Thermodynamical Approach to the Traveling Salesman Problem : An Efficient Simulation Algorithm , 2004 .

[26]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[27]  Zhi-Hua Zhou,et al.  SETRED: Self-training with Editing , 2005, PAKDD.

[28]  Xiaojin Zhu,et al.  Semi-Supervised Learning Literature Survey , 2005 .

[29]  Zhi-Hua Zhou,et al.  Tri-training: exploiting unlabeled data using three classifiers , 2005, IEEE Transactions on Knowledge and Data Engineering.

[30]  Alexander Zien,et al.  Semi-Supervised Classification by Low Density Separation , 2005, AISTATS.

[31]  Dale Schuurmans,et al.  Unsupervised and Semi-Supervised Multi-Class Support Vector Machines , 2005, AAAI.

[32]  Nitesh V. Chawla,et al.  Learning From Labeled And Unlabeled Data: An Empirical Study Across Techniques And Domains , 2011, J. Artif. Intell. Res..

[33]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[34]  Jason Weston,et al.  Large Scale Transductive SVMs , 2006, J. Mach. Learn. Res..

[35]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[36]  S. Sathiya Keerthi,et al.  Branch and Bound for Semi-Supervised Support Vector Machines , 2006, NIPS.

[37]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[38]  Xin Yao,et al.  An analysis of diversity measures , 2006, Machine Learning.

[39]  Alexander Zien,et al.  A continuation method for semi-supervised SVMs , 2006, ICML.

[40]  S. Sathiya Keerthi,et al.  Deterministic annealing for semi-supervised kernel machines , 2006, ICML.

[41]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[42]  Shai Ben-David,et al.  Does Unlabeled Data Provably Help? Worst-case Analysis of the Sample Complexity of Semi-Supervised Learning , 2008, COLT.

[43]  S. Sathiya Keerthi,et al.  Optimization Techniques for Semi-Supervised Support Vector Machines , 2008, J. Mach. Learn. Res..

[44]  Zhi-Hua Zhou,et al.  Semi-supervised learning by disagreement , 2010, Knowledge and Information Systems.

[45]  Robert D. Nowak,et al.  Unlabeled data: Now it helps, now it doesn't , 2008, NIPS.

[46]  Shih-Fu Chang,et al.  Graph transduction via alternating minimization , 2008, ICML '08.

[47]  James T. Kwok,et al.  Prototype vector machine for large scale semi-supervised learning , 2009, ICML '09.

[48]  Ivor W. Tsang,et al.  Maximum Margin Clustering Made Practical , 2007, IEEE Transactions on Neural Networks.

[49]  Ivor W. Tsang,et al.  Tighter and Convex Maximum Margin Clustering , 2009, AISTATS.

[50]  Zhi-Hua Zhou,et al.  Semi-supervised learning using label mean , 2009, ICML '09.

[51]  Wei Liu,et al.  Large Graph Construction for Scalable Semi-Supervised Learning , 2010, ICML.

[52]  Maria-Florina Balcan,et al.  A discriminative model for semi-supervised learning , 2010, J. ACM.

[53]  Zhi-Hua Zhou,et al.  Improving Semi-Supervised Support Vector Machines Through Unlabeled Instances Selection , 2010, AAAI.

[54]  Ke Chen,et al.  Semi-Supervised Learning via Regularized Boosting Working on Multiple Semi-Supervised Assumptions , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Wei Liu,et al.  Robust and Scalable Graph-Based Semisupervised Learning , 2012, Proceedings of the IEEE.

[56]  Dirk Van,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[57]  Ivor W. Tsang,et al.  Convex and scalable weakly labeled SVMs , 2013, J. Mach. Learn. Res..

[58]  Gavin Brown,et al.  Ensemble Learning , 2010, Encyclopedia of Machine Learning and Data Mining.

[59]  Xiaojin Zhu,et al.  Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.