Semi-supervised multi-class Adaboost by exploiting unlabeled data

Research highlights? We propose a semi-supervised learning method by using the multi-class boosting. ? It handles K-class classification without reducing into multiple two-class problems. ? The classification accuracy of base classifier requires only 1/K or better. ? Higher classification accuracy is achieved by exploiting the unlabeled data. Semi-supervised learning has attracted much attention in pattern recognition and machine learning. Most semi-supervised learning algorithms are proposed for binary classification, and then extended to multi-class cases by using approaches such as one-against-the-rest. In this work, we propose a semi-supervised learning method by using the multi-class boosting, which can directly classify the multi-class data and achieve high classification accuracy by exploiting the unlabeled data. There are two distinct features in our proposed semi-supervised learning approach: (1) handling multi-class cases directly without reducing them to multiple two-class problems, and (2) the classification accuracy of each base classifier requiring only at least 1/K or better than 1/K (K is the number of classes). Experimental results show that the proposed method is effective based on the testing of 21 UCI benchmark data sets.

[1]  Ronald Rosenfeld,et al.  Semi-supervised learning with graphs , 2005 .

[2]  Avrim Blum,et al.  Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.

[3]  Yi Liu,et al.  SemiBoost: Boosting for Semi-Supervised Learning , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[5]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[6]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[7]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[8]  Trevor Hastie,et al.  Multi-class AdaBoost ∗ , 2009 .

[9]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[10]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[11]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[12]  Robert D. Nowak,et al.  Unlabeled data: Now it helps, now it doesn't , 2008, NIPS.

[13]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[14]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[15]  Christophe Ambroise,et al.  Semi-supervised MarginBoost , 2001, NIPS.

[16]  Ian H. Witten,et al.  Weka-A Machine Learning Workbench for Data Mining , 2005, Data Mining and Knowledge Discovery Handbook.

[17]  Ayhan Demiriz,et al.  Exploiting unlabeled data in ensemble methods , 2002, KDD.

[18]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[19]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[20]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[21]  G. Wahba,et al.  Multicategory Support Vector Machines , Theory , and Application to the Classification of Microarray Data and Satellite Radiance Data , 2004 .