A Cluster-Based Semisupervised Ensemble for Multiclass Classification

Semisupervised classification (SSC) algorithms use labeled and unlabeled data to predict labels of unseen instances. Classifier ensembles have been successfully studied and employed as a SSC approach. However, the generalization of existing semisupervised ensembles can be strongly affected by incorrect label estimates produced by ensemble algorithms in order to train supervised base learners. These ensembles do not optimize the objective function present in their base learners, which causes their supervised base classifiers to be sensitive to incorrect labeling and to reinforce errors during training. We propose cluster-based boosting (CBoost), a multiclass classification algorithm with cluster regularization. In contrast to existing algorithms, CBoost and its base learners jointly perform a cluster-based semisupervised optimization, which allows base classifiers to overcome potential incorrect label estimates for unlabeled data. CBoost is effective and stable in the presence of overlapping classes and scarce labeled points in dense regions. Experiments on artificial and real-world datasets confirmed the effectiveness of our approach.

[1]  Horst Bischof,et al.  Robust Multi-View Boosting with Priors , 2010, ECCV.

[2]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[3]  Trevor Hastie,et al.  Multi-class AdaBoost ∗ , 2009 .

[4]  Christophe Ambroise,et al.  Boosting Mixture Models for Semi-supervised Learning , 2001, ICANN.

[5]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[6]  I. Nabney Efficient training of RBF networks for classification , 1999 .

[7]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[8]  Huanhuan Chen,et al.  Semisupervised Classification With Cluster Regularization , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[9]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[10]  Horst Bischof,et al.  Regularized multi-class semi-supervised boosting , 2009, CVPR.

[11]  Hussein A. Abbass,et al.  A novel mixture of experts model based on cooperative coevolution , 2006, Neurocomputing.

[12]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[13]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[14]  Lei Zheng,et al.  Information theoretic regularization for semi-supervised boosting , 2009, KDD.

[15]  Rong Jin,et al.  Semi-Supervised Boosting for Multi-Class Classification , 2008, ECML/PKDD.

[16]  Ke Chen,et al.  Semi-Supervised Learning via Regularized Boosting Working on Multiple Semi-Supervised Assumptions , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[18]  Christophe Ambroise,et al.  Semi-supervised MarginBoost , 2001, NIPS.

[19]  Ayhan Demiriz,et al.  Exploiting unlabeled data in ensemble methods , 2002, KDD.

[20]  Chih-Cheng Hung,et al.  Semi-supervised multi-class Adaboost by exploiting unlabeled data , 2011, Expert Syst. Appl..

[21]  Xin Yao,et al.  Sparse Approximation Through Boosting for Learning Large Scale Kernel Machines , 2010, IEEE Transactions on Neural Networks.