Exploiting Unlabeled Data to Enhance Ensemble Diversity

Ensemble learning aims to improve generalization ability by using multiple base learners. It is well-known that to construct a good ensemble, the base learners should be accurate as well as diverse. In this paper, unlabeled data is exploited to facilitate ensemble learning by helping augment the diversity among the base learners. Specifically, a semi-supervised ensemble method named UDEED is proposed. Unlike existing semi-supervised ensemble methods where error-prone pseudo-labels are estimated for unlabeled data to enlarge the labeled data to improve accuracy, UDEED works by maximizing accuracies of base learners on labeled data while maximizing diversity among them on unlabeled data. Experiments show that UDEED can effectively utilize unlabeled data for ensemble learning and is highly competitive to well-established semi-supervised ensemble methods.

[1]  Raymond J. Mooney,et al.  Creating diverse ensemble classifiers to reduce supervision , 2005 .

[2]  Zhi-Hua Zhou When semi-supervised learning meets ensemble learning , 2011 .

[3]  Peter L. Bartlett,et al.  Functional Gradient Techniques for Combining Hypotheses , 2000 .

[4]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[5]  Xin Yao,et al.  Ensemble learning via negative correlation , 1999, Neural Networks.

[6]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[7]  Yi Liu,et al.  SemiBoost: Boosting for Semi-Supervised Learning , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Rong Jin,et al.  Semi-Supervised Boosting for Multi-Class Classification , 2008, ECML/PKDD.

[9]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[10]  Ke Chen,et al.  Regularized Boost for Semi-Supervised Learning , 2007, NIPS.

[11]  Derek Partridge,et al.  Software Diversity: Practical Statistics for Its Measurement and Exploitation | Draft Currently under Revision , 1996 .

[12]  Raymond J. Mooney,et al.  Constructing Diverse Classifier Ensembles using Artificial Training Examples , 2003, IJCAI.

[13]  Huanhuan Chen,et al.  Regularized Negative Correlation Learning for Neural Network Ensembles , 2009, IEEE Transactions on Neural Networks.

[14]  Horst Bischof,et al.  Regularized multi-class semi-supervised boosting , 2009, CVPR.

[15]  Ayhan Demiriz,et al.  Exploiting unlabeled data in ensemble methods , 2002, KDD.

[16]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[17]  Zhi-Hua Zhou,et al.  Improve Computer-Aided Diagnosis With Machine Learning Techniques Using Undiagnosed Samples , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[18]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[19]  Ke Chen,et al.  Semi-Supervised Learning via Regularized Boosting Working on Multiple Semi-Supervised Assumptions , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Zhi-Hua Zhou,et al.  Semi-supervised learning by disagreement , 2010, Knowledge and Information Systems.

[21]  Fabio Roli,et al.  Design of effective neural network ensembles for image classification purposes , 2001, Image Vis. Comput..

[22]  Horst Bischof,et al.  SERBoost: Semi-supervised Boosting with Expectation Regularization , 2008, ECCV.

[23]  Padraig Cunningham,et al.  Diversity versus Quality in Classification Ensembles Based on Feature Selection , 2000, ECML.

[24]  Christophe Ambroise,et al.  Semi-supervised MarginBoost , 2001, NIPS.

[25]  Zhi-Hua Zhou,et al.  NeC4.5: Neural Ensemble Based C4.5 , 2004, IEEE Trans. Knowl. Data Eng..

[26]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[27]  David W. Opitz,et al.  Actively Searching for an E(cid:11)ective Neural-Network Ensemble , 1996 .

[28]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[29]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[30]  Gavin Brown,et al.  The Use of the Ambiguity Decomposition in Neural Network Ensemble Learning Methods , 2003, ICML.

[31]  Zhi-Hua Zhou,et al.  SETRED: Self-training with Editing , 2005, PAKDD.

[32]  Zhi-Hua Zhou,et al.  Tri-training: exploiting unlabeled data using three classifiers , 2005, IEEE Transactions on Knowledge and Data Engineering.

[33]  Xin Yao,et al.  Simultaneous training of negatively correlated neural networks in an ensemble , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[34]  David W. Opitz,et al.  Feature Selection for Ensembles , 1999, AAAI/IAAI.

[35]  Qiang Yang,et al.  Semi-Supervised Learning with Very Few Labeled Training Examples , 2007, AAAI.

[36]  Anil K. Jain,et al.  Ethnicity identification from face images , 2004, SPIE Defense + Commercial Sensing.

[37]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[38]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[39]  Bruce E. Rosen,et al.  Ensemble Learning Using Decorrelated Neural Networks , 1996, Connect. Sci..

[40]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..