Multiobjective Semisupervised Classifier Ensemble

Classification of high-dimensional data with very limited labels is a challenging task in the field of data mining and machine learning. In this paper, we propose the multiobjective semisupervised classifier ensemble (MOSSCE) approach to address this challenge. Specifically, a multiobjective subspace selection process (MOSSP) in MOSSCE is first designed to generate the optimal combination of feature subspaces. Three objective functions are then proposed for MOSSP, which include the relevance of features, the redundancy between features, and the data reconstruction error. Then, MOSSCE generates an auxiliary training set based on the sample confidence to improve the performance of the classifier ensemble. Finally, the training set, combined with the auxiliary training set, is used to select the optimal combination of basic classifiers in the ensemble, train the classifier ensemble, and generate the final result. In addition, diversity analysis of the ensemble learning process is applied, and a set of nonparametric statistical tests is adopted for the comparison of semisupervised classification approaches on multiple datasets. The experiments on 12 gene expression datasets and two large image datasets show that MOSSCE has a better performance than other state-of-the-art semisupervised classifiers on high-dimensional data.

[1]  Yan Yang,et al.  Driver Distraction Detection Using Semi-Supervised Machine Learning , 2016, IEEE Transactions on Intelligent Transportation Systems.

[2]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[3]  Songcan Chen,et al.  Safety-Aware Semi-Supervised Classification , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Zhi-Hua Zhou,et al.  Cost-Sensitive Semi-Supervised Support Vector Machine , 2010, AAAI.

[5]  Yunjun Gao,et al.  Probabilistic cluster structure ensemble , 2014, Inf. Sci..

[6]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[7]  Takeo Kanade,et al.  Interactive Cell Segmentation Based on Active and Semi-Supervised Learning , 2016, IEEE Transactions on Medical Imaging.

[8]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[9]  Jane You,et al.  Hybrid cluster ensemble framework based on the random combination of data transformation operators , 2012, Pattern Recognit..

[10]  A. Orth,et al.  Large-scale analysis of the human and mouse transcriptomes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Hong Huang,et al.  Gene Classification Using Parameter-Free Semi-Supervised Manifold Learning , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[12]  Tossapon Boongoen,et al.  A Link-Based Cluster Ensemble Approach for Categorical Data Clustering , 2012, IEEE Transactions on Knowledge and Data Engineering.

[13]  Minyoung Kim,et al.  Greedy approaches to semi-supervised subspace learning , 2015, Pattern Recognit..

[14]  Jin Gao,et al.  Semi-Supervised Tensor-Based Graph Embedding Learning and Its Application to Visual Discriminant Tracking , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Hareton K. N. Leung,et al.  Incremental Semi-Supervised Clustering Ensemble for High Dimensional Data Clustering , 2016, IEEE Transactions on Knowledge and Data Engineering.

[16]  Yaochu Jin,et al.  Classifier ensembles for image identification using multi-objective Pareto features , 2017, Neurocomputing.

[17]  João Paulo Papa,et al.  Improving semi-supervised learning through optimum connectivity , 2016, Pattern Recognit..

[18]  Hong Qiao,et al.  An Efficient Tree Classifier Ensemble-Based Approach for Pedestrian Detection , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[19]  Jane You,et al.  A New Kind of Nonparametric Test for Statistical Comparison of Multiple Classifiers Over Multiple Datasets , 2017, IEEE Transactions on Cybernetics.

[20]  Steven C. H. Hoi,et al.  Multiview Semi-Supervised Learning with Consensus , 2012, IEEE Transactions on Knowledge and Data Engineering.

[21]  Doina Caragea,et al.  Ensemble-based semi-supervised learning approaches for imbalanced splice site datasets , 2014, 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[22]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[23]  Hareton K. N. Leung,et al.  Hybrid $k$ -Nearest Neighbor Classifier , 2016, IEEE Transactions on Cybernetics.

[24]  Zhi-Hua Zhou,et al.  Improve Computer-Aided Diagnosis With Machine Learning Techniques Using Undiagnosed Samples , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[25]  S. Kullback,et al.  Information Theory and Statistics , 1959 .

[26]  Jing Huang,et al.  Multi-View and Multi-Objective Semi-Supervised Learning for HMM-Based Automatic Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[28]  Chih-Jen Lin,et al.  Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[29]  Chong Wang,et al.  A Semi-Supervised Method for Surveillance-Based Visual Location Recognition , 2017, IEEE Transactions on Cybernetics.

[30]  Fadi Dornaika,et al.  Learning Flexible Graph-Based Semi-Supervised Embedding , 2016, IEEE Transactions on Cybernetics.

[31]  Jane You,et al.  Semi-supervised classification based on random subspace dimensionality reduction , 2012, Pattern Recognit..

[32]  Jun Zhang,et al.  Hybrid Incremental Ensemble Learning for Noisy Real-World Data Classification , 2019, IEEE Transactions on Cybernetics.

[33]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[34]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  Sameer A. Nene,et al.  Columbia Object Image Library (COIL100) , 1996 .

[37]  Chien-Liang Liu,et al.  Semi-Supervised Text Classification With Universum Learning , 2016, IEEE Transactions on Cybernetics.

[38]  L. Tanoue Airway epithelial gene expression in the diagnostic evaluation of smokers with suspect lung cancer , 2009 .

[39]  Zhiwen Yu,et al.  Knowledge Based Cluster Ensemble for Cancer Discovery From Biomolecular Data , 2011, IEEE Transactions on NanoBioscience.

[40]  Hao Liao,et al.  An efficient semi-supervised representatives feature selection algorithm based on information theory , 2017, Pattern Recognit..

[41]  Ivor W. Tsang,et al.  Robust Semi-Supervised Learning through Label Aggregation , 2016, AAAI.

[42]  Zhi-Hua Zhou,et al.  Large Margin Distribution Learning with Cost Interval and Unlabeled Data , 2016, IEEE Transactions on Knowledge and Data Engineering.

[43]  Jane You,et al.  Semi-Supervised Ensemble Clustering Based on Selected Constraint Projection , 2018, IEEE Transactions on Knowledge and Data Engineering.

[44]  Zehra Cataltepe,et al.  Co-training with relevant random subspaces , 2010, Neurocomputing.

[45]  Jane You,et al.  Adaptive Fuzzy Consensus Clustering Framework for Clustering Analysis of Cancer Data , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[46]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[47]  Lei Zhu,et al.  Incremental and Decremental Max-Flow for Online Semi-Supervised Learning , 2016, IEEE Transactions on Knowledge and Data Engineering.

[48]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[49]  Ludmila I. Kuncheva,et al.  A Bound on Kappa-Error Diagrams for Analysis of Classifier Ensembles , 2013, IEEE Transactions on Knowledge and Data Engineering.

[50]  Zhiwen Yu,et al.  Protein Function Prediction with Incomplete Annotations , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[51]  Tommy W. S. Chow,et al.  Graph Based Constrained Semi-Supervised Learning Framework via Label Propagation over Adaptive Neighborhood , 2015, IEEE Transactions on Knowledge and Data Engineering.

[52]  Wei Liu,et al.  An Efficient Semi-Supervised Clustering Algorithm with Sequential Constraints , 2015, KDD.

[53]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[54]  Zeng Xian-hua A Random Subspace Method for Co-Training , 2008 .

[55]  K. Young,et al.  Diffuse large B-cell lymphoma. , 2018, Pathology.

[56]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[57]  Yide Wang,et al.  Progressive Semisupervised Learning of Multiple Classifiers , 2018, IEEE Transactions on Cybernetics.

[58]  Zhi-Hua Zhou,et al.  Semi-supervised learning using label mean , 2009, ICML '09.

[59]  Jane You,et al.  Hybrid Fuzzy Cluster Ensemble Framework for Tumor Clustering from Biomolecular Data , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[60]  Jason J. Corso,et al.  Semi-Supervised Nonlinear Distance Metric Learning via Forests of Max-Margin Cluster Hierarchies , 2014, IEEE Transactions on Knowledge and Data Engineering.

[61]  Jane You,et al.  SC³: Triple Spectral Clustering-Based Consensus Clustering Framework for Class Discovery from Cancer Gene Expression Profiles , 2012, TCBB.

[62]  Jane You,et al.  Double Selection Based Semi-Supervised Clustering Ensemble for Tumor Clustering from Gene Expression Profiles , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[63]  Wei Liu,et al.  Multi-Modal Curriculum Learning for Semi-Supervised Image Classification , 2016, IEEE Transactions on Image Processing.

[64]  Meng Wang,et al.  Scalable Semi-Supervised Learning by Efficient Anchor Graph Regularization , 2016, IEEE Transactions on Knowledge and Data Engineering.

[65]  Zhiwen Yu,et al.  Hybrid Adaptive Classifier Ensemble , 2015, IEEE Transactions on Cybernetics.

[66]  Jane You,et al.  Distribution-Based Cluster Structure Selection , 2017, IEEE Transactions on Cybernetics.

[67]  Zhiwen Yu,et al.  Semi-Supervised Image Classification With Self-Paced Cross-Task Networks , 2018, IEEE Transactions on Multimedia.

[68]  Yunjun Gao,et al.  Hybrid clustering solution selection strategy , 2014, Pattern Recognit..

[69]  Zhi-Hua Zhou,et al.  New Semi-Supervised Classification Method Based on Modified Cluster Assumption , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[70]  Zhaohong Deng,et al.  Semi-Supervised SVM With Extended Hidden Features , 2016, IEEE Transactions on Cybernetics.

[71]  Cheng Wu,et al.  Semi-Supervised and Unsupervised Extreme Learning Machines , 2014, IEEE Transactions on Cybernetics.

[72]  J. Welsh,et al.  Molecular classification of human carcinomas by use of gene expression signatures. , 2001, Cancer research.

[73]  Zhiwen Yu,et al.  Identifying Protein-Kinase-Specific Phosphorylation Sites Based on the Bagging–AdaBoost Ensemble Approach , 2010, IEEE Transactions on NanoBioscience.

[74]  Alexander Schliep,et al.  Clustering cancer gene expression data: a comparative study , 2008, BMC Bioinformatics.

[75]  Yueting Zhuang,et al.  Graph Regularized Feature Selection with Data Reconstruction , 2016, IEEE Transactions on Knowledge and Data Engineering.

[76]  Prateek Mittal,et al.  SybilBelief: A Semi-Supervised Learning Approach for Structure-Based Sybil Detection , 2013, IEEE Transactions on Information Forensics and Security.

[77]  Yan Zhou,et al.  Democratic co-learning , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[78]  Yi Liu,et al.  SemiBoost: Boosting for Semi-Supervised Learning , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[79]  Xuelong Li,et al.  Semi-Supervised Multitask Learning for Scene Recognition , 2015, IEEE Transactions on Cybernetics.

[80]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[81]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[82]  Jane You,et al.  Progressive subspace ensemble learning , 2016, Pattern Recognit..

[83]  E. Lander,et al.  Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[84]  Zhiwen Yu,et al.  Adaptive noise immune cluster ensemble using affinity propagation , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[85]  Marco Loog,et al.  Contrastive Pessimistic Likelihood Estimation for Semi-Supervised Classification , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[86]  Jane You,et al.  From cluster ensemble to structure ensemble , 2012, Inf. Sci..

[87]  Zhi-Hua Zhou,et al.  Tri-training: exploiting unlabeled data using three classifiers , 2005, IEEE Transactions on Knowledge and Data Engineering.

[88]  Fabricio A. Breve,et al.  Particle Competition and Cooperation in Networks for Semi-Supervised Learning , 2012, IEEE Transactions on Knowledge and Data Engineering.

[89]  Xiaojin Zhu,et al.  Semi-Supervised Learning Literature Survey , 2005 .