Multi-view Based AdaBoost Classifier Ensemble for Class Prediction from Gene Expression Profiles

Multi-view learning, one of the important sub-fields in the area of machine learning, has gained more and more attention in class prediction of gene expression datasets. In this paper, we propose a new classifier ensemble framework, named as multi-view based Ad-a boost classifier ensemble framework (MV-ACE), which not only utilizes a random view generation technique to regulate different views and applies adaboost to adjust the training set, but also designs an adaptive process which explores the feasible combination of multiple views through an optimization process. Traditional multi-view learning focuses on exploring diverse views and the best integration of multiple views in a straight-forward manner, such as the linear combination of different views. Our proposed model, however, additionally applies a progressive training approach to improve the accuracies of the base classifiers. Moreover, we investigate the assembly of views at the model level, and employ an adaptive process to optimize the multi-view learning model to improve its performance. Our experiments on 12 cancer gene data sets for the classification task show that(i) MV-ACE works well on a diverse class of cancer gene expression profiles. (ii) It outperforms most of the state-of-the-art classifier ensemble approaches on these datasets.

[1]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[2]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[3]  Anirban Mukherjee,et al.  Cancer Classification from Gene Expression Data by NPPC Ensemble , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[4]  Alexander Schliep,et al.  Clustering cancer gene expression data: a comparative study , 2008, BMC Bioinformatics.

[5]  Geoffrey I. Webb,et al.  MultiBoosting: A Technique for Combining Boosting and Wagging , 2000, Machine Learning.

[6]  Changqing Li,et al.  An Ensemble Classifier for Eukaryotic Protein Subcellular Location Prediction Using Gene Ontology Categories and Amino Acid Hydrophobicity , 2012, PloS one.

[7]  Haiyan Wang,et al.  Improving accuracy for cancer classification with a new algorithm for genes selection , 2012, BMC Bioinformatics.

[8]  Paul A. Viola,et al.  Fast Multi-view Face Detection , 2003 .

[9]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[10]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[11]  Loris Nanni,et al.  Combining multiple approaches for gene microarray classification , 2012, Bioinform..

[12]  Constantin F. Aliferis,et al.  A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification , 2008, BMC Bioinformatics.

[13]  Xiaosheng Wang,et al.  A Robust Gene Selection Method for Microarray-based Cancer Classification , 2010, Cancer informatics.