Random Subspace Aggregation for Cancer Prediction with Gene Expression Profiles

Background. Precisely predicting cancer is crucial for cancer treatment. Gene expression profiles make it possible to analyze patterns between genes and cancers on the genome-wide scale. Gene expression data analysis, however, is confronted with enormous challenges for its characteristics, such as high dimensionality, small sample size, and low Signal-to-Noise Ratio. Results. This paper proposes a method, termed RS_SVM, to predict gene expression profiles via aggregating SVM trained on random subspaces. After choosing gene features through statistical analysis, RS_SVM randomly selects feature subsets to yield random subspaces and training SVM classifiers accordingly and then aggregates SVM classifiers to capture the advantage of ensemble learning. Experiments on eight real gene expression datasets are performed to validate the RS_SVM method. Experimental results show that RS_SVM achieved better classification accuracy and generalization performance in contrast with single SVM, K-nearest neighbor, decision tree, Bagging, AdaBoost, and the state-of-the-art methods. Experiments also explored the effect of subspace size on prediction performance. Conclusions. The proposed RS_SVM method yielded superior performance in analyzing gene expression profiles, which demonstrates that RS_SVM provides a good channel for such biological data.

[1]  Aik Choon Tan,et al.  Ensemble machine learning on gene expression data for cancer classification. , 2003, Applied bioinformatics.

[2]  Loris Nanni,et al.  Combining multiple approaches for gene microarray classification , 2012, Bioinform..

[3]  Verónica Bolón-Canedo,et al.  A study of performance on microarray data sets for a classifier based on information theoretic learning , 2011, Neural Networks.

[4]  Jin-Kao Hao,et al.  A hybrid LDA and genetic algorithm for gene selection and classification of microarray data , 2010, Neurocomputing.

[5]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[6]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[7]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[8]  Giorgio Valentini,et al.  Bio-molecular cancer prediction with random subspace ensembles of support vector machines , 2005, Neurocomputing.

[9]  Ankit R Kharwar,et al.  Classification of Gene Expression Data by Gene Combination using Fuzzy Logic , 2015 .

[10]  Hongyu Zhao,et al.  Weighted random subspace method for high dimensional data classification. , 2009, Statistics and its interface.

[11]  Dhruba Kumar Bhattacharyya,et al.  Classification of microarray cancer data using ensemble approach , 2013, Network Modeling Analysis in Health Informatics and Bioinformatics.

[12]  Hui Jiang,et al.  Gene network modular-based classification of microarray samples , 2012, BMC Bioinformatics.

[13]  Anirban Mukherjee,et al.  Cancer Classification from Gene Expression Data by NPPC Ensemble , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[14]  Kuldip K. Paliwal,et al.  Improved direct LDA and its application to DNA microarray gene expression data , 2010, Pattern Recognit. Lett..

[15]  Allan R. Jones,et al.  A High-Resolution Spatiotemporal Atlas of Gene Expression of the Developing Mouse Brain , 2014, Neuron.

[16]  Krisztian Buza,et al.  Classification of gene expression data: A hubness-aware semi-supervised approach , 2016, Comput. Methods Programs Biomed..

[17]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[18]  Haiyan Wang,et al.  Improving accuracy for cancer classification with a new algorithm for genes selection , 2012, BMC Bioinformatics.

[19]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[20]  Verónica Bolón-Canedo,et al.  An ensemble of filters and classifiers for microarray data classification , 2012, Pattern Recognit..

[21]  Richard Bonneau,et al.  FIREWACh: High-throughput Functional Detection of Transcriptional Regulatory Modules in Mammalian Cells , 2014, Nature Methods.

[22]  Qiang Cheng,et al.  A Sparse Learning Machine for High-Dimensional Data with Application to Microarray Gene Analysis , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[23]  Tianzi Jiang,et al.  A combinational feature selection and ensemble neural network method for classification of gene expression data , 2004, BMC Bioinformatics.

[24]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[25]  S. Ramaswamy,et al.  Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. , 2002, Cancer research.

[26]  Yourim Yoon,et al.  A genetic filter for cancer classification on gene expression data. , 2015, Bio-medical materials and engineering.

[27]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[28]  B. K. Tripathy,et al.  A Hybrid Data Mining Technique for Improving the Classification Accuracy of Microarray Data Set , 2012 .

[29]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[30]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[32]  E. Petricoin,et al.  Use of proteomic patterns in serum to identify ovarian cancer , 2002, The Lancet.

[33]  Asit Kumar Das,et al.  Gene Selection and Classification Rule Generation for Microarray Dataset , 2012, ACITY.

[34]  Qi Guo,et al.  DNA microarray and cancer. , 2003, Current opinion in oncology.

[35]  Juan José Rodríguez Diez,et al.  Random Subspace Ensembles for fMRI Classification , 2010, IEEE Transactions on Medical Imaging.

[36]  Yong Wang,et al.  iPcc: a novel feature extraction method for accurate disease class discovery and prediction , 2013, Nucleic acids research.

[37]  Andrzej Kloczkowski,et al.  Multi-class BCGA-ELM based classifier that identifies biomarkers associated with hallmarks of cancer , 2015, BMC Bioinformatics.

[38]  Fan Yang,et al.  Methods of forward feature selection based on the aggregation of classifiers generated by single attribute , 2011, Comput. Biol. Medicine.

[39]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[40]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[41]  Hossein Ebrahimpour,et al.  Applying Grey Wolf Optimizer-based decision tree classifer for cancer classification on gene expression data , 2015, 2015 5th International Conference on Computer and Knowledge Engineering (ICCKE).

[42]  Shuiwang Ji,et al.  Automated identification of cell-type-specific genes in the mouse brain by image computing of expression patterns , 2014, BMC Bioinformatics.

[43]  Jieping Ye,et al.  Deep convolutional neural networks for annotating gene expression patterns in the mouse brain , 2015, BMC Bioinformatics.