Multiclass cancer classification using a feature subset-based ensemble from microRNA expression profiles

Cancer classification has been a crucial topic of research in cancer treatment. In the last decade, messenger RNA (mRNA) expression profiles have been widely used to classify different types of cancers. With the discovery of a new class of small non-coding RNAs; known as microRNAs (miRNAs), various studies have shown that the expression patterns of miRNA can also accurately classify human cancers. Therefore, there is a great demand for the development of machine learning approaches to accurately classify various types of cancers using miRNA expression data. In this article, we propose a feature subset-based ensemble method in which each model is learned from a different projection of the original feature space to classify multiple cancers. In our method, the feature relevance and redundancy are considered to generate multiple feature subsets, the base classifiers are learned from each independent miRNA subset, and the average posterior probability is used to combine the base classifiers. To test the performance of our method, we used bead-based and sequence-based miRNA expression datasets and conducted 10-fold and leave-one-out cross validations. The experimental results show that the proposed method yields good results and has higher prediction accuracy than popular ensemble methods. The Java program and source code of the proposed method and the datasets in the experiments are freely available at https://sourceforge.net/projects/mirna-ensemble/.

[1]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[2]  N. Rosenfeld,et al.  Accurate molecular classification of renal tumors using microRNA expression. , 2010, The Journal of molecular diagnostics : JMD.

[3]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[4]  Keun Ho Ryu,et al.  An ensemble correlation-based gene selection algorithm for cancer classification with gene expression data , 2012, Bioinform..

[5]  Jae Hoon Kim,et al.  MicroRNA Expression Profiles in Serous Ovarian Carcinoma , 2008, Clinical Cancer Research.

[6]  Albert Y. Zomaya,et al.  A genetic ensemble approach for gene-gene interaction identification , 2010, BMC Bioinformatics.

[7]  Jian Huang,et al.  Regularized ROC method for disease classification and biomarker selection with microarray data , 2005, Bioinform..

[8]  Madhubanti Maitra,et al.  Gene selection from microarray gene expression data for classification of cancer subgroups employing PSO and adaptive K-nearest neighborhood technique , 2015, Expert Syst. Appl..

[9]  Padraig Cunningham,et al.  Using Diversity in Preparing Ensembles of Classifiers Based on Different Feature Subsets to Minimize Generalization Error , 2001, ECML.

[10]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[11]  M. Teresa Pisabarro,et al.  PhenoFam-gene set enrichment analysis through protein structural information , 2010, BMC Bioinformatics.

[12]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[13]  Keun Ho Ryu,et al.  Comparing the normalization methods for the differential analysis of Illumina high-throughput RNA-Seq data , 2015, BMC Bioinformatics.

[14]  Kwang Sun Ryu,et al.  Pro-Detection of Atrial Fibrillation Using Mixture of Experts , 2012, IEICE Trans. Inf. Syst..

[15]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[16]  Xiaosheng Wang,et al.  Robust two-gene classifiers for cancer prediction. , 2012, Genomics.

[17]  Rui Xu,et al.  MicroRNA expression profile based cancer classification using Default ARTMAP , 2009, Neural Networks.

[18]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[19]  E. Miska,et al.  MicroRNA—implications for cancer , 2007, Virchows Archiv.

[20]  H. Horvitz,et al.  MicroRNA expression profiles classify human cancers , 2005, Nature.

[21]  Lei Liu,et al.  Ensemble gene selection for cancer classification , 2010, Pattern Recognit..

[22]  Wun-Jae Kim,et al.  HOXA9, ISL1 and ALDH1A3 methylation patterns as prognostic markers for nonmuscle invasive bladder cancer: Array‐based DNA methylation and expression profiling , 2013, International journal of cancer.

[23]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[24]  Stanislaw Osowski,et al.  Data mining for feature selection in gene expression autism data , 2015, Expert Syst. Appl..

[25]  Ashfaqur Rahman,et al.  Ensemble classifier generation using non-uniform layered clustering and Genetic Algorithm , 2013, Knowl. Based Syst..

[26]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[27]  Yi Yu,et al.  Performance of random forest when SNPs are in linkage disequilibrium , 2009, BMC Bioinformatics.

[28]  Thibault Helleputte,et al.  Robust biomarker identification for cancer diagnosis with ensemble feature selection methods , 2010, Bioinform..

[29]  Melanie Hilario,et al.  Approaches to dimensionality reduction in proteomic biomarker studies , 2007, Briefings Bioinform..

[30]  Rui Zhang,et al.  A novel feature selection method considering feature interaction , 2015, Pattern Recognit..

[31]  N. Ramaraj,et al.  A novel hybrid feature selection via Symmetrical Uncertainty ranking based local memetic search algorithm , 2010, Knowl. Based Syst..

[32]  Todd R. Golub,et al.  MicroRNA Expression Signatures Accurately Discriminate Acute Lymphoblastic Leukemia from Acute Myeloid Leukemia. , 2007 .

[33]  Peter Clark,et al.  Rule Induction with CN2: Some Recent Improvements , 1991, EWSL.

[34]  C. Croce,et al.  MicroRNA signatures in human cancers , 2006, Nature Reviews Cancer.

[36]  Ludmila I. Kuncheva,et al.  A Theoretical Study on Six Classifier Fusion Strategies , 2002, IEEE Trans. Pattern Anal. Mach. Intell..