A Novel Deep Flexible Neural Forest Model for Classification of Cancer Subtypes Based on Gene Expression Data

Classification of cancer subtypes is of paramount importance for diagnosis and prognosis of cancer. In recent years, deep learning methods have gained considerable popularity for cancer subtype classification, however, the structure of the neural network is difficult to determine and the performance of the deep network depends largely on its structure. To address this problem, a flexible neural tree (FNT) may be used. FNT is a special neural network with the advantage of automatic optimization of structure and parameters which cannot be used for multi-class classification. In this paper, a deep flexible neural forest (DFNForest) model is proposed, a novel ensemble of FNT model to aid with the classification of cancer subtypes. The proposed DFNForest model differs from the conventional FNT model because it transforms a multi-classification problem into many binary classification problems for each forest. We explore the cascade structure of DFNForest to deepen the flexible neural tree model so that the depth of the model is increased without introducing additional parameters. In addition to the DFNForest model, this paper proposes a combination of fisher ratio and neighborhood rough set for dimensionality reduction of gene expression data to obtain higher classification performance. The experiments on RNA-seq gene expression data show that our gene selection method has higher accuracy with fewer genes and the proposed DFNForest model has better performance for classification of cancer subtypes as compared to the conventional methods.

[1]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[2]  Ji Feng,et al.  Deep forest , 2017, IJCAI.

[3]  Tianwei Yu,et al.  A graph‐embedded deep feedforward network for disease outcome classification and feature selection using gene expression data , 2018, Bioinform..

[4]  Joshua M. Stuart,et al.  Subtype and pathway specific responses to anticancer compounds in breast cancer , 2011, Proceedings of the National Academy of Sciences.

[5]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[6]  Yu Xue,et al.  Gene selection for tumor classification using neighborhood rough sets and entropy measures , 2017, J. Biomed. Informatics.

[7]  Kezhi Mao,et al.  RBF neural network center selection based on Fisher ratio class separability measure , 2002, IEEE Trans. Neural Networks.

[8]  Pradipta Maji,et al.  Rough set based maximum relevance-maximum significance criterion and Gene selection from microarray data , 2011, Int. J. Approx. Reason..

[9]  Zhe Li,et al.  A novel computer-aided diagnosis system for breast MRI based on feature selection and ensemble learning , 2017, Comput. Biol. Medicine.

[10]  Jianhua Dai,et al.  Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification , 2013, Appl. Soft Comput..

[11]  Yanchun Liang,et al.  Prediction of Drought-Resistant Genes in Arabidopsis thaliana Using SVM-RFE , 2011, PloS one.

[12]  Yuehui Chen,et al.  A novel ensemble of classifiers for microarray data classification , 2008, Appl. Soft Comput..

[13]  Yang Guo,et al.  BCDForest: a boosting cascade deep forest model towards the classification of cancer subtypes based on gene expression data , 2018, BMC Bioinformatics.

[14]  Ashok Kumar Dwivedi Artificial neural network model for effective cancer classification using microarray gene expression data , 2018, Neural Computing and Applications.

[15]  Elisa Ficarra,et al.  MicroRNA–mRNA interactions underlying colorectal cancer molecular subtypes , 2015, Nature Communications.

[16]  M. Fraga,et al.  DNA methylation epigenotypes in breast cancer molecular subtypes , 2010, Breast Cancer Research.

[17]  Kai Yu,et al.  Feature Selection for Gene Expression Using Model-Based Entropy , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[18]  Qinghua Hu,et al.  Mixed feature selection based on granulation and approximation , 2008, Knowl. Based Syst..

[19]  Juan M. Corchado,et al.  Identification of informative genes and pathways using an improved penalized support vector machine with a weighting scheme , 2016, Comput. Biol. Medicine.

[20]  G. Yoo,et al.  Breast cancer expressing the activated HER2/neu is sensitive to gefitinib in vitro and in vivo and acquires resistance through a novel point mutation in the HER2/neu. , 2007, Cancer research.

[21]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[22]  Shu-Lin Wang,et al.  Neighborhood Rough Set Reduction-Based Gene Selection and Prioritization for Gene Expression Profile Analysis and Molecular Cancer Classification , 2010, Journal of biomedicine & biotechnology.

[23]  Sanghamitra Bandyopadhyay,et al.  MicroRNA signatures highlight new breast cancer subtypes. , 2015, Gene.

[24]  Elizabeth Tapia,et al.  Sparse and stable gene selection with consensus SVM-RFE , 2012, Pattern Recognit. Lett..

[25]  Dimitris Anastassiou,et al.  A Multi-Cancer Mesenchymal Transition Gene Expression Signature Is Associated with Prolonged Time to Recurrence in Glioblastoma , 2012, PloS one.

[26]  Qinghua Hu,et al.  Neighborhood rough set based heterogeneous feature subset selection , 2008, Inf. Sci..

[27]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[28]  Jason I. Herschkowitz,et al.  Phenotypic and molecular characterization of the claudin-low intrinsic subtype of breast cancer , 2010, Breast Cancer Research.

[29]  Tim Hui-Ming Huang,et al.  A personalized committee classification approach to improving prediction of breast cancer metastasis , 2014, Bioinform..

[30]  Yuan Qi,et al.  Prognostic and therapeutic implications of distinct kinase expression patterns in different subtypes of breast cancer. , 2010, Cancer research.

[31]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[32]  Yuehui Chen,et al.  Gene Expression Profiling Using Flexible Neural Trees , 2006, IDEAL.

[33]  Thomas A. Darden,et al.  Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method , 2001, Bioinform..

[34]  Jie Gui,et al.  Tumor classification by combining PNN classifier ensemble with neighborhood rough set based gene reduction , 2010, Comput. Biol. Medicine.

[35]  Tung-Kuan Liu,et al.  Noninvasive evaluation of mental stress using by a refined rough set technique based on biomedical signals , 2014, Artif. Intell. Medicine.

[36]  George D. Smith,et al.  Evolutionary Feature Construction Using Information Gain and Gini Index , 2004, EuroGP.

[37]  Jiwen Dong,et al.  Time-series forecasting using flexible neural tree model , 2005, Inf. Sci..

[38]  Ji Feng,et al.  Deep Forest: Towards An Alternative to Deep Neural Networks , 2017, IJCAI.

[39]  John W M Martens,et al.  mRNA and microRNA Expression Profiles in Circulating Tumor Cells and Primary Tumors of Metastatic Breast Cancer Patients , 2011, Clinical Cancer Research.

[40]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[41]  Theresa Beaubouef,et al.  Rough Sets , 2019, Lecture Notes in Computer Science.

[42]  Carlos Caldas,et al.  Molecular heterogeneity of breast carcinomas and the cancer stem cell hypothesis , 2007, Nature Reviews Cancer.

[43]  Shutao Li,et al.  Gene Selection Using Wilcoxon Rank Sum Test and Support Vector Machine for Cancer Classification , 2007, CIS.

[44]  Yuehui Chen,et al.  Grammar Guided Genetic Programming for Flexible Neural Trees Optimization , 2007, PAKDD.

[45]  Bo Yang,et al.  Flexible neural trees ensemble for stock index modeling , 2007, Neurocomputing.

[46]  Christian A. Rees,et al.  Systematic variation in gene expression patterns in human cancer cell lines , 2000, Nature Genetics.

[47]  Liang Goh,et al.  A Hybrid Feature Selection Approach for Microarray Gene Expression Data , 2006, International Conference on Computational Science.