A hierarchical integration deep flexible neural forest framework for cancer subtype classification by integrating multi-omics data

BackgroundCancer subtype classification attains the great importance for accurate diagnosis and personalized treatment of cancer. Latest developments in high-throughput sequencing technologies have rapidly produced multi-omics data of the same cancer sample. Many computational methods have been proposed to classify cancer subtypes, however most of them generate the model by only employing gene expression data. It has been shown that integration of multi-omics data contributes to cancer subtype classification.ResultsA new hierarchical integration deep flexible neural forest framework is proposed to integrate multi-omics data for cancer subtype classification named as HI-DFNForest. Stacked autoencoder (SAE) is used to learn high-level representations in each omics data, then the complex representations are learned by integrating all learned representations into a layer of autoencoder. Final learned data representations (from the stacked autoencoder) are used to classify patients into different cancer subtypes using deep flexible neural forest (DFNForest) model.Cancer subtype classification is verified on BRCA, GBM and OV data sets from TCGA by integrating gene expression, miRNA expression and DNA methylation data. These results demonstrated that integrating multiple omics data improves the accuracy of cancer subtype classification than only using gene expression data and the proposed framework has achieved better performance compared with other conventional methods.ConclusionThe new hierarchical integration deep flexible neural forest framework(HI-DFNForest) is an effective method to integrate multi-omics data to classify cancer subtypes.

[1]  Charles DeLisi,et al.  Pathway-based classification of cancer subtypes , 2012, Biology Direct.

[2]  Z. Hall Cancer , 1906, The Hospital.

[3]  A. Arance,et al.  Clinical implications of the intrinsic molecular subtypes of breast cancer. , 2015, Breast.

[4]  C. Croce,et al.  miR-15 and miR-16 induce apoptosis by targeting BCL2. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[5]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[6]  Ji Feng,et al.  Deep Forest: Towards An Alternative to Deep Neural Networks , 2017, IJCAI.

[7]  Taesung Park,et al.  Cancer survival classification using integrated data sets and intermediate information , 2014, Artif. Intell. Medicine.

[8]  De-Shuang Huang,et al.  Pupylation sites prediction with ensemble classification model , 2017, Int. J. Data Min. Bioinform..

[9]  S. Gabriel,et al.  Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. , 2010, Cancer cell.

[10]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[11]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumors , 2012, Nature.

[12]  Dongdong Sun,et al.  Integrating genomic data and pathological images to effectively predict breast cancer clinical outcome , 2018, Comput. Methods Programs Biomed..

[13]  Jing Xu,et al.  A Novel Deep Flexible Neural Forest Model for Classification of Cancer Subtypes Based on Gene Expression Data , 2019, IEEE Access.

[14]  J. Weinstein,et al.  Abstract 4262: A pan-cancer proteomic analysis of The Cancer Genome Atlas (TCGA) project , 2014 .

[15]  Steven J. M. Jones,et al.  Comprehensive genomic characterization of squamous cell lung cancers , 2012, Nature.

[16]  D. Hanahan,et al.  Hallmarks of Cancer: The Next Generation , 2011, Cell.

[17]  Elisa Ficarra,et al.  MicroRNA–mRNA interactions underlying colorectal cancer molecular subtypes , 2015, Nature Communications.

[18]  Ting Chen,et al.  Integrative Data Analysis of Multi-Platform Cancer Data with a Multimodal Deep Learning Approach , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[19]  Jiwen Dong,et al.  Time-series forecasting using flexible neural tree model , 2005, Inf. Sci..

[20]  De-Shuang Huang,et al.  Independent component analysis-based penalized discriminant method for tumor classification using gene expression data , 2006, Bioinform..

[21]  Lina Ni,et al.  An anonymous entropy-based location privacy protection scheme in mobile social networks , 2019, EURASIP J. Wirel. Commun. Netw..

[22]  Y. Xing,et al.  SURVIV for survival analysis of mRNA isoform variation , 2016, Nature Communications.

[23]  Tim Hui-Ming Huang,et al.  A personalized committee classification approach to improving prediction of breast cancer metastasis , 2014, Bioinform..

[24]  Lei Zhang,et al.  Tumor Classification Based on Non-Negative Matrix Factorization Using Gene Expression Data , 2011, IEEE Transactions on NanoBioscience.

[25]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[26]  J. George,et al.  Efficient molecular subtype classification of high‐grade serous ovarian cancer , 2015, The Journal of pathology.

[27]  Kim-Anh Lê Cao,et al.  mixOmics: An R package for ‘omics feature selection and multiple data integration , 2017, bioRxiv.

[28]  Luca Maria Gambardella,et al.  Mitosis Detection in Breast Cancer Histology Images with Deep Neural Networks , 2013, MICCAI.

[29]  Zhuowen Tu,et al.  Similarity network fusion for aggregating data types on a genomic scale , 2014, Nature Methods.

[30]  Bo Yang,et al.  Flexible neural trees ensemble for stock index modeling , 2007, Neurocomputing.

[31]  Yuehui Chen,et al.  The dynamic mechanism of a novel stochastic neural firing pattern observed in a real biological system , 2019, Cognitive Systems Research.

[32]  Zhonghu Bai,et al.  Breast cancer intrinsic subtype classification, clinical use and future trends. , 2015, American journal of cancer research.

[33]  M. Fraga,et al.  DNA methylation epigenotypes in breast cancer molecular subtypes , 2010, Breast Cancer Research.

[34]  M. Stratton,et al.  The cancer genome , 2009, Nature.

[35]  Chungang Yan,et al.  Resource Allocation Strategy in Fog Computing Based on Priced Timed Petri Nets , 2017, IEEE Internet of Things Journal.

[36]  Steven J. M. Jones,et al.  Comprehensive molecular characterization of human colon and rectal cancer , 2012, Nature.

[37]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[38]  De-Shuang Huang,et al.  Predicting Hub Genes Associated with Cervical Cancer through Gene Co-Expression Networks , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[39]  Chris H. Q. Ding,et al.  Cluster Structure of K-means Clustering via Principal Component Analysis , 2004, PAKDD.

[40]  Moisés Blanco-Calvo,et al.  Colorectal Cancer Classification and Cell Heterogeneity: A Systems Oncology Approach , 2015, International journal of molecular sciences.

[41]  Sanghamitra Bandyopadhyay,et al.  MicroRNA signatures highlight new breast cancer subtypes. , 2015, Gene.

[42]  Chun-Lin Su,et al.  Regulation of Phosphate Homeostasis by MicroRNA in Arabidopsis[W] , 2005, The Plant Cell Online.

[43]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumours , 2013 .

[44]  P. Laird,et al.  Discovery of multi-dimensional modules by integrative analysis of cancer genomic data , 2012, Nucleic acids research.

[45]  Mira Ayadi,et al.  Gene Expression Classification of Colon Cancer into Molecular Subtypes: Characterization, Validation, and Prognostic Value , 2013, PLoS medicine.

[46]  Jianzhong Wu,et al.  Stacked Sparse Autoencoder (SSAE) for Nuclei Detection on Breast Cancer Histopathology Images , 2016, IEEE Transactions on Medical Imaging.

[47]  Wayne Tam,et al.  MicroRNAs in tumorigenesis: a primer. , 2007, The American journal of pathology.

[48]  Jeff Shrager,et al.  A Novel Classification of Lung Cancer into Molecular Subtypes , 2012, PloS one.

[49]  Adam B. Olshen,et al.  Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis , 2009, Bioinform..

[50]  Yang Guo,et al.  Identification of cancer subtypes by integrating multiple types of transcriptomics data with deep learning in breast cancer , 2019, Neurocomputing.

[51]  Peng Wu,et al.  Classification of a DNA Microarray for Diagnosing Cancer Using a Complex Network Based Method , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[52]  Carlos Caldas,et al.  Molecular heterogeneity of breast carcinomas and the cancer stem cell hypothesis , 2007, Nature Reviews Cancer.

[53]  Yang Guo,et al.  A Similarity Regression Fusion Model for Integrating Multi-Omics Data to Identify Cancer Subtypes , 2018, Genes.