Deep Learning with Evolutionary and Genomic Profiles for Identifying Cancer Subtypes

Cancer subtype identification is an unmet need in precision diagnosis. Recently, evolutionary conservation has been indicated containing understandable signatures for functional significance in cancers. However, the importance of evolutionary conservation in distinguishing cancer subtypes remains unclear. Here, we identified the evolutionarily conserved genes (i.e., core gene) and observed that they are mainly involved in the pathways relevant to cell growth and metabolisms. By using these core genes, we integrated their evolutionary and genomic profiles with deep learning to develop a feature-based strategy (FES) and an image-based strategy (IMS). In comparison with FES using the random set and the strategy using the PAM50 classifier, core gene set-based FES has higher accuracy for identifying breast cancer subtypes. Moreover, the IMS with data augmentation yields better performance than the other strategies. Comprehensive analysis of eight TCGA cancer data demonstrates that our evolutionary conservation-based models provide a valid and helpful approach to identify cancer subtypes and the core gene set offers distinguishable clues of cancer subtypes.

[1]  Chun-Yu Lin,et al.  Module organization and variance in protein-protein interaction networks , 2015, Scientific Reports.

[2]  R. Weinberg,et al.  DNA sequences homologous to vertebrate oncogenes are conserved in Drosophila melanogaster. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Claire D. McWhite,et al.  Towards Consensus Gene Ages , 2016, bioRxiv.

[4]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[5]  Elisa Ficarra,et al.  MicroRNA–mRNA interactions underlying colorectal cancer molecular subtypes , 2015, Nature Communications.

[6]  George M. Spyrou,et al.  Discovering gene re-ranking efficiency and conserved gene-gene relationships derived from gene co-expression network analysis on breast cancer data , 2016, Scientific Reports.

[7]  Jieping Ye,et al.  Evolution‐informed modeling improves outcome prediction for cancers , 2016, Evolutionary applications.

[8]  N. McGranahan,et al.  Clonal Heterogeneity and Tumor Evolution: Past, Present, and the Future , 2017, Cell.

[9]  Menglan Cai,et al.  Subtype identification from heterogeneous TCGA datasets on a genomic scale by multi-view clustering with enhanced consensus , 2017, BMC Medical Genomics.

[10]  Vanessa E. Gray,et al.  Evolutionary diagnosis method for variants in personal exomes , 2012, Nature Methods.

[11]  Hong-yu Zhang,et al.  Evolutionary Origins of Cancer Driver Genes and Implications for Cancer Prognosis , 2017, Genes.

[12]  Philip M. Long,et al.  Breast cancer classification and prognosis based on gene expression profiles from a population-based study , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Yang Guo,et al.  Identification of cancer subtypes by integrating multiple types of transcriptomics data with deep learning in breast cancer , 2019, Neurocomputing.

[14]  A. Nobel,et al.  Supervised risk predictor of breast cancer based on intrinsic subtypes. , 2009, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[15]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.