Interpretable deep neural network for cancer survival analysis by integrating genomic and clinical data

Understanding the complex biological mechanisms of cancer patient survival using genomic and clinical data is vital, not only to develop new treatments for patients, but also to improve survival prediction. However, highly nonlinear and high-dimension, low-sample size (HDLSS) data cause computational challenges to applying conventional survival analysis. We propose a novel biologically interpretable pathway-based sparse deep neural network, named Cox-PASNet, which integrates high-dimensional gene expression data and clinical data on a simple neural network architecture for survival analysis. Cox-PASNet is biologically interpretable where nodes in the neural network correspond to biological genes and pathways, while capturing the nonlinear and hierarchical effects of biological pathways associated with cancer patient survival. We also propose a heuristic optimization solution to train Cox-PASNet with HDLSS data. Cox-PASNet was intensively evaluated by comparing the predictive performance of current state-of-the-art methods on glioblastoma multiforme (GBM) and ovarian serous cystadenocarcinoma (OV) cancer. In the experiments, Cox-PASNet showed out-performance, compared to the benchmarking methods. Moreover, the neural network architecture of Cox-PASNet was biologically interpreted, and several significant prognostic factors of genes and biological pathways were identified. Cox-PASNet models biological mechanisms in the neural network by incorporating biological pathway databases and sparse coding. The neural network of Cox-PASNet can identify nonlinear and hierarchical associations of genomic and clinical data to cancer patient survival. The open-source code of Cox-PASNet in PyTorch implemented for training, evaluation, and model interpretation is available at: https://github.com/DataX-JieHao/Cox-PASNet.

[1]  James J. Chen,et al.  Assessment of performance of survival prediction models for cancer prognosis , 2012, BMC Medical Research Methodology.

[2]  R. Alvarez,et al.  Crosstalk between the mitochondrial fission protein, Drp1, and the cell cycle is identified across various cancer types and can impact survival of epithelial ovarian cancer patients , 2016, Oncotarget.

[3]  S. Spiegl-Kreinecker,et al.  FGF5 as an oncogenic factor in human glioblastoma multiforme: autocrine and paracrine activities , 2008, Oncogene.

[4]  Alan Bridge,et al.  An Integrated Ontology Resource to Explore and Study Host-Virus Relationships , 2014, PloS one.

[5]  Jie Hao,et al.  Cox-PASNet: Pathway-based Sparse Deep Neural Network for Survival Analysis , 2018, 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[6]  Nathan E. Lewis,et al.  Novel personalized pathway-based metabolomics models reveal key metabolic pathways for breast cancer diagnosis , 2016, Genome Medicine.

[7]  J. Hao,et al.  Pathway-based deep clustering for molecular subtyping of cancer. , 2020, Methods.

[8]  Kahkashan Perveen,et al.  Glioblastoma Multiforme: A Review of its Epidemiology and Pathogenesis through Clinical Presentation and Treatment , 2017, Asian Pacific journal of cancer prevention : APJCP.

[9]  M. C. Oliveira,et al.  Prolactin gene expression in primary central nervous system tumors , 2013, Journal of Negative Results in BioMedicine.

[10]  Uri Shaham,et al.  DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network , 2016, BMC Medical Research Methodology.

[11]  Gary D Bader,et al.  Pathway enrichment analysis and visualization of omics data using g:Profiler, GSEA, Cytoscape and EnrichmentMap , 2019, Nature Protocols.

[12]  K. Ye,et al.  Phosphoinositide 3-kinase enhancer (PIKE) in the brain: is it simply a phosphoinositide 3-kinase/Akt enhancer? , 2012, Reviews in the neurosciences.

[13]  M. Xiong,et al.  Genome-Wide Association Studies of Copy Number Variation in Glioblastoma , 2010, 2010 4th International Conference on Bioinformatics and Biomedical Engineering.

[14]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15]  M. Cowperthwaite,et al.  Molecular Predictors of Long-Term Survival in Glioblastoma Multiforme Patients , 2016, PloS one.

[16]  Karim Atashgar,et al.  Survival analysis of breast cancer patients with different chronic diseases through parametric and semi-parametric approaches , 2018 .

[17]  Jing Zhu,et al.  IL22 furthers malignant transformation of rat mesenchymal stem cells, possibly in association with IL22RA1/STAT3 signaling , 2019, Oncology reports.

[18]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[19]  Volker Seifert,et al.  Inhibition of the JAK-2/STAT3 signaling pathway impedes the migratory and invasive potential of human glioblastoma cells , 2011, Journal of Neuro-Oncology.

[20]  Farid E Ahmed,et al.  Modeling survival in colon cancer: a methodological review , 2007, Molecular Cancer.

[21]  Yang Feng,et al.  High-dimensional variable selection for Cox's proportional hazards model , 2010, 1002.3315.

[22]  Dan Wang,et al.  Integrating genomic, epigenomic, and transcriptomic features reveals modular signatures underlying poor prognosis in ovarian cancer. , 2013, Cell reports.

[23]  Yunhui Luo,et al.  Bioinformatics analysis of the molecular mechanism of diffuse intrinsic pontine glioma , 2016, Oncology letters.

[24]  Ruben Martinez-Cantin,et al.  BayesOpt: a Bayesian optimization library for nonlinear optimization, experimental design and bandits , 2014, J. Mach. Learn. Res..

[25]  Jessica M. Rusert,et al.  Dopamine Receptor D5 is a Modulator of Tumor Response to Dopamine Receptor D2 Antagonism , 2018, Clinical Cancer Research.

[26]  Ludger Evers,et al.  Sparse kernel methods for high-dimensional survival data , 2008, Bioinform..

[27]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Robert Tibshirani,et al.  Survival analysis with high-dimensional covariates , 2010, Statistical methods in medical research.

[29]  Huiru Zheng,et al.  Review of applications of high-throughput sequencing in personalized medicine: barriers and facilitators of future progress in research and clinical application , 2019, Briefings Bioinform..

[30]  Hongzhe Li,et al.  Kernel Cox Regression Models for Linking Gene Expression Profiles to Censored Survival Data , 2002, Pacific Symposium on Biocomputing.

[31]  Yuqi Gao,et al.  Targeting JUN, CEBPB, and HDAC3: A Novel Strategy to Overcome Drug Resistance in Hypoxic Glioblastoma , 2019, Front. Oncol..

[32]  Debashis Ghosh,et al.  Integrating Clinical and Multiple Omics Data for Prognostic Assessment across Human Cancers , 2017, Scientific Reports.

[33]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[34]  Joshua E. Lewis,et al.  Predicting clinical outcomes from large scale cancer genomic profiles with deep survival models , 2017, Scientific Reports.

[35]  Alireza Abadi,et al.  Cox Models Survival Analysis Based on Breast Cancer Treatments , 2014, Iranian journal of cancer prevention.

[36]  Xun Zhu,et al.  Cox-nnet: An artificial neural network method for prognosis prediction of high-throughput omics data , 2018, PLoS Comput. Biol..

[37]  H. Burke Predicting Clinical Outcomes Using Molecular Biomarkers , 2016, Biomarkers in cancer.

[38]  Wyeth W. Wasserman,et al.  Deep Feature Selection: Theory and Application to Identify Enhancers and Promoters , 2015, RECOMB.

[39]  Marcin Kurdziel,et al.  Training neural networks on high-dimensional data using random projection , 2018, Pattern Analysis and Applications.

[40]  Robert J Tibshirani,et al.  Statistical Applications in Genetics and Molecular Biology , 2011 .

[41]  Zev A. Binder,et al.  Abrogation of PIK3CA or PIK3R1 reduces proliferation, migration, and invasion in glioblastoma multiforme cells , 2011, Oncotarget.

[42]  Thomas A. Sellers,et al.  Epidemiology of ovarian cancer: a review , 2017, Cancer biology & medicine.

[43]  Trevor Hastie,et al.  Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent. , 2011, Journal of statistical software.

[44]  S. Nozell,et al.  NF-κB and STAT3 signaling in glioma: targets for future therapies , 2010, Expert review of neurotherapeutics.

[45]  Yu Zhang,et al.  Deep Neural Networks for High Dimension, Low Sample Size Data , 2017, IJCAI.

[46]  Jinfeng Xu High-Dimensional Cox Regression Analysis in Genetic Studies with Censored Survival Outcomes , 2012 .

[47]  Daniela M Witten,et al.  Extensions of Sparse Canonical Correlation Analysis with Applications to Genomic Data , 2009, Statistical applications in genetics and molecular biology.

[48]  Hao Helen Zhang,et al.  Adaptive Lasso for Cox's proportional hazards model , 2007 .