Predicting clinical outcomes from large scale cancer genomic profiles with deep survival models

Translating the vast data generated by genomic platforms into accurate predictions of clinical outcomes is a fundamental challenge in genomic medicine. Many prediction methods face limitations in learning from the high-dimensional profiles generated by these platforms, and rely on experts to hand-select a small number of features for training prediction models. In this paper, we demonstrate how deep learning and Bayesian optimization methods that have been remarkably successful in general high-dimensional prediction tasks can be adapted to the problem of predicting cancer outcomes. We perform an extensive comparison of Bayesian optimized deep survival models and other state of the art machine learning methods for survival analysis, and describe a framework for interpreting deep survival models using a risk backpropagation technique. Finally, we illustrate that deep survival models can successfully transfer information across diseases to improve prognostic accuracy. We provide an open-source software implementation of this framework called SurvivalNet that enables automatic training, evaluation and interpretation of deep survival models.

[1]  J. Sneep,et al.  With a summary , 1945 .

[2]  F. Harrell,et al.  Evaluating the yield of medical tests. , 1982, JAMA.

[3]  D Faraggi,et al.  A neural network model for survival data. , 1995, Statistics in medicine.

[4]  P. Lapuerta,et al.  Comparison of the performance of neural network methods and Cox regression for censored survival data , 2000 .

[5]  T. Hubbard,et al.  A census of human cancer genes , 2004, Nature Reviews Cancer.

[6]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Yannis Dimopoulos,et al.  Use of some sensitivity criteria for choosing networks with good generalization ability , 1995, Neural Processing Letters.

[8]  B. Scheithauer,et al.  The 2007 WHO classification of tumours of the central nervous system , 2007, Acta Neuropathologica.

[9]  Q. Cui,et al.  Identification of high-quality cancer prognostic markers and metastasis network modules , 2010, Nature communications.

[10]  S. Gabriel,et al.  Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. , 2010, Cancer cell.

[11]  J. Uhm,et al.  The transcriptional network for mesenchymal transformation of brain tumours , 2010 .

[12]  Hans Christian Pedersen,et al.  Mammostrat® as a tool to stratify breast cancer patients at risk of recurrence during endocrine therapy , 2010, Breast Cancer Research.

[13]  Ganesh Rao,et al.  The transcriptional coactivator TAZ regulates mesenchymal differentiation in malignant glioma. , 2011, Genes & development.

[14]  Trevor Hastie,et al.  Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent. , 2011, Journal of statistical software.

[15]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[16]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  George W. Sledge,et al.  A Multigene Expression Assay to Predict Local Recurrence Risk for Ductal Carcinoma In Situ of the Breast , 2013, Journal of the National Cancer Institute.

[18]  Rasool Fakoor,et al.  Using deep learning to enhance cancer diagnosis and classication , 2013 .

[19]  Franziska Michor,et al.  Most human non-GCIMP glioblastoma subtypes evolve from a common proneural-like precursor glioma. , 2014, Cancer cell.

[20]  Samy Lamouille,et al.  Molecular mechanisms of epithelial–mesenchymal transition , 2014, Nature Reviews Molecular Cell Biology.

[21]  Ruben Martinez-Cantin,et al.  BayesOpt: a Bayesian optimization library for nonlinear optimization, experimental design and bandits , 2014, J. Mach. Learn. Res..

[22]  Hemant Ishwaran,et al.  Random survival forests for competing risks. , 2014, Biostatistics.

[23]  J. Mesirov,et al.  The Molecular Signatures Database (MSigDB) hallmark gene set collection. , 2015, Cell systems.

[24]  J. Mesirov,et al.  The Molecular Signatures Database Hallmark Gene Set Collection , 2015 .

[25]  Dimitrios I. Fotiadis,et al.  Machine learning applications in cancer prognosis and prediction , 2014, Computational and structural biotechnology journal.

[26]  Wyeth W. Wasserman,et al.  Deep Feature Selection: Theory and Application to Identify Enhancers and Promoters , 2015, RECOMB.

[27]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[28]  Integrative Genomic Analysis of Diffuse Lower-Grade Gliomas The Cancer , 2015 .

[29]  Steven J. M. Jones,et al.  Comprehensive, Integrative Genomic Analysis of Diffuse Lower-Grade Gliomas. , 2015, The New England journal of medicine.

[30]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[31]  N. Linder,et al.  Antibody-supervised deep learning for quantification of tumor-infiltrating immune cells in hematoxylin and eosin stained breast cancer samples , 2016, Journal of pathology informatics.

[32]  L. V. van't Veer,et al.  70-Gene Signature as an Aid to Treatment Decisions in Early-Stage Breast Cancer. , 2016, The New England journal of medicine.

[33]  G. Reifenberger,et al.  The 2016 World Health Organization Classification of Tumors of the Central Nervous System: a summary , 2016, Acta Neuropathologica.

[34]  Congzheng Song,et al.  Learning Genomic Representations to Predict Clinical Outcomes in Cancer , 2016, ArXiv.

[35]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[36]  Shamim Nemati,et al.  Optimal medication dosing from suboptimal clinical examples: A deep reinforcement learning approach , 2016, 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[37]  Y. Kluger,et al.  Deep Survival: A Deep Cox Proportional Hazards Network , 2016, ArXiv.

[38]  Jinfeng Zou,et al.  Identification and Construction of Combinatory Cancer Hallmark-Based Gene Signature Sets to Predict Recurrence and Chemotherapy Benefit in Stage II Colorectal Cancer. , 2016, JAMA oncology.

[39]  Wyeth W. Wasserman,et al.  Deep Feature Selection: Theory and Application to Identify Enhancers and Promoters , 2015, RECOMB.

[40]  Steven J. M. Jones,et al.  Molecular Profiling Reveals Biologically Discrete Subsets and Pathways of Progression in Diffuse Glioma , 2016, Cell.

[41]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.