Pan-cancer computational histopathology reveals mutations, tumor composition and prognosis

Here we use deep transfer learning to quantify histopathological patterns across 17,396 H&E stained histopathology image slides from 28 cancer types and correlate these with underlying genomic and transcriptomic data. Pan-cancer computational histopathology (PC-CHiP) classifies the tissue origin across organ sites and provides highly accurate, spatially resolved tumor and normal distinction within a given slide. The learned computational histopathological features correlate with a large range of recurrent genetic aberrations, including whole genome duplications (WGDs), arm-level copy number gains and losses, focal amplifications and deletions as well as driver gene mutations within a range of cancer types. WGDs can be predicted in 25/27 cancer types (mean AUC=0.79) including those that were not part of model training. Similarly, we observe associations with 25% of mRNA transcript levels, which enables to learn and localise histopathological patterns of molecularly defined cell types on each slide. Lastly, we find that computational histopathology provides prognostic information augmenting histopathological subtyping and grading in the majority of cancers assessed, which pinpoints prognostically relevant areas such as necrosis or infiltrating lymphocytes on each tumour section. Taken together, these findings highlight the large potential of PC-CHiP to discover new molecular and prognostic associations, which can augment diagnostic workflows and lay out a rationale for integrating molecular and histopathological data. Key points Pan-cancer computational histopathology analysis with deep learning extracts histopathological patterns and accurately discriminates 28 cancer and 14 normal tissue types Computational histopathology predicts whole genome duplications, focal amplifications and deletions, as well as driver gene mutations Wide-spread correlations with gene expression indicative of immune infiltration and proliferation Prognostic information augments conventional grading and histopathology subtyping in the majority of cancers

[1]  E. S. Pearson,et al.  TESTS FOR RANK CORRELATION COEFFICIENTS. I , 1957 .

[2]  D. Cox Regression Models and Life-Tables , 1972 .

[3]  D. E. Roberts,et al.  The Upper Tail Probabilities of Spearman's Rho , 1975 .

[4]  F. Harrell,et al.  Evaluating the yield of medical tests. , 1982, JAMA.

[5]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[6]  K. Aldape,et al.  Small Cell Architecture—A Histological Equivalent of EGFR Amplification in Glioblastoma Multiforme? , 2001, Journal of neuropathology and experimental neurology.

[7]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[8]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[9]  J. Manola,et al.  TP53 mutations and survival in squamous-cell carcinoma of the head and neck. , 2007, The New England journal of medicine.

[10]  Electron Kebebew,et al.  The Prevalence and Prognostic Value of BRAF Mutation in Thyroid Cancer , 2007, Annals of surgery.

[11]  Ming Tan,et al.  Molecular mechanisms of erbB2-mediated breast cancer chemoresistance. , 2007, Advances in experimental medicine and biology.

[12]  B. Scheithauer,et al.  The 2007 WHO classification of tumours of the central nervous system , 2007, Acta Neuropathologica.

[13]  Tara L. Naylor,et al.  Characterization CSMD1 in a large set of primary lung, head and neck, breast and skin cancer tissues , 2009, Cancer biology & therapy.

[14]  Yoram Singer,et al.  Efficient Learning using Forward-Backward Splitting , 2009, NIPS.

[15]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[16]  S. Gabriel,et al.  Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. , 2010, Cancer cell.

[17]  M. Aubry,et al.  Diagnostic concordance of histologic lung cancer type between bronchial biopsy and cytology specimens taken during the same bronchoscopic procedure. , 2010, Archives of pathology & laboratory medicine.

[18]  C. Perou,et al.  Allele-specific copy number analysis of tumors , 2010, Proceedings of the National Academy of Sciences.

[19]  M. Pollheimer,et al.  Tumor necrosis is a new promising prognostic factor in colorectal cancer. , 2010, Human pathology.

[20]  Payal Sipahimalani,et al.  A Histology-Based Model for Predicting Microsatellite Instability in Colorectal Cancers , 2010, The American journal of surgical pathology.

[21]  Yu Cheng,et al.  Evaluation of PPP2R2A as a prostate cancer susceptibility gene: a comprehensive germline and somatic study. , 2011, Cancer genetics.

[22]  G. D. de Bock,et al.  The prognostic influence of tumour-infiltrating lymphocytes in cancer: a systematic review with meta-analysis , 2011, British Journal of Cancer.

[23]  Trevor Hastie,et al.  Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent. , 2011, Journal of statistical software.

[24]  Rosette Lidereau,et al.  PIK3CA mutation impact on survival in breast cancer patients and in ERα, PR and ERBB2-based subgroups , 2012, Breast Cancer Research.

[25]  Bin Wang,et al.  Deconvolution Estimation in Measurement Error Models: The R Package decon. , 2011, Journal of statistical software.

[26]  Andrea J. O'Hara,et al.  The genomics and genetics of endometrial cancer. , 2012, Advances in genomics and genetics.

[27]  K. Aldape,et al.  New strategies in melanoma: molecular testing in advanced disease. , 2012, Clinical cancer research : an official journal of the American Association for Cancer Research.

[28]  Shay B. Cohen,et al.  Advances in Neural Information Processing Systems 25 , 2012, NIPS 2012.

[29]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[30]  F. Markowetz,et al.  Quantitative Image Analysis of Cellular Heterogeneity in Breast Tumors Complements Genomic Profiling , 2012, Science Translational Medicine.

[31]  F. Markowetz,et al.  The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups , 2012, Nature.

[32]  S. Påhlman,et al.  Cancer cell differentiation heterogeneity and aggressive behavior in solid tumors , 2012, Upsala journal of medical sciences.

[33]  Robert Brian Jenkins,et al.  Molecular Testing Guideline for Selection of Lung Cancer Patients for EGFR and ALK Tyrosine Kinase Inhibitors: Guideline from the College of American Pathologists, International Association for the Study of Lung Cancer, and Association for Molecular Pathology , 2013, Journal of thoracic oncology : official publication of the International Association for the Study of Lung Cancer.

[34]  Carolina Wählby,et al.  In situ sequencing for RNA analysis in preserved tissue and cells , 2013, Nature Methods.

[35]  G. Giaccone,et al.  Molecular testing guideline for selection of lung cancer patients for EGFR and ALK tyrosine kinase inhibitors: guideline from the College of American Pathologists, International Association for the Study of Lung Cancer, and Association for Molecular Pathology. , 2013, The Journal of molecular diagnostics : JMD.

[36]  S. Gabriel,et al.  Pan-cancer patterns of somatic copy-number alteration , 2013, Nature Genetics.

[37]  Carlos Caldas,et al.  TP53 Mutation Spectrum in Breast Cancer Is Subtype Specific and Has Distinct Prognostic Relevance , 2014, Clinical Cancer Research.

[38]  Adam A. Margolin,et al.  Assessing the clinical utility of cancer genomic and proteomic data across tumor types , 2014, Nature Biotechnology.

[39]  Maya Petersen,et al.  Computationally efficient confidence intervals for cross-validated area under the ROC curve estimates. , 2015, Electronic journal of statistics.

[40]  G. Sauter,et al.  Partial PTEN deletion is linked to poor prognosis in breast cancer , 2015, BMC Cancer.

[41]  Francisco Beca,et al.  Altered PPP2R2A and Cyclin D1 expression defines a subgroup of aggressive luminal-like breast cancer , 2015, BMC Cancer.

[42]  Sidra Nawaz,et al.  Beyond immune density: critical role of spatial heterogeneity in estrogen receptor-negative breast cancer , 2015, Modern Pathology.

[43]  Sidra Nawaz,et al.  Beyond immune density: critical role of spatial heterogeneity in estrogen receptor-negative breast cancer , 2015, Modern Pathology.

[44]  Steven J. M. Jones,et al.  The Molecular Taxonomy of Primary Prostate Cancer , 2015, Cell.

[45]  J. Elmore,et al.  Diagnostic concordance among pathologists interpreting breast biopsy specimens. , 2015, JAMA.

[46]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Patrik L. Ståhl,et al.  Visualization and analysis of gene expression in tissue sections by spatial transcriptomics , 2016, Science.

[48]  R. Gibbs,et al.  Genomic analyses identify molecular subtypes of pancreatic cancer , 2016, Nature.

[49]  Ce Zhang,et al.  Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features , 2016, Nature Communications.

[50]  David C. Jones,et al.  Landscape of somatic mutations in 560 breast cancer whole genome sequences , 2016, Nature.

[51]  Allison P. Heath,et al.  Toward a Shared Vision for Cancer Genomic Data. , 2016, The New England journal of medicine.

[52]  藤倉雄二,et al.  わが国における成人市中肺炎原因微生物についてのsystematic review/meta‐analysis , 2016 .

[53]  M. Stratton,et al.  Universal Patterns of Selection in Cancer and Somatic Tissues , 2017, bioRxiv.

[54]  Universal Patterns of Selection in Cancer and Somatic Tissues , 2017, Cell.

[55]  Patrick Rubin-Delanchy,et al.  Choosing between methods of combining p-values , 2017, 1707.06897.

[56]  Qianjin Feng,et al.  Integrative Analysis of Histopathological Images and Genomic Data Predicts Clear Cell Renal Cell Carcinoma Prognosis. , 2017, Cancer research.

[57]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[58]  Mithat Gönen,et al.  Morphological characterization of colorectal cancers in The Cancer Genome Atlas reveals distinct morphology–molecular associations: clinical and biological implications , 2017, Modern Pathology.

[59]  Y. Lévy,et al.  Corrigendum: CD32a is a marker of a CD4 T-cell HIV reservoir harbouring replication-competent proviruses , 2017, Nature.

[60]  R. Altman,et al.  Association of Omics Features with Histopathology Patterns in Lung Adenocarcinoma. , 2017, Cell systems.

[61]  S. Thrun,et al.  Corrigendum: Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[62]  J. Guinney,et al.  Consensus molecular subtypes and the evolution of precision medicine in colorectal cancer , 2017, Nature Reviews Cancer.

[63]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[64]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[65]  J. Guinney,et al.  Erratum: Consensus molecular subtypes and the evolution of precision medicine in colorectal cancer (Nature reviews. Cancer (2017) 17 2 (79-92)) , 2017 .

[66]  A. Børresen-Dale,et al.  Breast Cancer Molecular Stratification: From Intrinsic Subtypes to Integrative Clusters. , 2017, The American journal of pathology.

[67]  Steven J. M. Jones,et al.  Comprehensive and Integrated Genomic Characterization of Adult Soft Tissue Sarcomas , 2017, Cell.

[68]  Rajarsi R. Gupta,et al.  Spatial Organization and Molecular Correlation of Tumor-Infiltrating Lymphocytes Using Deep Learning on Pathology Images. , 2018, Cell reports.

[69]  Joel H Saltz,et al.  PanCancer insights from The Cancer Genome Atlas: the pathologist's perspective , 2018, The Journal of pathology.

[70]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[71]  P. Baldi,et al.  Deep-Learning Convolutional Neural Networks Accurately Classify Genetic Mutations in Gliomas , 2018, American Journal of Neuroradiology.

[72]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection , 2018, J. Open Source Softw..

[73]  N. Razavian,et al.  Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning , 2018, Nature Medicine.

[74]  D. Brat,et al.  Predicting cancer outcomes from histology and genomics using convolutional networks , 2017, Proceedings of the National Academy of Sciences.

[75]  D. Geschwind,et al.  Single-cell in situ transcriptomic map of astrocyte cortical layer diversity , 2018, bioRxiv.

[76]  Andrew J. Schaumberg,et al.  D R A F T H&E-stained Whole Slide Image Deep Learning Predicts SPOP Mutation State in Prostate Cancer , 2017 .

[77]  Yuan Ji,et al.  Portraits of genetic intra-tumour heterogeneity and subclonal selection across cancer types , 2018, bioRxiv.

[78]  Ashton C. Berger,et al.  Genomic and Functional Approaches to Understanding Cancer Aneuploidy. , 2018, Cancer cell.

[79]  Steven J. M. Jones,et al.  The Immune Landscape of Cancer , 2018, Immunity.

[80]  Benjamin J. Raphael,et al.  The evolutionary history of 2,658 cancers , 2017, Nature.

[81]  Henning Hermjakob,et al.  The Reactome pathway knowledgebase , 2013, Nucleic Acids Res..

[82]  Alberto Romagnoni,et al.  Transcriptomic learning for digital pathology , 2019, bioRxiv.

[83]  Jakob Nikolas Kather,et al.  Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer , 2019, Nature Medicine.

[84]  Jakob Nikolas Kather,et al.  Deep learning detects virus presence in cancer histology , 2019, bioRxiv.

[85]  Peiling Tsou,et al.  Mapping Driver Mutations to Histopathological Subtypes in Papillary Thyroid Carcinoma: Applying a Deep Convolutional Neural Network , 2019, Journal of clinical medicine.

[86]  Geert J. S. Litjens,et al.  Quantifying the effects of data augmentation and stain color normalization in convolutional neural networks for computational pathology , 2019, Medical Image Anal..

[87]  A. Madabhushi,et al.  Artificial intelligence in digital pathology — new tools for diagnosis and precision oncology , 2019, Nature Reviews Clinical Oncology.

[88]  Jakob Nikolas Kather,et al.  Pan-cancer image-based detection of clinically actionable genetic alterations , 2019, Nature Cancer.

[89]  Jens Rittscher,et al.  Image-based consensus molecular subtype classification (imCMS) of colorectal cancer using deep learning , 2019, bioRxiv.

[90]  Thomas J. Fuchs,et al.  Clinical-grade computational pathology using weakly supervised deep learning on whole slide images , 2019, Nature Medicine.

[91]  Anne E Carpenter,et al.  Nucleus segmentation across imaging experiments: the 2018 Data Science Bowl , 2019, Nature Methods.

[92]  Daniel Smilkov,et al.  Similar image search for histopathology: SMILY , 2019, npj Digital Medicine.

[93]  D. Schadendorf,et al.  Tertiary lymphoid structures improve immunotherapy and survival in melanoma , 2020, Nature.