Machine learning analysis of DNA methylation profiles distinguishes primary lung squamous cell carcinomas from head and neck metastases

A DNA methylation–based classifier facilitates the differentiation of primary lung and metastatic head and neck cancer with diagnostic accuracy. Discriminating lung primary tumors and metastases Pulmonary metastases of head and neck squamous cell carcinoma (HNSC) are currently difficult to distinguish from primary lung squamous cell carcinomas (LUSCs). Differentiating these tumor types has important clinical implications, as whether the lung tumor is primary or has spread can affect the treatment options offered to a patient. Here, Jurmeister et al. developed a machine learning algorithm that exploits the differential DNA methylation observed in primary LUSC and metastasized HNSC tumors in the lung. Their method was able to discriminate between these two tumor types with high accuracy across multiple cohorts, suggesting its potential as a clinical diagnostic tool. Head and neck squamous cell carcinoma (HNSC) patients are at risk of suffering from both pulmonary metastases or a second squamous cell carcinoma of the lung (LUSC). Differentiating pulmonary metastases from primary lung cancers is of high clinical importance, but not possible in most cases with current diagnostics. To address this, we performed DNA methylation profiling of primary tumors and trained three different machine learning methods to distinguish metastatic HNSC from primary LUSC. We developed an artificial neural network that correctly classified 96.4% of the cases in a validation cohort of 279 patients with HNSC and LUSC as well as normal lung controls, outperforming support vector machines (95.7%) and random forests (87.8%). Prediction accuracies of more than 99% were achieved for 92.1% (neural network), 90% (support vector machine), and 43% (random forest) of these cases by applying thresholds to the resulting probability scores and excluding samples with low confidence. As independent clinical validation of the approach, we analyzed a series of 51 patients with a history of HNSC and a second lung tumor, demonstrating the correct classifications based on clinicopathological properties. In summary, our approach may facilitate the reliable diagnostic differentiation of pulmonary metastases of HNSC from primary LUSC to guide therapeutic decisions.

[1]  K. Chaudhuri,et al.  Genome-wide DNA methylation profile identified a unique set of differentially methylated immune genes in oral squamous cell carcinoma patients in India , 2017, Clinical Epigenetics.

[2]  G. Tang,et al.  Indian Hedgehog: A Mechanotransduction Mediator in Condylar Cartilage , 2004, Journal of dental research.

[3]  Xavier Robin,et al.  pROC: an open-source package for R and S+ to analyze and compare ROC curves , 2011, BMC Bioinformatics.

[4]  J B Vermorken,et al.  Optimal treatment for recurrent/metastatic head and neck cancer. , 2010, Annals of oncology : official journal of the European Society for Medical Oncology.

[5]  A. Lal,et al.  A gene expression profile test to resolve head & neck squamous versus lung squamous cancers , 2013, Diagnostic Pathology.

[6]  Roland Eils,et al.  Complex heatmaps reveal patterns and correlations in multidimensional genomic data , 2016, Bioinform..

[7]  R. Weksberg,et al.  Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray , 2013, Epigenetics.

[8]  T. Samuelsson,et al.  The landscape of viral expression and host gene fusion and adaptation in human cancer , 2013, Nature Communications.

[9]  R. Tothill,et al.  Epigenetic profiling to classify cancer of unknown primary: a multicentre, retrospective analysis. , 2016, The Lancet. Oncology.

[10]  J. Silverman,et al.  Can We Tell the Site of Origin of Metastatic Squamous Cell Carcinoma? An Immunohistochemical Tissue Microarray Study of 194 Cases , 2011, Applied immunohistochemistry & molecular morphology : AIMM.

[11]  D. Seese,et al.  Algorithms for Spectral Analysis of Irregularly Sampled Time Series , 2004 .

[12]  W. Knoefel,et al.  Surgical strategies in the therapy of non-small cell lung cancer. , 2014, World journal of clinical oncology.

[13]  Till Acker,et al.  DNA methylation-based classification of central nervous system tumours , 2018, Nature.

[14]  N. Socci,et al.  Gene expression profiling allows distinction between primary and metastatic squamous cell carcinomas in the lung. , 2005, Cancer research.

[15]  B. Emami,et al.  Second primary cancers in patients with laryngeal cancer: a population-based study. , 2003, International journal of radiation oncology, biology, physics.

[16]  Vladimir Naumovich Vapni The Nature of Statistical Learning Theory , 1995 .

[17]  T. Beißbarth,et al.  Comparative proteomics reveals a diagnostic signature for pulmonary head‐and‐neck cancer metastasis , 2018, EMBO molecular medicine.

[18]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[19]  B. Halmos,et al.  Distinguishing head and neck cancer metastasis from second primary squamous lung cancer in the genomic era. , 2016 .

[20]  P. Laird,et al.  Low-level processing of Illumina Infinium DNA Methylation BeadArrays , 2013, Nucleic acids research.

[21]  P. Hufnagl,et al.  Cancer beyond organ and tissue specificity: Next‐generation‐sequencing gene mutation data reveal complex genetic similarities across major cancers , 2014, International journal of cancer.

[22]  Shinichi Morishita,et al.  On Classification and Regression , 1998, Discovery Science.

[23]  Steven J. M. Jones,et al.  Comprehensive genomic characterization of squamous cell lung cancers , 2012, Nature.

[24]  Z. Herceg,et al.  Unique DNA methylation signature in HPV-positive head and neck squamous cell carcinomas , 2017, Genome Medicine.

[25]  Carsten Denkert,et al.  Semiconductor sequencing: how many flows do you need? , 2015, Bioinform..

[26]  Yoav Gilad,et al.  DNA methylation in lung cells is associated with asthma endotypes and genetic risk. , 2016, JCI insight.

[27]  Jovana Maksimovic,et al.  missMethyl: an R package for analyzing data from Illumina's HumanMethylation450 platform , 2016, Bioinform..

[28]  Johan Staaf,et al.  Genome-wide DNA Methylation Analysis of Lung Carcinoma Reveals One Neuroendocrine and Four Adenocarcinoma Epitypes Associated with Patient Outcome , 2014, Clinical Cancer Research.

[29]  David T. W. Jones,et al.  Array-based DNA-methylation profiling in sarcomas with small blue round cell histology provides valuable diagnostic information , 2018, Modern Pathology.

[30]  L. V. van't Veer,et al.  Pulmonary Squamous Cell Carcinoma following Head and Neck Squamous Cell Carcinoma: Metastasis or Second Primary? , 2005, Clinical Cancer Research.

[31]  M. Melamed,et al.  The effect of surgical treatment on survival from early lung cancer. Implications for screening. , 1992, Chest.

[32]  A. Gentles,et al.  Identification of an atypical etiological head and neck squamous carcinoma subtype featuring the CpG island methylator phenotype , 2017, EBioMedicine.

[33]  Klaus-Robert Müller,et al.  Computational analysis reveals histotype-dependent molecular profile and actionable mutation effects across cancers , 2018, Genome Medicine.

[34]  Vessela Kristensen,et al.  Genome‐wide DNA methylation analyses in lung adenocarcinomas: Association with EGFR, KRAS and TP53 mutation status, gene expression and prognosis , 2015, Molecular oncology.

[35]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[36]  Volker Hovestadt,et al.  Robust molecular subgrouping and copy-number profiling of medulloblastoma from small amounts of archival tumour material using high-density DNA methylation arrays , 2013, Acta Neuropathologica.

[37]  Steven J. M. Jones,et al.  Comprehensive genomic characterization of head and neck squamous cell carcinomas , 2015, Nature.

[38]  J. Werner,et al.  Survival After Distant Metastasis in Head and Neck Cancer. , 2015, Anticancer research.

[39]  Kurt Hornik,et al.  kernlab - An S4 Package for Kernel Methods in R , 2004 .

[40]  P. Slootweg,et al.  The origins of multiple squamous cell carcinomas in the aerodigestive tract , 2000, Cancer.

[41]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[42]  R. de Bree,et al.  Distant metastases from head and neck squamous cell carcinoma. Part I. Basic aspects. , 2012, Oral oncology.

[43]  Genome-scale methylation assessment did not identify prognostic biomarkers in oral tongue carcinomas , 2016, Clinical Epigenetics.

[44]  A. Zeileis Econometric Computing with HC and HAC Covariance Matrix Estimators , 2004 .

[45]  A. Butte,et al.  Systematic pan-cancer analysis of tumour purity , 2015, Nature Communications.

[46]  Hao Wu,et al.  Estimating and accounting for tumor purity in the analysis of DNA methylation data from cancer studies , 2017, Genome Biology.

[47]  S. Tai,et al.  Second Primary Malignancies in Squamous Cell Carcinomas of the Tongue and Larynx: An Analysis of Incidence, Pattern, and Outcome , 2008, Journal of the Chinese Medical Association : JCMA.

[48]  R. Dikshit,et al.  Risk factors for the development of second primary tumors among men after laryngeal and hypopharyngeal carcinoma , 2005, Cancer.

[49]  M. Esteller,et al.  A prognostic DNA methylation signature for stage I non-small-cell lung cancer. , 2013, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[50]  Rafael A. Irizarry,et al.  Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays , 2014, Bioinform..

[51]  Steven J. M. Jones,et al.  Comprehensive molecular profiling of lung adenocarcinoma , 2014, Nature.