Copy number variation in plasma as a tool for lung cancer prediction using Extreme Gradient Boosting (XGBoost) classifier

The main cause of cancer death is lung cancer (LC) which usually presents at an advanced stage, but its early detection would increase the benefits of treatment. Blood is particularly favored in clinical research given the possibility of using it for relatively noninvasive analyses. Copy number variation (CNV) is a common genetic change in tumor genomes, and many studies have indicated that CNV‐derived cell‐free DNA (cfDNA) from plasma could be feasible as a biomarker for cancer diagnosis.

[1]  I. Petersen,et al.  Small-cell lung cancer is characterized by a high incidence of deletions on chromosomes 3p, 4q, 5q, 10q, 13q and 17p. , 1997, British Journal of Cancer.

[2]  K. Koizumi,et al.  Chromosome 8 copy numbers and the c-myc gene amplification in non-small cell lung cancer. Analysis by interphase cytogenetics. , 1999, Nihon Ika Daigaku zasshi.

[3]  J. Testa,et al.  Chromosomal imbalances in human lung cancer , 2002, Oncogene.

[4]  A. Schwartz,et al.  Temporal trends in small cell lung cancer: Analysis of the national Surveillance, Epidemiology, and End-Results (SEER) database. , 2006, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[5]  David S. Wishart,et al.  Applications of Machine Learning in Cancer Prediction and Prognosis , 2006, Cancer informatics.

[6]  Ilias Maglogiannis,et al.  An intelligent system for automated breast cancer diagnosis and prognosis using SVM based classifiers , 2009, Applied Intelligence.

[7]  Chao Xie,et al.  CNV-seq, a new method to detect copy number variation using high-throughput sequencing , 2009, BMC Bioinformatics.

[8]  Steven E Schild,et al.  Non-small cell lung cancer: epidemiology, risk factors, treatment, and survivorship. , 2008, Mayo Clinic proceedings.

[9]  A. Shlien,et al.  Copy number variations and cancer , 2009, Genome Medicine.

[10]  Derek Y. Chiang,et al.  The landscape of somatic copy-number alteration across human cancers , 2010, Nature.

[11]  G. Getz,et al.  GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers , 2011, Genome Biology.

[12]  G. Parmigiani,et al.  Detection of Chromosomal Alterations in the Circulation of Cancer Patients with Whole-Genome Sequencing , 2012, Science Translational Medicine.

[13]  A. Bowcock,et al.  DNA copy number changes as diagnostic tools for lung cancer , 2013, Thorax.

[14]  X. Xie,et al.  Reproducible copy number variation patterns among single circulating tumor cells of lung cancer patients , 2013, Proceedings of the National Academy of Sciences.

[15]  M. Speicher,et al.  Tumor signatures in the blood , 2014, Nature Biotechnology.

[16]  Yue Hu,et al.  Non-invasive Analysis of Genomic Copy Number Variation in Patients with Hepatocellular Carcinoma by Next Generation DNA Sequencing , 2015, Journal of Cancer.

[17]  Dimitrios I. Fotiadis,et al.  Machine learning applications in cancer prognosis and prediction , 2014, Computational and structural biotechnology journal.

[18]  V. Wong,et al.  Lengthening and shortening of plasma DNA in hepatocellular carcinoma patients , 2015, Proceedings of the National Academy of Sciences.

[19]  Chiang-Ching Huang,et al.  Genomic variations in plasma cell free DNA differentiate early stage lung cancers from normal controls. , 2015, Lung Cancer.

[20]  Didrik Nielsen,et al.  Tree Boosting With XGBoost - Why Does XGBoost Win "Every" Machine Learning Competition? , 2016 .

[21]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[22]  Hajar Mousannif,et al.  Using Machine Learning Algorithms for Breast Cancer Risk Prediction and Diagnosis , 2016, ANT/SEIT.

[23]  A. Jemal,et al.  Cancer statistics in China, 2015 , 2016, CA: a cancer journal for clinicians.

[24]  Denise R. Aberle,et al.  Prediction of lung cancer incidence on the low-dose computed tomography arm of the National Lung Screening Trial: A dynamic Bayesian network , 2016, Artif. Intell. Medicine.

[25]  N. Akhtar,et al.  Risk factors of Lung Cancer in nonsmoker. , 2017, Current problems in cancer.

[26]  Walter J Curran,et al.  Lung cancer: current therapies and new targeted treatments , 2017, The Lancet.

[27]  Dimitrios I. Fotiadis,et al.  Integration of Pathway Knowledge and Dynamic Bayesian Networks for the Prediction of Oral Cancer Recurrence , 2017, IEEE Journal of Biomedical and Health Informatics.

[28]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[29]  R. Kurzrock,et al.  Utility of Genomic Assessment of Blood-Derived Circulating Tumor DNA (ctDNA) in Patients with Advanced Lung Adenocarcinoma , 2017, Clinical Cancer Research.

[30]  Scott M. Lundberg,et al.  Consistent feature attribution for tree ensembles , 2017, ArXiv.

[31]  S. Kopetz,et al.  Circulating tumor markers: harmonizing the yin and yang of CTCs and ctDNA for precision medicine. , 2019, Annals of oncology : official journal of the European Society for Medical Oncology.

[32]  P. Spellman,et al.  Circulating-tumor DNA as an early detection and diagnostic tool. , 2017, Current opinion in genetics & development.

[33]  C. Swanton,et al.  Determinants and clinical implications of chromosomal instability in cancer , 2018, Nature Reviews Clinical Oncology.

[34]  Jun Deng,et al.  Predicting non-melanoma skin cancer via a multi-parameterized artificial neural network , 2018, Scientific Reports.

[35]  Yurong Xing,et al.  Circulating tumor DNA 5-hydroxymethylcytosine as a novel diagnostic biomarker for esophageal cancer , 2018, Cell Research.

[36]  Chiang-Ching Huang,et al.  Genomic alterations of plasma cell-free DNAs in small cell lung cancer and their clinical relevance. , 2018, Lung cancer.

[37]  Kathleen A Cronin,et al.  Annual Report to the Nation on the Status of Cancer, part I: National cancer statistics , 2018, Cancer.

[38]  H. Nielsen,et al.  Genome-wide cell-free DNA fragmentation in patients with cancer , 2019, Nature.