A Machine Learning Approach for Tracing Tumor Original Sites With Gene Expression Profiles

Some carcinomas show that one or more metastatic sites appear with unknown origins. The identification of primary or metastatic tumor tissues is crucial for physicians to develop precise treatment plans for patients. With unknown primary origin sites, it is challenging to design specific plans for patients. Usually, those patients receive broad-spectrum chemotherapy, while still having poor prognosis though. Machine learning has been widely used and already achieved significant advantages in clinical practices. In this study, we classify and predict a large number of tumor samples with uncertain origins by applying the random forest and Naive Bayesian algorithms. We use the precision, recall, and other measurements to evaluate the performance of our approach. The results have showed that the prediction accuracy of this method was 90.4 for 7,713 samples. The accuracy was 80% for 20 metastatic tumors samples. In addition, the 10-fold cross-validation is used to evaluate the accuracy of classification, which reaches 91%.

[1]  Kyung Won Kim,et al.  Cancer of unknown primary sites: what radiologists need to know and what oncologists want to know. , 2013, AJR. American journal of roentgenology.

[2]  Bor-Sen Chen,et al.  A robust adaptive DFE receiver for DS-CDMA systems under multipath fading channels , 2001, IEEE Trans. Signal Process..

[3]  Pornpimol Charoentong,et al.  ClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks , 2009, Bioinform..

[4]  N. Pavlidis,et al.  Cancer of unknown primary site: 20 questions to be answered. , 2010, Annals of oncology : official journal of the European Society for Medical Oncology.

[5]  R. Reznek,et al.  Cancer of unknown primary site. , 2008, Clinical medicine.

[6]  L. Buturovic,et al.  Validation and reproducibility of a microarray-based gene expression test for tumor identification in formalin-fixed, paraffin-embedded specimens. , 2011, The Journal of molecular diagnostics : JMD.

[7]  Quan Zou,et al.  Incorporating Distance-based Top-n-gram and Random Forest to Identify Electron Transport Proteins. , 2019, Journal of proteome research.

[8]  Wei Tang,et al.  Tumor origin detection with tissue‐specific miRNA and DNA methylation markers , 2018, Bioinform..

[9]  C. Hudis Trastuzumab--mechanism of action and use in clinical practice. , 2007, The New England journal of medicine.

[10]  Edi Brogi,et al.  ID genes mediate tumor reinitiation during breast cancer lung metastasis , 2007, Proceedings of the National Academy of Sciences.

[11]  N. Pavlidis,et al.  Carcinoma of Unknown Primary (CUP) , 2014 .

[12]  Lea Schroeder,et al.  The prevalence of human papillomavirus in squamous cell carcinoma of unknown primary site metastatic to neck lymph nodes: a systematic review , 2015, Clinical & Experimental Metastasis.

[13]  Christophe Massard,et al.  Carcinomas of an unknown primary origin—diagnosis and treatment , 2011, Nature Reviews Clinical Oncology.

[14]  G. Varadhachary,et al.  Carcinoma of unknown primary with a colon-cancer profile-changing paradigm and emerging definitions. , 2008, The Lancet. Oncology.

[15]  Zoltan Szallasi,et al.  TumorTracer: a method to identify the tissue of origin from the somatic mutations of a tumor specimen , 2015, BMC Medical Genomics.

[16]  J. Pollard,et al.  Microenvironmental regulation of metastasis , 2009, Nature Reviews Cancer.

[17]  James Brugarolas,et al.  Renal-cell carcinoma--molecular pathways and therapies. , 2007, The New England journal of medicine.

[18]  Quan Zou,et al.  Exploratory Predicting Protein Folding Model with Random Forest and Hybrid Features , 2014 .

[19]  E. Perez,et al.  Paclitaxel plus bevacizumab versus paclitaxel alone for metastatic breast cancer. , 2007, The New England journal of medicine.

[20]  George Vlachos,et al.  Consistent absence of BRAF mutations in cervical and endometrial cancer despite KRAS mutation status. , 2006, Gynecologic oncology.

[21]  Lesley Seymour,et al.  Erlotinib in lung cancer - molecular and clinical predictors of outcome. , 2005, The New England journal of medicine.

[22]  George Pentheroudakis,et al.  Cancer of Unknown Primary origin in the genomic era: Elucidating the dark box of cancer. , 2015, Cancer treatment reviews.

[23]  Hui Ding,et al.  A Random Forest Sub-Golgi Protein Classifier Optimized via Dipeptide and Amino Acid Composition Features , 2019, Front. Bioeng. Biotechnol..

[24]  Rakesh K. Jain,et al.  Principles and mechanisms of vessel normalization for cancer and other angiogenic diseases , 2011, Nature Reviews Drug Discovery.

[25]  F. Monzon,et al.  Multicenter validation of a 1,550-gene expression profile for identification of tumor tissue of origin. , 2009, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[26]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[27]  R. Salunga,et al.  Molecular classification of human cancers using a 92-gene real-time quantitative polymerase chain reaction assay. , 2009, Archives of pathology & laboratory medicine.

[28]  K. Oien,et al.  Pathologic evaluation of unknown primary cancer. , 2009, Seminars in oncology.

[29]  George Pentheroudakis,et al.  Neuroendocrine carcinoma of unknown primary: a systematic review of the literature and a comparative study with other neuroendocrine tumors. , 2011, Cancer treatment reviews.

[30]  Yasuhiro Fujiwara,et al.  Immunohistochemical Profile for Unknown Primary Adenocarcinoma , 2012, PloS one.

[31]  James C. Hu,et al.  The Gene Ontology Resource: 20 years and still GOing strong , 2019 .

[32]  K. Lindsten,et al.  Lack of proteasome active site allostery as revealed by subunit-specific inhibitors. , 2001, Molecular cell.

[33]  K. Oien,et al.  Diagnostic work-up of carcinoma of unknown primary: from immunohistochemistry to molecular profiling. , 2012, Annals of oncology : official journal of the European Society for Medical Oncology.

[34]  F Bibeau,et al.  The colorectal cancer stem-like cell hypothesis: a pathologist's point of view. , 2012, Journal of B.U.ON. : official journal of the Balkan Union of Oncology.

[35]  R. Molina,et al.  Utility of serum tumor markers as an aid in the differential diagnosis of patients with clinical suspicion of cancer and in patients with cancer of unknown primary site , 2012, Tumor Biology.

[36]  J. Hainsworth,et al.  Gene expression profiling in patients with carcinoma of unknown primary site: from translational research to standard of care , 2014, Virchows Archiv.

[37]  Lawrence M. Weiss,et al.  Determining Tissue of Origin for Metastatic Cancers: Meta-analysis and Literature Review of Immunohistochemistry Performance , 2010, Applied immunohistochemistry & molecular morphology : AIMM.

[38]  George Pentheroudakis,et al.  Prognostication in cancer of unknown primary (CUP): development of a prognostic algorithm in 311 cases and review of the literature. , 2013, Cancer treatment reviews.

[39]  Orlando Guntinas-Lichius,et al.  Diagnostic work-up and outcome of cervical metastases from an unknown primary , 2006, Acta oto-laryngologica.

[40]  Benjamin J. Raphael,et al.  Multiplatform Analysis of 12 Cancer Types Reveals Molecular Classification within and across Tissues of Origin , 2014, Cell.

[41]  The Gene Ontology Consortium,et al.  The Gene Ontology Resource: 20 years and still GOing strong , 2018, Nucleic Acids Res..

[42]  Hong Zhang,et al.  Clinicopathological significance of stromal variables: angiogenesis, lymphangiogenesis, inflammatory infiltration, MMP and PINCH in colorectal carcinomas , 2006, Molecular Cancer.

[43]  Norra MacReady,et al.  NICE issues guidance on cancer of unknown primary. , 2010, The Lancet. Oncology.

[44]  Hui Ding,et al.  RF-PseU: A Random Forest Predictor for RNA Pseudouridine Sites , 2020, Frontiers in Bioengineering and Biotechnology.