Tumor Origin Detection with Tissue-Specific miRNA and DNA methylation Markers

Motivation Cancer of unknown primary origin constitutes 3-5% of all human malignancies. Patients with these carcinomas present with metastases without an established primary site, which may not be found even by thorough histological search methods. Patients with cancer of unknown primary origin always have poor prognosis and hardly have efficient treatment since most cancers respond well to specific chemotherapy or hormone drugs. Many studies have proposed classifiers based on miRNAs or mRNAs to predict the tumor origins, but few study focus on high-dimensional DNA methylation profiles. Results We introduced three classifiers with novel feature selection algorithm combined with random forest to effectively identify highly tissue-specific epigenetics biomarkers such as microRNAs and CpG sites, which can help us predict the origin site of tumors. This algorithm, incorporating differential analysis and descending dimension algorithm, was applied on 14 histological tissues and over 5000 samples based on miRNA expression and DNA methylation profiles to assign given primary tumor to its origin tissue. Our study shows all of these three classifiers have an overall accuracy of 87.78% (72.55%-97.54%) based on miRNA datasets and an accuracy of 96.43% (MRMD: 87.85%-99.76%) or 97.06% (PCA: 92.44%-100%) based on DNA methylation datasets on predicting the origin of tumors and suggests that the biomarkers we selected can efficiently predict the origin of tumors and allow the clinicians to avoid adjuvant systemic therapy or to choose less aggressive therapeutic options. We also developed a user-friendly webserver which enables users to predict the origin site of tumors by uploading the miRNAs expression or DNA methylation profiles of those cancers. Availability The webserver, data, and code are accessible free of charge at http://server.malab.cn/MMCOP/ Contact zouquan@nclab.net Supplementary information Supplementary data are available at Bioinformatics online.

[1]  Q. S. Ringenberg,et al.  Tumors of unknown origin. , 1985, Medical and pediatric oncology.

[2]  M. Kattan,et al.  Elevated levels of circulating interleukin-6 and transforming growth factor-beta1 in patients with metastatic prostatic carcinoma. , 1999, The Journal of urology.

[3]  F. Lévi,et al.  Phase III multicenter randomized trial of oxaliplatin added to chronomodulated fluorouracil-leucovorin as first-line treatment of metastatic colorectal cancer. , 2000, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[4]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[5]  B. Decallonne,et al.  An overview of real-time quantitative PCR: applications to quantify cytokine gene expression. , 2001, Methods.

[6]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[7]  P. Schellhammer,et al.  Boosted decision tree analysis of surface-enhanced laser desorption/ionization mass spectral serum profiles discriminates prostate cancer from noncancer patients. , 2002, Clinical chemistry.

[8]  T. Golub,et al.  Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. , 2003, Cancer research.

[9]  T. Hastie,et al.  Classification of gene microarrays by penalized logistic regression. , 2004, Biostatistics.

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  Ravinder Singh,et al.  Fast-Find: A novel computational approach to analyzing combinatorial motifs , 2006, BMC Bioinformatics.

[12]  Y. Yatabe,et al.  A polycistronic microRNA cluster, miR-17-92, is overexpressed in human lung cancers and enhances cell proliferation. , 2005, Cancer research.

[13]  Ali Al-Shahib,et al.  Feature Selection and the Class Imbalance Problem in Predicting Protein Function from Sequence , 2005, Applied bioinformatics.

[14]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[15]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[16]  P. Kim,et al.  Gastric cancer staging at multi-detector row CT gastrography: comparison of transverse and volumetric CT scanning. , 2005, Radiology.

[17]  O. Miettinen,et al.  Survival of Patients with Stage I Lung Cancer Detected on CT Screening , 2008 .

[18]  J. Mendell,et al.  MicroRNAs in cell proliferation, cell death, and tumorigenesis , 2006, British Journal of Cancer.

[19]  J. Massagué,et al.  Cancer Metastasis: Building a Framework , 2006, Cell.

[20]  Kathleen Marchal,et al.  Evaluation of time profile reconstruction from complex two-color microarray designs , 2008, BMC Bioinformatics.

[21]  G. Kang,et al.  Panels of immunohistochemical markers help determine primary sites of metastatic adenocarcinoma. , 2007, Archives of pathology & laboratory medicine.

[22]  Krista A. Zanetti,et al.  Identification of metastasis‐related microRNAs in hepatocellular carcinoma , 2008, Hepatology.

[23]  R. Reznek,et al.  Cancer of unknown primary site. , 2008, Clinical medicine.

[24]  R. Aharonov,et al.  MicroRNAs accurately identify cancer tissue origin , 2008, Nature Biotechnology.

[25]  S. Ropero,et al.  A microRNA DNA methylation signature for human cancer metastasis , 2008, Proceedings of the National Academy of Sciences.

[26]  Jian Huang,et al.  Regularized gene selection in cancer microarray meta-analysis , 2009, BMC Bioinformatics.

[27]  Constantin F. Aliferis,et al.  A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification , 2008, BMC Bioinformatics.

[28]  K. Gunderson,et al.  Genome-wide DNA methylation profiling using Infinium® assay. , 2009, Epigenomics.

[29]  D. Bartel MicroRNAs: Target Recognition and Regulatory Functions , 2009, Cell.

[30]  Süleyman Cenk Sahinalp,et al.  Combinatorial Algorithms for Structural Variation Detection in High Throughput Sequenced Genomes , 2009, RECOMB.

[31]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[32]  Wei Li,et al.  BSMAP: whole genome bisulfite sequence MAPping program , 2009, BMC Bioinformatics.

[33]  J. Pollard,et al.  Microenvironmental regulation of metastasis , 2009, Nature Reviews Cancer.

[34]  Christopher J. Nelson,et al.  Advantages of next-generation sequencing versus the microarray in epigenetic research. , 2009, Briefings in functional genomics & proteomics.

[35]  Jeffrey W. Clark,et al.  Efficacy, safety, and biomarkers of neoadjuvant bevacizumab, radiation therapy, and fluorouracil in rectal cancer: a multidisciplinary phase II study. , 2009, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[36]  G. Daugaard,et al.  Cancer of Unknown Primary Site , 2009 .

[37]  Lee E. Edsall,et al.  Human DNA methylomes at base resolution show widespread epigenomic differences , 2009, Nature.

[38]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[39]  D. Schadendorf,et al.  Improved survival with ipilimumab in patients with metastatic melanoma. , 2010, The New England journal of medicine.

[40]  Jeffrey W. Clark,et al.  Anaplastic lymphoma kinase inhibition in non-small-cell lung cancer. , 2010, The New England journal of medicine.

[41]  W. E. Jordan,et al.  Cancer of unknown primary site. , 2010, Seminars in oncology.

[42]  J. Manson,et al.  Estrogen plus progestin and breast cancer incidence and mortality in postmenopausal women. , 2010, JAMA.

[43]  K. Junker,et al.  Specific miRNA signatures are associated with metastasis and poor prognosis in clear cell renal cell carcinoma , 2011, World Journal of Urology.

[44]  C. Sotiriou,et al.  Evaluation of the Infinium Methylation 450K technology. , 2011, Epigenomics.

[45]  C. Ku,et al.  Studying the epigenome using next generation sequencing , 2011, Journal of Medical Genetics.

[46]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[47]  Lee T. Sam,et al.  Deep sequencing reveals distinct patterns of DNA methylation in prostate cancer. , 2011, Genome research.

[48]  Kenji Suzuki,et al.  A Prospective Radiological Study of Thin-Section Computed Tomography to Predict Pathological Noninvasiveness in Peripheral Clinical IA Lung Cancer (Japan Clinical Oncology Group 0201) , 2011, Journal of thoracic oncology : official publication of the International Association for the Study of Lung Cancer.

[49]  D. Hanahan,et al.  Hallmarks of Cancer: The Next Generation , 2011, Cell.

[50]  M. Esteller,et al.  Validation of a DNA methylation microarray for 450,000 CpG sites in the human genome , 2011, Epigenetics.

[51]  R. Weinberg,et al.  A Perspective on Cancer Cell Metastasis , 2011, Science.

[52]  R. Weksberg,et al.  Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray , 2013, Epigenetics.

[53]  Francesco Marabita,et al.  A beta-mixture quantile normalization method for correcting probe design bias in Illumina Infinium 450 k DNA methylation data , 2012, Bioinform..

[54]  Zhihai Ma,et al.  In-Depth Characterization of microRNA Transcriptome in Melanoma , 2013, PloS one.

[55]  T. Blondal,et al.  Efficient identification of miRNAs for classification of tumor origin. , 2014, The Journal of molecular diagnostics : JMD.

[56]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[57]  Rafael A. Irizarry,et al.  Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays , 2014, Bioinform..

[58]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[59]  Liujuan Cao,et al.  A novel features ranking metric with application to scalable visual and bioinformatics data classification , 2016, Neurocomputing.