Identification of the functional alteration signatures across different cancer types with support vector machine and feature analysis.

Cancers are regarded as malignant proliferations of tumor cells present in many tissues and organs, which can severely curtail the quality of human life. The potential of using plasma DNA for cancer detection has been widely recognized, leading to the need of mapping the tissue-of-origin through the identification of somatic mutations. With cutting-edge technologies, such as next-generation sequencing, numerous somatic mutations have been identified, and the mutation signatures have been uncovered across different cancer types. However, somatic mutations are not independent events in carcinogenesis but exert functional effects. In this study, we applied a pan-cancer analysis to five types of cancers: (I) breast cancer (BRCA), (II) colorectal adenocarcinoma (COADREAD), (III) head and neck squamous cell carcinoma (HNSC), (IV) kidney renal clear cell carcinoma (KIRC), and (V) ovarian cancer (OV). Based on the mutated genes of patients suffering from one of the aforementioned cancer types, patients they were encoded into a large number of numerical values based upon the enrichment theory of gene ontology (GO) terms and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. We analyzed these features with the Monte-Carlo Feature Selection (MCFS) method, followed by the incremental feature selection (IFS) method to identify functional alteration features that could be used to build the support vector machine (SVM)-based classifier for distinguishing the five types of cancers. Our results showed that the optimal classifier with the selected 344 features had the highest Matthews correlation coefficient value of 0.523. Sixteen decision rules produced by the MCFS method can yield an overall accuracy of 0.498 for the classification of the five cancer types. Further analysis indicated that some of these features and rules were supported by previous experiments. This study not only presents a new approach to mapping the tissue-of-origin for cancer detection but also unveils the specific functional alterations of each cancer type, providing insight into cancer-specific functional aberrations as potential therapeutic targets. This article is part of a Special Issue entitled: Accelerating Precision Medicine through Genetic and Genomic Big Data Analysis edited by Yudong Cai & Tao Huang.

[1]  W. Murphy,et al.  LRIG1 opposes epithelial to mesenchymal transition and inhibits invasion of basal-like breast cancer cells , 2015, Oncogene.

[2]  L. Chin,et al.  Guanylate binding protein 1 is a novel effector of EGFR-driven invasion in glioblastoma , 2011, The Journal of experimental medicine.

[3]  T. Choueiri,et al.  Suppression of the Nitric Oxide Pathway in Metastatic Renal Cell Carcinoma Patients Receiving Vascular Endothelial Growth Factor–Signaling Inhibitors , 2010, Hypertension.

[4]  Valeri Vasioukhin,et al.  Point mutations of the N‐ras gene in the blood plasma DNA of patients with myelodysplastic syndrome or acute myelogenous leukaemia , 1994, British journal of haematology.

[5]  V. Velculescu,et al.  Ganitumab (AMG 479) Inhibits IGF-II–Dependent Ovarian Cancer Growth and Potentiates Platinum-Based Chemotherapy , 2014, Clinical Cancer Research.

[6]  H. Kung,et al.  Elevation of Soluble Guanylate Cyclase Suppresses Proliferation and Survival of Human Breast Cancer Cells , 2015, PloS one.

[7]  Lei Chen,et al.  Analysis of Tumor Suppressor Genes Based on Gene Ontology and the KEGG Pathway , 2014, PloS one.

[8]  H. Moses,et al.  Stromal fibroblasts in cancer initiation and progression , 2004, Nature.

[9]  S. Sathiya Keerthi,et al.  Which Is the Best Multiclass SVM Method? An Empirical Study , 2005, Multiple Classifier Systems.

[10]  Samantha E. Boyle,et al.  Large genomic rearrangements in the familial breast and ovarian cancer gene BRCA1 are associated with an increased frequency of high risk features , 2015, Familial Cancer.

[11]  Vladimir Vapnik,et al.  Support-vector networks , 2004, Machine Learning.

[12]  Tao Huang,et al.  Identification of compound–protein interactions through the analysis of gene ontology, KEGG enrichment for proteins and molecular fragments of compounds , 2016, Molecular Genetics and Genomics.

[13]  A. Wróbel,et al.  Effect of bisphenol-A on the expression of selected genes involved in cell cycle and apoptosis in the OVCAR-3 cell line. , 2011, Toxicology letters.

[14]  G. Rassidakis,et al.  Constitutive control of AKT-1 gene expression by JUNB / CJUN in ALK+ anaplastic large cell lymphoma: a novel crosstalk mechanism , 2015, Leukemia.

[15]  J. Arbiser,et al.  Imipramine blue halts head and neck cancer invasion through promoting F-box and leucine-rich repeat protein 14-mediated Twist1 degradation , 2016, Oncogene.

[16]  BOGUMIL M. KONOPKA,et al.  Evaluating the Significance of Protein Functional Similarity Based on Gene Ontology , 2014, J. Comput. Biol..

[17]  E. Messing,et al.  Regulation of receptor for activated C kinase 1 protein by the von Hippel–Lindau tumor suppressor in IGF-I-induced renal carcinoma cell invasiveness , 2011, Oncogene.

[18]  Kristine Crane Palliative Care Gains Ground in Developing Countries Cancer in the Developing World , 2010 .

[19]  Lai Wei,et al.  Analysis and prediction of drug–drug interaction by minimum redundancy maximum relevance and incremental feature selection , 2017, Journal of biomolecular structure & dynamics.

[20]  M. Watabe,et al.  Bone morphogenetic protein 7 in dormancy and metastasis of prostate cancer stem-like cells in bone , 2011, The Journal of experimental medicine.

[21]  Ian H. Witten,et al.  Data mining in bioinformatics using Weka , 2004, Bioinform..

[22]  S. Stürup,et al.  Dual role of LRRC8A-containing transporters on cisplatin resistance in human ovarian cancer cells. , 2016, Journal of inorganic biochemistry.

[23]  Lei Chen,et al.  Identification of gene expression signatures across different types of neural stem cells with the Monte‐Carlo feature selection method , 2018, Journal of cellular biochemistry.

[24]  Robert A. Weinberg,et al.  Heterogeneity of stromal fibroblasts in tumor , 2007 .

[25]  R. Swann,et al.  Tumor Stromal Architecture Can Define the Intrinsic Tumor Response to VEGF-Targeted Therapy , 2013, Clinical Cancer Research.

[26]  Lu Xie,et al.  SySAP: a system-level predictor of deleterious single amino acid polymorphisms , 2011, Protein & Cell.

[27]  Lei Chen,et al.  Gene expression profiling gut microbiota in different races of humans , 2016, Scientific Reports.

[28]  Chen Chu,et al.  Predicting the types of metabolic pathway of compounds using molecular fragments and sequential minimal optimization. , 2016, Combinatorial chemistry & high throughput screening.

[29]  Carlo C. Maley,et al.  Clonal evolution in cancer , 2012, Nature.

[30]  Yu-Dong Cai,et al.  Prediction of Deleterious Non-Synonymous SNPs Based on Protein Interaction Network and Hybrid Properties , 2010, PloS one.

[31]  E. Zabarovsky,et al.  Methylation pattern of the putative tumor-suppressor gene LRRC3B promoter in clear cell renal cell carcinomas. , 2011, Molecular medicine reports.

[32]  Yu-Dong Cai,et al.  The Use of Gene Ontology Term and KEGG Pathway Enrichment for Analysis of Drug Half-Life , 2016, PloS one.

[33]  David T. W. Jones,et al.  Signatures of mutational processes in human cancer , 2013, Nature.

[34]  Minoru Kanehisa,et al.  KEGG as a reference resource for gene and protein annotation , 2015, Nucleic Acids Res..

[35]  Juancarlos Chan,et al.  Gene Ontology Consortium: going forward , 2014, Nucleic Acids Res..

[36]  E. Masini,et al.  Role of nitric oxide in angiogenesis and tumor progression in head and neck cancer. , 1998, Journal of the National Cancer Institute.

[37]  Matthew W. Snyder,et al.  Cell-free DNA Comprises an In Vivo Nucleosome Footprint that Informs Its Tissues-Of-Origin , 2016, Cell.

[38]  Janica C Wong,et al.  Cyclic GMP/protein kinase G type‐Iα (PKG‐Iα) signaling pathway promotes CREB phosphorylation and maintains higher c‐IAP1, livin, survivin, and Mcl‐1 expression and the inhibition of PKG‐Iα kinase activity synergizes with cisplatin in non‐small cell lung cancer cells , 2012, Journal of cellular biochemistry.

[39]  Mang Ke,et al.  Ubiquitin ligase SMURF1 functions as a prognostic marker and promotes growth and metastasis of clear cell renal cell carcinoma , 2017, FEBS open bio.

[40]  Chen Chu,et al.  Prediction and analysis of cell-penetrating peptides using pseudo-amino acid composition and random forest models , 2015, Amino Acids.

[41]  T. Chan,et al.  The importance of analysis of long-range rearrangement of BRCA1 and BRCA2 in genetic diagnosis of familial breast cancer. , 2015, Cancer genetics.

[42]  Duccio Cavalieri,et al.  Explaining Diversity in Metagenomic Datasets by Phylogenetic-Based Feature Weighting , 2015, PLoS Comput. Biol..

[43]  Kang Zhang,et al.  Identification of methylation haplotype blocks aids in deconvolution of heterogeneous tissue samples and tumor tissue-of-origin mapping from plasma DNA , 2017, Nature Genetics.

[44]  Cheng-Lin Liu,et al.  A new method for identifying causal genes of schizophrenia and anti-tuberculosis drug-induced hepatotoxicity , 2016, Scientific Reports.

[45]  Wei Yang,et al.  P-REX1 creates a positive feedback loop to activate growth factor receptor, PI3K/AKT, and MEK/ERK signaling in breast cancer , 2014, Oncogene.

[46]  K. Tomczak,et al.  The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge , 2015, Contemporary oncology.

[47]  Tao Huang,et al.  Analysis of cancer-related lncRNAs using gene ontology and KEGG pathways , 2017, Artif. Intell. Medicine.

[48]  Thomas Lengauer,et al.  A DNA methylation fingerprint of 1628 human samples. , 2011, Genome research.

[49]  Jian Ma,et al.  Ubiquitin E3 ligase UHRF1 regulates p53 ubiquitination and p53-dependent cell apoptosis in clear cell Renal Cell Carcinoma. , 2015, Biochemical and biophysical research communications.

[50]  S. Biswas,et al.  Epigenetics in cancer: Fundamentals and Beyond. , 2017, Pharmacology & therapeutics.

[51]  Lei Chen,et al.  A Feature and Algorithm Selection Method for Improving the Prediction of Protein Structural Class. , 2017, Combinatorial chemistry & high throughput screening.

[52]  C. la Vecchia,et al.  Cancer Mortality Trend Analysis in Italy, 1980-2010, and Predictions for 2015 , 2015, Tumori.

[53]  D. Wazer,et al.  BRCA2 suppresses cell proliferation via stabilizing MAGE-D1. , 2005, Cancer research.

[54]  R. Weinberg,et al.  Heterotypic signaling between epithelial tumor cells and fibroblasts in carcinoma formation. , 2001, Experimental cell research.

[55]  Z. Zeng,et al.  LRRC4, a putative tumor suppressor gene, requires a functional leucine-rich repeat cassette domain to inhibit proliferation of glioma cells in vitro by modulating the extracellular signal-regulated kinase/protein kinase B/nuclear factor-kappaB pathway. , 2006, Molecular biology of the cell.

[56]  Abdel Kareem Azab,et al.  The role of hypoxia in cancer progression, angiogenesis, metastasis, and resistance to therapy , 2015, Hypoxia.

[57]  Z. Pawlak Rough Sets: Theoretical Aspects of Reasoning about Data , 1991 .

[58]  Lei Chen,et al.  Classifying Ten Types of Major Cancers Based on Reverse Phase Protein Array Profiles , 2015, PloS one.

[59]  S. Ou,et al.  Anaplastic Lymphoma Kinase (ALK) Signaling in Lung Cancer. , 2016, Advances in experimental medicine and biology.

[60]  Ying Ju,et al.  Pretata: predicting TATA binding proteins with novel features and dimensionality reduction strategy , 2016, BMC Systems Biology.

[61]  Chen Chu,et al.  Gene Ontology and KEGG Pathway Enrichment Analysis of a Drug Target-Based Classification System , 2015, PloS one.

[62]  K. Chou,et al.  Analysis and Prediction of the Metabolic Stability of Proteins Based on Their Sequential Features, Subcellular Locations and Interaction Networks , 2010, PloS one.

[63]  S. Fox,et al.  The cyclic GMP/protein kinase G pathway as a therapeutic target in head and neck squamous cell carcinoma. , 2016, Cancer letters.

[64]  André M. Oliveira,et al.  Molecular cytogenetic analysis for TFE3 rearrangement in Xp11.2 renal cell carcinoma and alveolar soft part sarcoma: validation and clinical experience with 75 cases , 2014, Modern Pathology.

[65]  K. Limesand,et al.  Impact of targeting insulin-like growth factor signaling in head and neck cancers. , 2013, Growth hormone & IGF research : official journal of the Growth Hormone Research Society and the International IGF Research Society.

[66]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[67]  K. Aldape,et al.  Nuclear PKM2 regulates β-catenin transactivation upon EGFR activation , 2011, Nature.

[68]  M. Stecker,et al.  A Minimally-invasive Blood-derived Biomarker of Oligodendrocyte Cell-loss in Multiple Sclerosis , 2016, EBioMedicine.

[69]  Brian Keith,et al.  HIF1α and HIF2α: sibling rivalry in hypoxic tumour growth and progression , 2011, Nature Reviews Cancer.

[70]  P. Green,et al.  Identification of p53 gene mutations in bladder cancers and urine samples. , 1991, Science.

[71]  Kurt Straif,et al.  Preventable exposures associated with human cancers. , 2011, Journal of the National Cancer Institute.

[72]  J. Werner,et al.  EGF-dependent induction of BCL-xL and p21CIP1/WAF1 is highly variable in HNSCC cells--implications for EGFR-targeted therapies. , 2010, Anticancer research.

[73]  Dara L Aisner,et al.  ROS1 and ALK Fusions in Colorectal Cancer, with Evidence of Intratumoral Heterogeneity for Molecular Drivers , 2013, Molecular Cancer Research.

[74]  J. Gorodkin,et al.  Unifying evolutionary and thermodynamic information for RNA folding of multiple alignments , 2008, Nucleic acids research.

[75]  Kurt Hornik,et al.  The support vector machine under test , 2003, Neurocomputing.

[76]  G. Wahba,et al.  Multicategory Support Vector Machines , Theory , and Application to the Classification of Microarray Data and Satellite Radiance Data , 2004 .

[77]  E. Ma,et al.  Plasma DNA tissue mapping by genome-wide methylation sequencing for noninvasive prenatal, cancer, and transplantation assessments , 2015, Proceedings of the National Academy of Sciences.

[78]  Jialiang Yang,et al.  Identify Key Sequence Features to Improve CRISPR sgRNA Efficacy , 2017, IEEE Access.

[79]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[80]  Lei Chen,et al.  Prediction and Analysis of Retinoblastoma Related Genes through Gene Ontology and KEGG , 2013, BioMed research international.

[81]  Jan Komorowski,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm486 Data and text mining Monte Carlo , 2022 .

[82]  Joachim L. Schultze,et al.  Web-TCGA: an online platform for integrated analysis of molecular cancer data sets , 2016, BMC Bioinformatics.

[83]  H. Tinsley,et al.  cGMP signaling as a target for the prevention and treatment of breast cancer. , 2015, Seminars in cancer biology.

[84]  Kai-Fai Lee,et al.  MicroRNA-141 enhances anoikis resistance in metastatic progression of ovarian cancer through targeting KLF12/Sp1/survivin axis , 2017, Molecular Cancer.

[85]  D. Magde,et al.  Activation of soluble guanylate cyclase by carbon monoxide and nitric oxide: a mechanistic model. , 1999, Methods.

[86]  Malte Buchholz,et al.  Stromal biology and therapy in pancreatic cancer , 2010, Gut.

[87]  Benjamin E. Gross,et al.  The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. , 2012, Cancer discovery.

[88]  L. Magnelli,et al.  Inducible nitric oxide synthase expression in laryngeal neoplasia: Correlation with angiogenesis , 2002, Head & neck.

[89]  V A Memoli,et al.  Soluble normal and mutated DNA sequences from single-copy genes in human blood. , 1994, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[90]  Lei Chen,et al.  Identification of Drug-Drug Interactions Using Chemical Interactions , 2017 .

[91]  Jan Gorodkin,et al.  Comparing two K-category assignments by a K-category correlation coefficient , 2004, Comput. Biol. Chem..

[92]  Liujuan Cao,et al.  A novel features ranking metric with application to scalable visual and bioinformatics data classification , 2016, Neurocomputing.

[93]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[94]  Jeffrey D. Parvin,et al.  BRCA1 regulates γ-tubulin binding to centrosomes , 2007 .

[95]  Benedikt Engels,et al.  Hypoxic tumor cell radiosensitization: role of the iNOS/NO pathway. , 2008, Bulletin du cancer.

[96]  K. Shailubhai,et al.  Uroguanylin treatment suppresses polyp formation in the Apc(Min/+) mouse and induces apoptosis in human colon adenocarcinoma cells via cyclic GMP. , 2000, Cancer research.

[97]  Michał Dramiński,et al.  Discovering Networks of Interdependent Features in High-Dimensional Problems , 2016 .

[98]  Hejun Zhang,et al.  SNHG8 is identified as a key regulator of epstein-barr virus(EBV)-associated gastric cancer by an integrative analysis of lncRNA and mRNA expression , 2016, Oncotarget.

[99]  Harris Drucker,et al.  Learning algorithms for classification: A comparison on handwritten digit recognition , 1995 .

[100]  S. Rocha Gene regulation under low oxygen: holding your breath for transcription. , 2007, Trends in biochemical sciences.