Machine Learning-Assisted Network Inference Approach to Identify a New Class of Genes that Coordinate the Functionality of Cancer Networks

Emerging evidence indicates the existence of a new class of cancer genes that act as “signal linkers” coordinating oncogenic signals between mutated and differentially expressed genes. While frequently mutated oncogenes and differentially expressed genes, which we term Class I cancer genes, are readily detected by most analytical tools, the new class of cancer-related genes, i.e., Class II, escape detection because they are neither mutated nor differentially expressed. Given this hypothesis, we developed a Machine Learning-Assisted Network Inference (MALANI) algorithm, which assesses all genes regardless of expression or mutational status in the context of cancer etiology. We used 8807 expression arrays, corresponding to 9 cancer types, to build more than 2 × 108 Support Vector Machine (SVM) models for reconstructing a cancer network. We found that ~3% of ~19,000 not differentially expressed genes are Class II cancer gene candidates. Some Class II genes that we found, such as SLC19A1 and ATAD3B, have been recently reported to associate with cancer outcomes. To our knowledge, this is the first study that utilizes both machine learning and network biology approaches to uncover Class II cancer genes in coordinating functionality in cancer networks and will illuminate our understanding of how genes are modulated in a tissue-specific network contribute to tumorigenesis and therapy development.

[1]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[2]  William Stafford Noble,et al.  Support vector machine , 2013 .

[3]  Joel s. Brown,et al.  Of cancer and cave fish , 2011, Nature Reviews Cancer.

[4]  D. O. Vidal,et al.  Placenta-Enriched LincRNAs MIR503HG and LINC00629 Decrease Migration and Invasion Potential of JEG-3 Cell Line , 2016, PloS one.

[5]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[6]  C. Blanpain Tracing the cellular origin of cancer , 2013, Nature Cell Biology.

[7]  Alexander Junge,et al.  KeyPathwayMiner 4.0: condition-specific pathway analysis by combining multiple omics studies and networks with Cytoscape , 2014, BMC Systems Biology.

[8]  I. Shih,et al.  Ubiquitin-proteasome system stress sensitizes ovarian cancer to proteasome inhibitor-induced apoptosis. , 2006, Cancer research.

[9]  Gordon B. Mills,et al.  Derailed endocytosis: an emerging feature of cancer , 2008, Nature Reviews Cancer.

[10]  Samantha A. Morris,et al.  CellNet: Network Biology Applied to Stem Cell Engineering , 2014, Cell.

[11]  J Alter,et al.  Progress and Promise , 1919, Nature.

[12]  R. Gatenby,et al.  Evolutionary triage governs fitness in driver and passenger mutations and suggests targeting never mutations , 2014, Nature Communications.

[13]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[14]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[15]  William Stafford Noble,et al.  Machine learning applications in genetics and genomics , 2015, Nature Reviews Genetics.

[16]  Carlo C. Maley,et al.  Clonal evolution in cancer , 2012, Nature.

[17]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[18]  F. Melchior,et al.  Sumoylation: a regulatory protein modification in health and disease. , 2013, Annual review of biochemistry.

[19]  Roland Rad,et al.  Tissue-specific tumorigenesis: context matters , 2017, Nature Reviews Cancer.

[20]  H. Aburatani,et al.  Amyloid precursor protein is a primary androgen target gene that promotes prostate cancer growth. , 2009, Cancer research.

[21]  N. Tatonetti,et al.  Connecting the Dots: Applications of Network Medicine in Pharmacology and Disease , 2013, Clinical pharmacology and therapeutics.

[22]  Dominik Heider,et al.  Compensation of feature selection biases accompanied with improved predictive performance for binary classification by using a novel ensemble feature selection approach , 2016, BioData Mining.

[23]  Jan Baumbach,et al.  On the performance of de novo pathway enrichment , 2017, npj Systems Biology and Applications.

[24]  A. Harris,et al.  The ubiquitin-proteasome pathway in cancer. , 1998, British Journal of Cancer.

[25]  M. Stratton Exploring the Genomes of Cancer Cells: Progress and Promise , 2011, Science.

[26]  Tobias Friedrich,et al.  Efficient key pathway mining: combining networks and OMICS data. , 2012, Integrative biology : quantitative biosciences from nano to macro.

[27]  Hua Huang,et al.  Abstract 1485: Sunitinib enhances the antitumor responses of agonistic CD40-antibody by reducing MDSCs and synergistically improving endothelial activation and T-cell recruitment , 2016 .

[28]  A. Børresen-Dale,et al.  The landscape of cancer genes and mutational processes in breast cancer , 2012, Nature.

[29]  Sirajul Salekin,et al.  A Robust and Efficient Feature Selection Algorithm for Microarray Data , 2017, Molecular informatics.

[30]  Francisco S. Roque,et al.  A large-scale analysis of tissue-specific pathology and gene expression of human disease genes and complexes , 2008, Proceedings of the National Academy of Sciences.

[31]  Benjamin J. Raphael,et al.  Expanding the computational toolbox for mining cancer genomes , 2014, Nature Reviews Genetics.

[32]  A. Letai,et al.  KPT‐330 inhibitor of CRM1 (XPO1)‐mediated nuclear export has selective anti‐leukaemic activity in preclinical models of T‐cell acute lymphoblastic leukaemia and acute myeloid leukaemia , 2013, British journal of haematology.

[33]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumours , 2013 .

[34]  Pamela A. Silver,et al.  Nuclear transport and cancer: from mechanism to intervention , 2004, Nature Reviews Cancer.

[35]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[36]  Wei-Po Lee,et al.  Computational methods for discovering gene networks from expression data , 2009, Briefings Bioinform..

[37]  Ravindra K. Ahuja,et al.  Network Flows: Theory, Algorithms, and Applications , 1993 .

[38]  Y. Hua,et al.  The Essential Role of H19 Contributing to Cisplatin Resistance by Regulating Glutathione Metabolism in High-Grade Serous Ovarian Cancer , 2016, Scientific Reports.

[39]  O. Stegle,et al.  Deep learning for computational biology , 2016, Molecular systems biology.

[40]  Martin H. Schaefer,et al.  Cell type-specific properties and environment shape tissue specificity of cancer genes , 2016, Scientific Reports.

[41]  T. Mak,et al.  Regulation of cancer cell metabolism , 2011, Nature Reviews Cancer.

[42]  Y. Z. Chen,et al.  In Silico Prediction of Pregnane X Receptor Activators by Machine Learning Approache , 2007, Molecular Pharmacology.

[43]  M. Klein A Primal Method for Minimal Cost Flows with Applications to the Assignment and Transportation Problems , 1966 .

[44]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[45]  Hu Li,et al.  NetDecoder: a network biology platform that decodes context-specific biological networks and gene activities , 2016, Nucleic acids research.

[46]  T. Bayer,et al.  Histone Deacetylase Inhibitor Valproic Acid Inhibits Cancer Cell Proliferation via Down-regulation of the Alzheimer Amyloid Precursor Protein* , 2010, The Journal of Biological Chemistry.

[47]  Adrian Zafiu,et al.  Theory, Algorithms and Applications for Solar Panel MPP Tracking , 2010 .

[48]  Daniel Marbach,et al.  Tissue-specific regulatory circuits reveal variable modular perturbations across complex diseases , 2016, Nature Methods.

[49]  Amy Brock,et al.  Silencing HoxA1 by Intraductal Injection of siRNA Lipidoid Nanoparticles Prevents Mammary Tumor Progression in Mice , 2014, Science Translational Medicine.

[50]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[51]  Eunhee Kim,et al.  RNA splicing factors as oncoproteins and tumour suppressors , 2016, Nature Reviews Cancer.

[52]  B. Frey,et al.  The human splicing code reveals new insights into the genetic determinants of disease , 2015, Science.

[53]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[54]  Chengyu Liu,et al.  Integrative Analysis of Deep Sequencing Data Identifies Estrogen Receptor Early Response Genes and Links ATAD3B to Poor Survival in Breast Cancer , 2013, PLoS Comput. Biol..

[55]  Thomas M. Cover,et al.  Elements of Information Theory: Cover/Elements of Information Theory, Second Edition , 2005 .

[56]  F. Bruggeman,et al.  Cancer: a Systems Biology disease. , 2006, Bio Systems.

[57]  M. Roizen,et al.  Hallmarks of Cancer: The Next Generation , 2012 .

[58]  R. Gibbs,et al.  Genomic analyses identify molecular subtypes of pancreatic cancer , 2016, Nature.

[59]  A. G. de la Fuente From 'differential expression' to 'differential networking' - identification of dysfunctional regulatory networks in diseases. , 2010, Trends in genetics : TIG.

[60]  Daniela Hoeller,et al.  Ubiquitin and ubiquitin-like proteins in cancer pathogenesis , 2006, Nature Reviews Cancer.

[61]  Benjamin J. Raphael,et al.  Integrated Genomic Analyses of Ovarian Carcinoma , 2011, Nature.

[62]  G. Collins The next generation. , 2006, Scientific American.

[63]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumors , 2012, Nature.

[64]  Srinivasan Parthasarathy,et al.  A single source k-shortest paths algorithm to infer regulatory pathways in a gene network , 2012, Bioinform..

[65]  E. Guney,et al.  Exploiting Protein-Protein Interaction Networks for Genome-Wide Disease-Gene Prioritization , 2012, PloS one.

[66]  Benjamin J. Raphael,et al.  Pan-Cancer Network Analysis Identifies Combinations of Rare Somatic Mutations across Pathways and Protein Complexes , 2014, Nature Genetics.

[67]  A. Ashworth,et al.  Cytochrome P450 Allele CYP3A7*1C Associates with Adverse Outcomes in Chronic Lymphocytic Leukemia, Breast, and Lung Cancer. , 2016, Cancer research.

[68]  Robert Gatenby,et al.  Perspective: Finding cancer's first principles , 2012, Nature.

[69]  Joshua I Gold,et al.  Context Matters , 2004, Neuron.

[70]  Zhi-Wei Cao,et al.  Effect of Selection of Molecular Descriptors on the Prediction of Blood-Brain Barrier Penetrating and Nonpenetrating Agents by Statistical Learning Methods , 2005, J. Chem. Inf. Model..

[71]  Justin Schwartz Engineering , 1929, Nature.

[72]  Sorin Draghici,et al.  Machine Learning and Its Applications to Biology , 2007, PLoS Comput. Biol..

[73]  A. Yannarell,et al.  Changes in nuclear RNA transport incident to carcinogenesis. , 1977, European journal of cancer.

[74]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.