A Guided Network Propagation Approach to Identify Disease Genes that Combines Prior and New Information

A major challenge in biomedical data science is to identify the causal genes underlying complex genetic diseases. Despite the massive influx of genome sequencing data, identifying disease-relevant genes remains difficult as individuals with the same disease may share very few, if any, genetic variants. Protein-protein interaction networks provide a means to tackle this heterogeneity, as genes causing the same disease tend to be proximal within networks. Previously, network propagation approaches have spread signal across the network from either known disease genes or genes that are newly putatively implicated in the disease (e.g., found to be mutated in exome studies or linked via genome-wide association studies). Here we introduce a general framework that considers both sources of data within a network context. Specifically, we use prior knowledge of disease-associated genes to guide random walks initiated from genes that are newly identified as perhaps disease-relevant. In large-scale testing across 24 cancer types, we demonstrate that our approach for integrating both prior and new information not only better identifies cancer driver genes than using either source of information alone but also readily outperforms other state-of-the-art network-based approaches. To demonstrate the versatility of our approach, we also apply it to genome-wide association data to identify genes functionally relevant for several complex diseases. Overall, our work suggests that guided network propagation approaches that utilize both prior and new data are a powerful means to identify disease genes.

[1]  T. Przytycka,et al.  Bridging the Gap between Genotype and Phenotype via Network Approaches , 2013, Front. Genet..

[2]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[3]  Sourav Bandyopadhyay,et al.  Challenges in identifying cancer genes by analysis of exome sequencing data , 2016, Nature Communications.

[4]  Benjamin J. Raphael,et al.  Pan-Cancer Network Analysis Identifies Combinations of Rare Somatic Mutations across Pathways and Protein Complexes , 2014, Nature Genetics.

[5]  K. N. Chandrika,et al.  Analysis of the human protein interactome and comparison with yeast, worm and fly interaction datasets , 2006, Nature Genetics.

[6]  K. Zhang,et al.  Smurf1 regulates lung cancer cell growth and migration through interaction with and ubiquitination of PIPKIγ , 2017, Oncogene.

[7]  J. Hopfield,et al.  From molecular to modular cell biology , 1999, Nature.

[8]  A. Barabasi,et al.  Network medicine : a network-based approach to human disease , 2010 .

[9]  M. Martina,et al.  Mutant ataxin1 disrupts cerebellar development in spinocerebellar ataxia type 1 , 2018, The Journal of clinical investigation.

[10]  E. Marcotte,et al.  It's the machine that matters: Predicting gene function and phenotype from protein networks. , 2010, Journal of proteomics.

[11]  E. Lander,et al.  Lessons from the Cancer Genome , 2013, Cell.

[12]  Mona Singh,et al.  Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps , 2005, ISMB.

[13]  Roded Sharan,et al.  Network-Based Integration of Disparate Omic Data To Identify "Silent Players" in Cancer , 2015, PLoS Comput. Biol..

[14]  Steven A. Roberts,et al.  Mutational heterogeneity in cancer and the search for new cancer-associated genes , 2013 .

[15]  T. Hubbard,et al.  A census of human cancer genes , 2004, Nature Reviews Cancer.

[16]  M. DePamphilis,et al.  HUMAN DISEASE , 1957, The Ulster Medical Journal.

[17]  David Haussler,et al.  Discovering causal pathways linking genomic events to transcriptional states using Tied Diffusion Through Interacting Events (TieDIE) , 2013, Bioinform..

[18]  T. Gilliam,et al.  Molecular triangulation: bridging linkage and molecular-network information for identifying candidate genes in Alzheimer's disease. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[19]  C. Sander,et al.  Automated Network Analysis Identifies Core Pathways in Glioblastoma , 2010, PloS one.

[20]  Damian Smedley,et al.  Walking the interactome for candidate prioritization in exome sequencing studies of Mendelian diseases , 2014, Bioinform..

[21]  Mona Singh,et al.  Differential analysis between somatic mutation and germline variation profiles reveals cancer-related genes , 2017, Genome Medicine.

[22]  Peilin Jia,et al.  VarWalker: Personalized Mutation Network Analysis of Putative Cancer Genes from Next-Generation Sequencing Data , 2014, PLoS Comput. Biol..

[23]  Mehmet Koyutürk,et al.  DADA: Degree-Aware Algorithms for Network-Based Disease Gene Prioritization , 2011, BioData Mining.

[24]  Ryan L. Collins,et al.  Variation across 141,456 human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes , 2019, bioRxiv.

[25]  Eli Upfal,et al.  Algorithms for Detecting Significantly Mutated Pathways in Cancer , 2010, RECOMB.

[26]  A. Bashashati,et al.  DriverNet: uncovering the impact of somatic driver mutations on transcriptional networks in cancer , 2012, Genome Biology.

[27]  Jing Chen,et al.  Disease candidate gene identification and prioritization using protein interaction networks , 2009, BMC Bioinformatics.

[28]  L. Mirny,et al.  Protein complexes and functional modules in molecular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Marcel J. T. Reinders,et al.  Detecting recurrent gene mutation in interaction network context using multi-scale graph diffusion , 2013, BMC Bioinformatics.

[30]  Lin Gao,et al.  Discovering potential cancer driver genes by an integrated network-based approach. , 2016, Molecular bioSystems.

[31]  Teresa M. Przytycka,et al.  Identifying Causal Genes and Dysregulated Pathways in Complex Diseases , 2011, PLoS Comput. Biol..

[32]  K. Zhu,et al.  Smad1 promotes colorectal cancer cell migration through Ajuba transactivation , 2017, Oncotarget.

[33]  K. Kinzler,et al.  Cancer Genome Landscapes , 2013, Science.

[34]  M. Oti,et al.  The modular nature of genetic diseases , 2006, Clinical genetics.

[35]  Noah M. Daniels,et al.  Going the Distance for Protein Function Prediction: A New Distance Metric for Protein Interaction Networks , 2013, PloS one.

[36]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[37]  Trey Ideker,et al.  A Fast and Flexible Framework for Network-Assisted Genomic Association , 2019, iScience.

[38]  Maxime W. C. Rousseaux,et al.  ATXN1-CIC Complex Is the Primary Driver of Cerebellar Pathology in Spinocerebellar Ataxia Type 1 through a Gain-of-Function Mechanism , 2018, Neuron.

[39]  Carl Kingsford,et al.  The power of protein interaction networks for associating genes with diseases , 2010, Bioinform..

[40]  Benjamin J. Raphael,et al.  Network propagation: a universal amplifier of genetic associations , 2017, Nature Reviews Genetics.

[41]  Thomas Sauerwald,et al.  HIT'nDRIVE: Multi-driver Gene Prioritization Based on Hitting Time , 2014, RECOMB.

[42]  Takeshi Yoshida,et al.  Nuclear receptor TLX inhibits TGF-β signaling in glioblastoma. , 2016, Experimental cell research.

[43]  Helen E. Parkinson,et al.  The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019 , 2018, Nucleic Acids Res..

[44]  J. Bader,et al.  Finding friends and enemies in an enemies-only network: a graph diffusion kernel for predicting novel genetic interactions and co-complex membership from yeast genetic interactions. , 2008, Genome research.

[45]  F. Supek,et al.  MUFFINN: cancer gene discovery via network analysis of somatic mutation data , 2016, Genome Biology.

[46]  E. Marcotte,et al.  Prioritizing candidate disease genes by network-based boosting of genome-wide association data. , 2011, Genome research.

[47]  A. Barabasi,et al.  The human disease network , 2007, Proceedings of the National Academy of Sciences.

[48]  E. Choi,et al.  Ataxin-1 is involved in tumorigenesis of cervical cancer cells via the EGFR–RAS–MAPK signaling pathway , 2017, Oncotarget.

[49]  W. Lee,et al.  Crosstalk between CCL7 and CCR3 promotes metastasis of colon cancer cells via ERK-JNK signaling pathways , 2016, Oncotarget.

[50]  Roded Sharan,et al.  Associating Genes and Protein Complexes with Disease via Network Propagation , 2010, PLoS Comput. Biol..

[51]  P. Robinson,et al.  Walking the interactome for prioritization of candidate disease genes. , 2008, American journal of human genetics.

[52]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[53]  Daniel E. Carlin,et al.  The Emerging Potential for Network Analysis to Inform Precision Cancer Medicine. , 2018, Journal of molecular biology.