Interpretation of Genomic Variants Using a Unified Biological Network Approach

The decreasing cost of sequencing is leading to a growing repertoire of personal genomes. However, we are lagging behind in understanding the functional consequences of the millions of variants obtained from sequencing. Global system-wide effects of variants in coding genes are particularly poorly understood. It is known that while variants in some genes can lead to diseases, complete disruption of other genes, called ‘loss-of-function tolerant’, is possible with no obvious effect. Here, we build a systems-based classifier to quantitatively estimate the global perturbation caused by deleterious mutations in each gene. We first survey the degree to which gene centrality in various individual networks and a unified ‘Multinet’ correlates with the tolerance to loss-of-function mutations and evolutionary conservation. We find that functionally significant and highly conserved genes tend to be more central in physical protein-protein and regulatory networks. However, this is not the case for metabolic pathways, where the highly central genes have more duplicated copies and are more tolerant to loss-of-function mutations. Integration of three-dimensional protein structures reveals that the correlation with centrality in the protein-protein interaction network is also seen in terms of the number of interaction interfaces used. Finally, combining all the network and evolutionary properties allows us to build a classifier distinguishing functionally essential and loss-of-function tolerant genes with higher accuracy (AUC = 0.91) than any individual property. Application of the classifier to the whole genome shows its strong potential for interpretation of variants involved in Mendelian diseases and in complex disorders probed by genome-wide association studies.

[1]  D. Vitkup,et al.  Influence of metabolic network structure and function on enzyme evolution , 2006, Genome Biology.

[2]  J. Lupski,et al.  The complete genome of an individual by massively parallel DNA sequencing , 2008, Nature.

[3]  Illés J. Farkas,et al.  Uniformly curated signaling pathways reveal tissue-specific cross-talks and support drug target discovery , 2010, Bioinform..

[4]  Gary D Bader,et al.  NetPath: a public resource of curated signal transduction pathways , 2010, Genome Biology.

[5]  PagelPhilipp,et al.  The MIPS mammalian protein--protein interaction database , 2005 .

[6]  Dawei Li,et al.  The diploid genome sequence of an Asian individual , 2008, Nature.

[7]  A. Barabasi,et al.  The human disease network , 2007, Proceedings of the National Academy of Sciences.

[8]  M. Gerstein,et al.  Integration of protein motions with molecular networks reveals different mechanisms for permanent and transient interactions , 2011, Protein science : a publication of the Protein Society.

[9]  Philip M. Kim,et al.  Relating Three-Dimensional Structures to Protein Networks Provides Evolutionary Insights , 2006, Science.

[10]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[11]  Gautier Koscielny,et al.  Ensembl 2012 , 2011, Nucleic Acids Res..

[12]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[13]  Timothy B. Stockwell,et al.  The Diploid Genome Sequence of an Individual Human , 2007, PLoS biology.

[14]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[15]  Joseph K. Pickrell,et al.  A Systematic Survey of Loss-of-Function Variants in Human Protein-Coding Genes , 2012, Science.

[16]  David Z. Chen,et al.  Architecture of the human regulatory network derived from ENCODE data , 2012, Nature.

[17]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[18]  Haiyuan Yu,et al.  Three-dimensional reconstruction of protein networks provides insight into human genetic disease , 2012, Nature Biotechnology.

[19]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[20]  Jan O. Korbel,et al.  Positive selection at the protein network periphery: Evaluation in terms of structural constraints and cellular context , 2007, Proceedings of the National Academy of Sciences.

[21]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[22]  Robert D. Finn,et al.  iPfam: visualization of protein?Cprotein interactions in PDB at domain and amino acid resolutions , 2005, Bioinform..

[23]  S. Henikoff,et al.  Predicting the effects of amino acid substitutions on protein function. , 2006, Annual review of genomics and human genetics.

[24]  Susumu Goto,et al.  KEGG for representation and analysis of molecular networks involving diseases and drugs , 2009, Nucleic Acids Res..

[25]  P. Stenson,et al.  The Human Gene Mutation Database: 2008 update , 2009, Genome Medicine.

[26]  Christopher T. Saunders,et al.  Evaluation of structural and evolutionary contributions to deleterious mutation prediction. , 2002, Journal of molecular biology.

[27]  Edwin Wang,et al.  Protein evolution on a human signaling network , 2009, BMC Systems Biology.

[28]  Jianzhi Zhang,et al.  Null mutations in human and mouse orthologs frequently result in different phenotypes , 2008, Proceedings of the National Academy of Sciences.

[29]  E. Birney,et al.  Reactome: a knowledgebase of biological pathways , 2004, Nucleic Acids Research.

[30]  Michael Krawczak,et al.  The human gene mutation database , 1998, Nucleic Acids Res..

[31]  J. Qian,et al.  Understanding protein phosphorylation on a systems level. , 2010, Briefings in functional genomics.

[32]  Haiyuan Yu,et al.  Network-based methods for human disease gene prediction. , 2011, Briefings in functional genomics.

[33]  Friedhelm Hildebrandt,et al.  Transcription factor SIX5 is mutated in patients with branchio-oto-renal syndrome. , 2007, American journal of human genetics.

[34]  中尾 光輝,et al.  KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集 ゲノム医学の現在と未来--基礎と臨床) -- (データベース) , 2000 .

[35]  Giovanni Marco Dall'Olio,et al.  Molecular evolution and network-level analysis of the N-glycosylation metabolic pathway across primates. , 2011, Molecular biology and evolution.

[36]  M. DePamphilis,et al.  HUMAN DISEASE , 1957, The Ulster Medical Journal.

[37]  Dmitrij Frishman,et al.  The MIPS mammalian protein?Cprotein interaction database , 2005, Bioinform..