Identifying Relationships among Genomic Disease Regions: Predicting Genes at Pathogenic SNP Associations and Rare Deletions

Translating a set of disease regions into insight about pathogenic mechanisms requires not only the ability to identify the key disease genes within them, but also the biological relationships among those key genes. Here we describe a statistical method, Gene Relationships Among Implicated Loci (GRAIL), that takes a list of disease regions and automatically assesses the degree of relatedness of implicated genes using 250,000 PubMed abstracts. We first evaluated GRAIL by assessing its ability to identify subsets of highly related genes in common pathways from validated lipid and height SNP associations from recent genome-wide studies. We then tested GRAIL, by assessing its ability to separate true disease regions from many false positive disease regions in two separate practical applications in human genetics. First, we took 74 nominally associated Crohn's disease SNPs and applied GRAIL to identify a subset of 13 SNPs with highly related genes. Of these, ten convincingly validated in follow-up genotyping; genotyping results for the remaining three were inconclusive. Next, we applied GRAIL to 165 rare deletion events seen in schizophrenia cases (less than one-third of which are contributing to disease risk). We demonstrate that GRAIL is able to identify a subset of 16 deletions containing highly related genes; many of these genes are expressed in the central nervous system and play a role in neuronal synapses. GRAIL offers a statistically robust approach to identifying functionally related genes from across multiple disease regions—that likely represent key disease pathways. An online version of this method is available for public use (http://www.broad.mit.edu/mpg/grail/).

[1]  M. Daly,et al.  Identifying relationships among genomic disease regions: predicting= pathogenic SNP associations and rare deletions , 2009 .

[2]  P. Buckley Rare Structural Variants Disrupt Multiple Genes in Neurodevelopmental Pathways in Schizophrenia , 2009 .

[3]  Alexander A. Morgan,et al.  FitSNPs: highly differentially expressed genes are more likely to have variants associated with disease , 2008, Genome Biology.

[4]  P. Visscher,et al.  Rare chromosomal deletions and duplications increase risk of schizophrenia , 2008, Nature.

[5]  Thomas W. Mühleisen,et al.  Large recurrent microdeletions associated with schizophrenia , 2008, Nature.

[6]  Judy H. Cho,et al.  Genome-wide association defines more than 30 distinct susceptibility loci for Crohn's disease , 2008, Nature Genetics.

[7]  Michael R. Seringhaus,et al.  Seeking a New Biology through Text Mining , 2008, Cell.

[8]  T. Gilliam,et al.  Genetic-linkage mapping of complex hereditary disorders to a whole-genome molecular-interaction network. , 2008, Genome research.

[9]  Joseph A. Gogos,et al.  Strong association of de novo copy number mutations with sporadic schizophrenia , 2008, Nature Genetics.

[10]  Bjarni V. Halldórsson,et al.  Many sequence variants affecting diversity of adult human height , 2008, Nature Genetics.

[11]  C. Gieger,et al.  Identification of ten loci associated with height highlights new biological pathways in human growth , 2008, Nature Genetics.

[12]  David M. Evans,et al.  Genome-wide association analysis identifies 20 loci that influence adult height , 2008, Nature Genetics.

[13]  A. Singleton,et al.  Rare Structural Variants Disrupt Multiple Genes in Neurodevelopmental Pathways in Schizophrenia , 2008, Science.

[14]  Joshua M. Korn,et al.  Association between microdeletion and microduplication at 16p11.2 and autism. , 2008, The New England journal of medicine.

[15]  R. Collins,et al.  Newly identified loci that influence lipid concentrations and risk of coronary artery disease , 2008, Nature Genetics.

[16]  Dolores Corella,et al.  Six new loci associated with blood low-density lipoprotein cholesterol, high-density lipoprotein cholesterol or triglycerides in humans , 2008, Nature Genetics.

[17]  Peter M Visscher,et al.  Prioritization of Positional Candidate Genes Using Multiple Web-Based Software Tools , 2007, Twin Research and Human Genetics.

[18]  Kai Wang,et al.  Pathway-based approaches for analysis of genomewide association studies. , 2007, American journal of human genetics.

[19]  D. Koller,et al.  Population genomics of human gene expression , 2007, Nature Genetics.

[20]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[21]  Pall I. Olason,et al.  A human phenome-interactome network of protein complexes implicated in genetic disorders , 2007, Nature Biotechnology.

[22]  Tatiana A. Tatusova,et al.  Entrez Gene: gene-centered information at NCBI , 2004, Nucleic Acids Res..

[23]  V. Deretic,et al.  Human IRGM Induces Autophagy to Eliminate Intracellular Mycobacteria , 2006, Science.

[24]  C. Wijmenga,et al.  Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. , 2006, American journal of human genetics.

[25]  Soumya Raychaudhuri Computational text analysis for funtional genomics and bioinformatics , 2006 .

[26]  P. Hevezi,et al.  Gene expression analyses reveal molecular relationships among 20 regions of the human CNS , 2006, Neurogenetics.

[27]  P. Bork,et al.  Literature mining for the biologist: from information retrieval to biological discovery , 2006, Nature Reviews Genetics.

[28]  S. Nelson,et al.  Molecular taxonomy of major neuronal classes in the adult mouse forebrain , 2006, Nature Neuroscience.

[29]  Seung-Hye Lee,et al.  Synaptic adhesion molecules , 2006 .

[30]  P. Donnelly,et al.  A Fine-Scale Map of Recombination Rates and Hotspots Across the Human Genome , 2005, Science.

[31]  A. Valencia,et al.  Text-mining and information-retrieval services for molecular biology , 2005, Genome Biology.

[32]  M. Daly,et al.  Ipr1 gene mediates innate immunity to tuberculosis , 2005, Nature.

[33]  N. Tonks,et al.  The Conserved Immunoglobulin Domain Controls the Subcellular Localization of the Homophilic Adhesion Receptor Protein-tyrosine Phosphatase μ* , 2005, Journal of Biological Chemistry.

[34]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[35]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[36]  Paul J. Harrison,et al.  Schizophrenia genes, gene expression, and neuropathology: on the matter of their convergence , 2005, Molecular Psychiatry.

[37]  G. Blanco,et al.  Filamin C interacts with the muscular dystrophy KY protein and is abnormally distributed in mouse KY deficient muscle fibres. , 2004, Human molecular genetics.

[38]  T. Gilliam,et al.  Molecular triangulation: bridging linkage and molecular-network information for identifying candidate genes in Alzheimer's disease. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[39]  S. Batalov,et al.  A gene atlas of the mouse and human protein-encoding transcriptomes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[40]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[41]  T. Speed,et al.  Summaries of Affymetrix GeneChip probe level data. , 2003, Nucleic acids research.

[42]  P. Bork,et al.  Association of genes to genetically inherited diseases using data mining , 2002, Nature Genetics.

[43]  I. Orme,et al.  Mice Lacking Bioactive IL-12 Can Generate Protective, Antigen-Specific Cellular Responses to Mycobacterial Infection Only if the IL-12 p40 Subunit Is Present1 , 2002, The Journal of Immunology.

[44]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[45]  J. Casanova,et al.  Genetic dissection of immunity to mycobacteria: the human model. , 2002, Annual review of immunology.

[46]  K. Watanabe,et al.  Neural recognition molecule NB‐2 of the contactin/F3 subgroup in rat: Specificity in neurite outgrowth‐promoting activity and restricted expression in the brain regions , 2001, Journal of neuroscience research.

[47]  C. Ponting,et al.  The kyphoscoliosis (ky) mouse is deficient in hypertrophic responses and is caused by a mutation in a novel muscle-specific protein. , 2001, Human molecular genetics.

[48]  Y. Taketani,et al.  Human NB-2 of the contactin subgroup molecules: chromosomal localization of the gene (CNTN5) and distinct expression pattern from other subgroup members. , 2000, Genomics.

[49]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[50]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[51]  Y. Hata,et al.  A Novel Multiple PDZ Domain-containing Molecule Interacting withN-Methyl-d-aspartateReceptors and Neuronal Cell Adhesion Proteins* , 1998, The Journal of Biological Chemistry.

[52]  Y. Nakamura,et al.  Cloning and characterization of BAI-associated protein 1: a PDZ domain-containing protein that interacts with BAI1. , 1998, Biochemical and biophysical research communications.

[53]  A. Beyer,et al.  Human PEX1 is mutated in complementation group 1 of the peroxisome biogenesis disorders , 1997, Nature Genetics.

[54]  I. Orme,et al.  Interleukin 12 (IL-12) Is Crucial to the Development of Protective Immunity in Mice Intravenously Infected with Mycobacterium tuberculosis , 1997, The Journal of experimental medicine.

[55]  T. Taniguchi,et al.  Requirement for transcription factor IRF-1 in NO synthase induction in macrophages. , 1994, Science.

[56]  T. Tsukamoto,et al.  A human gene responsible for Zellweger syndrome that affects peroxisome assembly. , 1992, Science.

[57]  R. Kelley Review: the cerebrohepatorenal syndrome of Zellweger, morphologic and metabolic aspects. , 1983, American journal of medical genetics.