Syndrome to gene (S2G): in‐silico identification of candidate genes for human diseases

The identification of genomic loci associated with human genetic syndromes has been significantly facilitated through the generation of high density SNP arrays. However, optimal selection of candidate genes from within such loci is still a tedious labor‐intensive bottleneck. Syndrome to Gene (S2G) is based on novel algorithms which allow an efficient search for candidate genes in a genomic locus, using known genes whose defects cause phenotypically similar syndromes. S2G (http://fohs.bgu.ac.il/s2g/index.html) includes two components: a phenotype Online Mendelian Inheritance in Man (OMIM)‐based search engine that alleviates many of the problems in the existing OMIM search engine (negation phrases, overlapping terms, etc.). The second component is a gene prioritizing engine that uses a novel algorithm to integrate information from 18 databases. When the detailed phenotype of a syndrome is inserted to the web‐based software, S2G offers a complete improved search of the OMIM database for similar syndromes. The software then prioritizes a list of genes from within a genomic locus, based on their association with genes whose defects are known to underlie similar clinical syndromes. We demonstrate that in all 30 cases of novel disease genes identified in the past year, the disease gene was within the top 20% of candidate genes predicted by S2G, and in most cases—within the top 10%. Thus, S2G provides clinicians with an efficient tool for diagnosis and researchers with a candidate gene prediction tool based on phenotypic data and a wide range of gene data resources. S2G can also serve in studies of polygenic diseases, and in finding interacting molecules for any gene of choice. Hum Mutat 30:1–8, 2010. © 2010 Wiley‐Liss, Inc.

[1]  Francesco Pinciroli,et al.  GFINDer: genetic disease and phenotype location statistical analysis and mining of dynamically annotated gene lists , 2005, Nucleic Acids Res..

[2]  G. Narkis,et al.  Lethal contractural syndrome type 3 (LCCS3) is caused by a mutation in PIP5K1C, which encodes PIPKI gamma of the phophatidylinsitol pathway. , 2007, American journal of human genetics.

[3]  A. Eyre-Walker,et al.  Human disease genes: patterns and predictions. , 2003, Gene.

[4]  B. Snel,et al.  Predicting disease genes using protein–protein interactions , 2006, Journal of Medical Genetics.

[5]  Frances S. Turner,et al.  POCUS: mining genomic sequence annotation to predict disease genes , 2003, Genome Biology.

[6]  Gert Vriend,et al.  GeneSeeker: extraction and integration of human disease-related information from web-based genetic databases , 2005, Nucleic Acids Res..

[7]  Leo Goodstadt,et al.  Evolutionary conservation and selection of human disease gene orthologs in the rat and mouse genomes , 2004, Genome Biology.

[8]  Jan Freudenberg,et al.  A similarity-based method for genome-wide prediction of disease-relevant human genes , 2002, ECCB.

[9]  Luca Benini,et al.  TOM: a web-based integrated approach for identification of candidate disease genes , 2006, Nucleic Acids Res..

[10]  S S Morse Patterns and predictability in emerging infections. , 1996, Hospital practice.

[11]  Wendy W. Chapman,et al.  A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries , 2001, J. Biomed. Informatics.

[12]  Chiara Romualdi,et al.  Novel genes, possibly relevant for molecular diagnosis or therapy of human rhabdomyosarcoma, detected by genomic expression profiling. , 2005, Gene.

[13]  M. Huynen,et al.  Phenome connections. , 2008, Trends in genetics : TIG.

[14]  C. Ouzounis,et al.  Genome-wide identification of genes likely to be involved in human genetic disease. , 2004, Nucleic acids research.

[15]  P. Robinson,et al.  Walking the interactome for prioritization of candidate disease genes. , 2008, American journal of human genetics.

[16]  G. Narkis,et al.  Lethal congenital contractural syndrome type 2 (LCCS2) is caused by a mutation in ERBB3 (Her3), a modulator of the phosphatidylinositol-3-kinase/Akt pathway. , 2007, American journal of human genetics.

[17]  P. Bork,et al.  G2D: a tool for mining genes associated with disease , 2005, BMC Genetics.

[18]  C. Wijmenga,et al.  Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. , 2006, American journal of human genetics.

[19]  Alan R. Powell,et al.  Integration of text- and data-mining using ontologies successfully selects disease gene candidates , 2005, Nucleic acids research.

[20]  Jun'ichi Tsujii,et al.  GENIA corpus - a semantically annotated corpus for bio-textmining , 2003, ISMB.

[21]  R. Sharan,et al.  Protein networks in disease. , 2008, Genome research.

[22]  Bassem A. Hassan,et al.  Gene prioritization through genomic data fusion , 2006, Nature Biotechnology.

[23]  David J. Porteous,et al.  Speeding disease gene discovery by sequence based candidate prioritization , 2005, BMC Bioinformatics.

[24]  David J. Porteous,et al.  SUSPECTS : enabling fast and effective prioritization of positional candidates , 2005 .

[25]  Jaime Prilusky,et al.  GeneCards: a novel functional genomics compendium with automated data mining and query reformulation support , 1998, Bioinform..

[26]  M. DePamphilis,et al.  HUMAN DISEASE , 1957, The Ulster Medical Journal.

[27]  Pall I. Olason,et al.  A human phenome-interactome network of protein complexes implicated in genetic disorders , 2007, Nature Biotechnology.