ToppGene Suite for gene list enrichment analysis and candidate gene prioritization

ToppGene Suite (http://toppgene.cchmc.org; this web site is free and open to all users and does not require a login to access) is a one-stop portal for (i) gene list functional enrichment, (ii) candidate gene prioritization using either functional annotations or network analysis and (iii) identification and prioritization of novel disease candidate genes in the interactome. Functional annotation-based disease candidate gene prioritization uses a fuzzy-based similarity measure to compute the similarity between any two genes based on semantic annotations. The similarity scores from individual features are combined into an overall score using statistical meta-analysis. A P-value of each annotation of a test gene is derived by random sampling of the whole genome. The protein–protein interaction network (PPIN)-based disease candidate gene prioritization uses social and Web networks analysis algorithms (extended versions of the PageRank and HITS algorithms, and the K-Step Markov method). We demonstrate the utility of ToppGene Suite using 20 recently reported GWAS-based gene–disease associations (including novel disease genes) representing five diseases. ToppGene ranked 19 of 20 (95%) candidate genes within the top 20%, while ToppNet ranked 12 of 16 (75%) candidate genes among the top 20%.

[1]  Dipanwita Roy Chowdhury,et al.  Human protein reference database as a discovery resource for proteomics , 2004, Nucleic Acids Res..

[2]  Shuhong Zhao,et al.  Candidate Gene Identification Approach: Progress and Challenges , 2007, International journal of biological sciences.

[3]  P. Robinson,et al.  Walking the interactome for prioritization of candidate disease genes. , 2008, American journal of human genetics.

[4]  Peter M Visscher,et al.  Prioritization of Positional Candidate Genes Using Multiple Web-Based Software Tools , 2007, Twin Research and Human Genetics.

[5]  Jing Chen,et al.  Improved human disease candidate gene prioritization using mouse phenotype , 2007, BMC Bioinformatics.

[6]  Alan R. Powell,et al.  Integration of text- and data-mining using ontologies successfully selects disease gene candidates , 2005, Nucleic acids research.

[7]  Michael Q. Zhang,et al.  Network-based global inference of human disease genes , 2008, Molecular systems biology.

[8]  Frances S. Turner,et al.  POCUS: mining genomic sequence annotation to predict disease genes , 2003, Genome Biology.

[9]  Avi Ma'ayan,et al.  Genes2Networks: connecting lists of gene symbols using mammalian protein interactions databases , 2007, BMC Bioinformatics.

[10]  Ian Witten,et al.  Data Mining , 2000 .

[11]  Mauno Vihinen,et al.  Identification of candidate disease genes by integrating Gene Ontologies and protein-interaction networks: case study of primary immunodeficiencies , 2008, Nucleic acids research.

[12]  David J. Porteous,et al.  Speeding disease gene discovery by sequence based candidate prioritization , 2005, BMC Bioinformatics.

[13]  David S Sanders,et al.  Newly identified genetic risk variants for celiac disease related to the immune response , 2008, Nature Genetics.

[14]  Padhraic Smyth,et al.  Algorithms for estimating relative importance in networks , 2003, KDD '03.

[15]  David J. Porteous,et al.  SUSPECTS : enabling fast and effective prioritization of positional candidates , 2005 .

[16]  Yongjin Li,et al.  Discovering disease-genes by topological features in human protein-protein interaction network , 2006, Bioinform..

[17]  A. Eyre-Walker,et al.  Human disease genes: patterns and predictions. , 2003, Gene.

[18]  David Valle,et al.  Human disease genes , 2001, Nature.

[19]  Jan Freudenberg,et al.  A similarity-based method for genome-wide prediction of disease-relevant human genes , 2002, ECCB.

[20]  Paul W. Franks,et al.  Replication and extension of genome-wide association study results for obesity in 4923 adults from northern Sweden , 2009, Human molecular genetics.

[21]  Changyu Shen,et al.  Mining Alzheimer Disease Relevant Proteins from Integrated Protein Interactome Data , 2005, Pacific Symposium on Biocomputing.

[22]  James M. Keller,et al.  Fuzzy Measures on the Gene Ontology for Gene Product Similarity , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[23]  Alastair Forbes,et al.  Genetic determinants of ulcerative colitis include the ECM1 locus and five loci implicated in Crohn's disease , 2008, Nature Genetics.

[24]  A. Clarke,et al.  Murine genetic models of human disease. , 1994, Current opinion in genetics & development.

[25]  Ian M. Donaldson,et al.  BIND: the Biomolecular Interaction Network Database , 2001, Nucleic Acids Res..

[26]  S. L. Wong,et al.  Towards a proteome-scale map of the human protein–protein interaction network , 2005, Nature.

[27]  T. Karlsen,et al.  Replication of signals from recent studies of Crohn's disease identifies previously unknown disease loci for ulcerative colitis , 2008, Nature Genetics.

[28]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2008 update , 2008, Nucleic Acids Res..

[29]  David Reich,et al.  A common MYBPC3 (cardiac myosin binding protein C) variant associated with cardiomyopathies in South Asia , 2009, Nature Genetics.

[30]  H. Lehrach,et al.  A Human Protein-Protein Interaction Network: A Resource for Annotating the Proteome , 2005, Cell.

[31]  Jing Chen,et al.  Disease candidate gene identification and prioritization using protein interaction networks , 2009, BMC Bioinformatics.

[32]  Judy H. Cho,et al.  Genome-wide association defines more than 30 distinct susceptibility loci for Crohn's disease , 2008, Nature Genetics.

[33]  Allard C van der Wal,et al.  Disruption of Abcc6 in the mouse: novel insight in the pathogenesis of pseudoxanthoma elasticum. , 2005, Human molecular genetics.

[34]  F J McMahon,et al.  Convergent functional genomics of genome‐wide association data for bipolar disorder: Comprehensive identification of candidate genes, pathways and mechanisms , 2009, American journal of medical genetics. Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics.

[35]  Bart De Moor,et al.  Endeavour update: a web resource for gene prioritization in multiple species , 2008, Nucleic Acids Res..

[36]  Frances S. Turner,et al.  Computational disease gene identification: a concert of methods prioritizes type 2 diabetes and obesity candidate genes , 2006, Nucleic acids research.

[37]  Mathieu Lemire,et al.  Common variants in the NLRP3 region contribute to Crohn's disease susceptibility , 2009, Nature Genetics.

[38]  Jason Y. Liu,et al.  Analysis of protein sequence and interaction data for candidate disease gene prediction , 2006, Nucleic acids research.

[39]  Bassem A. Hassan,et al.  Gene prioritization through genomic data fusion , 2006, Nature Biotechnology.

[40]  A. Barabasi,et al.  The human disease network , 2007, Proceedings of the National Academy of Sciences.

[41]  Maricel G. Kann,et al.  Protein interactions and disease: computational approaches to uncover the etiology of diseases , 2007, Briefings Bioinform..

[42]  Falk Schreiber,et al.  Exploration of biological network centralities with CentiBiN , 2006, BMC Bioinformatics.

[43]  Han G. Brunner,et al.  Mutation of the gene encoding the ROR2 tyrosine kinase causes autosomal recessive Robinow syndrome , 2000, Nature Genetics.

[44]  M. DePamphilis,et al.  HUMAN DISEASE , 1957, The Ulster Medical Journal.