ToppCluster: a multiple gene list feature analyzer for comparative enrichment clustering and network-based dissection of biological systems

ToppCluster is a web server application that leverages a powerful enrichment analysis and underlying data environment for comparative analyses of multiple gene lists. It generates heatmaps or connectivity networks that reveal functional features shared or specific to multiple gene lists. ToppCluster uses hypergeometric tests to obtain list-specific feature enrichment P-values for currently 17 categories of annotations of human-ortholog genes, and provides user-selectable cutoffs and multiple testing correction methods to control false discovery. Each nameable gene list represents a column input to a resulting matrix whose rows are overrepresented features, and individual cells per-list P-values and corresponding genes per feature. ToppCluster provides users with choices of tabular outputs, hierarchical clustering and heatmap generation, or the ability to interactively select features from the functional enrichment matrix to be transformed into XGMML or GEXF network format documents for use in Cytoscape or Gephi applications, respectively. Here, as example, we demonstrate the ability of ToppCluster to enable identification of list-specific phenotypic and regulatory element features (both cis-elements and 3′UTR microRNA binding sites) among tissue-specific gene lists. ToppCluster’s functionalities enable the identification of specialized biological functions and regulatory networks and systems biology-based dissection of biological states. ToppCluster can be accessed freely at http://toppcluster.cchmc.org.

[1]  M. Cleary,et al.  Pbx1 regulates nephrogenesis and ureteric branching in the developing kidney. , 2003, Developmental biology.

[2]  Padhraic Smyth,et al.  Analysis and Visualization of Network Data using JUNG , 2005 .

[3]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[4]  Brad T. Sherman,et al.  DAVID: Database for Annotation, Visualization, and Integrated Discovery , 2003, Genome Biology.

[5]  Joaquín Dopazo,et al.  FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes , 2004, Bioinform..

[6]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[7]  E. Olson,et al.  MEF2: a transcriptional target for signaling pathways controlling skeletal muscle growth and differentiation. , 1999, Current opinion in cell biology.

[8]  Purvesh Khatri,et al.  Ontological analysis of gene expression data: current tools, limitations, and open problems , 2005, Bioinform..

[9]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[10]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[11]  M. Pontoglio,et al.  Hepatocyte nuclear factor 1, a transcription factor at the crossroads of glucose homeostasis. , 2000, Journal of the American Society of Nephrology : JASN.

[12]  Yves Gibon,et al.  PageMan: An interactive ontology tool to generate, display, and annotate overview graphs for profiling experiments , 2006, BMC Bioinformatics.

[13]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[14]  J. Miano,et al.  Serum response factor: toggling between disparate programs of gene expression. , 2003, Journal of molecular and cellular cardiology.

[15]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Jing Chen,et al.  Improved human disease candidate gene prioritization using mouse phenotype , 2007, BMC Bioinformatics.

[17]  Maricel G. Kann,et al.  Advances in translational bioinformatics: computational approaches for the hunting of disease genes , 2010, Briefings Bioinform..

[18]  Jiang Qian,et al.  TiGER: A database for tissue-specific gene expression and regulation , 2008, BMC Bioinformatics.

[19]  E. Cantoni Analysis of Robust Quasi-deviances for Generalized Linear Models , 2004 .

[20]  Qizhi Yao,et al.  Profiling of 95 MicroRNAs in Pancreatic Cancer Cell Lines and Surgical Specimens by Real-Time PCR Analysis , 2009, World Journal of Surgery.

[21]  Mathieu Bastian,et al.  Gephi: An Open Source Software for Exploring and Manipulating Networks , 2009, ICWSM.

[22]  Qi Zheng,et al.  GOEAST: a web-based software toolkit for Gene Ontology enrichment analysis , 2008, Nucleic Acids Res..

[23]  Jeffrey E. Thatcher,et al.  Dysregulation of microRNAs after myocardial infarction reveals a role of miR-29 in cardiac fibrosis , 2008, Proceedings of the National Academy of Sciences.

[24]  M. Takiguchi,et al.  Chicken ovalbumin upstream promoter-transcription factor (COUP-TF) represses transcription from the promoter of the gene for ornithine transcarbamylase in a manner antagonistic to hepatocyte nuclear factor-4 (HNF-4). , 1993, The Journal of biological chemistry.

[25]  John N. Weinstein,et al.  High-Throughput GoMiner, an 'industrial-strength' integrative gene ontology tool for interpretation of multiple-microarray experiments, with application to studies of Common Variable Immune Deficiency (CVID) , 2005, BMC Bioinformatics.

[26]  R. Viger,et al.  Role of the GATA family of transcription factors in endocrine development, function, and disease. , 2008, Molecular endocrinology.

[27]  J. Mesirov,et al.  GenePattern 2.0 , 2006, Nature Genetics.

[28]  Gavin Kelsey,et al.  Characterization of the mouse HNF-4 gene and its expression during mouse embryogenesis , 1994, Mechanisms of Development.

[29]  Hedi Peterson,et al.  g:Profiler—a web-based toolset for functional profiling of gene lists from large-scale experiments , 2007, Nucleic Acids Res..

[30]  Alok J. Saldanha,et al.  Java Treeview - extensible visualization of microarray data , 2004, Bioinform..