FUNC: a package for detecting significant associations between gene sets and ontological annotations

BackgroundGenome-wide expression, sequence and association studies typically yield large sets of gene candidates, which must then be further analysed and interpreted. Information about these genes is increasingly being captured and organized in ontologies, such as the Gene Ontology. Relationships between the gene sets identified by experimental methods and biological knowledge can be made explicit and used in the interpretation of results. However, it is often difficult to assess the statistical significance of such analyses since many inter-dependent categories are tested simultaneously.ResultsWe developed the program package FUNC that includes and expands on currently available methods to identify significant associations between gene sets and ontological annotations. Implemented are several tests in particular well suited for genome wide sequence comparisons, estimates of the family-wise error rate, the false discovery rate, a sensitive estimator of the global significance of the results and an algorithm to reduce the complexity of the results.ConclusionFUNC is a versatile and useful tool for the analysis of genome-wide data. It is freely available under the GPL license and also accessible via a web service.

[1]  Kei-Hoi Cheung,et al.  Handling multiple testing while interpreting microarrays with the Gene Ontology Database , 2004, BMC Bioinformatics.

[2]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[3]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[4]  Jean L. Chang,et al.  Initial sequence of the chimpanzee genome and comparison with the human genome , 2005, Nature.

[5]  Hagai Bergman,et al.  Identifying subtle interrelated changes in functional gene categories using continuous measures of gene expression , 2005, Bioinform..

[6]  Doron Lancet,et al.  Human specific loss of olfactory receptor genes , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[7]  P. Khatri,et al.  Global functional profiling of gene expression ? ? This work was funded in part by a Sun Microsystem , 2003 .

[8]  David W. Mount,et al.  Pathway Miner: extracting gene association networks from molecular pathways for predicting the biological significance of gene expression microarray data , 2004, Bioinform..

[9]  May D. Wang,et al.  GoMiner: a resource for biological interpretation of genomic and proteomic data , 2003, Genome Biology.

[10]  D. Damian,et al.  Statistical concerns about the GSEA procedure , 2004, Nature Genetics.

[11]  S. S. Young,et al.  Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment , 1993 .

[12]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[13]  Huey-miin Hsueh,et al.  Comparison of Methods for Estimating the Number of True Null Hypotheses in Multiplicity Testing , 2003, Journal of biopharmaceutical statistics.

[14]  Chris Sander,et al.  Characterizing gene sets with FuncAssociate , 2003, Bioinform..

[15]  Andrew B. Nobel,et al.  Significance analysis of functional categories in gene expression studies: a structured permutation approach , 2005, Bioinform..

[16]  Paolo Gasparini,et al.  GOAL: automated Gene Ontology analysis of expression profiles , 2004, Nucleic Acids Res..

[17]  Y. Benjamini,et al.  Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics , 1999 .

[18]  Patrik Edén,et al.  Comparing Functional Annotation Analyses with Catmap Comparing Functional Annotation Analyses with Catmap , 2004 .

[19]  Ji Huang,et al.  [Serial analysis of gene expression]. , 2002, Yi chuan = Hereditas.

[20]  Bing Zhang,et al.  GOTree Machine (GOTM): a web-based platform for interpreting sets of interesting genes using Gene Ontology hierarchies , 2004, BMC Bioinformatics.

[21]  Justin C. Fay,et al.  Sequence divergence, functional constraint, and selection in protein evolution. , 2003, Annual review of genomics and human genetics.

[22]  John N. Weinstein,et al.  High-Throughput GoMiner, an 'industrial-strength' integrative gene ontology tool for interpretation of multiple-microarray experiments, with application to studies of Common Variable Immune Deficiency (CVID) , 2005, BMC Bioinformatics.

[23]  Yoav Benjamini,et al.  Identifying differentially expressed genes using false discovery rate controlling procedures , 2003, Bioinform..

[24]  Yongchao Ge Resampling-based Multiple Testing for Microarray Data Analysis , 2003 .

[25]  P. Park,et al.  Discovering statistically significant pathways in expression profiling studies. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Andrew Young,et al.  OntologyTraverser: an R package for GO analysis , 2005, Bioinform..

[27]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[28]  Francesco Pinciroli,et al.  GFINDer: Genome Function INtegrated Discoverer through dynamic annotation, statistical analysis, and mining , 2004, Nucleic Acids Res..

[29]  T. Speed,et al.  GOstat: find statistically overrepresented Gene Ontologies within a group of genes. , 2004, Bioinformatics.

[30]  Joaquín Dopazo,et al.  FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes , 2004, Bioinform..

[31]  K. Manly,et al.  Genomics, prior probability, and statistical tests of multiple hypotheses. , 2004, Genome research.

[32]  Thomas Lengauer,et al.  Improved scoring of functional groups from gene expression data by decorrelating GO graph structure , 2006, Bioinform..

[33]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[34]  C. V. Jongeneel,et al.  eVOC: a controlled vocabulary for unifying gene expression data. , 2003, Genome research.

[35]  Daniel L. Hartl,et al.  GeneMerge - Post-genomic Analysis, Data Mining, and Hypothesis Testing , 2003, Bioinform..

[36]  M. Kreitman,et al.  Adaptive protein evolution at the Adh locus in Drosophila , 1991, Nature.