GOAL: A software tool for assessing biological significance of genes groups

BackgroundModern high throughput experimental techniques such as DNA microarrays often result in large lists of genes. Computational biology tools such as clustering are then used to group together genes based on their similarity in expression profiles. Genes in each group are probably functionally related. The functional relevance among the genes in each group is usually characterized by utilizing available biological knowledge in public databases such as Gene Ontology (GO), KEGG pathways, association between a transcription factor (TF) and its target genes, and/or gene networks.ResultsWe developed GOAL: G ene O ntology A naL yzer, a software tool specifically designed for the functional evaluation of gene groups. GOAL implements and supports efficient and statistically rigorous functional interpretations of gene groups through its integration with available GO, TF-gene association data, and association with KEGG pathways. In order to facilitate more specific functional characterization of a gene group, we implement three GO-tree search strategies rather than one as in most existing GO analysis tools. Furthermore, GOAL offers flexibility in deployment. It can be used as a standalone tool, a plug-in to other computational biology tools, or a web server application.ConclusionWe developed a functional evaluation software tool, GOAL, to perform functional characterization of a gene group. GOAL offers three GO-tree search strategies and combines its strength in function integration, portability and visualization, and its flexibility in deployment. Furthermore, GOAL can be used to evaluate and compare gene groups as the output from computational biology tools such as clustering algorithms.

[1]  Panayiotis V. Benos,et al.  Extracting biologically significant patterns from short time series gene expression data , 2009, BMC Bioinformatics.

[2]  Joaquín Dopazo,et al.  BABELOMICS: a systems biology perspective in the functional annotation of genome-scale experiments , 2006, Nucleic Acids Res..

[3]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[4]  Purvesh Khatri,et al.  Ontological analysis of gene expression data: current tools, limitations, and open problems , 2005, Bioinform..

[5]  N. H. Shah,et al.  CLENCH: a program for calculating Cluster ENriCHment using the Gene Ontology , 2004, Bioinform..

[6]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[7]  M. Tyers,et al.  A dynamic transcriptional network communicates growth potential to ribosome synthesis and critical cell size. , 2004, Genes & development.

[8]  Midori A. Harris,et al.  BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/btm112 Databases and ontologies OBO-Edit—an ontology editor for biologists , 2007 .

[9]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[10]  D. I. Hawkins,et al.  100 Statistical Tests , 1994 .

[11]  Robert B Goldberg,et al.  Genes directly regulated by LEAFY COTYLEDON2 provide insight into the control of embryo maturation and somatic embryogenesis. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Tanya Z. Berardini,et al.  The Arabidopsis Information Resource (TAIR): gene structure and function annotation , 2007, Nucleic Acids Res..

[13]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[14]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[15]  Chris Mungall,et al.  AmiGO: online access to ontology and annotation data , 2008, Bioinform..

[16]  Gopal Kanji,et al.  100 Statistical Tests , 1994 .

[17]  Peer Bork,et al.  KEGG Atlas mapping for global analysis of metabolic pathways , 2008, Nucleic Acids Res..

[18]  Ron Shamir,et al.  EXPANDER – an integrative program suite for microarray data analysis , 2005, BMC Bioinformatics.

[19]  Grier P. Page,et al.  Bioinformatic Tools for Inferring Functional Information from Plant Microarray Data II: Analysis Beyond Single Gene , 2008, International journal of plant genomics.

[20]  Martin Kuiper,et al.  BiNGO: a Cytoscape plugin to assess overrepresentation of Gene Ontology categories in Biological Networks , 2005, Bioinform..

[21]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[22]  Chris Sander,et al.  Characterizing gene sets with FuncAssociate , 2003, Bioinform..

[23]  Brad T. Sherman,et al.  DAVID: Database for Annotation, Visualization, and Integrated Discovery , 2003, Genome Biology.

[24]  Jian Zhang,et al.  LEAFY COTYLEDON1 Is a Key Regulator of Fatty Acid Biosynthesis in Arabidopsis1[C][W][OA] , 2008, Plant Physiology.

[25]  T. Speed,et al.  GOstat: find statistically overrepresented Gene Ontologies within a group of genes. , 2004, Bioinformatics.

[26]  I. Androulakis,et al.  Analysis of time-series gene expression data: methods, challenges, and opportunities. , 2007, Annual review of biomedical engineering.

[27]  J. Lieb,et al.  ChIP-chip: considerations for the design, analysis, and application of genome-wide chromatin immunoprecipitation experiments. , 2004, Genomics.

[28]  S. Baud,et al.  WRINKLED1 specifies the regulatory action of LEAFY COTYLEDON2 towards fatty acid metabolism during seed maturation in Arabidopsis. , 2007, The Plant journal : for cell and molecular biology.

[29]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[30]  Ahmed H. Tewfik,et al.  Biological evaluation of biclustering algorithms using Gene Ontology and chIP-chip data , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[31]  Holger Karas,et al.  TRANSFAC: a database on transcription factors and their DNA binding sites , 1996, Nucleic Acids Res..

[32]  Sieu Phan,et al.  Towards a temporal modeling of the genetic network controlling Systemic Acquired Resistance in Arabidopsis thaliana , 2010, 2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[33]  Nicola J. Rinaldi,et al.  Transcriptional Regulatory Networks in Saccharomyces cerevisiae , 2002, Science.

[34]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[35]  Wyeth W. Wasserman,et al.  JASPAR: an open-access database for eukaryotic transcription factor binding profiles , 2004, Nucleic Acids Res..

[36]  D. Botstein,et al.  Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF , 2001, Nature.

[37]  Guo-Zhang Wu,et al.  Shanghai RAPESEED Database: a resource for functional genomics studies of seed development and fatty acid metabolism of Brassica , 2007, Nucleic Acids Res..

[38]  Denis Thieffry,et al.  RegulonDB: a database on transcriptional regulation in Escherichia coli , 1998, Nucleic Acids Res..