GREAT: Gene Regulation EvAluation Tool

Our understanding of biological systems is highly dependent on the study of the mechanisms that regulate genetic expression. In this paper we present a tool to evaluate scientific papers that potentially describe Saccharomyces cerevisiae gene regulations, following the identification of transcription factors in abstracts using text mining techniques. GREAT evaluates the probability of a given gene-transcription factor pair corresponding to a gene regulation based on data retrieved from public biological databases.

[1]  J. R. Quinlan Learning With Continuous Classes , 1992 .

[2]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[3]  Julio Collado-Vides,et al.  RegulonDB (version 6.0): gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated promoters and Textpresso navigation , 2007, Nucleic Acids Res..

[4]  Catia Pesquita,et al.  ProteInOn: A Web Tool for Protein Semantic Similarity , 2007 .

[5]  J. Thevelein,et al.  Osmotic Stress-Induced Gene Expression in Saccharomyces cerevisiae Requires Msn1p and the Novel Nuclear Factor Hot1p , 1999, Molecular and Cellular Biology.

[6]  Ian H. Witten,et al.  Induction of model trees for predicting continuous classes , 1996 .

[7]  D. Rebholz-Schuhmann,et al.  Facts from Text—Is Text Mining Ready to Deliver? , 2005, PLoS biology.

[8]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[9]  Aidong Zhang,et al.  Semantic integration to identify overlapping functional modules in protein interaction networks , 2007, BMC Bioinformatics.

[10]  Christian Blaschke,et al.  Identifying bioentity recognition errors of rule-based text-mining systems , 2008, 2008 Third International Conference on Digital Information Management.

[11]  Meng Chen,et al.  Multiple Basic Helix-Loop-Helix Proteins Regulate Expression of the ENO1 Gene of Saccharomyces cerevisiae , 2007, Eukaryotic Cell.

[12]  Ibrahim Emam,et al.  ArrayExpress update—from an archive of functional genomics experiments to the atlas of gene expression , 2008, Nucleic Acids Res..

[13]  M. Stam,et al.  Plant Methods BioMed Central Methodology Chromatin immunoprecipitation: optimization, quantitative , 2007 .

[14]  Mário J. Silva,et al.  Mining the BioLiterature: towards automatic annotation of genes and proteins , 2006 .

[15]  Lars Juhl Jensen,et al.  Large-scale extraction of gene regulation for model organisms in an ontological context , 2004, Silico Biol..

[16]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information , 2021, Nucleic Acids Res..

[17]  A. Valencia,et al.  Linking genes to literature: text mining, information extraction, and retrieval applications for biology , 2008, Genome Biology.

[18]  O. Sheils,et al.  Nucleic acid microarrays: an overview , 2003 .

[19]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[20]  Dietrich Rebholz-Schuhmann,et al.  How Feasible and Robust is the Automatic Extraction of Gene Regulation Events? A Cross-Method Evaluation under Lab and Real-Life Conditions , 2009, BioNLP@HLT-NAACL.

[21]  Phillip W. Lord,et al.  Semantic Similarity in Biomedical Ontologies , 2009, PLoS Comput. Biol..

[22]  Catia Pesquita,et al.  Metrics for GO based protein semantic similarity: a systematic evaluation , 2008, BMC Bioinformatics.

[23]  Pooja Jain,et al.  The YEASTRACT database: a tool for the analysis of transcription regulatory associations in Saccharomyces cerevisiae , 2005, Nucleic Acids Res..

[24]  Ron Edgar,et al.  Gene Expression Omnibus ( GEO ) : Microarray data storage , submission , retrieval , and analysis , 2008 .

[25]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[26]  M. Tyers,et al.  From genomics to proteomics , 2003, Nature.

[27]  Cátia Pesquita Improving semantic similarity for proteins based on the gene ontology , 2007 .

[28]  Robert C. Berwick,et al.  Wordnet. an Electronic Lexical Database. Edited by Christiane Fellbaum, with a Preface By , 2003 .

[29]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[30]  David Botstein,et al.  SGD: Saccharomyces Genome Database , 1998, Nucleic Acids Res..

[31]  Rachael P. Huntley,et al.  The GOA database in 2009—an integrated Gene Ontology Annotation resource , 2008, Nucleic Acids Res..

[32]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[33]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[34]  Steven P. Abney Partial parsing via finite-state cascades , 1996, Natural Language Engineering.