BeeSpace Navigator: exploratory analysis of gene function using semantic indexing of biological literature

With the rapid decrease in cost of genome sequencing, the classification of gene function is becoming a primary problem. Such classification has been performed by human curators who read biological literature to extract evidence. BeeSpace Navigator is a prototype software for exploratory analysis of gene function using biological literature. The software supports an automatic analogue of the curator process to extract functions, with a simple interface intended for all biologists. Since extraction is done on selected collections that are semantically indexed into conceptual spaces, the curation can be task specific. Biological literature containing references to gene lists from expression experiments can be analyzed to extract concepts that are computational equivalents of a classification such as Gene Ontology, yielding discriminating concepts that differentiate gene mentions from other mentions. The functions of individual genes can be summarized from sentences in biological literature, to produce results resembling a model organism database entry that is automatically computed. Statistical frequency analysis based on literature phrase extraction generates offline semantic indexes to support these gene function services. The website with BeeSpace Navigator is free and open to all; there is no login requirement at www.beespace.illinois.edu for version 4. Materials from the 2010 BeeSpace Software Training Workshop are available at www.beespace.illinois.edu/bstwmaterials.php.

[1]  Bruce R. Schatz Building analysis environments: Beyond the genome and the web , 2002 .

[2]  T. Buza,et al.  Gene Ontology annotation quality analysis in model eukaryotes , 2008, Nucleic acids research.

[3]  Xin He,et al.  Generating gene summaries from biomedical literature: A study of semi-structured summarization , 2007, Inf. Process. Manag..

[4]  Matthew E Hudson,et al.  Wasp Gene Expression Supports an Evolutionary Link Between Maternal Behavior and Eusociality , 2007, Science.

[5]  Steffen Staab,et al.  Building Analysis Environments : Beyond the Genome and the Web , 2004 .

[6]  Hsinchun Chen,et al.  A Concept Space Approach to Addressing the Vocabulary Problem in Scientific Information Retrieval: An Experiment on the Worm Community System , 1997, J. Am. Soc. Inf. Sci..

[7]  Xu Ling,et al.  Mining multi-faceted overviews of arbitrary topics in a text collection , 2008, KDD.

[8]  Monica Chagoyen,et al.  Quantifying the biological significance of gene ontology biological processes - implications for the analysis of systems-wide data , 2010, Bioinform..

[9]  Bruce R. Schatz,et al.  The Interspace: Concept Navigation Across Distributed Communities , 2002, Computer.

[10]  Madeline A. Crosby,et al.  FlyBase: genes and gene models , 2004, Nucleic Acids Res..

[11]  ChengXiang Zhai,et al.  An empirical study of tokenization strategies for biomedical information retrieval , 2007, Information Retrieval.

[12]  Xin He,et al.  BSQA: integrated text mining using entity relation semantics extracted from biological literature of insects , 2010, Nucleic Acids Res..

[13]  Ted Briscoe,et al.  Integrating Natural Language Processing with Flybase Curation , 2006, Pacific Symposium on Biocomputing.

[14]  Xin He,et al.  Automatically Generating Gene Summaries from Biomedical Literature , 2005, Pacific Symposium on Biocomputing.

[15]  Bruce R. Schatz,et al.  Semantic indexing for a complete subject discipline , 1999, DL '99.

[16]  Gene E Robinson,et al.  Species differences in brain gene expression profiles associated with adult behavioral maturation in honey bees , 2007, BMC Genomics.

[17]  Xin He,et al.  Identifying overrepresented concepts in gene lists from literature: a statistical approach based on Poisson mixture model , 2009, BMC Bioinformatics.

[18]  I. Benjamin,et al.  Embryonic development: Maternal effect of Hsf1 on reproductive success , 2000, Nature.

[19]  B Marshall,et al.  Gene Ontology Consortium: The Gene Ontology (GO) database and informatics resource , 2004, Nucleic Acids Res..

[20]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[21]  Bruce R. Schatz,et al.  Document clustering using small world communities , 2007, JCDL '07.