BSQA: integrated text mining using entity relation semantics extracted from biological literature of insects

Text mining is one promising way of extracting information automatically from the vast biological literature. To maximize its potential, the knowledge encoded in the text should be translated to some semantic representation such as entities and relations, which could be analyzed by machines. But large-scale practical systems for this purpose are rare. We present BeeSpace question/answering (BSQA) system that performs integrated text mining for insect biology, covering diverse aspects from molecular interactions of genes to insect behavior. BSQA recognizes a number of entities and relations in Medline documents about the model insect, Drosophila melanogaster. For any text query, BSQA exploits entity annotation of retrieved documents to identify important concepts in different categories. By utilizing the extracted relations, BSQA is also able to answer many biologically motivated questions, from simple ones such as, which anatomical part is a gene expressed in, to more complex ones involving multiple types of relations. BSQA is freely available at http://www.beespace.uiuc.edu/QuestionAnswer.

[1]  Rachel B. Brem,et al.  Integrating large-scale functional genomic data to dissect the complexity of yeast regulatory networks , 2008, Nature Genetics.

[2]  D. Chaussabel,et al.  Mining microarray expression data by literature profiling , 2002, Genome Biology.

[3]  Sophia Ananiadou,et al.  Text mining and its potential applications in systems biology. , 2006, Trends in biotechnology.

[4]  A. McGregor,et al.  How to get ahead: the origin, evolution and function of bicoid , 2005, BioEssays : news and reviews in molecular, cellular and developmental biology.

[5]  Peer Bork,et al.  Extraction of regulatory gene/protein networks from Medline , 2006, Bioinform..

[6]  Hans-Michael Müller,et al.  Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature , 2004, PLoS biology.

[7]  Xin He,et al.  Automatically Generating Gene Summaries from Biomedical Literature , 2005, Pacific Symposium on Biocomputing.

[8]  Xin He,et al.  Identifying overrepresented concepts in gene lists from literature: a statistical approach based on Poisson mixture model , 2009, BMC Bioinformatics.

[9]  Carlos Santos,et al.  Data and text mining Wnt pathway curation using automated natural language processing : combining statistical methods with partial and full parse for knowledge extraction , 2005 .

[10]  Michael R. Seringhaus,et al.  Seeking a New Biology through Text Mining , 2008, Cell.

[11]  Toshihisa Takagi,et al.  Automated extraction of information on protein-protein interactions from the biological literature , 2001, Bioinform..

[12]  M. Sokolowski,et al.  Chaser (Csr), a new gene affecting larval foraging behavior in Drosophila melanogaster. , 1995, Genetics.

[13]  A. Valencia,et al.  A gene network for navigating the literature , 2004, Nature Genetics.

[14]  R. Strauss,et al.  Larval behavior of Drosophila central complex mutants: interactions between no bridge, foraging, and Chaser. , 1996, Journal of neurogenetics.

[15]  R. Hawley,et al.  The genetics and molecular biology of the synaptonemal complex. , 2004, Annual review of cell and developmental biology.

[16]  Gyunghee Lee,et al.  Hemolymph Sugar Homeostasis and Starvation-Induced Hyperactivity Affected by Genetic Manipulations of the Adipokinetic Hormone-Encoding Gene in Drosophila melanogaster , 2004, Genetics.

[17]  Alfonso Valencia,et al.  PLAN2L: a web tool for integrated text mining and literature-based bioentity relation extraction , 2009, Nucleic Acids Res..

[18]  Christian von Mering,et al.  STITCH: interaction networks of chemicals and proteins , 2007, Nucleic Acids Res..

[19]  Julie M. Sullivan,et al.  FlyMine: an integrated database for Drosophila and Anopheles genomics , 2007, Genome Biology.

[20]  M. Schuemie,et al.  Anni 2.0: a multipurpose text-mining tool for the life sciences , 2008, Genome Biology.

[21]  Peer Bork,et al.  Systematic Association of Genes to Phenotypes by Genome and Literature Mining , 2005, PLoS biology.

[22]  D. Karger,et al.  Bridging high-throughput genetic and transcriptional data reveals cellular responses to alpha-synuclein toxicity , 2009, Nature Genetics.