BioEve Search: A Novel Framework to Facilitate Interactive Literature Search

Background. Recent advances in computational and biological methods in last two decades have remarkably changed the scale of biomedical research and with it began the unprecedented growth in both the production of biomedical data and amount of published literature discussing it. An automated extraction system coupled with a cognitive search and navigation service over these document collections would not only save time and effort, but also pave the way to discover hitherto unknown information implicitly conveyed in the texts. Results. We developed a novel framework (named “BioEve”) that seamlessly integrates Faceted Search (Information Retrieval) with Information Extraction module to provide an interactive search experience for the researchers in life sciences. It enables guided step-by-step search query refinement, by suggesting concepts and entities (like genes, drugs, and diseases) to quickly filter and modify search direction, and thereby facilitating an enriched paradigm where user can discover related concepts and keywords to search while information seeking. Conclusions. The BioEve Search framework makes it easier to enable scalable interactive search over large collection of textual articles and to discover knowledge hidden in thousands of biomedical literature articles with ease.

[1]  Toshihisa Takagi,et al.  Automated extraction of information on protein-protein interactions from the biological literature , 2001, Bioinform..

[2]  David R. Karger,et al.  Magnet: supporting navigation in semistructured data environments , 2005, SIGMOD '05.

[3]  Michael Krauthammer,et al.  GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles , 2001, ISMB.

[4]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[5]  Hamish Cunningham,et al.  Information Extraction, Automatic , 2006 .

[6]  Hsinchun Chen,et al.  A shallow parser based on closed-class words to capture relations in biomedical text , 2003, J. Biomed. Informatics.

[7]  Miguel A. Andrade-Navarro,et al.  Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions , 1999, ISMB.

[8]  Jun'ichi Tsujii,et al.  New challenges for text mining: mapping between text and manually curated pathways , 2008, BMC Bioinformatics.

[9]  Michael Schroeder,et al.  Inter-species normalization of gene mentions with GNAT , 2008, ECCB.

[10]  Andrei Mikheev,et al.  A Workbench for Finding Structure in Texts , 1997, ANLP.

[11]  Ryen W. White,et al.  Supporting exploratory search , 2006 .

[12]  Michael Krauthammer,et al.  GeneWays: a system for extracting, analyzing, visualizing, and integrating molecular pathway data , 2004, J. Biomed. Informatics.

[13]  Roni Rosenfeld,et al.  Learning Hidden Markov Model Structure for Information Extraction , 1999 .

[14]  Zhiyong Lu,et al.  OpenDMAP: An open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression , 2008, BMC Bioinformatics.

[15]  Rohit J. Kate,et al.  Comparative experiments on learning information extractors for proteins and their interactions , 2005, Artif. Intell. Medicine.

[16]  Michael Schroeder,et al.  GoPubMed: exploring PubMed with the Gene Ontology , 2005, Nucleic Acids Res..

[17]  Mark Craven,et al.  Constructing Biological Knowledge Bases by Extracting Information from Text Sources , 1999, ISMB.

[18]  A. Valencia,et al.  Overview of the protein-protein interaction annotation extraction task of BioCreative II , 2008, Genome Biology.

[19]  Jun Xu,et al.  Extracting biochemical interactions from MEDLINE using a link grammar parser , 2003, Proceedings. 15th IEEE International Conference on Tools with Artificial Intelligence.

[20]  Daniel Tunkelang,et al.  Faceted Search , 2009, Synthesis Lectures on Information Concepts, Retrieval, and Services.

[21]  Yi Zhang,et al.  Personalized interactive faceted search , 2008, WWW.

[22]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[23]  Sampo Pyysalo,et al.  A Dependency Parsing Approach to Biomedical Text Mining , 2008 .

[24]  Ulf Leser,et al.  ALIBABA: PubMed as a graph , 2006, Bioinform..

[25]  Oren Etzioni,et al.  A search engine for natural language applications , 2005, WWW '05.

[26]  Subbarao Kambhampati,et al.  Efficiently Executing Information Gathering Plans , 1998 .

[27]  Doug Downey,et al.  Methods for Domain-Independent Information Extraction from the Web: An Experimental Comparison , 2004, AAAI.

[28]  Dietrich Rebholz-Schuhmann,et al.  EBIMed - text crunching to gather facts for proteins from Medline , 2007, Bioinform..

[29]  Wendy G. Lehnert,et al.  Information extraction , 1996, CACM.

[30]  Doug Downey,et al.  KnowItNow: Fast, Scalable Information Extraction from the Web , 2005, HLT.

[31]  Jun'ichi Tsujii,et al.  Event Extraction from Biomedical Papers Using a Full Parser , 2000, Pacific Symposium on Biocomputing.

[32]  Raymond J. Mooney,et al.  Relational Learning of Pattern-Match Rules for Information Extraction , 1999, CoNLL.

[33]  Nicholas Kushmerick,et al.  Wrapper Induction for Information Extraction , 1997, IJCAI.

[34]  George Hripcsak,et al.  Gene symbol disambiguation using knowledge-based profiles , 2007, Bioinform..

[35]  Burr Settles,et al.  ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text , 2005 .

[36]  Sampo Pyysalo,et al.  Overview of BioNLP’09 Shared Task on Event Extraction , 2009, BioNLP@HLT-NAACL.

[37]  Sofia Stamou,et al.  Towards Faceted Search for Named Entity Queries , 2009, APWeb/WAIM Workshops.

[38]  Erik F. Tjong Kim Sang,et al.  Memory-Based Shallow Parsing , 2002, J. Mach. Learn. Res..

[39]  Ryen W. White,et al.  Supporting Exploratory Search, Introduction, Special Issue, Communications of the ACM , 2006 .

[40]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[41]  Lenhart K. Schubert Can we derive general world knowledge from texts , 2002 .

[42]  Wai-Tat Fu,et al.  Facilitating exploratory search by model-based navigational cues , 2010, IUI '10.

[43]  William B. Langdon,et al.  BioRAT: extracting biological information from full-length papers , 2004, Bioinform..

[44]  Ulf Leser,et al.  What makes a gene name? Named entity recognition in the biomedical literature , 2005, Briefings Bioinform..