Expanded information retrieval using full-text searching

The value of full text for expanding information retrieval was examined. Two full-text databases were used: Textpresso for neuroscience and ScienceDirect. Queries representing different categories were used to search different text fields (titles, abstracts, full text and, where possible, keywords). Searching the full-text field relative to the commonly used abstracts field increases retrievals by one or more orders of magnitude, depending on the categories selected. For phenomena-type categories (e.g. blood flow, thermodynamic equilibrium, etc.), retrievals are enhanced by about an order of magnitude. For infrastructure-type categories (e.g. equipment types, sponsors, suppliers, databases, etc.), retrievals are enhanced by well over an order of magnitude, and sometimes multiple orders of magnitude. Use of combination terms along with proximity specification capability is a very powerful feature for retrieving relevant records from full-text searching, and can be useful for applications like literature-related discovery.

[1]  Ronald N. Kostoff,et al.  Brief Communication Adjacency and proximity searching in the Science Citation Index and Google , 2006, J. Inf. Sci..

[2]  Miguel A. Andrade-Navarro,et al.  Information extraction from full text scientific articles: Where are the keywords? , 2003, BMC Bioinformatics.

[3]  Daniel C. Berrios Automated indexing for full text information retrieval , 2000, AMIA.

[4]  Dietrich Rebholz-Schuhmann,et al.  Facilitating the development of controlled vocabularies for metabolomics technologies with text mining , 2008, BMC Bioinformatics.

[5]  Hans-Michael Müller,et al.  Textpresso for Neuroscience: Searching the Full Text of Thousands of Neuroscience Research Papers , 2008, Neuroinformatics.

[6]  Andre Skusa,et al.  Extraction of biological interaction networks from scientific literature , 2005, Briefings Bioinform..

[7]  Sung Kim,et al.  A Hybrid Information Retrieval Model Using Metadata and Text , 2005, ICADL.

[8]  Claudia V. Goldman,et al.  PHIRST: A distributed architecture for P2P information retrieval , 2009, Inf. Syst..

[9]  Sally Hopewell,et al.  Better reporting of randomized trials in biomedical journal and conference abstracts , 2008, J. Inf. Sci..

[10]  Martijn J. Schuemie,et al.  Distribution of information in biomedical abstracts and full-text publications , 2004, Bioinform..

[11]  Ronald N. Kostoff,et al.  Literature-Related Discovery (LRD): Introduction and background , 2008 .

[12]  Hans-Michael Müller,et al.  Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature , 2004, PLoS biology.

[13]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[14]  E. Michael Keen Some aspects of proximity searching in text retrieval systems , 1992, J. Inf. Sci..

[15]  Jeffrey Beall The Weaknesses of Full-Text Searching. , 2008 .

[16]  Neil R. Smalheiser,et al.  The Arrowsmith Project: 2005 Status Report , 2005, ALT.