Comparison of full-text searching to metadata searching for genes in two biomedical literature cohorts

Researchers have traditionally used bibliographic databases to search out information. Today, the full-text of resources is increasingly available for searching, and more researchers are performing full-text searches. This study compares differences in the number of articles discovered between metadata and full-text searches of the same literature cohort when searching for gene names in two biomedical literature domains. Three reviewers additionally ranked 100 articles in each domain. Significantly more articles were discovered via full-text searching; however, the precision of full-text searching also is significantly lower than that of metadata searching. Certain features of articles correlated with higher relevance ratings. A significant feature measured was the number of matches of the search term in the full-text of the article, with a larger number of matches having a statistically significant higher usefulness (i.e., relevance) rating. By using the number of hits of the search term in the full-text to rank the importance of the article, performance of full-text searching was improved so that both recall and precision were as good as or better than that for metadata searching. This suggests that full-text searching alone may be sufficient, and that metadata searching as a surrogate is not necessary. © 2007 Wiley Periodicals, Inc.

[1]  R M Pitkin,et al.  Accuracy of data in abstracts of published research articles. , 1999, JAMA.

[2]  Timothy B. Patrick,et al.  Comparing Frequency of Content-Bearing Words in Abstracts and Texts in Articles from Four Medical Journals: An Exploratory Study , 2001, MedInfo.

[3]  B. Hemminger,et al.  Information seeking behavior of academic scientists , 2007 .

[4]  Joyce A. Mitchell,et al.  The Medline/full-text research project. , 1991 .

[5]  William R Hersh,et al.  Enhancing access to the Bibliome: the TREC 2004 Genomics Track , 2006, Journal of biomedical discovery and collaboration.

[6]  C. D. Rosa,et al.  Perceptions of libraries and information resources , 2005 .

[7]  Jung-Hsien Chiang,et al.  MeKE: Discovering the Functions of Gene Products from Biomedical Literature Via Sentence Alignment , 2003, Bioinform..

[8]  Joel D. Martin,et al.  Getting to the (c)ore of knowledge: mining biomedical literature , 2002, Int. J. Medical Informatics.

[9]  Hans-Michael Müller,et al.  Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature , 2004, PLoS biology.

[10]  M. E. Maron,et al.  An evaluation of retrieval effectiveness for a full-text document-retrieval system , 1985, CACM.

[11]  Don R. Swanson,et al.  Two medical literatures that are logically but not bibliographically connected , 1987, J. Am. Soc. Inf. Sci..

[12]  D. Swanson Medical literature as a potential source of new knowledge. , 1990, Bulletin of the Medical Library Association.

[13]  Alexander A. Morgan,et al.  Evaluation of text data mining for database curation: lessons learned from the KDD Challenge Cup , 2003, ISMB.

[14]  Joel D. Martin,et al.  PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine , 2003, BMC Bioinformatics.

[15]  Gerard Salton,et al.  Automatic text analysis. , 1970 .

[16]  Gerard Salton,et al.  Automatic Information Organization And Retrieval , 1968 .

[17]  L Hunter,et al.  MedMiner: an Internet text-mining tool for biomedical information, with application to gene expression profiling. , 1999, BioTechniques.

[18]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[19]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[20]  Hagit Shatkay,et al.  Mining the Biomedical Literature in the Genomic Era: An Overview , 2003, J. Comput. Biol..

[21]  David Hawking,et al.  Does topic metadata help with Web search , 2007 .

[22]  C. J. van Rijsbergen,et al.  Report on the need for and provision of an 'ideal' information retrieval test collection , 1975 .

[23]  Don R. Swanson,et al.  Searching Natural Language Text by Computer , 1960 .

[24]  Limsoon Wong,et al.  Accomplishments and challenges in literature data mining for biology , 2002, Bioinform..

[25]  Amanda Spink,et al.  Real life information retrieval: a study of user queries on the Web , 1998, SIGF.

[26]  Carol Tenopir Retrieval performance in a full text journal article database , 1984 .