Exploring Abbreviation Expansion for Genomic Information Retrieval

Abbreviations are commonly found instances of synonymy in Biomedical journal papers. Information retrieval systems that index paragraphs rather than full-text articles are more susceptible to term variation of this kind, since abbreviations are typically only defined once at the beginning of the text. One solution to this problem is to expand the user query automatically with all possible abbreviation instances for each query term. In this paper, we compare the effectiveness of two abbreviation expansion techniques on the TREC 2006 Genomics Track queries and collection. Our results show that for highly ambiguous abbreviations the query collocationeffect isn’t strong enough to deter the retrieval of erroneous passages. We conclude that full-text abbreviation resolution prior to passage indexing is the most appropriate approach to this problem.

[1]  Jimmy J. Lin,et al.  The role of knowledge in conceptual retrieval: a study in the domain of clinical medicine , 2006, SIGIR.

[2]  Yi Li,et al.  An empirical study of the effects of NLP components on Geographic IR performance , 2008, Int. J. Geogr. Inf. Sci..

[3]  Charles L. A. Clarke,et al.  Domain-Specific Synonym Expansion and Validation for Biomedical Information Retrieval (MultiText Experiments for TREC 2004) , 2004, TREC.

[4]  James Allan,et al.  HARD Track Overview in TREC 2003: High Accuracy Retrieval from Documents , 2003, TREC.

[5]  Mounia Lalmas,et al.  A survey on the use of relevance feedback for information access systems , 2003, The Knowledge Engineering Review.

[6]  W. Bruce Croft,et al.  Lexical ambiguity and information retrieval , 1992, TOIS.

[7]  Dolf Trieschnigg,et al.  The influence of basic tokenization on biomedical document retrieval , 2007, SIGIR.

[8]  Clement T. Yu,et al.  A Concept-Based Framework for Passage Retrieval at Genomics , 2006, TREC.

[9]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[10]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[11]  William R. Hersh,et al.  A comparative analysis of retrieval features used in the TREC 2006 Genomics Track passage retrieval task , 2007, AMIA.

[12]  Marti A. Hearst,et al.  TREC 2007 Genomics Track Overview , 2007, TREC.

[13]  Ellen M. Voorhees,et al.  The fifteenth text retrieval conference TREC 2006 , 2007 .

[14]  Neil R. Smalheiser,et al.  ADAM: another database of abbreviations in MEDLINE , 2006, Bioinform..

[15]  Clement T. Yu,et al.  Knowledge-intensive conceptual retrieval and passage extraction of biomedical literature , 2007, SIGIR.

[16]  Hoa Trang Dang,et al.  Overview of DUC 2006 , 2006 .