Expanding Queries Using Multiple Resources

We describe our participation in the TREC 2006 Genomics track, in which our main focus was on query expansion. We hypothesized that applying query expansion techniques would help us both to identify and retrieve synonymous terms, and to cope with ambiguity. To this end, we developed several collection-specific as well as online strategies. Our proposed methods yield a noticeable improvement in retrieval performance over the baseline. To counter the negative effects of query expansion on recall, we introduce conjunctive Boolean constraints on the query terms and added expansion terms. When these additional constraints are imposed, results improve even further. The improvements in our results are noticeable on the document, passage, as well as aspect level.

[1]  Marti A. Hearst Improving Full-Text Precision on Short Queries using Simple Constraints , 1996 .

[2]  Ellen M. Voorhees,et al.  Query expansion using lexical-semantic relations , 1994, SIGIR '94.

[3]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[4]  Jun'ichi Tsujii,et al.  GENIA corpus - a semantically annotated corpus for bio-textmining , 2003, ISMB.

[5]  Chris Buckley,et al.  Pivoted Document Length Normalization , 1996, SIGIR Forum.

[6]  Luo Si,et al.  York University at TREC 2007: Genomics Track , 2005, TREC.

[7]  Gilad Mishne,et al.  Boosting Web Retrieval through Query Operations , 2005, BNAIC.

[8]  Maarten de Rijke,et al.  Length normalization in XML retrieval , 2004, SIGIR '04.

[9]  Marti A. Hearst,et al.  A Simple Algorithm for Identifying Abbreviation Definitions in Biomedical Text , 2002, Pacific Symposium on Biocomputing.

[10]  Jaana Kekäläinen,et al.  The impact of query structure and query expansion on retrieval performance , 1998, SIGIR '98.

[11]  Djoerd Hiemstra,et al.  Using language models for information retrieval , 2001 .

[12]  Andrew B. Clegg,et al.  Evaluating and Integrating Treebank Parsers on a Biomedical Corpus , 2005, ACL 2005.

[13]  Xie Kanglin Lucene Search Engine , 2007 .

[14]  Djoerd Hiemstra,et al.  Bayesian extension to the language model for ad hoc information retrieval , 2003, SIGIR.

[15]  Djoerd Hiemstra,et al.  Twenty-One at TREC7: Ad-hoc and Cross-Language Track , 1998, TREC.

[16]  Chris Buckley Why current IR engines fail , 2004, SIGIR '04.

[17]  W. Bruce Croft,et al.  An exploratory analysis of phrases in text retrieval , 2000, RIAO.