论文信息 - IIT TREC 2006: Genomics Track

IIT TREC 2006: Genomics Track

For the TREC-2006 Genomics Track, we report on the effectiveness of composite information retrieval functions based on a dimensional data model for improving document, passage, and aspect search precision of genomics literature. We designed an approach, and developed a corresponding search engine, based on a novel dimensional data model capable of document, paragraph, sentence, and passage level retrieval of genomics literature. By constructing a data warehouse style index with the flexibility of aggregating term statistics at multiple levels of document granularity, and incorporating key biological entities through shallow parsing of individual sentences, composite retrieval models combining multiple levels of contextual evidence can be efficiently developed to improve retrieval performance. The genomics track for 2006 measured document, passage, and aspect retrieval using 27 topics created by active biological researchers. Each topic fit within one of four question-oriented topic templates: the role of a gene in a disease, the effect of a gene on a biological process, how genes interact in organ function, and how mutations in genes influence biological processes. Documents for this task come from a corpus of 162,048 full-text biomedical articles. Each form of retrieval was measured with a variant of mean average precision (MAP). We submitted automatically generated results from three composite models to the TREC forum. All three models delivered results that significantly exceed the median results reported for the 2006 TREC Genomics track. The results of our best performing TREC model had MAP of 0.426 for document retrieval (53% above median), 0.055 for passage retrieval (129% above median), and 0.262 for aspect retrieval (125% above median).

Ophir Frieder | Nazli Goharian | Jay Urbain

[1] John D. Lafferty,et al. A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[2] Adwait Ratnaparkhi,et al. IBM's Statistical Question Answering System , 2000, TREC.

[3] Ophir Frieder,et al. Information Retrieval: Algorithms and Heuristics , 1998 .

[4] Ryen W. White,et al. Using top-ranking sentences to facilitate effective information access , 2005, J. Assoc. Inf. Sci. Technol..

[5] Stephen E. Robertson,et al. Okapi/Keenbow at TREC-8 , 1999, TREC.

[6] Jimmy J. Lin. The Role of Information Retrieval in Answering Complex Questions , 2006, ACL.

[7] Jimmy J. Lin,et al. Quantitative evaluation of passage retrieval algorithms for question answering , 2003, SIGIR.

[8] Djoerd Hiemstra,et al. Relating the new language models of information retrieval to the traditional retrieval models , 2000 .

[9] Justin Zobel,et al. Passage retrieval revisited , 1997, SIGIR '97.

[10] Justin Zobel,et al. Effective ranking with arbitrary passages , 2001 .

[11] M. F. Porter,et al. An algorithm for suffix stripping , 1997 .

[12] Ophir Frieder,et al. Integrating structured data and text: a relational approach , 1997 .

[13] James P. Callan,et al. Passage-level evidence in document retrieval , 1994, SIGIR '94.

[14] Salim Roukos,et al. IBM's Statistical Question Answering System-TREC 11 , 2001, TREC.

[15] Marti A. Hearst,et al. A Simple Algorithm for Identifying Abbreviation Definitions in Biomedical Text , 2002, Pacific Symposium on Biocomputing.