Passage retrieval based on language models

Previous research has shown that passage-level evidence can bring added benefits to document retrieval when documents are long or span different subject areas. Recent developments in language modeling approach to IR provided a new effective alternative to traditional retrieval models. These two streams of research motivate us to examine the use of passages in a language model framework. This paper reports on experiments using passages in a simple language model and a relevance model, and compares the results with document-based retrieval. Results from the INQUERY search engine, which is not based on a language modeling approach, are also given for comparison. Test data include two heterogeneous and one homogeneous document collections. Our experiments show that passage retrieval is feasible in the language modeling context, and more importantly, it can provide more reliable performance than retrieval based on full documents.

[1]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[2]  Panos Constantopoulos,et al.  Research and Advanced Technology for Digital Libraries , 2001, Lecture Notes in Computer Science.

[3]  John D. Lafferty,et al.  Information Retrieval as Statistical Translation , 2017 .

[4]  Richard M. Schwartz,et al.  A hidden Markov model information retrieval system , 1999, SIGIR '99.

[5]  Justin Zobel,et al.  Passage retrieval revisited , 1997, SIGIR '97.

[6]  James P. Callan,et al.  Passage-level evidence in document retrieval , 1994, SIGIR '94.

[7]  Justin Zobel,et al.  Effective ranking with arbitrary passages , 2001 .

[8]  W. Bruce Croft,et al.  Inference networks for document retrieval , 1989, SIGIR '90.

[9]  James Allan,et al.  Automatic Query Expansion Using SMART: TREC 3 , 1994, TREC.

[10]  W. Bruce Croft,et al.  A general language model for information retrieval (poster abstract) , 1999, SIGIR '99.

[11]  W. Bruce Croft,et al.  Text Segmentation by Topic , 1997, ECDL.

[12]  W. Bruce Croft,et al.  A language modeling approach to information retrieval , 1998, SIGIR '98.

[13]  Gerard Salton,et al.  Automatic Text Decomposition and Structuring , 1994, Inf. Process. Manag..

[14]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[15]  W. Bruce Croft,et al.  Cross-lingual relevance models , 2002, SIGIR '02.

[16]  Djoerd Hiemstra,et al.  A Linguistically Motivated Probabilistic Model of Information Retrieval , 1998, ECDL.

[17]  Gerard Salton,et al.  Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[18]  James Allan,et al.  Approaches to passage retrieval in full text information systems , 1993, SIGIR.

[19]  Alistair Moffat,et al.  Efficient Retrieval of Partial Documents , 1995, Inf. Process. Manag..

[20]  W. Bruce Croft,et al.  A general language model for information retrieval , 1999, CIKM '99.

[21]  Christian Plaunt,et al.  Subtopic structuring for full-length document access , 1993, SIGIR.

[22]  James Allan,et al.  Relevance feedback with too much data , 1995, SIGIR '95.

[23]  Marti A. Hearst TextTiling: A Quantitative Approach to Discourse , 1993 .

[24]  R. Rosenfeld,et al.  Two decades of statistical language modeling: where do we go from here? , 2000, Proceedings of the IEEE.