Utilizing Passage-Based Language Models for Document Retrieval

We show that several previously proposed passage-based document ranking principles, along with some new ones, can be derived from the same probabilistic model. We use language models to instantiate specific algorithms, and propose a passage language model that integrates information from the ambient document to an extent controlled by the estimated document homogeneity. Several document-homogeneity measures that we propose yield passage language models that are more effective than the standard passage model for basic document retrieval and for constructing and utilizing passage-based relevance models; the latter outperform a document-based relevance model. We also show that the homogeneity measures are effective means for integrating document-query and passage-query similarity information for document retrieval.

[1]  Dell Zhang,et al.  A Language Modeling Approach to Passage Question Answering , 2003, TREC.

[2]  W. Bruce Croft,et al.  Answer Passage Retrieval for Question Answering , 2003 .

[3]  Carmel Domshlak,et al.  Better than the real thing?: iterative pseudo-query processing using cluster-based language models , 2005, SIGIR '05.

[4]  James Allan,et al.  Automatic Query Expansion Using SMART: TREC 3 , 1994, TREC.

[5]  James Allan,et al.  Approaches to passage retrieval in full text information systems , 1993, SIGIR.

[6]  James Allan,et al.  Passage Retrieval and Evaluation , 2005 .

[7]  Oren Kurland,et al.  PageRank without hyperlinks: structural re-ranking using links induced by language models , 2005, SIGIR '05.

[8]  Justin Zobel,et al.  Passage retrieval revisited , 1997, SIGIR '97.

[9]  Peter Schäuble,et al.  Document and passage retrieval based on hidden Markov models , 1994, SIGIR '94.

[10]  Justin Zobel,et al.  Effective ranking with arbitrary passages , 2001 .

[11]  W. Bruce Croft,et al.  Passage retrieval based on language models , 2002, CIKM '02.

[12]  Fernando Diaz,et al.  UMass at TREC 2004: Novelty and HARD , 2004, TREC.

[13]  James P. Callan,et al.  Passage-level evidence in document retrieval , 1994, SIGIR '94.

[14]  ChengXiang Zhai,et al.  UIUC in HARD 2004--Passage Retrieval Using HMMs , 2004, TREC.

[15]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[16]  W. Bruce Croft,et al.  A language modeling approach to information retrieval , 1998, SIGIR '98.

[17]  Munawar Hussain,et al.  Language Modeling Based Passage Retrieval for Question Answering Systems , 2005 .

[18]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[19]  James P. Callan,et al.  Hierarchical Language Models for XML Component Retrieval , 2004, INEX.

[20]  Patrick Gallinari,et al.  HMM-based passage models for document classification and ranking , 2001 .

[21]  Ross Wilkinson,et al.  Effective retrieval of structured documents , 1994, SIGIR '94.

[22]  W. Bruce Croft,et al.  A Translation Model for Sentence Retrieval , 2005, HLT.

[23]  Oren Kurland,et al.  Corpus structure, language models, and ad hoc information retrieval , 2004, SIGIR '04.

[24]  Victor Lavrenko,et al.  A Generative Theory of Relevance , 2008, The Information Retrieval Series.

[25]  Wei-Ying Ma,et al.  Block-based web search , 2004, SIGIR '04.

[26]  Jaap Kamps,et al.  The Effect of Structured Queries and Selective Indexing on XML Retrieval , 2005, INEX.