Utilizing inter-passage and inter-document similarities for re-ranking search results

We present a novel language-model-based approach to re-ranking an initially retrieved list so as to improve precision at top ranks. Our model integrates whole-document information with that induced from passages. Specifically, inter-passage, inter-document, and query-based similarities are integrated in our model. Empirical evaluation demonstrates the effectiveness of our approach.

[1]  W. Bruce Croft,et al.  Text Segmentation by Topic , 1997, ECDL.

[2]  Fernando Diaz,et al.  Regularizing ad hoc retrieval scores , 2005, CIKM '05.

[3]  Rada Mihalcea,et al.  Graph-based Ranking Algorithms for Sentence Extraction, Applied to Text Summarization , 2004, ACL.

[4]  James Allan,et al.  Automatic Query Expansion Using SMART: TREC 3 , 1994, TREC.

[5]  Oren Kurland,et al.  Utilizing Passage-Based Language Models for Document Retrieval , 2008, ECIR.

[6]  Guodong Zhou,et al.  Document re-ranking using cluster validation and label propagation , 2006, CIKM '06.

[7]  Robert Wing Pong Luk,et al.  A Generative Theory of Relevance , 2008, The Information Retrieval Series.

[8]  Dragomir R. Radev,et al.  LexPageRank: Prestige in Multi-Document Text Summarization , 2004, EMNLP.

[9]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[10]  Günes Erkan,et al.  Language Model-Based Document Clustering Using Random Walks , 2006, NAACL.

[11]  Xiaojun Wan,et al.  Towards a unified approach to document similarity search using manifold-ranking of blocks , 2008, Inf. Process. Manag..

[12]  Hua Li,et al.  Improving web search results using affinity graph , 2005, SIGIR '05.

[13]  Peter Willett Query-specific automatic document classification , 1985 .

[14]  Justin Zobel,et al.  Passage retrieval revisited , 1997, SIGIR '97.

[15]  Jong-Hyeok Lee,et al.  Completely-Arbitrary Passage Retrieval in Language Modeling Approach , 2008, AIRS.

[16]  Czeslaw Danilowicz,et al.  Re-ranking method based on inter-document distances , 2005, Inf. Process. Manag..

[17]  Oren Kurland,et al.  The opposite of smoothing: a language model approach to ranking query-specific document clusters , 2008, SIGIR '08.

[18]  James Allan,et al.  Relevance models for topic detection and tracking , 2002 .

[19]  Oren Kurland,et al.  Re-ranking search results using document-passage graphs , 2008, SIGIR '08.

[20]  Patrick Gallinari,et al.  HMM-based passage models for document classification and ranking , 2001 .

[21]  Oren Kurland,et al.  Utilizing inter-passage and inter-document similarities for reranking search results , 2010, ACM Trans. Inf. Syst..

[22]  W. Bruce Croft,et al.  A Translation Model for Sentence Retrieval , 2005, HLT.

[23]  W. Bruce Croft,et al.  Relevance Models in Information Retrieval , 2003 .

[24]  Luo Si,et al.  Discriminative probabilistic models for passage based retrieval , 2008, SIGIR '08.

[25]  Peter Schäuble,et al.  Document and passage retrieval based on hidden Markov models , 1994, SIGIR '94.

[26]  James P. Callan,et al.  Passage-level evidence in document retrieval , 1994, SIGIR '94.

[27]  Justin Zobel,et al.  Effective ranking with arbitrary passages , 2001, J. Assoc. Inf. Sci. Technol..

[28]  Fernando Diaz,et al.  UMass at TREC 2004: Novelty and HARD , 2004, TREC.

[29]  Ellen M. Voorhees,et al.  The Eighth Text REtrieval Conference (TREC-8) , 2000 .

[30]  ChengXiang Zhai,et al.  UIUC in HARD 2004--Passage Retrieval Using HMMs , 2004, TREC.

[31]  Oren Kurland,et al.  PageRank without hyperlinks: structural re-ranking using links induced by language models , 2005, SIGIR '05.

[32]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[33]  W. Bruce Croft,et al.  Cluster-based retrieval using language models , 2004, SIGIR '04.

[34]  W. Bruce Croft,et al.  Passage retrieval based on language models , 2002, CIKM '02.

[35]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[36]  Oren Kurland,et al.  Inter-Document Similiarities, Language Models, and Ad Hoc Information Retrieval , 2006 .

[37]  W. Bruce Croft,et al.  Evaluating Text Representations for Retrieval of the Best Group of Documents , 2008, ECIR.

[38]  Fernando Diaz,et al.  Improving the estimation of relevance models using large external corpora , 2006, SIGIR.

[39]  Ross Wilkinson,et al.  Effective retrieval of structured documents , 1994, SIGIR '94.

[40]  Oren Kurland,et al.  Respect my authority!: HITS without hyperlinks, utilizing cluster-based language models , 2006, SIGIR.

[41]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[42]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[43]  James Allan,et al.  Approaches to passage retrieval in full text information systems , 1993, SIGIR.

[44]  Wei-Ying Ma,et al.  Block-based web search , 2004, SIGIR '04.

[45]  Christian Plaunt,et al.  Subtopic structuring for full-length document access , 1993, SIGIR.

[46]  Munawar Hussain,et al.  Language Modeling Based Passage Retrieval for Question Answering Systems , 2005 .

[47]  Dragomir R. Radev,et al.  Using Random Walks for Question-focused Sentence Retrieval , 2005, HLT.

[48]  James Allan,et al.  Passage Retrieval and Evaluation , 2005 .

[49]  ChengXiang Zhai,et al.  A general optimization framework for smoothing language models on graph structures , 2008, SIGIR '08.