Relevance Ranking Based on Query-Aware Context Analysis

Word mismatch between queries and documents is a long-standing challenge in information retrieval. Recent advances in distributed word representations address the word mismatch problem by enabling semantic matching. However, most existing models rank documents based on semantic matching between query and document terms without an explicit understanding of the relationship of the match to relevance. To consider semantic matching between query and document, we propose an unsupervised semantic matching model by simulating a user who makes relevance decisions. The primary goal of the proposed model is to combine the exact and semantic matching between query and document terms, which has been shown to produce effective performance in information retrieval. As semantic matching between queries and entire documents is computationally expensive, we propose to use local contexts of query terms in documents for semantic matching. Matching with smaller query-related contexts of documents stems from the relevance judgment process recorded by human observers. The most relevant part of a document is then recognized and used to rank documents with respect to the query. Experimental results on several representative retrieval models and standard datasets show that our proposed semantic matching model significantly outperforms competitive baselines in all measures.

[1]  Azadeh Shakery,et al.  Iterative Estimation of Document Relevance Score for Pseudo-Relevance Feedback , 2017, ECIR.

[2]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[3]  Nick Craswell,et al.  Learning to Match using Local and Distributed Representations of Text for Web Search , 2016, WWW.

[4]  Kam-Fai Wong,et al.  A retrospective study of a hybrid document-context based retrieval model , 2007, Inf. Process. Manag..

[5]  ChengXiang Zhai,et al.  A comparative study of methods for estimating query language models with pseudo feedback , 2009, CIKM.

[6]  W. Bruce Croft,et al.  Quary Expansion Using Local and Global Document Analysis , 1996, SIGIR Forum.

[7]  Tao Tao,et al.  Diagnostic Evaluation of Information Retrieval Models , 2011, TOIS.

[8]  Karen Sparck Jones Automatic keyword classification for information retrieval , 1971 .

[9]  Ellen M. Voorhees,et al.  Query expansion using lexical-semantic relations , 1994, SIGIR '94.

[10]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[11]  Mandar Mitra,et al.  Word Embedding based Generalized Language Model for Information Retrieval , 2015, SIGIR.

[12]  Tao Tao,et al.  An exploration of proximity measures in information retrieval , 2007, SIGIR.

[13]  Kevyn Collins-Thompson,et al.  Reducing the risk of query expansion via robust constrained optimization , 2009, CIKM.

[14]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[15]  Azadeh Shakery,et al.  Term Proximity Constraints for Pseudo-Relevance Feedback , 2017, SIGIR.

[16]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[17]  Marie-Francine Moens,et al.  Monolingual and Cross-Lingual Information Retrieval Models Based on (Bilingual) Word Embeddings , 2015, SIGIR.

[18]  Éric Gaussier,et al.  A Theoretical Analysis of Pseudo-Relevance Feedback Models , 2013, ICTIR.

[19]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[20]  Oren Kurland,et al.  Query Expansion Using Word Embeddings , 2016, CIKM.

[21]  Jiaul H. Paik A novel TF-IDF weighting scheme for effective ranking , 2013, SIGIR.

[22]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[23]  Tao Tao,et al.  A formal study of information retrieval heuristics , 2004, SIGIR '04.

[24]  Azadeh Shakery,et al.  Axiomatic Analysis for Improving the Log-Logistic Feedback Model , 2016, SIGIR.

[25]  Nick Craswell,et al.  Query Expansion with Locally-Trained Word Embeddings , 2016, ACL.

[26]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[27]  Yiqun Liu,et al.  Understanding Reading Attention Distribution during Relevance Judgement , 2018, CIKM.

[28]  Azadeh Shakery,et al.  Deep Neural Networks for Query Expansion using Word Embeddings , 2018, ECIR.

[29]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[30]  Jean-Pierre Chevallet,et al.  A Comparison of Deep Learning Based Query Expansion with Pseudo-Relevance Feedback and Mutual Information , 2016, ECIR.

[31]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[32]  W. Bruce Croft,et al.  A Language Modeling Approach to Information Retrieval , 1998, SIGIR Forum.

[33]  W. Bruce Croft,et al.  LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[34]  Charles Elkan,et al.  Latent semantic indexing (LSI) fails for TREC collections , 2011, SKDD.

[35]  Éric Gaussier,et al.  Information-based models for ad hoc IR , 2010, SIGIR '10.

[36]  James P. Callan,et al.  Learning to Reweight Terms with Distributed Representations , 2015, SIGIR.

[37]  Yiqun Liu,et al.  Teach Machine How to Read: Reading Behavior Inspired Relevance Estimation , 2019, SIGIR.

[38]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[39]  W. Bruce Croft,et al.  Embedding-based Query Language Models , 2016, ICTIR.

[40]  Azadeh Shakery,et al.  Theoretical Analysis of Interdependent Constraints in Pseudo-Relevance Feedback , 2018, SIGIR.