On the Equilibrium of Query Reformulation and Document Retrieval

In this paper, we study jointly query reformulation and document relevance estimation, the two essential aspects of information retrieval (IR). Their interactions are modelled as a two-player strategic game: one player, a query formulator, taking actions to produce the optimal query, is expected to maximize its own utility with respect to the relevance estimation of documents produced by the other player, a retrieval modeler; simultaneously, the retrieval modeler, taking actions to produce the document relevance scores, needs to optimize its likelihood from the training data with respect to the refined query produced by the query formulator. Their equilibrium or equilibria will be reached when both are the best responses to each other. We derive our equilibrium theory of IR using normal-form representations: when a standard relevance feedback algorithm is coupled with a retrieval model, they would share the same objective function and thus form a partnership game; by contrast, pseudo relevance feedback pursues a rather different objective than that of retrieval models, therefore the interaction between them would lead to a general-sum game (though implicitly collaborative). Our game-theoretical analyses not only yield useful insights into the two major aspects of IR, but also offer new practical algorithms for achieving the equilibrium state of retrieval which have been shown to bring consistent performance improvements in both text retrieval and item recommendation.

[1]  Claudio Carpineto,et al.  An information-theoretic approach to automatic query expansion , 2001, TOIS.

[2]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 1 , 2000, Inf. Process. Manag..

[3]  O. H. Brownlee,et al.  ACTIVITY ANALYSIS OF PRODUCTION AND ALLOCATION , 1952 .

[4]  Christopher J. C. Burges,et al.  From RankNet to LambdaRank to LambdaMART: An Overview , 2010 .

[5]  Grace HuiYang,et al.  Dynamic Information Retrieval Modeling , 2016 .

[6]  Stephen E. Robertson,et al.  On Term Selection for Query Expansion , 1991, J. Documentation.

[7]  Grace Hui Yang,et al.  Win-win search: dual-agent stochastic game in session search , 2014, SIGIR.

[8]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[9]  Donna K. Harman,et al.  Relevance feedback revisited , 1992, SIGIR '92.

[10]  Wei-Ying Ma,et al.  Improving pseudo-relevance feedback in web information retrieval using web page segmentation , 2003, WWW '03.

[11]  Peng Zhang,et al.  IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models , 2017, SIGIR.

[12]  James Allan,et al.  A cluster-based resampling method for pseudo-relevance feedback , 2008, SIGIR '08.

[13]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[14]  Gareth J. F. Jones,et al.  Applying summarization techniques for term selection in relevance feedback , 2001, SIGIR '01.

[15]  ChengXiang Zhai Towards a Game-Theoretic Framework for Information Retrieval , 2015, SIGIR.

[16]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[17]  D. K. Harmon,et al.  Overview of the Third Text Retrieval Conference (TREC-3) , 1996 .

[18]  Jun Wang,et al.  Unifying user-based and item-based collaborative filtering approaches by similarity fusion , 2006, SIGIR.

[19]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[20]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[21]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[22]  Xueqi Cheng,et al.  Top-k learning to rank: labeling, ranking and evaluation , 2012, SIGIR '12.

[23]  Korris Fu-Lai Chung,et al.  Improving weak ad-hoc queries using wikipedia asexternal corpus , 2007, SIGIR.

[24]  Stephen E. Robertson,et al.  Selecting good expansion terms for pseudo-relevance feedback , 2008, SIGIR '08.

[25]  Hugo Zaragoza,et al.  The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[26]  Hugo Liu,et al.  ConceptNet — A Practical Commonsense Reasoning Tool-Kit , 2004 .

[27]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[28]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[29]  Peter Vrancx,et al.  Game Theory and Multi-agent Reinforcement Learning , 2012, Reinforcement Learning.

[30]  W. Bruce Croft,et al.  Quary Expansion Using Local and Global Document Analysis , 1996, SIGIR Forum.

[31]  Yinan Zhang,et al.  Information Retrieval as Card Playing: A Formal Model for Optimizing Interactive Retrieval Interface , 2015, SIGIR.

[32]  Tim Roughgarden,et al.  Algorithmic game theory , 2010, Commun. ACM.

[33]  Branislav L. Slantchev Game Theory : Dominance , Nash Equilibrium , Symmetry , 2005 .

[34]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 2 , 2000, Inf. Process. Manag..

[35]  Jun Wang,et al.  Dynamic Information Retrieval Modeling , 2015, Synthesis Lectures on Information Concepts, Retrieval, and Services.

[36]  John D. Lafferty,et al.  Document Language Models, Query Models, and Risk Minimization for Information Retrieval , 2001, SIGIR Forum.

[37]  Kevyn Collins-Thompson,et al.  Estimation and use of uncertainty in pseudo-relevance feedback , 2007, SIGIR.

[38]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL 2006.

[39]  Valentina Franzoni,et al.  PMING Distance: A Collaborative Semantic Proximity Measure , 2012, 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[40]  Donna K. Harman,et al.  Overview of the Fourth Text REtrieval Conference (TREC-4) , 1995, TREC.