Information Retrieval Meets Game Theory: The Ranking Competition Between Documents' Authors

In competitive search settings as the Web, there is an ongoing ranking competition between document authors (publishers) for certain queries. The goal is to have documents highly ranked, and the means is document manipulation applied in response to rankings. Existing retrieval models, and their theoretical underpinnings (e.g., the probability ranking principle), do not account for post-ranking corpus dynamics driven by this strategic behavior of publishers. However, the dynamics has major effect on retrieval effectiveness since it affects content availability in the corpus. Furthermore, while manipulation strategies observed over the Web were reported in past literature, they were not analyzed as ongoing, and changing, post-ranking response strategies, nor were they connected to the foundations of classical ad hoc retrieval models (e.g., content-based document-query surface level similarities and document relevance priors). We present a novel theoretical and empirical analysis of the strategic behavior of publishers using these foundations. Empirical analysis of controlled ranking competitions that we organized reveals a key strategy of publishers: making their documents (gradually) become similar to documents ranked the highest in previous rankings. Our theoretical analysis of the ranking competition as a repeated game, and its minmax regret equilibrium, yields a result that supports the merits of this publishing strategy. We further show that it can be predicted with high accuracy, and without explicit knowledge of the ranking function, whether documents will be promoted to the highest rank in our competitions. The prediction utilizes very few features which quantify changes of documents, specifically with respect to those previously ranked the highest.

[1]  Tao Tao,et al.  A formal study of information retrieval heuristics , 2004, SIGIR '04.

[2]  Ran Ben-Basat,et al.  The ranking game , 2016, WebDB.

[3]  Kfir Eliaz,et al.  Search Design and Broad Matching , 2016 .

[4]  S. Robertson The probability ranking principle in IR , 1997 .

[5]  Charles L. A. Clarke,et al.  Efficient and effective spam filtering and re-ranking for large web datasets , 2010, Information Retrieval.

[6]  Juliana Freire,et al.  A First Study on Temporal Dynamics of Topics on the Web , 2016, WWW.

[7]  Qiang Wu,et al.  Adapting boosting for information retrieval measures , 2010, Information Retrieval.

[8]  Norbert Fuhr,et al.  A probability ranking principle for interactive information retrieval , 2008, Information Retrieval.

[9]  Marc Najork,et al.  Detecting spam web pages through content analysis , 2006, WWW '06.

[10]  Jérôme Renault,et al.  Repeated Games with Incomplete Information , 2009, Encyclopedia of Complexity and Systems Science.

[11]  Moshe Tennenholtz,et al.  The Probability Ranking Principle is Not Optimal in Adversarial Retrieval Settings , 2015, ICTIR.

[12]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[13]  Brian D. Davison,et al.  Adversarial Web Search , 2011, Found. Trends Inf. Retr..

[14]  C. J. van Rijsbergen,et al.  The use of hierarchic clustering in information retrieval , 1971, Inf. Storage Retr..

[15]  Jun Wang,et al.  Dynamical information retrieval modelling: a portfolio-armed bandit machine approach , 2012, WWW.

[16]  Hector Garcia-Molina,et al.  Web Spam Taxonomy , 2005, AIRWeb.

[17]  Jun Wang,et al.  Dynamic Information Retrieval Modeling , 2015, Synthesis Lectures on Information Concepts, Retrieval, and Services.

[18]  Jade Goldstein-Stewart,et al.  The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries , 1998, SIGIR Forum.

[19]  Hong Wang,et al.  Adversarial Prediction Games for Multivariate Losses , 2015, NIPS.

[20]  W. Bruce Croft,et al.  Quality-biased ranking of web documents , 2011, WSDM '11.

[21]  Tie-Yan Liu Learning to Rank for Information Retrieval , 2009, Found. Trends Inf. Retr..

[22]  Craig Boutilier,et al.  Regret Minimizing Equilibria and Mechanisms for Games with Strict Type Uncertainty , 2004, UAI.

[23]  Fernando Diaz,et al.  Regularizing ad hoc retrieval scores , 2005, CIKM '05.

[24]  Susan T. Dumais,et al.  Leveraging temporal dynamics of document content in relevance ranking , 2010, WSDM '10.

[25]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[26]  John D. Lafferty,et al.  Document Language Models, Query Models, and Risk Minimization for Information Retrieval , 2001, SIGIR Forum.

[27]  Ran El-Yaniv,et al.  On the Foundations of Adversarial Single-Class Classification , 2010, ArXiv.

[28]  Pedro M. Domingos,et al.  Adversarial classification , 2004, KDD.

[29]  Yinan Zhang,et al.  Information Retrieval as Card Playing: A Formal Model for Optimizing Interactive Retrieval Interface , 2015, SIGIR.

[30]  R. Spiegler,et al.  A Simple Model of Search Engine Pricing , 2011 .

[31]  Paul N. Bennett,et al.  Predicting content change on the web , 2013, WSDM.