The Probability Ranking Principle is Not Optimal in Adversarial Retrieval Settings

The probability ranking principle (PRP) - ranking documents in response to a query by their relevance probabilities - is the theoretical foundation of most ad hoc document retrieval methods. A key observation that motivates our work is that the PRP does not account for potential post-ranking effects, specifically, changes to documents that result from a given ranking. Yet, in adversarial retrieval settings such as the Web, authors may consistently try to promote their documents in rankings by changing them. We prove that, indeed, the PRP can be sub-optimal in adversarial retrieval settings. We do so by presenting a novel game theoretic analysis of the adversarial setting. The analysis is performed for different types of documents (single topic and multi topic) and is based on different assumptions about the writing qualities of documents' authors. We show that in some cases, introducing randomization into the document ranking function yields overall user utility that transcends that of applying the PRP.

[1]  Jaap Kamps,et al.  Focused Retrieval of Content and Structure : 10th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2011, Saarbrücken, Germany, December 12-14, 2011, Revised Selected Papers , 2012 .

[2]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 1 , 2000, Inf. Process. Manag..

[3]  David Hawking,et al.  A Framework for Measuring the Impact of Web Spam , 2007 .

[4]  Hector Garcia-Molina,et al.  Web Spam Taxonomy , 2005, AIRWeb.

[5]  Oren Kurland,et al.  Shame to be sham: addressing content-based grey hat search engine optimization , 2013, SIGIR.

[6]  Ingemar J. Cox,et al.  Estimating global statistics for unstructured P2P search in the presence of adversarial peers , 2014, SIGIR.

[7]  W. Bruce Croft,et al.  A language modeling approach to information retrieval , 1998, SIGIR '98.

[8]  Jaap Kamps,et al.  Focused Retrieval of Content and Structure , 2011, Lecture Notes in Computer Science.

[9]  Norbert Fuhr,et al.  A probability ranking principle for interactive information retrieval , 2008, Information Retrieval.

[10]  J. Nash Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Moshe Tennenholtz,et al.  The search duel: a response to a strong ranker , 2014, SIGIR.

[12]  W. Bruce Croft,et al.  Quality-biased ranking of web documents , 2011, WSDM '11.

[13]  C. J. van Rijsbergen,et al.  Probabilistic models of information retrieval based on measuring the divergence from randomness , 2002, TOIS.

[14]  Brian D. Davison,et al.  Identifying link farm spam pages , 2005, WWW '05.

[15]  Tim Roughgarden,et al.  How bad is selfish routing? , 2002, JACM.

[16]  András A. Benczúr,et al.  SpamRank -- Fully Automatic Link Spam Detection , 2005, AIRWeb.

[17]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[18]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[19]  Rashmi Raj,et al.  Web Spam Detection with Anti-Trust Rank , 2006, AIRWeb.

[20]  Susan T. Dumais,et al.  Leveraging temporal dynamics of document content in relevance ranking , 2010, WSDM '10.

[21]  S. Robertson The probability ranking principle in IR , 1997 .

[22]  Gordon V. Cormack,et al.  TREC 2006 Spam Track Overview , 2006, TREC.

[23]  Charles L. A. Clarke,et al.  Efficient and effective spam filtering and re-ranking for large web datasets , 2010, Information Retrieval.

[24]  Thorsten Joachims,et al.  Accurately Interpreting Clickthrough Data as Implicit Feedback , 2017 .

[25]  Oren Kurland,et al.  Query-Performance Prediction Using Minimal Relevance Feedback , 2013, ICTIR.

[26]  Christos H. Papadimitriou,et al.  Worst-case equilibria , 1999 .

[27]  Pedro M. Domingos,et al.  Adversarial classification , 2004, KDD.

[28]  Jade Goldstein-Stewart,et al.  The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries , 1998, SIGIR Forum.

[29]  Marc Najork,et al.  Spam, damn spam, and statistics: using statistical analysis to locate spam web pages , 2004, WebDB '04.