Studying Ranking-Incentivized Web Dynamics

The ranking incentives of many authors of Web pages play an important role in the Web dynamics. That is, authors who opt to have their pages highly ranked for queries of interest often respond to rankings for these queries by manipulating their pages; the goal is to improve the pages' future rankings. Various theoretical aspects of this dynamics have recently been studied using game theory. However, empirical analysis of the dynamics is highly constrained due to lack of publicly available datasets. We present an initial such dataset that is based on TREC's ClueWeb09 dataset. Specifically, we used the WayBack Machine of the Internet Archive to build a document collection that contains past snapshots of ClueWeb documents which are highly ranked by some initial search performed for ClueWeb queries. Temporal analysis of document changes in this dataset reveals that findings recently presented for small-scale controlled ranking competitions between documents' authors also hold for Web data. Specifically, documents' authors tend to mimic the content of documents that were highly ranked in the past, and this practice can result in improved ranking.

[1]  Moshe Tennenholtz,et al.  Ranking Robustness Under Adversarial Document Manipulations , 2018, SIGIR.

[2]  Moshe Tennenholtz,et al.  A Game Theoretic Analysis of the Adversarial Retrieval Setting , 2017, J. Artif. Intell. Res..

[3]  Moshe Tennenholtz,et al.  Information Retrieval Meets Game Theory: The Ranking Competition Between Documents' Authors , 2017, SIGIR.

[4]  Susan T. Dumais,et al.  Leveraging temporal dynamics of document content in relevance ranking , 2010, WSDM '10.

[5]  Evgeniy Gabrilovich,et al.  Using the past to score the present: extending term weighting models through revision history analysis , 2010, CIKM.

[6]  Moshe Tennenholtz,et al.  Rethinking search engines and recommendation systems , 2019, Commun. ACM.

[7]  W. Bruce Croft,et al.  Quality-biased ranking of web documents , 2011, WSDM '11.

[8]  Cristina Ribeiro,et al.  Term weighting based on document revision history , 2011, J. Assoc. Inf. Sci. Technol..

[9]  Hector Garcia-Molina,et al.  Web Spam Taxonomy , 2005, AIRWeb.

[10]  Ran Ben Basat A Game Theoretic Analysis of the Adversarial Retrieval Setting , 2017 .

[11]  S. Robertson The probability ranking principle in IR , 1997 .

[12]  Moshe Tennenholtz,et al.  A Game-Theoretic Approach to Recommendation Systems with Strategic Content Providers , 2018, NeurIPS.

[13]  Charles L. A. Clarke,et al.  Efficient and effective spam filtering and re-ranking for large web datasets , 2010, Information Retrieval.

[14]  Milad Shokouhi,et al.  Temporal web dynamics and its application to information retrieval , 2013, WSDM.

[15]  Brian D. Davison,et al.  Adversarial Web Search , 2011, Found. Trends Inf. Retr..

[16]  Paul N. Bennett,et al.  Predicting content change on the web , 2013, WSDM.

[17]  John D. Lafferty,et al.  Document Language Models, Query Models, and Risk Minimization for Information Retrieval , 2001, SIGIR Forum.