Proceedings of the 4th international workshop on Adversarial information retrieval on the web

Before the advent of the World Wide Web, information retrieval algorithms were developed for relatively small and coherent document collections such as newspaper articles or book catalogs in a library. In comparison to these collections, the Web is massive, much less cohe-rent, changes more rapidly, and is spread over geographically distributed computers. Scal-ing information retrieval algorithms to the World Wide Web is a challenging task. Success to date is depicted by the ubiquitous use of search engines to access Internet content. From the point of view of a search engine, the Web is a mix of two types of content: the "closed Web" and the "open Web". The closed web comprises a few high-quality controlled collections which a search engine can fully trust. The "open Web," on the other hand, in-cludes the vast majority of Web pages, which lack an authority asserting their quality. The openness of the Web has been the key to its rapid growth and success. However, this open-ness is also a major source of new challenges for information retrieval methods. Adversarial Information Retrieval addresses tasks such as gathering, indexing, filtering, re-trieving and ranking information from collections wherein a subset has been manipulated maliciously. On the Web, the predominant form of such manipulation is "search engine spamming" or spamdexing, i.e.: malicious attempts to influence the outcome of ranking al-gorithms, aimed at getting an undeserved high ranking for some items in the collection. There is an economic incentive to rank higher in search engines, considering that a good ranking on them is strongly correlated with more traffic, which often translates to more revenue.