Using Propagation of Distrust to Find Untrustworthy Web Neighborhoods

Web spamming, the practice of introducing artificial text and links into web pages to affect the results of searches, has been recognized as a major problem for search engines. But it is mainly a serious problem for web users because they tend to confuse trusting the search engine with trusting the results of a search. In this paper, we propose "backwards propagation of distrust,'' as an approach to finding spamming untrustworthy sites. Our approach is inspired by the social behavior associated with distrust. In society, recognition of an untrustworthy entity (person, institution, idea, etc) is a reason for questioning the trustworthiness of those that recommended its entity.  People that are found to strongly support untrustworthy entities become untrustworthy themselves.  So, in society distrust is propagated backwards. Our algorithm simulates this social behavior on the web graph with considerable success. Moreover, by respecting the user's perception of trust through the web graph, our algorithm makes it possible to resolve the moral question of who should be making the decision of weeding out web spammers in favor of the user, not the search engine or a higher authority.  Our approach can lead to browser-level or personalized server-side web spam filters that work in synergy with the powerful search engines to deliver personalized, trusted web results.

[1]  Panagiotis Takis Metaxas,et al.  On the Evolution of Search Engine Rankings , 2009, WEBIST.

[2]  Marc Najork,et al.  Spam, Damn Spam, and Statistics , 2004 .

[3]  Panagiotis Takis Metaxas Web Spam, Social Propaganda and the Evolution of Search Engine Rankings , 2009, WEBIST.

[4]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[5]  Kostas Tsioutsiouliklis,et al.  \Googlearchy": How a Few Heavily-Linked Sites Dominate Politics on the Web , 2003 .

[6]  Hao Chen,et al.  Spam double-funnel: connecting web spammers with advertisers , 2007, WWW '07.

[7]  Monika Henzinger,et al.  Analysis of a very large web search engine query log , 1999, SIGF.

[8]  Rajeev Motwani,et al.  Stratified Planning , 2009, IJCAI.

[9]  Hector Garcia-Molina,et al.  Combating Web Spam with TrustRank , 2004, VLDB.

[10]  Franco Scarselli,et al.  PageRank and Web communities , 2003, Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003).

[11]  Hector Garcia-Molina,et al.  Web Spam Taxonomy , 2005, AIRWeb.

[12]  Brian D. Davison,et al.  Identifying link farm spam pages , 2005, WWW '05.

[13]  Clifford A. Lynch,et al.  When documents deceive: Trust and provenance as new factors for information retrieval in a tangled web , 2001, J. Assoc. Inf. Sci. Technol..

[14]  Marc Najork,et al.  Detecting spam web pages through content analysis , 2006, WWW '06.

[15]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..