Exploiting bidirectional links: making spamming detection easier

Previous anti-spamming algorithms based on link structure suffer from either the weakness of the page value metric or the vagueness of the seed selection. In this paper, we propose two page value metrics, AVRank and HVRank. These two "values" of all the web pages can be well assessed by using the bidirectional links' information. Moreover, with the help of bidirectional links, it becomes easier to enlarge the propagation coverage of seed sets. We further discuss the effectiveness of the combination of these two metrics, such as the quadratic mean of them. Our experimental results show that with such two metrics, our method can filter out spam sites and identify reputable ones more effectively than previous algorithms such as TrustRank.

[1]  Hector Garcia-Molina,et al.  Link Spam Alliances , 2005, VLDB.

[2]  Hector Garcia-Molina,et al.  Web Spam Taxonomy , 2005, AIRWeb.

[3]  Eli Upfal,et al.  The Web as a graph , 2000, PODS.

[4]  Rajeev Motwani,et al.  Stratified Planning , 2009, IJCAI.

[5]  Marcin Sydow,et al.  Random surfer with back step , 2004, WWW Alt. '04.

[6]  Yan Zhang,et al.  Deeply Exploiting Link Structure : Setting a Tougher Life for Spammers , 2009 .

[7]  Xinchang Zhang,et al.  Link based small sample learning for web spam detection , 2009, WWW '09.

[8]  Brian D. Davison,et al.  Identifying link farm spam pages , 2005, WWW '05.

[9]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[10]  Rashmi Raj,et al.  Web Spam Detection with Anti-Trust Rank , 2006, AIRWeb.

[11]  Michael R. Lyu,et al.  DiffusionRank: a possible penicillin for web spamming , 2007, SIGIR.

[12]  Luca Becchetti,et al.  Link-Based Characterization and Detection of Web Spam , 2006, AIRWeb.

[13]  Brian D. Davison,et al.  Propagating Trust and Distrust to Demote Web Spam , 2006, MTW.

[14]  Hector Garcia-Molina,et al.  Combating Web Spam with TrustRank , 2004, VLDB.

[15]  Brian D. Davison,et al.  Topical TrustRank: using topicality to combat web spam , 2006, WWW '06.

[16]  Yan Zhang,et al.  From Good to Bad Ones: Making Spam Detection Easier , 2008, 2008 IEEE 8th International Conference on Computer and Information Technology Workshops.

[17]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[18]  Yan Zhang,et al.  Exploring both Content and Link Quality for Anti-Spamming , 2006, The Sixth IEEE International Conference on Computer and Information Technology (CIT'06).

[19]  Panagiotis Takis Metaxas,et al.  Web Spam, Propaganda and Trust , 2005, AIRWeb.

[20]  Allan Borodin,et al.  Finding authorities and hubs from link structures on the World Wide Web , 2001, WWW '01.