Link Variable TrustRank for Fighting Web Spam

Highly ranking position in the search engine query results can bring great benefits for websites. However, some websites use various techniques cheating search engine to increase their ranking, and thus affecting the quality of the answer provided to the user. TrustRank is a recent algorithm to combat web spam, which is based on the idea that good sites seldom point to spam sites, however, we find many spam sites can get lots of inlinks from good sites by using indecent tricks. We propose to take the variance of link structure into consideration, combining with which the ranking scores of websites are judged. As showing through experiments such a method can filter out web spam effectively.

[1]  Hector Garcia-Molina,et al.  Web Spam Taxonomy , 2005, AIRWeb.

[2]  Hector Garcia-Molina,et al.  Combating Web Spam with TrustRank , 2004, VLDB.

[3]  Brian D. Davison,et al.  Topical TrustRank: using topicality to combat web spam , 2006, WWW '06.

[4]  Yan Zhang,et al.  Exploring both Content and Link Quality for Anti-Spamming , 2006, The Sixth IEEE International Conference on Computer and Information Technology (CIT'06).

[5]  Craig Silverstein,et al.  Analysis of a Very Large Altavista Query Log" SRC Technical note #1998-14 , 1998 .

[6]  Tie-Yan Liu,et al.  Detecting Link Spam Using Temporal Information , 2006, Sixth International Conference on Data Mining (ICDM'06).

[7]  Monika Henzinger,et al.  Analysis of a very large web search engine query log , 1999, SIGF.

[8]  Rajeev Motwani,et al.  Stratified Planning , 2009, IJCAI.

[9]  Chunheng Wang,et al.  Boosting the Performance of Web Spam Detection with Ensemble Under-Sampling Classification , 2007, Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007).