Link Spam Detection based on DBSpamClust with Fuzzy C-means Clustering

Search engine became omnipresent means for ingoing to the web. Spamming Search engine is the technique to deceiving the ranking in search engine and it inflates the ranking. Web spammers have taken advantage of the vulnerability of link based ranking algorithms by creating many artificial references or links in order to acquire higher-than-deserved ranking n search engines' results. Link based algorithms such as PageRank, HITS utilizes the structural details of the hyperlinks for ranking the content in the web. In this paper an algorithm DBSpamClust is proposed for link spam detection. As showing through experiments such a method can filter out web spam effectively

[1]  Brian D. Davison,et al.  Undue influence: eliminating the impact of link plagiarism on web search rankings , 2006, SAC.

[2]  Hector Garcia-Molina,et al.  Web Spam Taxonomy , 2005, AIRWeb.

[3]  Michael I. Jordan,et al.  Stable algorithms for link analysis , 2001, SIGIR '01.

[4]  Hector Garcia-Molina,et al.  Link Spam Alliances , 2005, VLDB.

[5]  Krishna Bharat,et al.  Improved algorithms for topic distillation in a hyperlinked environment , 1998, SIGIR '98.

[6]  Carl D. Meyer,et al.  Deeper Inside PageRank , 2004, Internet Math..

[7]  Chunheng Wang,et al.  Boosting the Performance of Web Spam Detection with Ensemble Under-Sampling Classification , 2007, Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007).

[8]  Wei Zhang,et al.  Improvement of HITS-based algorithms on web documents , 2002, WWW '02.

[9]  Kevin S. McCurley,et al.  Ranking the web frontier , 2004, WWW '04.

[10]  Soumen Chakrabarti,et al.  Enhanced topic distillation using text, markup tags, and hyperlinks , 2001, SIGIR '01.

[11]  Gareth O. Roberts,et al.  Downweighting tightly knit communities in world wide web ranking. , 2003 .

[12]  Hector Garcia-Molina,et al.  Combating Web Spam with TrustRank , 2004, VLDB.

[13]  Brian D. Davison,et al.  Topical TrustRank: using topicality to combat web spam , 2006, WWW '06.

[14]  Malik Magdon-Ismail,et al.  Optimal Link Bombs are Uncoordinated , 2005, AIRWeb.

[15]  Taher H. Haveliwala,et al.  The Second Eigenvalue of the Google Matrix , 2003 .

[16]  Yan Zhang,et al.  Exploring both Content and Link Quality for Anti-Spamming , 2006, The Sixth IEEE International Conference on Computer and Information Technology (CIT'06).