WESPACT: — Detection of web spamdexing with decision trees in GA perspective

Internet today is huge, dynamic, self-organized, and strongly interlinked. Web spam can significantly worsen the quality of search engine results. The motivation of the paper is based on the logical perspective of approaching the web spam problem as cancer caused to the internet, and the solution could be derived by formulating the algorithms based on genetic algorithm (GA) based on content and link attributes. Web mining tools GATree [15] and PermutMatrix [14] has been used to simulate the experiments. JAVA is used to develop program that analyze and report the spamdexing instance. This paper proposes an algorithm WESPACT, to detect the web spam. This algorithm performs well as shown through experiments.

[1]  S. Sasikala,et al.  GAB_CLIQDET: - A Diagnostics to Web Cancer (Web Link Spam) Based on Genetic Algorithm , 2011 .

[2]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[3]  A. Barabasi,et al.  Scale-free characteristics of random networks: the topology of the world-wide web , 2000 .

[4]  Gilles Caraux,et al.  PermutMatrix: a graphical environment to arrange gene expression profiles in optimal linear order , 2005, Bioinform..

[5]  S. K. Jayanthi,et al.  Clique-Attacks Detection in Web Search Engine for Spamdexing using K-Clique Percolation Technique , 2012 .

[6]  Brian D. Davison,et al.  Topical TrustRank: using topicality to combat web spam , 2006, WWW '06.

[7]  Yan Zhang,et al.  Exploring both Content and Link Quality for Anti-Spamming , 2006, The Sixth IEEE International Conference on Computer and Information Technology (CIT'06).

[8]  S. K. Jayanthi,et al.  Perceiving linkspam based on DBSpamClust , 2011, 2011 3rd International Conference on Electronics Computer Technology.

[9]  S. Sasikala,et al.  Link Spam Detection based on DBSpamClust with Fuzzy C-means Clustering , 2011, ArXiv.

[10]  V Latora,et al.  Efficient behavior of small-world networks. , 2001, Physical review letters.

[11]  Nadine Akkari,et al.  SECURITY ANALYSIS AND DELAY EVALUATION FOR SIP-BASED MOBILE MASS EXAMINATION SYSTEM , 2012 .

[12]  Chunheng Wang,et al.  Boosting the Performance of Web Spam Detection with Ensemble Under-Sampling Classification , 2007, Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007).

[13]  Brian D. Davison,et al.  Undue influence: eliminating the impact of link plagiarism on web search rankings , 2006, SAC.