Clique-Attacks Detection in Web Search Engine for Spamdexing using K-Clique Percolation Technique

Search engines make the information retrieval task easier for the users. Highly ranking position in the search engine query results brings great benefits for websites. Some website owners interpret the link architecture to improve ranks. To handle the search engine spam problems, especially link farm spam, clique identification in the network structure would help a lot. This paper proposes a novel strategy to detect the spam based on K-Clique Percolation method. Data collected from website and classified with NaiveBayes Classification algorithm. The suspicious spam sites are analyzed for clique-attacks. Observations and findings were given regarding the spam. Performance of the system seems to be good in terms of accuracy.

[1]  A. Parasuraman,et al.  Reassessment of expectations as a comparison standard in measuring service quality: Implications , 1994 .

[2]  Brian D. Davison,et al.  Identifying link farm spam pages , 2005, WWW '05.

[3]  A. Parasuraman,et al.  A Conceptual Model of Service Quality and Its Implications for Future Research , 1985 .

[4]  S. K. Jayanthi,et al.  Perceiving linkspam based on DBSpamClust , 2011, 2011 3rd International Conference on Electronics Computer Technology.

[5]  C. Hwang,et al.  TOPSIS for MODM , 1994 .

[6]  S. Sasikala,et al.  Link Spam Detection based on DBSpamClust with Fuzzy C-means Clustering , 2011, ArXiv.

[7]  Yong Wang,et al.  Link Farm Spam Detection Based on its Properties , 2008, 2008 International Conference on Computational Intelligence and Security.

[8]  Ching-Wen Li,et al.  Quality evaluation of domestic airline industry using modified Taguchi loss function with different weights and target values , 1997 .

[9]  Zsolt Fekete,et al.  Web spam: a survey with vision for the archivist , 2008 .

[10]  Jiawei Han,et al.  Survey on web spam detection: principles and algorithms , 2012, SKDD.

[11]  Ching-Lai Hwang,et al.  A new approach for multiple objective decision making , 1993, Comput. Oper. Res..

[12]  Brian D. Davison,et al.  Undue influence: eliminating the impact of link plagiarism on web search rankings , 2006, SAC.

[13]  Hector Garcia-Molina,et al.  Web Spam Taxonomy , 2005, AIRWeb.

[14]  Stelios H. Zanakis,et al.  Multi-attribute decision making: A simulation comparison of select methods , 1998, Eur. J. Oper. Res..

[15]  Song-Nian Yu,et al.  Link Variable TrustRank for Fighting Web Spam , 2008, 2008 International Conference on Computer Science and Software Engineering.

[16]  Carlos Castillo,et al.  Graph regularization methods for Web spam detection , 2010, Machine Learning.

[17]  Gyutai Kim,et al.  Identifying investment opportunities for advanced manufacturing systems with comparative-integrated performance measurement , 1997 .

[18]  Hector Garcia-Molina,et al.  Combating Web Spam with TrustRank , 2004, VLDB.