A Study on Web Spam Classification and Algorithms

The various spams in the internet is classified based on the properties of spam such as spam content, type and ranking. The impact of various spams in social network, email, image, content and links is discussed and the techniques are listed to prevent the spam in various area. Also it reviews the two groups of spam detection techniques and algorithms such as content based methods and link based methods. Link based methods are further subdivided into five groups such as label propagation, link pruning and reweighting, label refinement, graph regularization. The review compares the label propagation, link pruning and reweighting, label refinement, graph regularization and feature based methods based on the various factors like type of information used, algorithms, working, complexity and mining techniques.

[1]  Haiying Shen,et al.  SOAP: A Social network Aided Personalized and effective spam filter to clean your e-mail box , 2011, 2011 Proceedings IEEE INFOCOM.

[2]  Torsten Suel,et al.  Improving web spam classifiers using link structure , 2007, AIRWeb '07.

[3]  Fabrizio Silvestri,et al.  Know your neighbors: web spam detection using the web topology , 2007, SIGIR.

[4]  Carlos Castillo,et al.  Graph regularization methods for Web spam detection , 2010, Machine Learning.

[5]  Kevin S. McCurley,et al.  Ranking the web frontier , 2004, WWW '04.

[6]  Gang Hua,et al.  A Comprehensive Approach to Image Spam Detection: From Server to Client Solution , 2010, IEEE Transactions on Information Forensics and Security.

[7]  Masaru Takesue,et al.  Cascaded Simple Filters for Accurate and Lightweight Email-Spam Detection , 2010, 2010 Fourth International Conference on Emerging Security Information, Systems and Technologies.

[8]  Thorsten Joachims,et al.  Accurately interpreting clickthrough data as implicit feedback , 2005, SIGIR '05.

[9]  Ming-Wei Chang,et al.  Partitioned logistic regression for spam filtering , 2008, KDD.

[10]  András A. Benczúr,et al.  SpamRank - fully automatic link spam detection. Work in progress , 2005 .

[11]  Malik Magdon-Ismail,et al.  Optimal Link Bombs are Uncoordinated , 2005, AIRWeb.

[12]  Kevin Borders,et al.  Social networks and context-aware spam , 2008, CSCW.

[13]  Ming Yang,et al.  Image spam hunter , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Pedro M. Domingos,et al.  Adversarial classification , 2004, KDD.

[15]  Hector Garcia-Molina,et al.  Web Spam Taxonomy , 2005, AIRWeb.

[16]  Monika Henzinger,et al.  Analysis of a very large web search engine query log , 1999, SIGF.

[17]  David Maxwell Chickering,et al.  Improving Cloaking Detection using Search Query Popularity and Monetizability , 2006, AIRWeb.

[18]  Rajeev Motwani,et al.  Stratified Planning , 2009, IJCAI.

[19]  David Carmel,et al.  The connectivity sonar: detecting site functionality by structural patterns , 2003, HYPERTEXT '03.

[20]  Wolfgang Nejdl,et al.  Site level noise removal for search engines , 2006, WWW '06.