Artificial Neural Networks For Content-based Web Spam Detection

Web spam has become a big problem in the lives of Internet users, causing personal injury and economic losses. Although some approaches have been proposed to automatically detect and avoid this problem, the high speed the techniques employed by spammers are improved requires that the classifiers be more generic, efficient and highly adaptive. Despite of the fact that it is a common sense in the literature that neural based techniques have a high ability of generalization and adaptation, as far as we know there is no work that explore such method to avoid web spam. Given this scenario and to fill this important gap, this paper presents a performance evaluation of different models of artificial neural networks used to automatically classify and filter real samples of web spam based on their contents. The results indicate that some of evaluated approaches have a big potential since they are suitable to deal with the problem and clearly outperform the state-of-the-art techniques.

[1]  Hongwei Liu,et al.  On the Levenberg-Marquardt training method for feed-forward neural networks , 2010, 2010 Sixth International Conference on Natural Computation.

[2]  Thomas Lavergne,et al.  Tracking Web spam with HTML style similarities , 2008, TWEB.

[3]  Marc Najork,et al.  Detecting spam web pages through content analysis , 2006, WWW '06.

[4]  Kevin S. McCurley,et al.  Ranking the web frontier , 2004, WWW '04.

[5]  Akebo Yamakami,et al.  Content-based spam filtering , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[6]  Akebo Yamakami,et al.  Facing the spammers: A very effective approach to avoid junk e-mails , 2012, Expert Syst. Appl..

[7]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[8]  John Mark,et al.  Introduction to radial basis function networks , 1996 .

[9]  Chunheng Wang,et al.  Boosting the Performance of Web Spam Detection with Ensemble Under-Sampling Classification , 2007, Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007).

[10]  Qinqing Ren Feature-Fusion Framework for Spam Filtering Based on SVM , 2010 .

[11]  Qiang Wu,et al.  Improving web spam classification using rank-time features , 2007, AIRWeb '07.

[12]  Carlos Castillo,et al.  Graph regularization methods for Web spam detection , 2010, Machine Learning.

[13]  T.,et al.  Training Feedforward Networks with the Marquardt Algorithm , 2004 .

[14]  Sylvain Peyronnet,et al.  Lightweight Clustering Methods for Webspam Demotion , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[15]  Jácint Szabó,et al.  Linked latent Dirichlet allocation in web spam filtering , 2009, AIRWeb '09.

[16]  Hector Garcia-Molina,et al.  Spam: it's not just for inboxes anymore , 2005, Computer.

[17]  Akebo Yamakami,et al.  Contributions to the study of SMS spam filtering: new collection and results , 2011, DocEng '11.

[18]  Jurandy Almeida,et al.  Spam filtering: how the dimensionality reduction affects the accuracy of Naive Bayes classifiers , 2011, Journal of Internet Services and Applications.

[19]  Tie-Yan Liu,et al.  Detecting Link Spam Using Temporal Information , 2006, Sixth International Conference on Data Mining (ICDM'06).

[20]  Akebo Yamakami,et al.  Redução de Dimensionalidade Aplicada na Classificação de Spams usando Filtros Bayesianos , 2011 .

[21]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.

[22]  Akebo Yamakami,et al.  Advances in Spam Filtering Techniques , 2012, Computational Intelligence for Privacy and Security.

[23]  Fabrizio Silvestri,et al.  Know your neighbors: web spam detection using the web topology , 2007, SIGIR.

[24]  Christian Platzer,et al.  Removing web spam links from search engine results , 2011, Journal in Computer Virology.

[25]  Jurandy Almeida,et al.  Filtering spams using the minimum description length principle , 2010, SAC '10.

[26]  András A. Benczúr,et al.  Web spam classification: a few features worth more , 2011, WebQuality '11.

[27]  Jurandy Almeida,et al.  Evaluation of Approaches for Dimensionality Reduction Applied with Naive Bayes Anti-Spam Filters , 2009, 2009 International Conference on Machine Learning and Applications.

[28]  Jurandy Almeida,et al.  Probabilistic anti-spam filtering with dimensionality reduction , 2010, SAC '10.

[29]  Torsten Suel,et al.  Improving web spam classifiers using link structure , 2007, AIRWeb '07.