The changing nature of Spam 2.0

Spam 2.0 (or Web 2.0 Spam) is referred to as spam content that is hosted on Web 2.0 applications (blogs, forums, social networks etc.). Such spam differs from traditional spam as this is targeted at Web 2.0 applications and spreads through legitimate websites. The main problems with Spam 2.0 is spam websites get undeserved high ranking in search engines, damage the reputation of legitimate websites, wastes' valuable computing resources and deceives users resulting in proliferation of scam, fraud and other security attacks. Protecting the Internet against Spam 2.0 attacks is increasingly becoming important due to the potential threats it poses to the innocent web users. The paper contributes in this direction by attempting to understand the root cause of the problem, by investigating the changing nature of Spam 2.0. To understand this we setup an online discussion forum as a Honeypot to capture spam content. The collected data is analysed to identify trends within the spam corpus, which includes repetitiveness in the use of email addresses, patterns within email addresses, repetitiveness of forum posts, domains used for spamming, keywords and categories, origin of spam traffic. In the future we aim to use these trends in developing a preventive or early detection system that could predict future spam activities and would allow us to take pre-emptive actions to address them.

[1]  Nazanin Firoozeh,et al.  Definition of spam 2.0: New spamming boom , 2010, 4th IEEE International Conference on Digital Ecosystems and Technologies.

[2]  Debajyoti Mukhopadhyay,et al.  Clustering-based web page prediction , 2011, Int. J. Knowl. Web Intell..

[3]  Vidyasagar Potdar,et al.  Spammer and hacker, two old friends , 2009, 2009 3rd IEEE International Conference on Digital Ecosystems and Technologies.

[4]  Vidyasagar Potdar,et al.  Toward spam 2.0: An evaluation of Web 2.0 anti-spam methods , 2009, 2009 7th IEEE International Conference on Industrial Informatics.

[5]  Farida Ridzuan,et al.  Key Parameters in Identifying Cost of Spam 2.0 , 2010, 2010 24th IEEE International Conference on Advanced Information Networking and Applications.

[6]  Debajyoti Mukhopadhyay,et al.  An Algorithm for Construction of High Efficient Web Page Tree , 2010, J. Convergence Inf. Technol..

[7]  Alex Talevski,et al.  HoneySpam 2.0: Profiling Web Spambot Behaviour , 2009, PRIMA.

[8]  Timothy W. Finin,et al.  Characterizing the Splogosphere , 2006, WWW 2006.

[9]  Alex Talevski,et al.  Web Spambot Detection Based on Web Navigation Behaviour , 2010, 2010 24th IEEE International Conference on Advanced Information Networking and Applications.

[10]  Indranil Ghosh,et al.  An Advanced Partitioning Approach of Web Page Clustering utilizing Content & Link Structure , 2009, J. Convergence Inf. Technol..

[11]  Debajyoti Mukhopadhyay,et al.  Identify Web-page Content meaning using Knowledge based System for Dual Meaning Words , 2012, ArXiv.

[12]  Ronald D. Snee,et al.  Industry, Statistics in , 2006 .

[13]  Stephen Hinde Spam, scams, chains, hoaxes and other junk mail , 2002, Comput. Secur..

[14]  Alex Talevski,et al.  Behaviour-Based Web Spambot Detection by Utilising Action Time and Action Frequency , 2010, ICCSA.