Spam Campaign Cluster Detection Using Redirected URLs and Randomized SubDomains

A substantial majority of the email sent everyday is spam. Spam emails cause many problems if someone acts or clicks on the link provided in the email body. The problems may include infecting users personal machine with malware, stealing personal information, capturing credit card information, etc. Since spam emails are generated as a part of a very limited numbers of spam campaigns, it is useful to cluster spam messages into campaigns, so as to identify which campaigns are the largest. This enables investigation to focus this attention on the largest as the most significant clusters. In this paper, we present a method to cluster spam emails into spam campaigns. In our approach, the redirected URL has been chosen as the primary field for cluster formation. Our study shows that, a huge number of URLs arriving in spam email eventually points to a much smaller set of redirected URLs. Our multilevel clustering method grouped 90% of our half million spam emails into 4 spam campaigns. In addition to redirected URLs, we also use randomized sub domains, which come as a given URL in email body, for campaign identification. We believe that our model can be applied in real time to quickly detect major campaign.

[1]  Marc Najork,et al.  Detecting spam web pages through content analysis , 2006, WWW '06.

[2]  Farnam Jahanian,et al.  Shades of grey: On the effectiveness of reputation-based “blacklists” , 2008, 2008 3rd International Conference on Malicious and Unwanted Software (MALWARE).

[3]  P. Jaccard Distribution de la flore alpine dans le bassin des Dranses et dans quelques régions voisines , 1901 .

[4]  Nick Feamster,et al.  Understanding the network-level behavior of spammers , 2006, SIGCOMM.

[5]  Carlos Castillo,et al.  Web spam identification through content and hyperlinks , 2008, AIRWeb '08.

[6]  Fang Yu,et al.  On Network-level Clusters for Spam Detection , 2010, NDSS.

[7]  Paul Gardner-Stephen,et al.  A Taxonomy of Email SPAM Filters , 2011 .

[8]  Carl Vogel,et al.  Spam filters: bayes vs. chi-squared; letters vs. words , 2003, ISICT.

[9]  Ragib Hasan,et al.  Hot Zone Identification: Analyzing Effects of Data Sampling on Spam Clustering , 2014, J. Digit. Forensics Secur. Law.

[10]  Wesley Pronk Real-time Blacklisting of Bots based on Spam Analyis , 2011 .

[11]  Joel Scanlan,et al.  Catching spam before it arrives: domain specific dynamic blacklists , 2006, ACSW.

[12]  Shari Lawrence Pfleeger,et al.  Canning SPAM: Proposed solutions to unwanted email , 2005, IEEE Security & Privacy Magazine.

[13]  Wagner Meira,et al.  A Campaign-based Characterization of Spamming Strategies , 2008, CEAS.

[14]  Kirby I Bland,et al.  The University of Alabama at Birmingham. , 2002, Archives of surgery.

[15]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[16]  Wagner Meira,et al.  Spam Miner: A Platform for Detecting and Characterizing Spam Campaigns , 2009 .

[17]  M. H. P. Chaves,et al.  Spamming Chains: A New Way of Understanding Spammer Behavior , 2009 .

[18]  James A. Hendler,et al.  Reputation Network Analysis for Email Filtering , 2004, CEAS.