Taster's choice: a comparative analysis of spam feeds

E-mail spam has been the focus of a wide variety of measurement studies, at least in part due to the plethora of spam data sources available to the research community. However, there has been little attention paid to the suitability of such data sources for the kinds of analyses they are used for. In spite of the broad range of data available, most studies use a single "spam feed" and there has been little examination of how such feeds may differ in content. In this paper we provide this characterization by comparing the contents of ten distinct contemporaneous feeds of spam-advertised domain names. We document significant variations based on how such feeds are collected and show how these variations can produce differences in findings as a result.

[1]  Harris Drucker,et al.  Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[2]  Georgios Paliouras,et al.  An evaluation of Naive Bayesian anti-spam filtering , 2000, ArXiv.

[3]  Lluís Màrquez i Villodre,et al.  Boosting Trees for Anti-Spam Email Filtering , 2001, ArXiv.

[4]  Víctor Pàmies,et al.  Open Directory Project , 2003 .

[5]  J. Zittrain,et al.  Spam Works: Evidence from Stock Touts and Corresponding Market Activity , 2007 .

[6]  Bhavani M. Thuraisingham,et al.  Feature Based Techniques for Auto-Detection of Novel Email Worms , 2007, PAKDD.

[7]  Stefan Savage,et al.  Spamscatter: Characterizing Internet Scam Hosting Infrastructure , 2007, USENIX Security Symposium.

[8]  Santosh S. Vempala,et al.  Filtering spam with behavioral blacklisting , 2007, CCS '07.

[9]  Tyler Moore,et al.  Examining the impact of website take-down on phishing , 2007, eCrime '07.

[10]  Farnam Jahanian,et al.  Shades of grey: On the effectiveness of reputation-based “blacklists” , 2008, 2008 3rd International Conference on Malicious and Unwanted Software (MALWARE).

[11]  Minaxi Gupta,et al.  Behind Phishing: An Examination of Phisher Modi Operandi , 2008, LEET.

[12]  Geoff Hulten,et al.  Spamming botnets: signatures and characteristics , 2008, SIGCOMM '08.

[13]  Zhuoqing Morley Mao,et al.  Peeking into Spammer Behavior from a Unique Vantage Point , 2008, LEET.

[14]  Robert Beverly,et al.  Exploiting Transport-Level Characteristics of Spam , 2008, CEAS.

[15]  Blaine Nelson,et al.  Exploiting Machine Learning to Subvert Your Spam Filter , 2008, LEET.

[16]  Helen J. Wang,et al.  Characterizing Botnets from Email Spam Records , 2008, LEET.

[17]  Chris Kanich,et al.  On the Spam Campaign Trail , 2008, LEET.

[18]  Arvind Krishnamurthy,et al.  Studying Spamming Botnets Using Botlab , 2009, NSDI.

[19]  Chris Kanich,et al.  Spamcraft: An Inside Look At Spam Campaign Orchestration , 2009, LEET.

[20]  Chris Kanich,et al.  Spamalytics: an empirical analysis of spam marketing conversion , 2009, CACM.

[21]  Dmitry Samosseiko,et al.  THE PARTNERKA - WHAT IS IT, AND WHY SHOULD YOU CARE? , 2009 .

[22]  Nick Feamster,et al.  Dynamics of Online Scam Hosting Infrastructure , 2009, PAM.

[23]  Feng Qian,et al.  Botnet spam campaigns can be long lasting: evidence, implications, and analysis , 2009, SIGMETRICS '09.

[24]  M. H. P. Chaves,et al.  Spamming Chains: A New Way of Understanding Spammer Behavior , 2009 .

[25]  R. Clayton How much did shutting down McColo help ? , 2009 .

[26]  Zhenhai Duan,et al.  Understanding Forgery Properties of Spam Delivery Paths , 2010 .

[27]  Anthony Skjellum,et al.  Identifying New Spam Domains by Hosting IPs: Improving Domain Blacklisting , 2010 .

[28]  Insup Lee,et al.  Spam mitigation using spatio-temporal reputations from blacklist history , 2010, ACSAC '10.

[29]  Fang Yu,et al.  On Network-level Clusters for Spam Detection , 2010, NDSS.

[30]  P. C. Guerra,et al.  Exploring the Spam Arms Race to Characterize Spam Evolution , 2010 .

[31]  Chris Kanich,et al.  Botnet Judo: Fighting Spam with Itself , 2010, NDSS.

[32]  Marc Dacier,et al.  A strategic analysis of spam botnets operations , 2011, CEAS '11.

[33]  Tyler Moore,et al.  Measuring and Analyzing Search-Redirection Attacks in the Illicit Online Prescription Drug Trade , 2011, USENIX Security Symposium.

[34]  He Liu,et al.  Click Trajectories: End-to-End Analysis of the Spam Value Chain , 2011, 2011 IEEE Symposium on Security and Privacy.

[35]  Charles L. A. Clarke,et al.  Clustering for semi-supervised spam filtering , 2011, CEAS '11.

[36]  He Liu,et al.  On the Effects of Registrar-level Intervention , 2011, LEET.

[37]  Stefan Savage,et al.  PharmaLeaks: Understanding the Business of Online Pharmaceutical Affiliate Programs , 2012, USENIX Security Symposium.