Towards the effective temporal association mining of spam blacklists

IP blacklists are a well-regarded anti-spam mechanism that capture global spamming patterns. These properties make such lists a practical ground-truth by which to study email spam behaviors. Observing one blacklist for nearly a year-and-a-half, we collected data on roughly half a billion listing events. In this paper, that data serves two purposes. First, we conduct a measurement study on the dynamics of blacklists and email spam at-large. The magnitude/duration of the data enables scrutiny of long-term trends, at scale. Further, these statistics help parameterize our second task: the mining of blacklist history for temporal association rules. That is, we search for IP addresses with correlated histories. Strong correlations would suggest group members are not independent entities and likely share botnet membership. Unfortunately, we find that statistically significant groupings are rare. This result is reinforced when rules are evaluated in terms of their ability to: (1) identify shared botnet members, using ground-truth from botnet infiltrations and sinkholes, and (2) predict future blacklisting events. In both cases, performance improvements over a control classifier are nominal. This outcome forces us to re-examine the appropriateness of blacklist data for this task, and suggest refinements to our mining model that may allow it to better capture the dynamics by which botnets operate.

[1]  Wagner Meira,et al.  A Campaign-based Characterization of Spamming Strategies , 2008, CEAS.

[2]  Gianluca Stringhini,et al.  The Underground Economy of Spam: A Botmaster's Perspective of Coordinating Large-Scale Spam Campaigns , 2011, LEET.

[3]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[4]  Arvind Krishnamurthy,et al.  Studying Spamming Botnets Using Botlab , 2009, NSDI.

[5]  Santosh S. Vempala,et al.  Filtering spam with behavioral blacklisting , 2007, CCS '07.

[6]  Dawn Xiaodong Song,et al.  Exploiting Network Structure for Proactive Spam Mitigation , 2007, USENIX Security Symposium.

[7]  Chris Kanich,et al.  Botnet Judo: Fighting Spam with Itself , 2010, NDSS.

[8]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[9]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[10]  Christian Rossow,et al.  Empirical research on IP blacklisting , 2008, CEAS.

[11]  Insup Lee,et al.  Spam mitigation using spatio-temporal reputations from blacklist history , 2010, ACSAC '10.

[12]  Nan Jiang,et al.  Research issues in data stream association rule mining , 2006, SGMD.

[13]  Emil Sit,et al.  An empirical study of spam traffic and the use of DNS black lists , 2004, IMC '04.

[14]  Fang Yu,et al.  How dynamic are IP addresses? , 2007, SIGCOMM '07.

[15]  Guofei Gu,et al.  BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection , 2008, USENIX Security Symposium.

[16]  Nick Feamster,et al.  Understanding the network-level behavior of spammers , 2006, SIGCOMM.

[17]  Ajith Abraham,et al.  An efficient algorithm for incremental mining of temporal association rules , 2010, Data Knowl. Eng..

[18]  Prateek Mittal,et al.  BotGrep: Finding P2P Bots with Structured Graph Analysis , 2010, USENIX Security Symposium.

[19]  Wenke Lee,et al.  Ether: malware analysis via hardware virtualization extensions , 2008, CCS.

[20]  Christopher Krügel,et al.  Your botnet is my botnet: analysis of a botnet takeover , 2009, CCS.

[21]  Geoffrey Zweig,et al.  Syntactic Clustering of the Web , 1997, Comput. Networks.

[22]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[23]  Farnam Jahanian,et al.  Improving Spam Blacklisting Through Dynamic Thresholding and Speculative Aggregation , 2010, NDSS.

[24]  Shashi Shekhar,et al.  Similarity-Profiled Temporal Association Mining , 2009, IEEE Transactions on Knowledge and Data Engineering.

[25]  Howard J. Hamilton,et al.  Interestingness measures for data mining: A survey , 2006, CSUR.

[26]  Stefan Savage,et al.  Spamscatter: Characterizing Internet Scam Hosting Infrastructure , 2007, USENIX Security Symposium.

[27]  Xiaodong Chen,et al.  Discovering Temporal Association Rules: Algorithms, Language and System , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[28]  Helen J. Wang,et al.  Characterizing Botnets from Email Spam Records , 2008, LEET.

[29]  Cláudia Antunes,et al.  Temporal Data Mining: an overview , 2001 .

[30]  Fulu Li,et al.  An Empirical Study of Clustering Behavior of Spammers and Group-based Anti-Spam Strategies , 2006, CEAS.

[31]  Chris Kanich,et al.  Spamalytics: an empirical analysis of spam marketing conversion , 2008, CCS.

[32]  Fang Yu,et al.  On Network-level Clusters for Spam Detection , 2010, NDSS.

[33]  Feng Qian,et al.  Botnet spam campaigns can be long lasting: evidence, implications, and analysis , 2009, SIGMETRICS '09.

[34]  Geoff Hulten,et al.  Spamming botnets: signatures and characteristics , 2008, SIGCOMM '08.