Automatically Inferring the Evolution of Malicious Activity on the Internet

Internet-based services routinely contend with a range of malicious activity (e.g., spam, scans, botnets) that can potentially arise from virtually any part of the global Internet infrastructure and that can shift longitudinally over time. In this paper, we develop the first algorithmic techniques to automatically infer regions of the Internet with shifting security characteristics in an online fashion. Conceptually, our key idea is to model the malicious activity on the Internet as a decision tree over the IP address space, and identify the dynamics of the malicious activity by inferring the dynamics of the decision tree. Our evaluations on large corpuses of mail data and botnet data indicate that our algorithms are fast, can keep up with Internet-scale traffic data, and can extract changes in sources of malicious activity substantially better (a factor of 2.5) than approaches based on using predetermined levels of aggregation such as BGP-based network-aware clusters. Our case studies demonstrate our algorithm’s ability to summarize large shifts in malicious activity to a small number of IP regions (by as much as two orders of magnitude), and thus help focus limited operator resources. Using our algorithms, we find that some regions of the Internet are prone to much faster changes than others, such as a set of small and medium-sized hosting providers that are of particular interest to mail operators.

[1]  Dimitrios Gunopulos,et al.  Parsimonious Explanations of Change in Hierarchical Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[2]  Farnam Jahanian,et al.  Shades of grey: On the effectiveness of reputation-based “blacklists” , 2008, 2008 3rd International Conference on Malicious and Unwanted Software (MALWARE).

[3]  George Varghese,et al.  Automatically inferring patterns of resource consumption in network traffic , 2003, SIGCOMM '03.

[4]  Manfred K. Warmuth,et al.  THE WEIGHTED MAJORITY ALGORITHM (Supersedes 89-16) , 1992 .

[5]  Shai Ben-David,et al.  Detecting Change in Data Streams , 2004, VLDB.

[6]  Nick Feamster,et al.  Understanding the network-level behavior of spammers , 2006, SIGCOMM.

[7]  Yoram Singer,et al.  Using and combining predictors that specialize , 1997, STOC '97.

[8]  Fang Yu,et al.  On Network-level Clusters for Spam Detection , 2010, NDSS.

[9]  Vyas Sekar,et al.  Analyzing large DDoS attacks using multiple data sources , 2006, LSAD '06.

[10]  Katerina J. Argyraki,et al.  Optimal Filtering of Source Address Prefixes: Models and Algorithms , 2009, IEEE INFOCOM 2009.

[11]  Phillip A. Porras,et al.  Highly Predictive Blacklisting , 2008, USENIX Security Symposium.

[12]  Divesh Srivastava,et al.  Diamond in the rough: finding Hierarchical Heavy Hitters in multi-dimensional data , 2004, SIGMOD '04.

[13]  Arvind Krishnamurthy,et al.  Studying Spamming Botnets Using Botlab , 2009, NSDI.

[14]  Joseph B. Kadane,et al.  Using uncleanliness to predict future botnet addresses , 2007, IMC '07.

[15]  Carsten Lund,et al.  Online identification of hierarchical heavy hitters: algorithms, evaluation, and applications , 2004, IMC '04.

[16]  Emil Sit,et al.  An empirical study of spam traffic and the use of DNS black lists , 2004, IMC '04.

[17]  Nick Feamster,et al.  Dynamics of Online Scam Hosting Infrastructure , 2009, PAM.

[18]  Dawn Xiaodong Song,et al.  Tracking Dynamic Sources of Malicious Activity at Internet Scale , 2009, NIPS.

[19]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[20]  Jian Zhang,et al.  Highly Predictive Blacklists , 2007 .

[21]  Geoff Hulten,et al.  Spamming botnets: signatures and characteristics , 2008, SIGCOMM '08.

[22]  Brice Augustin,et al.  IXPs: mapped? , 2009, IMC '09.

[23]  Stefan Savage,et al.  Spamscatter: Characterizing Internet Scam Hosting Infrastructure , 2007, USENIX Security Symposium.

[24]  Yinglian Xie,et al.  How dynamic are IP addresses , 2007, SIGCOMM 2007.

[25]  Tatsuya Mori,et al.  On the effectiveness of IP reputation for spam filtering , 2010, 2010 Second International Conference on COMmunication Systems and NETworks (COMSNETS 2010).

[26]  Santosh S. Vempala,et al.  Filtering spam with behavioral blacklisting , 2007, CCS '07.

[27]  Dawn Xiaodong Song,et al.  Exploiting Network Structure for Proactive Spam Mitigation , 2007, USENIX Security Symposium.

[28]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[29]  Balachander Krishnamurthy,et al.  On network-aware clustering of Web clients , 2000, SIGCOMM.

[30]  Farnam Jahanian,et al.  Improving Spam Blacklisting Through Dynamic Thresholding and Speculative Aggregation , 2010, NDSS.

[31]  Dimitrios Gunopulos,et al.  Efficient and effective explanation of change in hierarchical summaries , 2007, KDD '07.

[32]  Laurent Mathy,et al.  Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference , 2009, IMC 2009.