Spatio-temporal mining of software adoption & penetration

How does malware propagate? Does it form spikes over time? Does it resemble the propagation pattern of benign files, such as software patches? Does it spread uniformly over countries? How long does it take for a URL that distributes malware to be detected and shut down? In this work, we answer these questions by analyzing patterns from 22 million malicious (and benign) files, found on 1.6 million hosts worldwide during the month of June 2011. We conduct this study using the WINE database available at Symantec Research Labs. Additionally, we explore the research questions raised by sampling on such large databases of executables; the importance of studying the implications of sampling is twofold: First, sampling is a means of reducing the size of the database hence making it more accessible to researchers; second, because every such data collection can be perceived as a sample of the real world. Finally, we discover the SHARKFIN temporal propagation pattern of executable files, the GEOSPLIT pattern in the geographical spread of machines that report executables to Symantec's servers, the Periodic Power Law (PPL) distribution of the life-time of URLs, and we show how to efficiently extrapolate crucial properties of the data from a small sample. To the best of our knowledge, our work represents the largest study of propagation patterns of executables.

[1]  Kenneth Levenberg A METHOD FOR THE SOLUTION OF CERTAIN NON – LINEAR PROBLEMS IN LEAST SQUARES , 1944 .

[2]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[3]  B. Mandelbrot,et al.  Fractals: Form, Chance and Dimension , 1978 .

[4]  I. Good,et al.  Fractals: Form, Chance and Dimension , 1978 .

[5]  R M May,et al.  Coevolution of hosts and parasites , 1982, Parasitology.

[6]  Manfred Schroeder,et al.  Fractals, Chaos, Power Laws: Minutes From an Infinite Paradise , 1992 .

[7]  Azer Bestavros,et al.  Self-similarity in World Wide Web traffic: evidence and possible causes , 1996, SIGMETRICS '96.

[8]  Christos Faloutsos,et al.  Modeling Skewed Distribution Using Multifractals and the '80-20' Law , 1996, VLDB.

[9]  Mark Crovella,et al.  Self-Similarity in World Wide Web Traffic: Evidence and Causes , 1996, SIGMETRICS.

[10]  Richard G. Baraniuk,et al.  A Multifractal Wavelet Model with Application to Network Traffic , 1999, IEEE Trans. Inf. Theory.

[11]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[12]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[13]  Herbert W. Hethcote,et al.  The Mathematics of Infectious Diseases , 2000, SIAM Rev..

[14]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[15]  Christos Faloutsos,et al.  Data mining meets performance evaluation: fast algorithms for modeling bursty traffic , 2002, Proceedings 18th International Conference on Data Engineering.

[16]  Vern Paxson,et al.  How to Own the Internet in Your Spare Time , 2002, USENIX Security Symposium.

[17]  Christos Faloutsos,et al.  Capturing the spatio-temporal behavior of real traffic data , 2002, Perform. Evaluation.

[18]  David Moore,et al.  Code-Red: a case study on the spread and victims of an internet worm , 2002, IMW '02.

[19]  Stefan Savage,et al.  Inside the Slammer Worm , 2003, IEEE Secur. Priv..

[20]  Vern Paxson,et al.  The top speed of flash worms , 2004, WORM '04.

[21]  M. Newman Power laws, Pareto distributions and Zipf's law , 2005 .

[22]  Christos Gkantsidis,et al.  Planet scale software updates , 2006, SIGCOMM 2006.

[23]  Christos Faloutsos,et al.  Cascading Behavior in Large Blog Graphs , 2007 .

[24]  Jure Leskovec,et al.  Microscopic evolution of social networks , 2008, KDD.

[25]  Didier Sornette,et al.  Robust dynamic classes revealed by measuring the response function of a social system , 2008, Proceedings of the National Academy of Sciences.

[26]  Vern Paxson,et al.  Measuring Pay-per-Install: The Commoditization of Malware Distribution , 2011, USENIX Security Symposium.

[27]  Tudor Dumitras,et al.  Toward a standard benchmark for computer security research: the worldwide intelligence network environment (WINE) , 2011, BADGERS '11.

[28]  Christos Faloutsos,et al.  Polonium: Tera-Scale Graph Mining and Inference for Malware Detection , 2011 .

[29]  Leyla Bilge,et al.  Before we knew it: an empirical study of zero-day attacks in the real world , 2012, CCS.

[30]  Christos Faloutsos,et al.  Rise and fall patterns of information diffusion: model and implications , 2012, KDD.

[31]  Daniel E. Geer,et al.  Power. Law , 2012, IEEE Secur. Priv..

[32]  Christos Faloutsos,et al.  Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining , 2013, ASONAM 2013.