Scalable attack propagation model and algorithms for honeypot systems

Attack propagation models within honeypot systems aim at providing insights about attack strategies that target multiple honeypots, rather than analyzing attacks on each honeypot separately. Traditional attack propagation models focus on building a single probabilistic model. This modeling approach may be misleading, since it does not take into consideration contextual information such as the country from which the attack is initiated. In addition, with the massive increase in the magnitude of attacks on honeypots, a scalable modeling approach is required. In this work we present a novel attack propagation model that can utilize contextual information about the attacks by training multiple Markov Chain models. Moreover, we add additional layers of analysis: first, we present a likelihood estimation procedure that can identify new and evolving attack patterns; and second, we introduce a method for generating simulated attack sequences that can be used for training or sensitivity analysis. Lastly, we present, in details, a MapReduce design for all suggested algorithms in order to address scalability issues. We evaluate our methods on a massive dataset which includes approximately 170 million attacks on an operational honeypot system. Results indicate that contextual modeling is important for explaining attack propagation that may vary by country. In addition, we show the effectiveness of the suggested method for generating simulated sequences by comparing the attack propagation patterns we learned in the generated dataset and the original one. Finally, we demonstrate the scalability of all of the proposed algorithms on real and synthetic datasets that include over a billion records.

[1]  Marc Dacier,et al.  Empirical analysis and statistical modeling of attack processes based on honeypots , 2007, ArXiv.

[2]  Ramesh R. Sarukkai,et al.  Link prediction and path analysis using Markov chains , 2000, Comput. Networks.

[3]  Hiroshi Fujinoki,et al.  A Survey: Recent Advances and Future Trends in Honeypot Research , 2012 .

[4]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[5]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[6]  Katerina Goseva-Popstojanova,et al.  Characterization and classification of malicious Web traffic , 2014, Comput. Secur..

[7]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[8]  Katerina Goseva-Popstojanova,et al.  Quantification of Attackers Activities on Servers Running Web 2.0 Applications , 2010, 2010 Ninth IEEE International Symposium on Network Computing and Applications.

[9]  Lance Spitzner,et al.  The Honeynet Project: Trapping the Hackers , 2003, IEEE Secur. Priv..

[10]  David Lou,et al.  FOG: Fragment Optimized Growth Algorithm for the de Novo Generation of Molecules Occupying Druglike Chemical Space , 2009, J. Chem. Inf. Model..

[11]  Charles M. Grinstead,et al.  Introduction to probability , 1999, Statistics for the Behavioural Sciences.

[12]  John N. Tsitsiklis,et al.  Introduction to Probability , 2002 .

[13]  James Newsome,et al.  Polygraph: automatically generating signatures for polymorphic worms , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).

[14]  E. Thompson,et al.  Multilocus Lod Scores in Large Pedigrees: Combination of Exact and Approximate Calculations , 2007, Human Heredity.

[15]  Ming-Yang Su Internet worms identification through serial episodes mining , 2010, ECTI-CON2010: The 2010 ECTI International Confernce on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology.