New Trends in Security Evaluation of Bayesian Network-Based Malware Detection Models

Statistical methods have been used for a long time as a way to detect viral code. Such a detection method has been called spectral analysis, because it works with statistical distributions, such as bytes, instructions or system calls frequencies spectra. Most statistical classification algorithms can be described as graphical models, namely Bayesian networks. We will first present in this paper an approach of viral detection by means of spectral analysis based on Bayesian networks, through two basic examples of such learning models: naive Bayes and hidden Markov models. Designing a statistical information retrieval model requires careful and thorough evaluation in order to demonstrate the superior performance of new techniques on representative program collections. Nowadays, it has developed into a highly empirical discipline. We will next present information theory based criteria to characterize the effectiveness of spectral analysis models and then discuss the limits of such models.

[1]  Wenke Lee,et al.  Evading network anomaly detection systems: formal reasoning and practical techniques , 2006, CCS '06.

[2]  電子情報通信学会 IEICE transactions on information and systems , 1992 .

[3]  Jeff A. Bilmes,et al.  What HMMs Can Do , 2006, IEICE Trans. Inf. Syst..

[4]  Salvatore J. Stolfo,et al.  On the infeasibility of modeling polymorphic shellcode , 2007, CCS '07.

[5]  Mark Stamp,et al.  Hunting for metamorphic engines , 2006, Journal in Computer Virology.

[6]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[7]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[8]  Mark Stamp,et al.  Hunting for undetectable metamorphic viruses , 2011, Journal in Computer Virology.

[9]  David A. Wagner,et al.  Mimicry attacks on host-based intrusion detection systems , 2002, CCS '02.

[10]  Charles Elkan,et al.  Boosting and Naive Bayesian learning , 1997 .

[11]  Eric Filiol,et al.  A statistical model for undecidable viral detection , 2007, Journal in Computer Virology.

[12]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[13]  Eric Filiol,et al.  Malware Pattern Scanning Schemes Secure Against Black-box Analysis , 2006, Journal in Computer Virology.

[14]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[15]  Jeff A. Bilmes,et al.  WHAT HMMS CAN'T DO , 2004 .

[16]  Mark Stamp,et al.  Profile hidden Markov models and metamorphic virus detection , 2009, Journal in Computer Virology.

[17]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[18]  Salvatore J. Stolfo,et al.  On the infeasibility of modeling polymorphic shellcode , 2009, Machine Learning.

[19]  Wenke Lee,et al.  Polymorphic Blending Attacks , 2006, USENIX Security Symposium.

[20]  L. Breuer Introduction to Stochastic Processes , 2022, Statistical Methods for Climate Scientists.

[21]  Claude E. Shannon,et al.  A mathematical theory of communication , 1948, MOCO.