Data mining methods for detection of new malicious executables

A serious security threat today is malicious executables, especially new, unseen malicious executables often arriving as email attachments. These new malicious executables are created at the rate of thousands every year and pose a serious security threat. Current anti-virus systems attempt to detect these new malicious programs with heuristics generated by hand. This approach is costly and oftentimes ineffective. We present a data mining framework that detects new, previously unseen malicious executables accurately and automatically. The data mining framework automatically found patterns in our data set and used these patterns to detect a set of new malicious binaries. Comparing our detection methods with a traditional signature-based method, our method more than doubles the current detection rates for new malicious executables.

[1]  Eugene H. Spafford,et al.  The internet worm program: an analysis , 1989, CCRV.

[2]  Karl N. Levitt,et al.  Automated assistance for detecting malicious code , 1993 .

[3]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[4]  Donald Michie,et al.  Machine learning of rules and trees , 1995 .

[5]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[6]  Karl N. Levitt,et al.  MCF: a malicious code filter , 1995, Comput. Secur..

[7]  Gerald Tesauro,et al.  Neural networks for computer virus recognition , 1996 .

[8]  William W. Cohen Learning Trees and Rules with Set-Valued Features , 1996, AAAI/IAAI, Vol. 1.

[9]  Huan Liu,et al.  Book review: Machine Learning, Neural and Statistical Classification Edited by D. Michie, D.J. Spiegelhalter and C.C. Taylor (Ellis Horwood Limited, 1994) , 1996, SGAR.

[10]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[11]  Philip K. Chan,et al.  Learning Patterns from Unix Process Execution Traces for Intrusion Detection , 1997 .

[12]  K. Zou,et al.  Smooth non-parametric receiver operating characteristic (ROC) curves for continuous diagnostic tests. , 1997, Statistics in medicine.

[13]  Sebastian Thrun,et al.  Learning to Classify Text from Labeled and Unlabeled Documents , 1998, AAAI/IAAI.

[14]  Steve R. White,et al.  Open Problems in Computer Virus Research , 1998 .

[15]  Salvatore J. Stolfo,et al.  A data mining framework for building intrusion detection models , 1999, Proceedings of the 1999 IEEE Symposium on Security and Privacy (Cat. No.99CB36344).

[16]  Steve R. White,et al.  Anatomy of a Commercial-Grade Immune System , 1999 .

[17]  Protein Family Classification Using Sparse Markov Transducers , 2000, ISMB.

[18]  William C. Arnold,et al.  AUTOMATICALLY GENERATED WIN32 HEURISTIC VIRUS DETECTION , 2000 .

[19]  W. Frable Online publication , 2002 .

[20]  Eleazar Eskin,et al.  Protein Family Classification Using Sparse Markov Transducers , 2000, J. Comput. Biol..