Determining malicious executable distinguishing attributes and low-complexity detection

Detection of rapidly evolving malware requires classification techniques that can effectively and efficiently detect zero-day attacks. Such detection is based on a robust model of benign behavior and deviations from that model are used to detect malicious behavior. In this paper we propose a low-complexity host-based technique that uses deviations in static file attributes to detect malicious executables. We first develop simple statistical models of static file attributes derived from the empirical data of thousands of benign executables. Deviations among the attribute models of benign and malware executables are then quantified using information-theoretic (Kullback-Leibler-based) divergence measures. This quantification reveals distinguishing attributes that are considerably divergent between benign and malware executables and therefore can be used for detection. We use the benign models of divergent attributes in cross-correlation and log-likelihood frameworks to classify malicious executables. Our results, using over 4,000 malicious file samples, indicate that the proposed detector provides reasonably high detection accuracy, while having significantly lower complexity than existing detectors.

[1]  rey O. Kephart,et al.  Automatic Extraction of Computer Virus SignaturesJe , 2006 .

[2]  William C. Arnold,et al.  AUTOMATICALLY GENERATED WIN32 HEURISTIC VIRUS DETECTION , 2000 .

[3]  Christopher Krügel,et al.  Static Disassembly of Obfuscated Binaries , 2004, USENIX Security Symposium.

[4]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[5]  Salvatore J. Stolfo,et al.  A comparative evaluation of two algorithms for Windows Registry Anomaly Detection , 2005, J. Comput. Secur..

[6]  Byrne Ghavalas,et al.  Trojan defence: A forensic view , 2005, Digit. Investig..

[7]  Raymond W. Yeung,et al.  A First Course in Information Theory (Information Technology: Transmission, Processing and Storage) , 2006 .

[8]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[9]  Salvatore J. Stolfo,et al.  Towards Stealthy Malware Detection , 2007, Malware Detection.

[10]  U. Bayer,et al.  TTAnalyze: A Tool for Analyzing Malware , 2006 .

[11]  Marcus A. Maloof,et al.  Learning to detect malicious executables in the wild , 2004, KDD.

[12]  Dawn Song,et al.  Malware Detection , 2010, Advances in Information Security.

[14]  Jeffrey O. Kephart,et al.  Biologically Inspired Defenses Against Computer Viruses , 1995, IJCAI.

[15]  Salvatore J. Stolfo,et al.  Data mining methods for detection of new malicious executables , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[16]  Somesh Jha,et al.  OmniUnpack: Fast, Generic, and Safe Unpacking of Malware , 2007, Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007).

[17]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[18]  Raymond W. Yeung,et al.  A First Course in Information Theory , 2002 .

[19]  Eugene H. Spafford,et al.  The internet worm program: an analysis , 1989, CCRV.

[20]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[21]  Karl N. Levitt,et al.  MCF: a malicious code filter , 1995, Comput. Secur..

[22]  Don H. Johnson,et al.  Symmetrizing the Kullback-Leibler Distance , 2001 .

[23]  Wei Xu,et al.  Improving one-class SVM for anomaly detection , 2003, Proceedings of the 2003 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.03EX693).

[24]  K. Liang,et al.  Asymptotic Properties of Maximum Likelihood Estimators and Likelihood Ratio Tests under Nonstandard Conditions , 1987 .

[25]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[26]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[27]  Richard A. Davis,et al.  Introduction to time series and forecasting , 1998 .

[28]  Dawson R. Engler,et al.  Using programmer-written compiler extensions to catch security holes , 2002, Proceedings 2002 IEEE Symposium on Security and Privacy.

[29]  Wenke Lee,et al.  PolyUnpack: Automating the Hidden-Code Extraction of Unpack-Executing Malware , 2006, 2006 22nd Annual Computer Security Applications Conference (ACSAC'06).

[30]  P. Mahalanobis On the generalized distance in statistics , 1936 .