A Chi-Square-Based Decision for Real-Time Malware Detection Using PE-File Features

The real-time detection of malware remains an open issue, since most of the existing approaches for malware categorization focus on improving the accuracy rather than the detection time. Therefore, finding a proper balance between these two characteristics is very important, especially for such sensitive systems. In this paper, we present a fast portable executable (PE) malware detection system, which is based on the analysis of the set of Application Programming Interfaces (APIs) called by a program and some technical PE features (TPFs). We used an efficient feature selection method, which first selects the most relevant APIs and TPFs using the chi-square (KHI2) measure, and then the Phi (φ) coefficient was used to classify the features in different subsets, based on their relevance. We evaluated our method using different classifiers trained on different combinations of feature subsets. We obtained very satisfying results with more than 98% accuracy. Our system is adequate for real-time detection since it is able to categorize a file (Malware or Benign) in 0.09 seconds.

[1]  Muhammad Zubair Shafiq,et al.  PE-Miner: Mining Structural Information to Detect Malicious Executables in Realtime , 2009, RAID.

[2]  Matt Pietrek,et al.  Peering Inside the PE: A Tour of the Win32 Portable Executable File Format , 1994 .

[3]  Yuval Elovici,et al.  Detection of malicious code by applying machine learning classifiers on static features: A state-of-the-art survey , 2009, Inf. Secur. Tech. Rep..

[4]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[5]  Ashkan Sami,et al.  Using feature generation from API calls for malware detection , 2014 .

[6]  D. Farrington,et al.  Relative improvement over chance (RIOC) and phi as measures of predictive efficiency and strength of association in 2×2 tables , 1989 .

[7]  O. B. Chedzoy Phi-Max Coefficient , 2006 .

[8]  Mark Stamp,et al.  Chi-squared distance and metamorphic virus detection , 2013, Journal of Computer Virology and Hacking Techniques.

[9]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  Christopher Leckie,et al.  A survey of coordinated attacks and collaborative intrusion detection , 2010, Comput. Secur..

[12]  Yibin Zhang,et al.  A fast malware detection algorithm based on objective-oriented association mining , 2013, Comput. Secur..

[13]  Tim Ring Counting the cost of privacy , 2014 .

[14]  Ali Hamzeh,et al.  A survey on heuristic malware detection techniques , 2013, The 5th Conference on Information and Knowledge Technology.

[15]  Salvatore J. Stolfo,et al.  Data mining methods for detection of new malicious executables , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[16]  Tao Li,et al.  An intelligent PE-malware detection system based on association mining , 2008, Journal in Computer Virology.

[17]  Ziro Yamauti,et al.  Statistical Tables and Formulas with Computer Applications , 1975 .

[18]  Paolo Fornasini The Chi Square Test , 2008 .

[19]  Srinivas Mukkamala,et al.  Computational Intelligent Techniques and Similarity Measures for Malware Classification , 2012, Computational Intelligence for Privacy and Security.

[20]  Jianmin Pang,et al.  Using API Sequence and Bayes Algorithm to Detect Suspicious Behavior , 2009, 2009 International Conference on Communication Software and Networks.

[21]  Yanfang Ye,et al.  CIMDS: Adapting Postprocessing Techniques of Associative Classification for Malware Detection , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).