Dynamic classification of packing algorithms for inspecting executables using entropy analysis

Packing is widely used for bypassing anti-malware systems, and the proportion of packed malware has been growing rapidly, making up over 80% of malware. Few studies on detecting packing algorithms have been conducted during last two decades. In this paper, we propose a method to classify packing algorithms of given packed executables. First, we convert entropy values of the packed executables loaded in memory into symbolic representations. Our proposed method uses SAX (Symbolic Aggregate Approximation) which is known to be good at large data conversion. Due to its advantage of simplifying complicated patterns, symbolic representation is commonly used in bio-informatics and data mining fields. Second, we classify the distribution of symbols using supervised learning classifications, i.e., Naive Bayes and Support Vector Machines. Results of our experiments with a collection of 466 programs and 15 packing algorithms demonstrated that our method can identify packing algorithms of given executables with a high accuracy of 94.2%, recall of 94.7% and precision of 92.7%. It has been confirmed that packing algorithms can be identified using entropy analysis, which is a measure of uncertainty of running executables, without a prior knowledge of the executable.

[1]  Heejo Lee,et al.  Generic unpacking using entropy analysis , 2010, 2010 5th International Conference on Malicious and Unwanted Software.

[2]  B. R. Meijer Rules and algorithms for the design of templates for template matching , 1992, [1992] Proceedings. 11th IAPR International Conference on Pattern Recognition.

[3]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[4]  Heejo Lee,et al.  Detecting Malicious Web Links and Identifying Their Attack Types , 2011, WebApps.

[5]  Wenke Lee,et al.  Classification of packed executables for accurate computer virus detection , 2008, Pattern Recognit. Lett..

[6]  Yang Xiang,et al.  Classification of malware using structured control flow , 2010 .

[7]  Nirwan Ansari,et al.  Revealing Packed Malware , 2008, IEEE Security & Privacy.

[8]  Tzi-cker Chiueh,et al.  A Study of the Packer Problem and Its Solutions , 2008, RAID.

[9]  Robert Lyda,et al.  Using Entropy Analysis to Find Encrypted and Packed Malware , 2007, IEEE Security & Privacy.

[10]  Christos Faloutsos,et al.  Fast Time Sequence Indexing for Arbitrary Lp Norms , 2000, VLDB.

[11]  Li Sun,et al.  Pattern Recognition Techniques for the Classification of Malware Packers , 2010, ACISP.

[12]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[13]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.