论文信息 - Classification of Malware by Using Structural Entropy on Convolutional Neural Networks

Classification of Malware by Using Structural Entropy on Convolutional Neural Networks

The number of malicious programs has grown both in number and in sophistication. Analyzing the malicious intent of vast amounts of data requires huge resources and thus, effective categorization of malware is required. In this paper, the content of a malicious program is represented as an entropy stream, where each value describes the amount of entropy of a small chunk of code in a specific location of the file. Wavelet transforms are then applied to this entropy signal to describe the variation in the entropic energy. Motivated by the visual similarity between streams of entropy of malicious software belonging to the same family, we propose a file agnostic deep learning approach for categorization of malware. Our method exploits the fact that most variants are generated by using common obfuscation techniques and that compression and encryption algorithms retain some properties present in the original code. This allows us to find discriminative patterns that almost all variants in a family share. Our method has been evaluated using the data provided by Microsoft for the BigData Innovators Gathering Anti-Malware Prediction Challenge, and achieved promising results in comparison with the State of the Art.

[1] Lars Schmidt-Thieme,et al. Learning time-series shapelets , 2014, KDD.

[2] Eamonn J. Keogh,et al. The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances , 2016, Data Mining and Knowledge Discovery.

[3] Mansour Ahmadi,et al. Novel Feature Extraction, Selection and Fusion for Effective Malware Family Classification , 2015, CODASPY.

[4] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[5] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.

[6] A. Haar. Zur Theorie der orthogonalen Funktionensysteme , 1910 .

[7] Eamonn J. Keogh,et al. Time series shapelets: a new primitive for data mining , 2009, KDD.

[8] Jian Pei,et al. A brief survey on sequence classification , 2010, SKDD.

[9] Tak-Chung Fu,et al. A review on time series data mining , 2011, Eng. Appl. Artif. Intell..

[10] Hui Ding,et al. Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[11] Robert Lyda,et al. Using Entropy Analysis to Find Encrypted and Packed Malware , 2007, IEEE Security & Privacy.

[12] Ivan Sorokin,et al. Comparing files using structural entropy , 2011, Journal in Computer Virology.