Collective classification for packed executable identification

Malware is any software designed to harm computers. Commercial anti-virus are based on signature scanning, which is a technique effective only when the malicious executables have been previously analysed and identified. Malware writers employ several techniques in order to hide their actual behaviour. Executable packing consists in encrypting or hiding the real payload of the executable. Generic unpacking techniques do not depend on the packer used, as they execute the binary within an isolated environment (namely `sandbox') to gather the real code of the packed executable. However, this approach is slow and, therefore, a filter step is required to determine when an executable has been packed. To this end, supervised machine learning approaches trained with static features from the executables have been proposed. Notwithstanding, supervised learning methods need the identification and labelling of a high number of packed and not packed executables. In this paper, we propose a new method for packed executable detection that adopts a collective learning approach to reduce the labelling requirements of completely supervised approaches. We performed an empirical validation demonstrating that the system maintains a high accuracy rate while the labelling efforts are lower than when using supervised learning.

[1]  Stephen R. Garner,et al.  WEKA: The Waikato Environment for Knowledge Analysis , 1996 .

[2]  Judea Pearl,et al.  Reverend Bayes on Inference Engines: A Distributed Hierarchical Approach , 1982, AAAI.

[3]  Jonathon T. Giffin,et al.  Automatic Reverse Engineering of Malware Emulators , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[4]  Guoqiang Peter Zhang,et al.  Neural networks for classification: a survey , 2000, IEEE Trans. Syst. Man Cybern. Part C.

[5]  Yoseba K. Penya,et al.  Idea: Opcode-Sequence-Based Malware Detection , 2010, ESSoS.

[6]  Komal Babar,et al.  Generic unpacking techniques , 2009, 2009 2nd International Conference on Computer, Control and Communication.

[7]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[8]  Jennifer Neville,et al.  Collective Classification with Relational Dependency Networks , 2003 .

[9]  S. Kotsiantis,et al.  Recent Advances in Clustering : A Brief Survey , 2004 .

[10]  Igor Santos,et al.  Opcode-Sequence-Based Semi-supervised Unknown Malware Detection , 2011, CISIS.

[11]  Wenke Lee,et al.  Rotalumè: A Tool for Automatic Reverse Engineering of Malware Emulators , 2009 .

[12]  Si Wu,et al.  Improving support vector machine classifiers by modifying kernel functions , 1999, Neural Networks.

[13]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[14]  Robert Lyda,et al.  Using Entropy Analysis to Find Encrypted and Packed Malware , 2007, IEEE Security & Privacy.

[15]  Igor Santos,et al.  Semi-supervised learning for packed executable detection , 2011, 2011 5th International Conference on Network and System Security.

[16]  Yoseba K. Penya,et al.  N-grams-based File Signatures for Malware Detection , 2009, ICEIS.

[17]  Vinod Yegneswaran,et al.  Eureka: A Framework for Enabling Static Malware Analysis , 2008, ESORICS.

[18]  T. Bayes An essay towards solving a problem in the doctrine of chances , 2003 .

[19]  Wenke Lee,et al.  McBoost: Boosting Scalability in Malware Collection and Analysis Using Statistical Classification of Executables , 2008, 2008 Annual Computer Security Applications Conference (ACSAC).

[20]  J. Kent Information gain and a general measure of correlation , 1983 .

[21]  Somesh Jha,et al.  OmniUnpack: Fast, Generic, and Safe Unpacking of Malware , 2007, Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007).

[22]  Lise Getoor,et al.  Collective Classification for Text Classification , 2009 .

[23]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[24]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[25]  Ian H. Witten,et al.  WEKA: a machine learning workbench , 1994, Proceedings of ANZIIS '94 - Australian New Zealnd Intelligent Information Systems Conference.

[26]  Sotiris B. Kotsiantis,et al.  Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[27]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[28]  Enrique F. Castillo,et al.  Expert Systems and Probabilistic Network Models , 1996, Monographs in Computer Science.

[29]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Small Sample Performance , 1952 .

[30]  Peter Szor,et al.  The Art of Computer Virus Research and Defense , 2005 .

[31]  Wenke Lee,et al.  Classification of packed executables for accurate computer virus detection , 2008, Pattern Recognit. Lett..

[32]  Fred Cohen,et al.  Computer viruses—theory and experiments , 1990 .

[33]  Rolf Rolles,et al.  Unpacking Virtualization Obfuscators , 2009, WOOT.

[34]  Muhammad Zubair Shafiq,et al.  PE-Miner: Mining Structural Information to Detect Malicious Executables in Realtime , 2009, RAID.

[35]  Igor Santos,et al.  Collective classification for unknown malware detection , 2011, Proceedings of the International Conference on Security and Cryptography.

[36]  Igor Santos,et al.  Structural Feature Based Anomaly Detection for Packed Executable Identification , 2011, CISIS.

[37]  S. Momina Tabish,et al.  PE-Probe: Leveraging Packer Detection and Structural Information to Detect Malicious Portable Executables , 2009 .

[38]  Igor Santos,et al.  Collective classification for spam filtering , 2013 .

[39]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[40]  Heng Yin,et al.  Renovo: a hidden code extractor for packed executables , 2007, WORM '07.

[41]  Arvinder Kaur,et al.  Comparative analysis of regression and machine learning methods for predicting fault proneness models , 2009, Int. J. Comput. Appl. Technol..

[42]  Wenke Lee,et al.  PolyUnpack: Automating the Hidden-Code Extraction of Unpack-Executing Malware , 2006, 2006 22nd Annual Computer Security Applications Conference (ACSAC'06).