Semi-supervised learning for packed executable detection

The term malware is coined to name any software with malicious intentions. One of the methods malware writers use for hiding their creations is executable packing. Packing consists of encrypting or hiding the real code of the executable in such a way that it is decrypted or unhidden in its execution. Widespread solutions to this issue first try to identify the packer used and next apply the corresponding unpacking routine for each packing algorithm. As it happens with malware obfuscations, this approach fails to detect new and custom packers. Generic unpacking is a technique that has been proposed to solve this issue. These methods usually execute the binary in a contained environment or sandbox to retrieve the real code of the packed executable. Because these approaches incur in a high performance overhead, a filter step is required to determine whether an executable is packed or not. Supervised machine-learning approaches have been proposed to handle this filtering step. However, the usefulness of supervised learning is far to be complete because it requires a high amount of packed and not packed executables to be identified and labelled previously. In this paper, we propose a new method for packed executable detection that adopts a well-known semi-supervised learning approach to reduce the labelling requirements of completely supervised approaches. We performed an empirical validation demonstrating that the labelling efforts are lower than when supervised learning is used while the system maintains high accuracy rates.

[1]  Wenke Lee,et al.  PolyUnpack: Automating the Hidden-Code Extraction of Unpack-Executing Malware , 2006, 2006 22nd Annual Computer Security Applications Conference (ACSAC'06).

[2]  Wenke Lee,et al.  Rotalumè: A Tool for Automatic Reverse Engineering of Malware Emulators , 2009 .

[3]  Robert Lyda,et al.  Using Entropy Analysis to Find Encrypted and Packed Malware , 2007, IEEE Security & Privacy.

[4]  Arvinder Kaur,et al.  Comparative analysis of regression and machine learning methods for predicting fault proneness models , 2009, Int. J. Comput. Appl. Technol..

[5]  Rolf Rolles,et al.  Unpacking Virtualization Obfuscators , 2009, WOOT.

[6]  Vinod Yegneswaran,et al.  Eureka: A Framework for Enabling Static Malware Analysis , 2008, ESORICS.

[7]  Christopher Kruegel Proceedings of the 2007 ACM workshop on Recurring malcode , 2007, CCS 2007.

[8]  Guofei Gu,et al.  Using an Ensemble of One-Class SVM Classifiers to Harden Payload-based Anomaly Detection Systems , 2006, Sixth International Conference on Data Mining (ICDM'06).

[9]  Wenke Lee,et al.  Classification of packed executables for accurate computer virus detection , 2008, Pattern Recognit. Lett..

[10]  Stephen R. Garner,et al.  WEKA: The Waikato Environment for Knowledge Analysis , 1996 .

[11]  Muhammad Zubair Shafiq,et al.  PE-Miner: Mining Structural Information to Detect Malicious Executables in Realtime , 2009, RAID.

[12]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[13]  Komal Babar,et al.  Generic unpacking techniques , 2009, 2009 2nd International Conference on Computer, Control and Communication.

[14]  J. Kent Information gain and a general measure of correlation , 1983 .

[15]  Somesh Jha,et al.  OmniUnpack: Fast, Generic, and Safe Unpacking of Malware , 2007, Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007).

[16]  Jonathon T. Giffin,et al.  Automatic Reverse Engineering of Malware Emulators , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[17]  Heng Yin,et al.  Renovo: a hidden code extractor for packed executables , 2007, WORM '07.

[18]  Wenke Lee,et al.  McBoost: Boosting Scalability in Malware Collection and Analysis Using Statistical Classification of Executables , 2008, 2008 Annual Computer Security Applications Conference (ACSAC).

[19]  Igor Santos,et al.  Structural Feature Based Anomaly Detection for Packed Executable Identification , 2011, CISIS.

[20]  S. Momina Tabish,et al.  PE-Probe: Leveraging Packer Detection and Structural Information to Detect Malicious Portable Executables , 2009 .

[21]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.