Identifying Dormant Functionality in Malware Programs

To handle the growing flood of malware, security vendors and analysts rely on tools that automatically identify and analyze malicious code. Current systems for automated malware analysis typically follow a dynamic approach, executing an unknown program in a controlled environment (sandbox) and recording its runtime behavior. Since dynamic analysis platforms directly run malicious code, they are resilient to popular malware defense techniques such as packing and code obfuscation. Unfortunately, in many cases, only a small subset of all possible malicious behaviors is observed within the short time frame that a malware sample is executed. To mitigate this issue, previous work introduced techniques such as multi-path or forced execution to increase the coverage of dynamic malware analysis. Unfortunately, using these techniques is potentially expensive, as the number of paths that require analysis can grow exponentially. In this paper, we propose Reanimator, a novel solution to determine the capabilities (malicious functionality) of malware programs. Our solution is based on the insight that we can leverage behavior observed while dynamically executing a specific malware sample to identify similar functionality in other programs. More precisely, when we observe malicious actions during dynamic analysis, we automatically extract and model the parts of the malware binary that are responsible for this behavior. We then leverage these models to check whether similar code is present in other samples. This allows us to statically identify dormant functionality (functionality that is not observed during dynamic analysis) in malicious programs. We evaluate our approach on thousands of real-world malware samples, and we show that our system is successful in identifying additional, malicious functionality. As a result, our approach can significantly improve the coverage of malware analysis results.

[1]  Christopher Krügel,et al.  Polymorphic Worm Detection Using Structural Information of Executables , 2005, RAID.

[2]  Somesh Jha,et al.  Semantics-aware malware detection , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).

[3]  Daniel J. Quinlan,et al.  Detecting code clones in binary executables , 2009, ISSTA.

[4]  James Newsome,et al.  Polygraph: automatically generating signatures for polymorphic worms , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).

[5]  Daniel Shawcross Wilkerson,et al.  Winnowing: local algorithms for document fingerprinting , 2003, SIGMOD '03.

[6]  Engin Kirda,et al.  Insights into current malware behavior , 2009 .

[7]  Jonathon T. Giffin,et al.  Automatic Reverse Engineering of Malware Emulators , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[8]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[9]  Zhenkai Liang,et al.  AGIS: Towards automatic generation of infection signatures , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[10]  Jonathon T. Giffin,et al.  Impeding Malware Analysis Using Conditional Code Obfuscation , 2008, NDSS.

[11]  Somesh Jha,et al.  A Layered Architecture for Detecting Malicious Behaviors , 2008, RAID.

[12]  Joseph Robert Horgan,et al.  Dynamic program slicing , 1990, PLDI '90.

[13]  Tzi-cker Chiueh,et al.  A Forced Sampled Execution Approach to Kernel Rootkit Identification , 2007, RAID.

[14]  B. Karp,et al.  Autograph: Toward Automated, Distributed Worm Signature Detection , 2004, USENIX Security Symposium.

[15]  Eric Filiol,et al.  Malware Behavioral Detection by Attribute-Automata Using Abstraction from Platform and Language , 2009, RAID.

[16]  Zhenkai Liang,et al.  Automatically Identifying Trigger-based Behavior in Malware , 2008, Botnet Detection.

[17]  Somesh Jha,et al.  Mining specifications of malicious behavior , 2008, ISEC '08.

[18]  Enrique V. Carrera,et al.  Digital genome mapping: ad-vanced binary malware analysis , 2004 .

[19]  Wenke Lee,et al.  PolyUnpack: Automating the Hidden-Code Extraction of Unpack-Executing Malware , 2006, 2006 22nd Annual Computer Security Applications Conference (ACSAC'06).

[20]  Zhendong Su,et al.  DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones , 2007, 29th International Conference on Software Engineering (ICSE'07).

[21]  Patrice Godefroid,et al.  Automated Whitebox Fuzz Testing , 2008, NDSS.

[22]  Somesh Jha,et al.  OmniUnpack: Fast, Generic, and Safe Unpacking of Malware , 2007, Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007).

[23]  Christopher Krügel,et al.  Exploring Multiple Execution Paths for Malware Analysis , 2007, 2007 IEEE Symposium on Security and Privacy (SP '07).

[24]  Jens Krinke,et al.  Identifying similar code with program dependence graphs , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[25]  Yuanyuan Zhou,et al.  CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code , 2004, OSDI.

[26]  Brenda S. Baker,et al.  On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[27]  Mattia Monga,et al.  Detecting Self-mutating Malware Using Control-Flow Graph Matching , 2006, DIMVA.

[28]  Farnam Jahanian,et al.  PolyPack: an automated online packing service for optimal antivirus evasion , 2009 .

[29]  Heng Yin,et al.  Renovo: a hidden code extractor for packed executables , 2007, WORM '07.

[30]  Kang G. Shin,et al.  Large-scale malware indexing using function-call graphs , 2009, CCS.

[31]  Somesh Jha,et al.  Static Analysis of Executables to Detect Malicious Patterns , 2003, USENIX Security Symposium.

[32]  Stefan Katzenbeisser,et al.  Detecting Malicious Code by Model Checking , 2005, DIMVA.

[33]  Koushik Sen,et al.  DART: directed automated random testing , 2005, PLDI '05.

[34]  Somesh Jha,et al.  A semantics-based approach to malware detection , 2007, POPL '07.

[35]  Dawson R. Engler,et al.  EXE: automatically generating inputs of death , 2006, CCS '06.