Mining malware secrets

Malware analysts, besides being tasked to create signatures, are also called upon to generate indicators of compromise, to disrupt botnets, to attribute an attack to an actor, and to understand the adversary's intent. This requires extracting from malware a variety of secrets, aka threat intelligence. After studying a few samples from a malware family and locating where its secrets are embedded, analysts create rules that may be used to automatically extract threat intelligence from malware variants in the future. Rules to extract secrets from malware are today written as regular expressions over bytecodes, such as using Yara. These rules are easily invalidated by polymorphic variants or evolutionary versions. Keeping the rules updated is a maintenance challenge for malware analysts. Instead of using bytecode, we present the use of code semantics to create rules to extract malware secrets. The semantics of code captures the effect of instructions on the registers and memory. Rules written using the structure of the symbolic content of registers and memory, instead of bytecode, are more resilient to code transformation and evolutionary changes, and are thus less brittle and easier to maintain.

[1]  Priya Narasimhan,et al.  Binary Function Clustering Using Semantic Hashes , 2012, 2012 11th International Conference on Machine Learning and Applications.

[2]  Amr M. Youssef,et al.  On the analysis of the Zeus botnet crimeware toolkit , 2010, 2010 Eighth International Conference on Privacy, Security and Trust.

[3]  Arun Lakhotia,et al.  Fast location of similar code fragments using semantic 'juice' , 2013, PPREW '13.

[4]  Xiaohong Su,et al.  Using Reduced Execution Flow Graph to Identify Library Functions in Binary Code , 2016, IEEE Transactions on Software Engineering.

[5]  Debin Gao,et al.  BinHunt: Automatically Finding Semantic Differences in Binary Programs , 2008, ICICS.

[6]  Arun Lakhotia,et al.  Abstract Symbolic Automata: Mixed syntactic/semantic similarity analysis of executables , 2015, POPL.

[7]  Paul Black,et al.  Anti-analysis trends in banking malware , 2016, 2016 11th International Conference on Malicious and Unwanted Software (MALWARE).

[8]  Somesh Jha,et al.  A semantics-based approach to malware detection , 2008, TOPL.

[9]  Arun Lakhotia,et al.  VirusBattle: State-of-the-art malware analysis for better cyber threat intelligence , 2014, 2014 7th International Symposium on Resilient Control Systems (ISRCS).

[10]  Johannes Bader,et al.  A Comprehensive Measurement Study of Domain Generating Malware , 2016, USENIX Security Symposium.

[11]  T. Dullien,et al.  Graph-based comparison of Executable Objects ( English Version ) , 2005 .

[12]  Andrew Walenstein,et al.  The Software Similarity Problem in Malware Analysis , 2006, Duplication, Redundancy, and Similarity in Software.