Optimizing symbolic execution for malware behavior classification

Abstract Increasingly software correctness, reliability, and security is being analyzed using tools that combine various formal and heuristic approaches. Often such analysis becomes expensive in terms of time and at the cost of high quality results. In this experience report we explore the tuning and optimization of the tools underlying binary malware detection and classification. We identify heuristics and SMT solver tactics for the effective symbolic execution of binary files. We combine these with effective heuristics for the construction of behavioral signatures of programs that can be used for a supervised learning multi-class malware classifier. Further, a set of experiments following the full-factorial design allowed us to identify the correlations between heuristics and the overall performance of the classifier.

[1]  Qinghua Zheng,et al.  Frequent Subgraph Based Familial Classification of Android Malware , 2016, 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE).

[2]  Vijay Laxmi,et al.  SPADE: Signature based PAcker DEtection , 2012, SecurIT '12.

[3]  Shing-Chow Chan,et al.  Open64 compiler infrastructure for emerging multicore/manycore architecture All Symposium Tutorial , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[4]  Joris Kinable,et al.  Malware classification based on call graph clustering , 2010, Journal in Computer Virology.

[5]  Yong Fan,et al.  AMDroid: Android Malware Detection Using Function Call Graphs , 2019, 2019 IEEE 19th International Conference on Software Quality, Reliability and Security Companion (QRS-C).

[6]  Youssef Hamadi,et al.  Efficiently solving quantified bit-vector formulas , 2010, Formal Methods in Computer Aided Design.

[7]  Jiang Ming,et al.  Cryptographic Function Detection in Obfuscated Binaries via Bit-Precise Symbolic Loop Mapping , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[8]  Steven Gianvecchio,et al.  Mimimorphism: a new approach to binary code obfuscation , 2010, CCS '10.

[9]  David Brumley,et al.  Unleashing Mayhem on Binary Code , 2012, 2012 IEEE Symposium on Security and Privacy.

[10]  Patrick Cousot,et al.  Varieties of Static Analyzers: A Comparison with ASTREE , 2007, First Joint IEEE/IFIP Symposium on Theoretical Aspects of Software Engineering (TASE '07).

[11]  Arun Lakhotia,et al.  Using engine signature to detect metamorphic malware , 2006, WORM '06.

[12]  Christopher Krügel,et al.  Effective and Efficient Malware Detection at the End Host , 2009, USENIX Security Symposium.

[13]  Weilin Luo,et al.  WAP: SAT-Based Computation of Minimal Cut Sets , 2017, 2017 IEEE 28th International Symposium on Software Reliability Engineering (ISSRE).

[14]  Thomas Given-Wilson,et al.  GoannaSMT — a static analyzer with SMT-based refinement , 2012 .

[15]  Christian S. Collberg,et al.  Distributed application tamper detection via continuous software updates , 2012, ACSAC '12.

[16]  Davide Balzarotti,et al.  A Close Look at a Daily Dataset of Malware Samples , 2019, ACM Trans. Priv. Secur..

[17]  Tayssir Touili,et al.  Automatic extraction of malicious behaviors , 2016, 2016 11th International Conference on Malicious and Unwanted Software (MALWARE).

[18]  Jean-Yves Marion,et al.  Backward-Bounded DSE: Targeting Infeasibility Questions on Obfuscated Codes , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[19]  Paul Pettersson,et al.  Experience Report: Evaluating Fault Detection Effectiveness and Resource Efficiency of the Architecture Quality Assurance Framework and Tool , 2017, 2017 IEEE 28th International Symposium on Software Reliability Engineering (ISSRE).

[20]  Josephine Micallef,et al.  Detection of global, metamorphic malware variants using control and data flow analysis , 2012, MILCOM 2012 - 2012 IEEE Military Communications Conference.

[21]  Stephen Mason,et al.  A study of the relationship between antivirus regressions and label changes , 2013, 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE).

[22]  Christopher Krügel,et al.  A survey on automated dynamic malware-analysis techniques and tools , 2012, CSUR.

[23]  M. Zaslavskiy,et al.  A Path Following Algorithm for the Graph Matching Problem , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  George Candea,et al.  S2E: a platform for in-vivo multi-path analysis of software systems , 2011, ASPLOS XVI.

[25]  Zhendong Su,et al.  Steering symbolic execution to less traveled paths , 2013, OOPSLA.

[26]  Johannes Götzfried,et al.  VMAttack: Deobfuscating Virtualization-Based Packed Binaries , 2017, ARES.

[27]  Dawson R. Engler,et al.  RWset: Attacking Path Explosion in Constraint-Based Test Generation , 2008, TACAS.

[28]  Erven Rohou,et al.  On-stack replacement to improve JIT-based obfuscation a preliminary study , 2013, 2013 Second International Japan-Egypt Conference on Electronics, Communications and Computers (JEC-ECC).

[29]  Chris Eagle,et al.  The IDA Pro Book: The Unofficial Guide to the World's Most Popular Disassembler , 2008 .

[30]  Felix C. Freiling,et al.  Toward Automated Dynamic Malware Analysis Using CWSandbox , 2007, IEEE Secur. Priv..

[31]  Hugo Daniel Macedo,et al.  Mining Malware Specifications through Static Reachability Analysis , 2013, ESORICS.

[32]  Minsu Cho,et al.  A Graph Matching Algorithm Using Data-Driven Markov Chain Monte Carlo Sampling , 2010, 2010 20th International Conference on Pattern Recognition.

[33]  Christopher Krügel,et al.  Improving the efficiency of dynamic malware analysis , 2010, SAC '10.

[34]  Saumya Debray,et al.  Symbolic Execution of Obfuscated Code , 2015, CCS.

[35]  Frédéric Saubion,et al.  Towards Automated Strategies in Satisfiability Modulo Theory , 2016, EuroGP.

[36]  Somesh Jha,et al.  Synthesizing Near-Optimal Malware Specifications from Suspicious Behaviors , 2010, 2010 IEEE Symposium on Security and Privacy.

[37]  Koushik Sen,et al.  MultiSE: multi-path symbolic execution using value summaries , 2015, ESEC/SIGSOFT FSE.

[38]  Christopher Krügel,et al.  Driller: Augmenting Fuzzing Through Selective Symbolic Execution , 2016, NDSS.

[39]  Simone Atzeni,et al.  Evaluation of Android Malware Detection Based on System Calls , 2016, IWSPA@CODASPY.

[40]  Per Runeson,et al.  Guidelines for conducting and reporting case study research in software engineering , 2009, Empirical Software Engineering.

[41]  Christopher Krügel,et al.  A quantitative study of accuracy in system call-based malware detection , 2012, ISSTA 2012.

[42]  Michael Hicks,et al.  Automating object transformations for dynamic software updating , 2012, OOPSLA '12.

[43]  Marsha Chechik,et al.  Symbolic optimization with SMT solvers , 2014, POPL.

[44]  Sukumar Nandi,et al.  Obfuscated malware detection using API call dependency , 2012, SecurIT '12.

[45]  Nirwan Ansari,et al.  Revealing Packed Malware , 2008, IEEE Security & Privacy.

[46]  R. Nigel Horspool,et al.  MARD: A Framework for Metamorphic Malware Analysis and Real-Time Detection , 2014, AINA.

[47]  Maximilian Junker,et al.  SMT-Based False Positive Elimination in Static Program Analysis , 2012, ICFEM.

[48]  Sheng Chen,et al.  A malware detection method based on family behavior graph , 2018, Comput. Secur..

[49]  Cristiano Calcagno,et al.  Infer: An Automatic Program Verifier for Memory Safety of C Programs , 2011, NASA Formal Methods.

[50]  Keith D. Cooper,et al.  ACME: adaptive compilation made efficient , 2005, LCTES '05.

[51]  Hao Zhou,et al.  Analysis of Android Malware Family Characteristic Based on Isomorphism of Sensitive API Call Graph , 2017, 2017 IEEE Second International Conference on Data Science in Cyberspace (DSC).

[52]  Nikolaj Bjørner,et al.  Z3: An Efficient SMT Solver , 2008, TACAS.

[53]  Giovanni Denaro,et al.  Worst-Case Execution Time Testing via Evolutionary Symbolic Execution , 2018, 2018 IEEE 29th International Symposium on Software Reliability Engineering (ISSRE).

[54]  Koushik Sen,et al.  Symbolic execution for software testing: three decades later , 2013, CACM.

[55]  Alexander Binder,et al.  Detection of Masqueraders Based on Graph Partitioning of File System Access Events , 2018, 2018 IEEE Security and Privacy Workshops (SPW).

[56]  GurfinkelArie,et al.  Symbolic optimization with SMT solvers , 2014 .

[57]  Debin Gao,et al.  Linear Obfuscation to Combat Symbolic Execution , 2011, ESORICS.

[58]  Leonardo Mendonça de Moura,et al.  The Strategy Challenge in SMT Solving , 2013, Automated Reasoning and Mathematics.

[59]  Jun Wang,et al.  TaintPipe: Pipelined Symbolic Taint Analysis , 2015, USENIX Security Symposium.

[60]  Christoph Csallner,et al.  Mixed-Mode Malware and Its Analysis , 2014, PPREW@ACSAC.

[61]  Xudong Ma,et al.  Dynamic Android Malware Classification Using Graph-Based Representations , 2016, 2016 IEEE 3rd International Conference on Cyber Security and Cloud Computing (CSCloud).

[62]  Tayssir Touili,et al.  Pushdown Model Checking for Malware Detection , 2012, TACAS.

[63]  Md. Rafiqul Islam,et al.  Classification of malware based on integrated static and dynamic features , 2013, J. Netw. Comput. Appl..

[64]  Richard E. Harang,et al.  ALOHA: Auxiliary Loss Optimization for Hypothesis Augmentation , 2019, USENIX Security Symposium.

[65]  Christopher Krügel,et al.  SOK: (State of) The Art of War: Offensive Techniques in Binary Analysis , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[66]  Vladimir A. Zakharov,et al.  On the Concept of Software Obfuscation in Computer Security , 2007, ISC.

[67]  Alexander Pretschner,et al.  Code obfuscation against symbolic execution attacks , 2016, ACSAC.

[68]  Ricardo Dahab,et al.  Practical Evaluation of Static Analysis Tools for Cryptography: Benchmarking Method and Case Study , 2017, 2017 IEEE 28th International Symposium on Software Reliability Engineering (ISSRE).

[69]  Feng Li,et al.  Android Malware Detection via Graphlet Sampling , 2018, IEEE Transactions on Mobile Computing.

[70]  Yanhui Guo,et al.  Malware family classification method based on static feature extraction , 2017, 2017 3rd IEEE International Conference on Computer and Communications (ICCC).

[71]  Alessandro Orso,et al.  Optimizing Constraint Solving to Better Support Symbolic Execution , 2011, 2011 IEEE Fourth International Conference on Software Testing, Verification and Validation Workshops.

[72]  Koushik Sen,et al.  CUTE: a concolic unit testing engine for C , 2005, ESEC/FSE-13.

[73]  Ali Hamzeh,et al.  Metamorphic malware categorization using co-evolutionary algorithm , 2015, 2015 7th Conference on Information and Knowledge Technology (IKT).

[74]  Bülent Yener,et al.  A Survey On Automated Dynamic Malware Analysis Evasion and Counter-Evasion: PC, Mobile, and Web , 2017, ROOTS.

[75]  Douglas S. Reeves,et al.  Fast malware classification by automated behavioral graph matching , 2010, CSIIRW '10.

[76]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[77]  Roberto Baldoni,et al.  A Survey of Symbolic Execution Techniques , 2016, ACM Comput. Surv..

[78]  Ming Xu,et al.  Deep Android Malware Classification with API-Based Feature Graph , 2019, 2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE).

[79]  Mark Vella,et al.  WeXpose: Towards on-line dynamic analysis of web attack payloads using just-in-time binary modification , 2015, 2015 12th International Joint Conference on e-Business and Telecommunications (ICETE).

[80]  Stephen McCamant,et al.  Loop-extended symbolic execution on binary programs , 2009, ISSTA.

[81]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[82]  Axel Legay,et al.  Detection of Mirai by Syntactic and Behavioral Analysis , 2018, 2018 IEEE 29th International Symposium on Software Reliability Engineering (ISSRE).

[83]  Md. Rafiqul Islam,et al.  Differentiating malware from cleanware using behavioural analysis , 2010, 2010 5th International Conference on Malicious and Unwanted Software.

[84]  Jens Myrup Pedersen,et al.  An approach for detection and family classification of malware based on behavioral analysis , 2016, 2016 International Conference on Computing, Networking and Communications (ICNC).

[85]  Carsten Willems,et al.  Learning and Classification of Malware Behavior , 2008, DIMVA.