Predicting the Resilience of Obfuscated Code Against Symbolic Execution Attacks via Machine Learning

Software obfuscation transforms code such that it is more difficult to reverse engineer. However, it is known that given enough resources, an attacker will successfully reverse engineer an obfuscated program. Therefore, an open challenge for software obfuscation is estimating the time an obfuscated program is able to withstand a given reverse engineering attack. This paper proposes a general framework for choosing the most relevant software features to estimate the effort of automated attacks. Our framework uses these software features to build regression models that can predict the resilience of different software protection transformations against automated attacks. To evaluate the effectiveness of our approach, we instantiate it in a case-study about predicting the time needed to deobfuscate a set of C programs, using an attack based on symbolic execution. To train regression models our system requires a large set of programs as input. We have therefore implemented a code generator that can generate large numbers of arbitrarily complex random C functions. Our results show that features such as the number of community structures in the graphrepresentation of symbolic path-constraints, are far more relevant for predicting deobfuscation time than other features generally used to measure the potency of controlflow obfuscation (e.g. cyclomatic complexity). Our best model is able to predict the number of seconds of symbolic execution-based deobfuscation attacks with over 90% accuracy for 80% of the programs in our dataset, which also includes several realistic hash functions.

[1]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[2]  Rabih Mohsen,et al.  Evaluating Obfuscation Security: A Quantitative Approach , 2015, FPS.

[3]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[4]  Christian S. Collberg,et al.  The Obfuscation Executive , 2004, ISC.

[5]  Krzysztof Czarnecki,et al.  SATGraf: Visualizing the Evolution of SAT Formula Structure in Solvers , 2015, SAT.

[6]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[7]  Alexander Pretschner,et al.  Idea: Benchmarking Indistinguishability Obfuscation - A Candidate Implementation , 2015, ESSoS.

[8]  K. Pearson VII. Note on regression and inheritance in the case of two parents , 1895, Proceedings of the Royal Society of London.

[9]  Stefan Katzenbeisser,et al.  Protecting Software through Obfuscation , 2016, ACM Comput. Surv..

[10]  Xiaohong Su,et al.  Identifying and Understanding Self-Checksumming Defenses in Software , 2015, CODASPY.

[11]  Christian S. Collberg,et al.  Distributed application tamper detection via continuous software updates , 2012, ACSAC '12.

[12]  M. Preda Code Obfuscation and Malware Detection by Abstract Interpretation , 2007 .

[13]  Andrew Blyth,et al.  An empirical examination of the reverse engineering process for binary files , 2006, Comput. Secur..

[14]  Saumya K. Debray,et al.  Deobfuscation: reverse engineering obfuscated code , 2005, 12th Working Conference on Reverse Engineering (WCRE'05).

[15]  Angelos D. Keromytis,et al.  Smashing the Gadgets: Hindering Return-Oriented Programming Using In-place Code Randomization , 2012, 2012 IEEE Symposium on Security and Privacy.

[16]  Eda Sevim Barlak Feature selection using genetic algorithms , 2007 .

[17]  Myra B. Cohen,et al.  An orchestrated survey of methodologies for automated software test case generation , 2013, J. Syst. Softw..

[18]  Alex J. Malozemoff,et al.  Implementing Cryptographic Program Obfuscation , 2014, IACR Cryptol. ePrint Arch..

[19]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[20]  Sebastian Fischmeister,et al.  Impact of Community Structure on SAT Solver Performance , 2014, SAT.

[21]  Anna Philippou,et al.  Tools and Algorithms for the Construction and Analysis of Systems , 2018, Lecture Notes in Computer Science.

[22]  Paolo Falcarin,et al.  A large study on the effect of code obfuscation on the quality of java code , 2015, Empirical Software Engineering.

[23]  Alexander Pretschner,et al.  A Framework for Measuring Software Obfuscation Resilience against Automated Attacks , 2015, 2015 IEEE/ACM 1st International Workshop on Software Protection.

[24]  Toby Walsh,et al.  The Constrainedness of Search , 1996, AAAI/IAAI, Vol. 1.

[25]  Ilya Mironov,et al.  Applications of SAT Solvers to Cryptanalysis of Hash Functions , 2006, SAT.

[26]  Vu Nguyen,et al.  Improved size and effort estimation models for software maintenance , 2010, 2010 IEEE International Conference on Software Maintenance.

[27]  Daniel Kroening,et al.  A Tool for Checking ANSI-C Programs , 2004, TACAS.

[28]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[29]  Christian S. Collberg,et al.  A Taxonomy of Obfuscating Transformations , 1997 .

[30]  Koen De Bosschere,et al.  Program obfuscation: a quantitative approach , 2007, QoP '07.

[31]  Christopher Krügel,et al.  Firmalice - Automatic Detection of Authentication Bypass Vulnerabilities in Binary Firmware , 2015, NDSS.

[32]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[33]  Zne-Jung Lee,et al.  Parameter determination of support vector machine and feature selection using simulated annealing approach , 2008, Appl. Soft Comput..

[34]  Yuichiro Kanzaki,et al.  Code Artificiality: A Metric for the Code Stealth Based on an N-Gram Model , 2015, 2015 IEEE/ACM 1st International Workshop on Software Protection.

[35]  Jonathon T. Giffin,et al.  Impeding Malware Analysis Using Conditional Code Obfuscation , 2008, NDSS.

[36]  Nikolaj Bjørner,et al.  Z3: An Efficient SMT Solver , 2008, TACAS.

[37]  Non eXcutable PAYLOAD ALREADY INSIDE : DATA REUSE FOR ROP EXPLOITS , 2010 .

[38]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[39]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[40]  Saumya Debray,et al.  A Generic Approach to Automatic Deobfuscation of Executable Code , 2015, 2015 IEEE Symposium on Security and Privacy.

[41]  Matthew T. Karnick,et al.  A QUALITATIVE ANALYSIS OF JAVA OBFUSCATION , 2006 .

[42]  C. Furlanello,et al.  Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products , 2006 .

[43]  Alexander Pretschner,et al.  Code obfuscation against symbolic execution attacks , 2016, ACSAC.

[44]  Carsten Sinz,et al.  LLBMC: Bounded Model Checking of C and C++ Programs Using a Compiler IR , 2012, VSTTE.

[45]  Boaz Barak,et al.  Hopes, fears, and software obfuscation , 2016, Commun. ACM.

[46]  Dawson R. Engler,et al.  KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.