Khaos: The Impact of Inter-procedural Code Obfuscation on Binary Diffing Techniques

Software obfuscation techniques can prevent binary diffing techniques from locating vulnerable code by obfuscating the third-party code, to achieve the purpose of protecting embedded device software. With the rapid development of binary diffing techniques, they can achieve more and more accurate function matching and identification by extracting the features within the function. This makes existing software obfuscation techniques, which mainly focus on the intra-procedural code obfuscation, no longer effective. In this paper, we propose a new inter-procedural code obfuscation mechanism Khaos, which moves the code across functions to obfuscate the function by using compilation optimizations. Two obfuscation primitives are proposed to separate and aggregate the function, which are called fission and fusion respectively. A prototype of Khaos is implemented based on the LLVM compiler and evaluated on a large number of real-world programs including SPEC CPU 2006 & 2017, CoreUtils, JavaScript engines, etc. Experimental results show that Khaos outperforms existing code obfuscations and can significantly reduce the accuracy rates of five state-of-the-art binary diffing techniques (less than 19%) with lower runtime overhead (less than 7%).

[1]  Yuanyuan Yuan,et al.  Enhancing DNN-Based Binary Code Function Search With Low-Cost Equivalence Checking , 2023, IEEE Transactions on Software Engineering.

[2]  Hao Wang,et al.  jTrans: jump-aware transformer for binary code similarity detection , 2022, ISSTA.

[3]  Xiangyu Zhang,et al.  Generating Effective Software Obfuscation Sequences With Reinforcement Learning , 2022, IEEE Transactions on Dependable and Secure Computing.

[4]  Jiang Ming,et al.  Unleashing the hidden power of compiler optimization on binary code difference: an empirical study , 2021, PLDI.

[5]  Qinghua Zheng,et al.  Interpretation-Enabled Software Reuse Detection Based on a Multi-level Birthmark Model , 2021, 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE).

[6]  Xiapu Luo,et al.  ATVHunter: Reliable Version Detection of Third-Party Libraries for Vulnerability Identification in Android Applications , 2021, 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE).

[7]  Jiang Ming,et al.  PatchScope: Memory Object Centric Patch Diffing , 2020, CCS.

[8]  Zhengzi Xu,et al.  Patch based vulnerability matching for binary programs , 2020, ISSTA.

[9]  Yang Liu,et al.  Accurate and Scalable Cross-Architecture Cross-OS Binary Code Search with Emulation , 2019, IEEE Transactions on Software Engineering.

[10]  Irfan Ul Haq,et al.  A Survey of Binary Code Similarity , 2019, ACM Comput. Surv..

[11]  Jean-Yves Marion,et al.  How to kill symbolic deobfuscation for free (or: unleashing the potential of path-oriented protections) , 2019, ACSAC.

[12]  Benjamin C. M. Fung,et al.  Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[13]  Giuseppe Antonio Di Luna,et al.  SAFE: Self-Attentive Function Embeddings for Binary Similarity , 2018, DIMVA.

[14]  Chao Zhang,et al.  $\alpha$ Diff: Cross-Version Binary Code Similarity Detection with DNN , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[15]  Yu Jiang,et al.  VulSeeker: A Semantic Learning Based Vulnerability Seeker for Cross-Platform Binary , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[16]  Juanru Li,et al.  BinMatch: A Semantics-Based Hybrid Approach on Binary Code Clone Analysis , 2018, 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[17]  Xiaopeng Li,et al.  Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs , 2018, NDSS.

[18]  Michael R. Lyu,et al.  Manufacturing Resilient Bi-Opaque Predicates Against Symbolic Execution , 2018, 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[19]  Xiaojiang Chen,et al.  Enhance virtual-machine-based code obfuscation security through dynamic bytecode scheduling , 2018, Comput. Secur..

[20]  Eran Yahav,et al.  FirmUp: Precise Static Detection of Common Vulnerabilities in Firmware , 2018, ASPLOS.

[21]  Lingyu Wang,et al.  FOSSIL: A Resilient and Efficient System for Identifying FOSS Functions in Malware Binaries , 2018, ACM Trans. Priv. Secur..

[22]  Dinghao Wu,et al.  In-memory fuzzing for binary code similarity analysis , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[23]  Le Song,et al.  Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection , 2018 .

[24]  Yi Zhou,et al.  Understanding the Mirai Botnet , 2017, USENIX Security Symposium.

[25]  Eran Yahav,et al.  Similarity of binaries through re-optimization , 2017, PLDI.

[26]  Jiang Ming,et al.  Cryptographic Function Detection in Obfuscated Binaries via Bit-Precise Symbolic Loop Mapping , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[27]  Juanru Li,et al.  Binary Code Clone Detection across Architectures and Compiling Configurations , 2017, 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC).

[28]  Yang Liu,et al.  SPAIN: Security Patch Analysis for Binaries towards Understanding the Pain and Pills , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[29]  Mu Zhang,et al.  Extracting Conditional Formulas for Cross-Platform Bug Search , 2017, AsiaCCS.

[30]  Amr M. Youssef,et al.  BinSequence: Fast, Accurate and Scalable Binary Code Reuse Detection , 2017, AsiaCCS.

[31]  Mariano Ceccato,et al.  Automatic generation of opaque constants based on the k-clique problem for resilient data obfuscation , 2017, 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[32]  Alexander Pretschner,et al.  Code obfuscation against symbolic execution attacks , 2016, ACSAC.

[33]  Yang Liu,et al.  BinGo: cross-architecture cross-OS binary search , 2016, SIGSOFT FSE.

[34]  Heng Yin,et al.  Scalable Graph-based Bug Search for Firmware Images , 2016, CCS.

[35]  Yepang Liu,et al.  Taming Android fragmentation: Characterizing and detecting compatibility issues for Android apps , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[36]  Eran Yahav,et al.  Statistical similarity of binaries , 2016, PLDI.

[37]  Stefan Katzenbeisser,et al.  Protecting Software through Obfuscation , 2016, ACM Comput. Surv..

[38]  Juanru Li,et al.  Cross-Architecture Binary Semantics Understanding via Similar Code Comparison , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[39]  Arvind Narayanan,et al.  When Coding Style Survives Compilation: De-anonymizing Programmers from Executable Binaries , 2015, NDSS.

[40]  Li Wang,et al.  LOOP: Logic-Oriented Opaque Predicate Detection in Obfuscated Binary Code , 2015, CCS.

[41]  Davide Balzarotti,et al.  SoK: Deep Packer Inspection: A Longitudinal Study of the Complexity of Run-Time Packers , 2015, 2015 IEEE Symposium on Security and Privacy.

[42]  Leyla Bilge,et al.  The Attack of the Clones: A Study of the Impact of Shared Code on Vulnerability Patching , 2015, 2015 IEEE Symposium on Security and Privacy.

[43]  Christian Rossow,et al.  Cross-architecture bug search in binary executables , 2015, 2015 IEEE Symposium on Security and Privacy.

[44]  Pascal Junod,et al.  Obfuscator-LLVM -- Software Protection for the Masses , 2015, 2015 IEEE/ACM 1st International Workshop on Software Protection.

[45]  Christian Rossow,et al.  Leveraging semantic signatures for bug search in binary programs , 2014, ACSAC.

[46]  Sencun Zhu,et al.  Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection , 2014, SIGSOFT FSE.

[47]  Wanlei Zhou,et al.  Control Flow-Based Malware VariantDetection , 2014, IEEE Transactions on Dependable and Secure Computing.

[48]  Yaniv David,et al.  Tracelet-based code search in executables , 2014, PLDI.

[49]  Jonathan M. Smith,et al.  Low-fat pointers: compact encoding and efficient gate-level implementation of fat pointers for spatial safety and capability-based security , 2013, CCS.

[50]  Barton P. Miller,et al.  Binary-code obfuscations in prevalent packer tools , 2013, CSUR.

[51]  Kang G. Shin,et al.  MutantX-S: Scalable Malware Clustering Based on Static Features , 2013, USENIX Annual Technical Conference.

[52]  Andy King,et al.  BinSlayer: accurate comparison of binary executables , 2013, PPREW '13.

[53]  Priya Narasimhan,et al.  Binary Function Clustering Using Semantic Hashes , 2012, 2012 11th International Conference on Machine Learning and Applications.

[54]  Christian S. Collberg,et al.  Distributed application tamper detection via continuous software updates , 2012, ACSAC '12.

[55]  David Brumley,et al.  ReDeBug: Finding Unpatched Code Clones in Entire OS Distributions , 2012, 2012 IEEE Symposium on Security and Privacy.

[56]  Christopher Krügel,et al.  A survey on automated dynamic malware-analysis techniques and tools , 2012, CSUR.

[57]  David Brumley,et al.  BitShred: feature hashing malware for scalable triage and semantic analysis , 2011, CCS '11.

[58]  Jonathon T. Giffin,et al.  Automatic Reverse Engineering of Malware Emulators , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[59]  Wenke Lee,et al.  Ether: malware analysis via hardware virtualization extensions , 2008, CCS.

[60]  Helen J. Wang,et al.  SubVirt: implementing malware with virtual machines , 2006, 2006 IEEE Symposium on Security and Privacy (S&P'06).

[61]  Saumya K. Debray,et al.  Obfuscation of executable code to improve resistance to static disassembly , 2003, CCS '03.

[62]  Christian S. Collberg,et al.  Breaking abstractions and unstructuring data structures , 1998, Proceedings of the 1998 International Conference on Computer Languages (Cat. No.98CB36225).

[63]  Mechthild Stoer,et al.  A simple min-cut algorithm , 1997, JACM.

[64]  Carey Nachenberg,et al.  Computer virus-antivirus coevolution , 1997, Commun. ACM.

[65]  Zoe Chen,et al.  An Invisible Insider Threat: The Risks of Implanted Medical Devices in Secure Spaces , 2020 .

[66]  Xuezixiang Li,et al.  Learning Program-Wide Code Representations for Binary Diffing , 2019, NDSS.

[67]  Moritz Contag,et al.  Syntia: Synthesizing the Semantics of Obfuscated Code , 2017, USENIX Security Symposium.

[68]  Jiang Ming,et al.  BinSim: Trace-based Semantic Binary Diffing via System Call Sliced Segment Equivalence Checking , 2017, USENIX Security Symposium.

[69]  Khaled Yakdan,et al.  discovRE: Efficient Cross-Architecture Identification of Bugs in Binary Code , 2016, NDSS.

[70]  Salvatore J. Stolfo,et al.  When Firmware Modifications Attack: A Case Study of Embedded Exploitation , 2013, NDSS.

[71]  Robert E. Tarjan,et al.  A fast algorithm for finding dominators in a flowgraph , 1979, TOPL.

[72]  Robert B. Allan,et al.  On domination and independent domination numbers of a graph , 1978, Discret. Math..

[73]  Anthony Ralston,et al.  Encyclopedia of Computer Science , 1971 .