Binary Debloating for Security via Demand Driven Loading

Modern software systems heavily use C/C++ based libraries. Because of the weak memory model of C/C++, libraries may suffer from vulnerabilities which can expose the applications to potential attacks. For example, a very large number of return oriented programming gadgets exist in glibc that allow stitching together semantically valid but malicious Turing-complete programs. In spite of significant advances in attack detection and mitigation, full defense is unrealistic against an ever-growing set of possibilities for generating such malicious programs. In this work, we create a defense mechanism by debloating libraries to reduce the dynamic functions linked so that the possibilities of constructing malicious programs diminishes significantly. The key idea is to locate each library call site within an application, and in each case to load only the set of library functions that will be used at that call site. This approach of demand-driven loading relies on an input-aware oracle that predicts a near-exact set of library functions needed at a given call site during the execution. The predicted functions are loaded just in time, and the complete call chain (of function bodies) inside the library is purged after returning from the library call back into the application. We present a decision-tree based predictor, which acts as an oracle, and an optimized runtime system, which works directly with library binaries like GNU libc and libstdc++. We show that on average, the proposed scheme cuts the exposed code surface of libraries by 97.2%, reduces ROP gadgets present in linked libraries by 97.9%, achieves a prediction accuracy in most cases of at least 97%, and adds a small runtime overhead of 18% on all libraries (16% for glibc, 2% for others) across all benchmarks of SPEC 2006, suggesting this scheme is practical.

[1]  Edith Schonberg,et al.  Finding low-utility data structures , 2010, PLDI '10.

[2]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[3]  Rajiv Kapoor,et al.  Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[4]  Martín Abadi,et al.  Control-flow integrity , 2005, CCS '05.

[5]  Dan Boneh,et al.  Hacking Blind , 2014, 2014 IEEE Symposium on Security and Privacy.

[6]  Ben Niu,et al.  Modular control-flow integrity , 2014, PLDI.

[7]  Jun Xu,et al.  Non-Control-Data Attacks Are Realistic Threats , 2005, USENIX Security Symposium.

[8]  Jeannette M. Wing,et al.  An Attack Surface Metric , 2011, IEEE Transactions on Software Engineering.

[9]  Matthew Arnold,et al.  Software bloat analysis: finding, removing, and preventing performance problems in modern large-scale object-oriented applications , 2010, FoSER '10.

[10]  Nicholas Nethercote,et al.  Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.

[11]  Hovav Shacham,et al.  The geometry of innocent flesh on the bone: return-into-libc without function calls (on the x86) , 2007, CCS '07.

[12]  Xuxian Jiang,et al.  Mitigating code-reuse attacks with control-flow locking , 2011, ACSAC '11.

[13]  Edith Schonberg,et al.  Four Trends Leading to Java Runtime Bloat , 2010, IEEE Software.

[14]  Per Larsen,et al.  Opaque Control-Flow Integrity , 2015, NDSS.

[15]  William R. Harris,et al.  Enforcing Unique Code Target Property for Control-Flow Integrity , 2018, CCS.

[16]  Zhendong Su,et al.  Randomized stress-testing of link-time optimizers , 2015, ISSTA.

[17]  Ben Niu,et al.  Per-Input Control-Flow Integrity , 2015, CCS.

[18]  Xi Chen,et al.  A Tough Call: Mitigating Advanced Code-Reuse Attacks at the Binary Level , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[19]  Mingwei Zhang,et al.  Control Flow Integrity for COTS Binaries , 2013, USENIX Security Symposium.

[20]  Saumya K. Debray,et al.  Profile-guided code compression , 2002, PLDI '02.

[21]  Eric Bodden,et al.  Analyzing the Gadgets - Towards a Metric to Measure Gadget Quality , 2016, ESSoS.

[22]  Qin Zhao,et al.  Practical memory checking with Dr. Memory , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[23]  Tibor Gyimóthy,et al.  Survey of code-size reduction methods , 2003, CSUR.

[24]  David A. Wagner,et al.  Control-Flow Bending: On the Effectiveness of Control-Flow Integrity , 2015, USENIX Security Symposium.

[25]  Mayur Naik,et al.  Effective Program Debloating via Reinforcement Learning , 2018, CCS.

[26]  Tao Zhang,et al.  Using Branch Correlation to Identify Infeasible Paths for Anomaly Detection , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[27]  Wiem Tounsi,et al.  A survey on technical threat intelligence in the age of sophisticated cyber attacks , 2018, Comput. Secur..

[28]  Guoqing Xu,et al.  Finding reusable data structures , 2012, OOPSLA '12.

[29]  Milo M. K. Martin,et al.  CETS: compiler enforced temporal safety for C , 2010, ISMM '10.

[30]  Vikram S. Adve,et al.  KCoFI: Complete Control-Flow Integrity for Commodity Operating System Kernels , 2014, 2014 IEEE Symposium on Security and Privacy.

[31]  Xuxian Jiang,et al.  On the Expressiveness of Return-into-libc Attacks , 2011, RAID.

[32]  Atanas Rountev,et al.  Detecting inefficiently-used containers to avoid bloat , 2010, PLDI '10.

[33]  Nick Mitchell,et al.  The causes of bloat, the limits of health , 2007, OOPSLA.

[34]  Koen De Bosschere,et al.  Link-time binary rewriting techniques for program compaction , 2005, TOPL.

[35]  Fan Long,et al.  Control Jujutsu: On the Weaknesses of Fine-Grained Control Flow Integrity , 2015, CCS.

[36]  Trent Jaeger,et al.  GRIFFIN: Guarding Control Flows Using Intel Processor Trace , 2017, ASPLOS.

[37]  Michael Franz,et al.  Slim binaries , 1997, CACM.

[38]  Koen De Bosschere,et al.  alto: a link-time optimizer for the Compaq Alpha , 2001, Softw. Pract. Exp..

[39]  Bjorn De Sutter,et al.  Compiler techniques for code compaction , 2000, TOPL.

[40]  Chao Zhang,et al.  Practical Control Flow Integrity and Randomization for Binary Executables , 2013, 2013 IEEE Symposium on Security and Privacy.

[41]  Yutao Liu,et al.  Transparent and Efficient CFI Enforcement with Intel Processor Trace , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[42]  Lok-Kwong Yan,et al.  Debloating Software through Piece-Wise Compilation and Loading , 2018, USENIX Security Symposium.

[43]  Zhenkai Liang,et al.  Jump-oriented programming: a new class of code-reuse attack , 2011, ASIACCS '11.

[44]  Ahmad-Reza Sadeghi,et al.  Just-In-Time Code Reuse: On the Effectiveness of Fine-Grained Address Space Layout Randomization , 2013, 2013 IEEE Symposium on Security and Privacy.

[45]  Lawrence Spracklen,et al.  Evaluating the correspondence between training and reference workloads in SPEC CPU2006 , 2007, CARN.

[46]  Edith Schonberg,et al.  Making Sense of Large Heaps , 2009, ECOOP.

[47]  George Candea,et al.  Code-pointer integrity , 2014, OSDI.

[48]  Irene Zhang,et al.  Optimizing VM Checkpointing for Restore Performance in VMware ESXi , 2013, USENIX Annual Technical Conference.

[49]  Koen De Bosschere,et al.  Link-time compaction and optimization of ARM executables , 2007, TECS.