One Engine To Serve 'em All: Inferring Taint Rules Without Architectural Semantics

Dynamic binary taint analysis has wide applications in the security analysis of commercial-off-the-shelf (COTS) binaries. One of the key challenges in dynamic binary analysis is to specify the taint rules that capture how taint information propagates for each instruction on an architecture. Most of the existing solutions specify taint rules using a deductive approach by summarizing the rules manually after analyzing the instruction semantics. Intuitively, taint propagation reflects on how an instruction input affects its output, and thus can be observed from instruction executions. In this work, we propose an inductive method for taint propagation and develop a universal taint tracking engine that is architecture-agnostic. Our taint engine, TAINTINDUCE, can learn taint rules with minimal architectural knowledge by observing the execution behavior of instructions. To measure its correctness and guide taint rule generation, we define the precise notion of soundness for bit-level taint tracking in this novel setup. In our evaluation, we show that TAINTINDUCE automatically learns rules for 4 widely used architectures: x86, x64, AArch64, and MIPS-I. It can detect vulnerabilities for 24 CVEs in 15 applications on both Linux and Windows over millions of instructions and is comparable with other mature existing tools (TEMU [51], libdft [32], Triton [42]). TAINTINDUCE can be used as a stand-alone taint engine or be used to complement existing taint engines for unhandled instructions. Further, it can be used as a cross-referencing tool to uncover bugs in taint engines, emulation implementations and ISA documentations.

[1]  Niranjan Hasabnis,et al.  Extracting instruction semantics via symbolic execution of code generators , 2016, SIGSOFT FSE.

[2]  Barton P. Miller,et al.  Binary-code obfuscations in prevalent packer tools , 2013, CSUR.

[3]  Heng Yin,et al.  Make it work, make it right, make it fast: building a platform-neutral whole-system dynamic binary analysis platform , 2014, ISSTA 2014.

[4]  James Newsome,et al.  Dynamic Taint Analysis for Automatic Detection, Analysis, and SignatureGeneration of Exploits on Commodity Software , 2005, NDSS.

[5]  Angelos D. Keromytis,et al.  libdft: practical dynamic data flow tracking for commodity systems , 2012, VEE '12.

[6]  R. Sekar,et al.  On the Limits of Information Flow Techniques for Malware Analysis and Containment , 2008, DIMVA.

[7]  Alexander Aiken,et al.  Stratified synthesis: automatically learning the x86-64 instruction set , 2016, PLDI.

[8]  Dawn Xiaodong Song,et al.  TaintEraser: protecting sensitive data leaks using application-level taint tracking , 2011, OPSR.

[9]  Andrew C. Myers,et al.  Language-based information-flow security , 2003, IEEE J. Sel. Areas Commun..

[10]  Herbert Bos,et al.  VUzzer: Application-aware Evolutionary Fuzzing , 2017, NDSS.

[11]  Patrice Godefroid Higher-order test generation , 2011, PLDI '11.

[12]  Peter J. Denning,et al.  Certification of programs for secure information flow , 1977, CACM.

[13]  Alessandro Orso,et al.  Dytan: a generic dynamic taint analysis framework , 2007, ISSTA '07.

[14]  Heng Yin,et al.  Panorama: capturing system-wide information flow for malware detection and analysis , 2007, CCS '07.

[15]  William K. Robertson,et al.  LAVA: Large-Scale Automated Vulnerability Addition , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[16]  Zhenkai Liang,et al.  Towards Generating High Coverage Vulnerability-Based Signatures with Protocol-Level Constraint-Guided Exploration , 2009, RAID.

[17]  Heng Yin TEMU: Binary Code Analysis via Whole-System Layered Annotative Execution , 2010 .

[18]  J. Meseguer,et al.  Security Policies and Security Models , 1982, 1982 IEEE Symposium on Security and Privacy.

[19]  David A. Wagner,et al.  This copyright notice must be included in the reproduced paper. USENIX acknowledges all trademarks herein. Detecting Format String Vulnerabilities with Type Qualifiers , 2001 .

[20]  Guofei Gu,et al.  TaintScope: A Checksum-Aware Directed Fuzzing Tool for Automatic Software Vulnerability Detection , 2010, 2010 IEEE Symposium on Security and Privacy.

[21]  R. Sekar,et al.  Efficient fine-grained binary instrumentationwith applications to taint-tracking , 2008, CGO '08.

[22]  David Brumley,et al.  BAP: A Binary Analysis Platform , 2011, CAV.

[23]  Herbert Bos,et al.  Pointless tainting?: evaluating the practicality of pointer tainting , 2009, EuroSys '09.

[24]  Martin C. Rinard,et al.  Taint-based directed whitebox fuzzing , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[25]  Tal Garfinkel,et al.  Understanding data lifetime via whole system simulation , 2004 .

[26]  Qin Zhao,et al.  Transparent dynamic instrumentation , 2012, VEE '12.

[27]  Stephen McCamant,et al.  Loop-extended symbolic execution on binary programs , 2009, ISSTA.

[28]  Manu Sridharan,et al.  TAJ: effective taint analysis of web applications , 2009, PLDI '09.

[29]  Christopher Krügel,et al.  Cross Site Scripting Prevention with Dynamic Data Tainting and Static Analysis , 2007, NDSS.

[30]  R. Sekar An Efficient Black-box Technique for Defeating Web Application Attacks , 2009, NDSS.

[31]  Tiziano Villa,et al.  Complexity of two-level logic minimization , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[32]  Zhenkai Liang,et al.  BitBlaze: A New Approach to Computer Security via Binary Analysis , 2008, ICISS.

[33]  Stephen McCamant,et al.  Quantitative information flow as network flow capacity , 2008, PLDI '08.

[34]  Christopher Krügel,et al.  Prospex: Protocol Specification Extraction , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[35]  Stephen McCamant,et al.  DTA++: Dynamic Taint Analysis with Targeted Control-Flow Propagation , 2011, NDSS.

[36]  David Brumley,et al.  All You Ever Wanted to Know about Dynamic Taint Analysis and Forward Symbolic Execution (but Might Have Been Afraid to Ask) , 2010, 2010 IEEE Symposium on Security and Privacy.

[37]  Robert K. Brayton,et al.  ESPRESSO-SIGNATURE: A New Exact Minimizer for Logic Functions , 1993, 30th ACM/IEEE Design Automation Conference.

[38]  Wei Xu,et al.  Taint-Enhanced Policy Enforcement: A Practical Approach to Defeat a Wide Range of Attacks , 2006, USENIX Security Symposium.

[39]  Andreas Zeller,et al.  Detecting information flow by mutating input data , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[40]  Benjamin C. Pierce,et al.  Explicit Secrecy: A Policy for Taint Tracking , 2016, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[41]  Julian Schütte,et al.  AppCaulk: Data Leak Prevention by Injecting Targeted Taint Tracking into Android Apps , 2014, 2014 IEEE 13th International Conference on Trust, Security and Privacy in Computing and Communications.

[42]  Brendan Dolan-Gavitt,et al.  Repeatable Reverse Engineering with PANDA , 2015, PPREW@ACSAC.

[43]  Niranjan Hasabnis,et al.  Lifting Assembly to Intermediate Representation: A Novel Approach Leveraging Compilers , 2016, ASPLOS.

[44]  Abhik Roychoudhury,et al.  Coverage-Based Greybox Fuzzing as Markov Chain , 2016, IEEE Transactions on Software Engineering.

[45]  Nripendra N. Biswas,et al.  Minimization of Boolean Functions , 1971, IEEE Transactions on Computers.

[46]  Fabrice Bellard,et al.  QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX ATC, FREENIX Track.

[47]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[48]  Ankur Taly,et al.  Automated synthesis of symbolic instruction encodings from I/O samples , 2012, PLDI.