Low-overhead multi-language dynamic taint analysis on managed runtimes through speculative optimization

Dynamic taint analysis (DTA) is a popular program analysis technique with applications to diverse fields such as software vulnerability detection and reverse engineering. It consists of marking sensitive data as tainted and tracking its propagation at runtime. While DTA has been implemented on top of many different analysis platforms, these implementations generally incur significant slowdown from taint propagation. Since a purely dynamic analysis cannot predict which instructions will operate on tainted values at runtime, programs have to be fully instrumented for taint propagation even when they never actually observe tainted values. We propose leveraging speculative optimizations to reduce slowdown on the peak performance of programs instrumented for DTA on a managed runtime capable of dynamic compilation. In this paper, we investigate how speculative optimizations can reduce the peak performance impact of taint propagation on programs executed on a managed runtime. We also explain how a managed runtime can implement DTA to be amenable to such optimizations. We implemented our ideas in TruffleTaint, a DTA platform which supports both dynamic languages like JavaScript and languages like C and C++ which are typically compiled statically. We evaluated TruffleTaint on several benchmarks from the popular Computer Language Benchmarks Game and SPECint 2017 benchmark suites. Our evaluation shows that TruffleTaint is often able to avoid slowdown entirely when programs do not operate on tainted data, and that it exhibits slowdown of on average ∼2.10x and up to ∼5.52x when they do, which is comparable to state-of-the-art taint analysis platforms optimized for performance.

[1]  Jacob West,et al.  Dynamic taint propagation: Finding vulnerabilities without attacking , 2008, Inf. Secur. Tech. Rep..

[2]  Helen J. Wang,et al.  Tupni: automatic reverse engineering of input formats , 2008, CCS.

[3]  Andrew Warfield,et al.  Practical taint-based protection using demand emulation , 2006, EuroSys.

[4]  Michael Haupt,et al.  Fast, Flexible, Polyglot Instrumentation Support for Debuggers and other Tools , 2018, Art Sci. Eng. Program..

[5]  Kevin W. Hamlen,et al.  Compiler-instrumented, Dynamic Secret-Redaction of Legacy Processes for Attacker Deception , 2015, USENIX Security Symposium.

[6]  David Brumley,et al.  All You Ever Wanted to Know about Dynamic Taint Analysis and Forward Symbolic Execution (but Might Have Been Afraid to Ask) , 2010, 2010 IEEE Symposium on Security and Privacy.

[7]  Hanspeter Mössenböck,et al.  Dominance-based duplication simulation (DBDS): code duplication to enable compiler optimizations , 2018, CGO.

[8]  James Newsome,et al.  Dynamic Taint Analysis for Automatic Detection, Analysis, and SignatureGeneration of Exploits on Commodity Software , 2005, NDSS.

[9]  Hanspeter Mössenböck,et al.  Partial Escape Analysis and Scalar Replacement for Java , 2014, CGO '14.

[10]  Christian Wimmer,et al.  Practical partial evaluation for high-performance dynamic language runtimes , 2017, PLDI.

[11]  Frank Tip,et al.  Platform-Independent Dynamic Taint Analysis for JavaScript , 2020, IEEE Transactions on Software Engineering.

[12]  Tzi-cker Chiueh,et al.  A General Dynamic Information Flow Tracking Framework for Security Applications , 2006, 2006 22nd Annual Computer Security Applications Conference (ACSAC'06).

[13]  Christopher Krügel,et al.  Automatic Network Protocol Analysis , 2008, NDSS.

[14]  Hanspeter Mössenböck,et al.  An object storage model for the truffle language implementation framework , 2014, PPPJ '14.

[15]  Ben Stock,et al.  25 million flows later: large-scale detection of DOM-based XSS , 2013, CCS.

[16]  Daniel Kroening,et al.  The Taint Rabbit: Optimizing Generic Taint Analysis with Dynamic Fast Path Generation , 2020, AsiaCCS.

[17]  Hanspeter Mössenböck,et al.  Sulong, and Thanks for All the Bugs: Finding Errors in C Programs by Abstracting from the Native Execution Model , 2018, ASPLOS.

[18]  Benjamin Livshits,et al.  Dynamic Taint Tracking in Managed Runtimes , 2012 .

[19]  Per Larsen,et al.  Information flow tracking meets just-in-time compilation , 2013, TACO.

[20]  Jun Cai,et al.  SwordDTA: A dynamic taint analysis tool for software vulnerability detection , 2016, Wuhan University Journal of Natural Sciences.

[21]  Thomas Würthinger,et al.  An Optimization-Driven Incremental Inline Substitution Algorithm for Just-in-Time Compilers , 2019, 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[22]  Cheng Wang,et al.  LIFT: A Low-Overhead Practical Information Flow Tracking System for Detecting Security Attacks , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[23]  Herbert Bos,et al.  Minemu: The World's Fastest Taint Tracker , 2011, RAID.

[24]  Stephen McCamant,et al.  DTA++: Dynamic Taint Analysis with Targeted Control-Flow Propagation , 2011, NDSS.

[25]  Shan Huang,et al.  Efficient Taint Analysis with Taint Behavior Summary , 2011, 2011 Third International Conference on Communications and Mobile Computing.

[26]  Hanspeter Mössenböck,et al.  Applying Optimizations for Dynamically-typed Languages to Java , 2017, ManLang.

[27]  Alessandro Orso,et al.  Penumbra: automatically identifying failure-relevant inputs using dynamic tainting , 2009, ISSTA.

[28]  Arnar Birgisson,et al.  JSFlow: tracking information flow in JavaScript and its APIs , 2014, SAC.

[29]  Angelos D. Keromytis,et al.  A General Approach for Efficiently Accelerating Software-based Dynamic Data Flow Tracking on Commodity Hardware , 2012, NDSS.

[30]  Angelos D. Keromytis,et al.  libdft: practical dynamic data flow tracking for commodity systems , 2012, VEE '12.

[31]  David A. Wagner,et al.  Efficient character-level taint tracking for Java , 2009, SWS '09.

[32]  Gail E. Kaiser,et al.  Phosphor: illuminating dynamic data flow in commodity jvms , 2014, OOPSLA.

[33]  Heng Yin,et al.  Make it work, make it right, make it fast: building a platform-neutral whole-system dynamic binary analysis platform , 2014, ISSTA 2014.

[34]  Hanspeter Mössenböck,et al.  Fast-path loop unrolling of non-counted loops to enable subsequent compiler optimizations , 2018, ManLang '18.

[35]  Alessandro Orso,et al.  Dytan: a generic dynamic taint analysis framework , 2007, ISSTA '07.

[36]  Christian Wimmer,et al.  One VM to rule them all , 2013, Onward!.

[37]  Saumya K. Debray,et al.  Code Specialization Based on Value Profiles , 2000, SAS.

[38]  Steven S. Muchnick,et al.  Advanced Compiler Design and Implementation , 1997 .

[39]  BrunthalerStefan,et al.  Information flow tracking meets just-in-time compilation , 2013 .

[40]  Hanspeter Mössenböck,et al.  Multi-language dynamic taint analysis in a polyglot virtual machine , 2020, MPLR.

[41]  Heng Yin,et al.  DECAF++: Elastic Whole-System Dynamic Taint Analysis , 2019, RAID.

[42]  Jun He,et al.  A guided fuzzing approach for security testing of network protocol software , 2015, 2015 6th IEEE International Conference on Software Engineering and Service Science (ICSESS).