SEA: String Executability Analysis by Abstract Interpretation

Dynamic languages often employ reflection primitives to turn dynamically generated text into executable code at run-time. These features make standard static analysis extremely hard if not impossible because its essential data structures, i.e., the control-flow graph and the system of recursive equations associated with the program to analyse, are themselves dynamically mutating objects. We introduce SEA, an abstract interpreter for automatic sound string executability analysis of dynamic languages employing bounded (i.e, finitely nested) reflection and dynamic code generation. Strings are statically approximated in an abstract domain of finite state automata with basic operations implemented as symbolic transducers. SEA combines standard program analysis together with string executability analysis. The analysis of a call to reflection determines a call to the same abstract interpreter over a code which is synthesised directly from the result of the static string executability analysis at that program point. The use of regular languages for approximating dynamically generated code structures allows SEA to soundly approximate safety properties of self modifying programs yet maintaining efficiency. Soundness here means that the semantics of the code synthesised by the analyser to resolve reflection over-approximates the semantics of the code dynamically built at run-rime by the program at that point.

[1]  Margus Veanes,et al.  BEK: Modeling Imperative String Operations with Symbolic Transducers , 2010 .

[2]  Simon Holm Jensen,et al.  Remedying the eval that men do , 2012, ISSTA 2012.

[3]  Mira Mezini,et al.  Taming reflection: Aiding static analysis in the presence of reflection and custom class loaders , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[4]  Olivier Danvy,et al.  Intensions and extensions in a reflective tower , 1988, LISP and Functional Programming.

[5]  Benjamin Livshits,et al.  Fast and Precise Sanitizer Analysis with BEK , 2011, USENIX Security Symposium.

[6]  Jan Vitek,et al.  The Eval That Men Do - A Large-Scale Study of the Use of Eval in JavaScript Applications , 2011, ECOOP.

[7]  Fabio Massacci,et al.  StaDynA: Addressing the Problem of Dynamic Code Updates in the Security Analysis of Android Applications , 2015, CODASPY.

[8]  Avik Chaudhuri,et al.  Dynamic inference of static types for ruby , 2011, POPL '11.

[9]  Roberto Giacobazzi,et al.  Abductive Analysis of Modular Logic Programs , 1994, J. Log. Comput..

[10]  Nikolaj Bjørner,et al.  Symbolic finite state transducers: algorithms and applications , 2012, POPL '12.

[11]  Peter Thiemann Grammar-based analysis of string expressions , 2005, TLDI '05.

[12]  Aske Simon Christensen,et al.  Precise Analysis of String Expressions , 2003, SAS.

[13]  Sencun Zhu,et al.  STILL: Exploit Code Detection via Static Taint and Initialization Analyses , 2008, 2008 Annual Computer Security Applications Conference (ACSAC).

[14]  Brian Cantwell Smith,et al.  Reflection and semantics in LISP , 1984, POPL.

[15]  Patrick Cousot,et al.  Systematic design of program analysis frameworks , 1979, POPL.

[16]  Dawson R. Engler,et al.  A few billion lines of code later , 2010, Commun. ACM.

[17]  Peter Thiemann,et al.  Type Analysis for JavaScript , 2009, SAS.

[18]  Koen De Bosschere,et al.  A Model for Self-Modifying Code , 2006, Information Hiding.

[19]  David Gregg,et al.  Static analysis of dynamic scripting languages , 2009 .

[20]  Janusz A. Brzozowski,et al.  Derivatives of Regular Expressions , 1964, JACM.

[21]  Yasuhiko Minamide,et al.  Static approximation of dynamically generated Web pages , 2005, WWW '05.

[22]  Patrick Cousot,et al.  Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints , 1977, POPL.

[23]  Bart Preneel,et al.  A taxonomy of self-modifying code for obfuscation , 2011, Comput. Secur..

[24]  Sorin Lerner,et al.  Staged information flow for javascript , 2009, PLDI '09.

[25]  Zhong Shao,et al.  Certified self-modifying code , 2007, PLDI '07.

[26]  Fang Yu,et al.  Patching vulnerabilities with sanitization synthesis , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[27]  Roberto Giacobazzi,et al.  Unveiling metamorphism by abstract interpretation of code properties , 2015, Theor. Comput. Sci..

[28]  Mitchell Wand,et al.  The mystery of the tower revealed: A nonreflective description of the reflective tower , 1988, LISP Symb. Comput..

[29]  David A. Schmidt,et al.  Abstract Parsing: Static Analysis of Dynamically Generated String Output Using LR-Parsing Technology , 2009, SAS.

[30]  David A. Schmidt,et al.  Static Validation of Dynamically Generated HTML Documents Based on Abstract Parsing and Semantic Processing , 2013, SAS.

[31]  Patrick Cousot,et al.  Formal language, grammar and set-constraint-based program analysis by abstract interpretation , 1995, FPCA '95.

[32]  Roberto Giacobazzi,et al.  Completeness in Approximate Transduction , 2016, SAS.

[33]  Joxan Jaffar,et al.  Set Constraints and Set-Based Analysis , 1994, PPCP.

[34]  Patrick Cousot,et al.  Constructive design of a hierarchy of semantics of a transition system by abstract interpretation , 2002, MFPS.

[35]  Alexander Aiken,et al.  Static Detection of Security Vulnerabilities in Scripting Languages , 2006, USENIX Security Symposium.

[36]  Arnaud Venet,et al.  Automatic Analysis of Pointer Aliasing for Untyped Programs , 1999, Sci. Comput. Program..