Static Program Analysis for String Manipulation Languages

In recent years, dynamic languages, such as JavaScript or Python, have been increasingly used in a wide range of fields and applications. Their tricky and misunderstood behaviors pose a hard challenge for static analysis of these programming languages. A key aspect of any dynamic language program is the multiple usage of strings, since they can be implicitly converted to another type value, transformed by string-to-code primitives or used to access an object-property. Unfortunately, string analyses for dynamic languages still lack precision and do not take into account some important string features. Moreover, string obfuscation is very popular in the context of dynamic language malicious code, for example, to hide code information inside strings and then to dynamically transform strings into executable code. In this scenario, more precise string analyses become a necessity. This paper is placed in the context of static string analysis by abstract interpretation and proposes a new semantics for string analysis, placing a first step for handling dynamic languages string features.

[1]  Flemming Nielson,et al.  A Parametric Abstract Domain for Lattice-Valued Regular Expressions , 2016, SAS.

[2]  Hyeonseung Im,et al.  Precise and scalable static analysis of jQuery using a regular expression domain , 2016, DLS.

[3]  Thomas W. Reps,et al.  Recency-Abstraction for Heap-Allocated Storage , 2006, SAS.

[4]  Emanuele Rodaro,et al.  State Complexity of Prefix, Suffix, Bifix and Infix Operators on Regular Languages , 2010, Developments in Language Theory.

[5]  Anthony Widjaja Lin,et al.  String solving with word equations and transducers: towards a logic for analysing mutation XSS , 2015, POPL.

[6]  Jeffrey Shallit,et al.  Minimal Covers of Formal Languages , 2001, Developments in Language Theory.

[7]  Patrick Cousot,et al.  Abstract Interpretation Frameworks , 1992, J. Log. Comput..

[8]  Roberto Giacobazzi,et al.  Program Analysis Is Harder Than Verification: A Computability Perspective , 2018, CAV.

[9]  Parosh Aziz Abdulla,et al.  String Constraints for Verification , 2014, CAV.

[10]  Roberto Giacobazzi,et al.  Making abstract interpretations complete , 2000, JACM.

[11]  Roberto Giacobazzi,et al.  Incompleteness, Counterexamples, and Refinements in Abstract Model-Checking , 2001, SAS.

[12]  Kyung-Goo Doh,et al.  A Practical String Analyzer by the Widening Approach , 2006, APLAS.

[13]  Tevfik Bultan,et al.  Widening Arithmetic Automata , 2004, CAV.

[14]  Sukyoung Ryu,et al.  SAFE: Formal Specification and Implementation of a Scalable Analysis Framework for ECMAScript , 2012 .

[15]  Philipp Rümmer,et al.  String constraints with concatenation and transducers solved efficiently , 2017, Proc. ACM Program. Lang..

[16]  Ben Hardekopf,et al.  JSAI: a static analysis platform for JavaScript , 2014, SIGSOFT FSE.

[17]  Simon Holm Jensen,et al.  Remedying the eval that men do , 2012, ISSTA 2012.

[18]  Jan Kofron,et al.  Framework for Static Analysis of PHP Applications , 2015, ECOOP.

[19]  Agostino Cortesi,et al.  M-String Segmentation: A Refined Abstract Domain for String Analysis in C Programs , 2018, 2018 International Symposium on Theoretical Aspects of Software Engineering (TASE).

[20]  Sukyoung Ryu,et al.  Scalable and Precise Static Analysis of JavaScript Applications via Loop-Sensitivity , 2015, ECOOP.

[21]  Sergio Maffeis,et al.  Abstract Domains for Type Juggling , 2017, Electron. Notes Theor. Comput. Sci..

[22]  Roberto Giacobazzi,et al.  Making abstract models complete † , 2014, Mathematical Structures in Computer Science.

[23]  A Pnueli,et al.  Two Approaches to Interprocedural Data Flow Analysis , 2018 .

[24]  Ahmed Bouajjani,et al.  Abstract Regular Model Checking , 2004, CAV.

[25]  Cesare Tinelli,et al.  An efficient SMT solver for string constraints , 2016, Formal Methods Syst. Des..

[26]  Oscar H. Ibarra,et al.  Symbolic String Verification: An Automata-Based Approach , 2008, SPIN.

[27]  Peter J. Stuckey,et al.  Combining String Abstract Domains for JavaScript Analysis: An Evaluation , 2017, TACAS.

[28]  Nicolas Halbwachs,et al.  Automatic discovery of linear restraints among variables of a program , 1978, POPL.

[29]  Agostino Cortesi,et al.  A suite of abstract domains for static analysis of string values , 2015, Softw. Pract. Exp..

[30]  Tayssir Touili,et al.  Antichain-Based Universality and Inclusion Testing over Nondeterministic Finite Tree Automata , 2008, CIAA.

[31]  Peter Thiemann,et al.  Type Analysis for JavaScript , 2009, SAS.

[32]  Sheng Yu,et al.  The State Complexities of Some Basic Operations on Regular Languages , 1994, Theor. Comput. Sci..

[33]  Patrick Cousot,et al.  A parametric segmentation functor for fully automatic and scalable array content analysis , 2011, POPL '11.

[34]  Wei Xu,et al.  The power of obfuscation techniques in malicious JavaScript code: A measurement study , 2012, 2012 7th International Conference on Malicious and Unwanted Software.

[35]  Rajeev Alur,et al.  Visibly pushdown languages , 2004, STOC '04.

[36]  Martin Kutrib,et al.  Determination of finite automata accepting subregular languages , 2009, Theor. Comput. Sci..

[37]  H. B. Enderton Review: Martin D. Davis, Ron Sigal, Elaine J. Weyuker, Computability, Complexity, and Languages. Fundamentals of Theoretical Computer Science , 1996 .

[38]  Patrick Cousot,et al.  Types as abstract interpretations , 1997, POPL '97.

[39]  Parosh Aziz Abdulla,et al.  Norn: An SMT Solver for String Constraints , 2015, CAV.

[40]  Andrei Paun,et al.  An Efficient Algorithm for Constructing Minimal Cover Automata for Finite Languages , 2002, Int. J. Found. Comput. Sci..

[41]  Antoine Miné,et al.  Static Value Analysis of Python Programs by Abstract Interpretation , 2018, NFM.

[42]  Patrick Cousot,et al.  Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints , 1977, POPL.

[43]  Patrick Cousot,et al.  Comparing the Galois Connection and Widening/Narrowing Approaches to Abstract Interpretation , 1992, PLILP.

[44]  Koushik Sen,et al.  The Good, the Bad, and the Ugly: An Empirical Study of Implicit Type Conversions in JavaScript , 2015, ECOOP.