Twinning automata and regular expressions for string static analysis

In this paper we formalize and prove the soundness of Tarsis, a new abstract domain based on the abstract interpretation theory that approximates string values through finite state automata. The main novelty of Tarsis is that it works over an alphabet of strings instead of single characters. On the one hand, such approach requires a more complex and refined definition of the widening operator, and the abstract semantics of string operators. On the other hand, it is in position to obtain strictly more precise results than than state-of-the-art approaches. We implemented a prototype of Tarsis, and we applied it on some case studies taken from some of the most popular Java libraries manipulating string values. The experimental results confirm that Tarsis is in position to obtain strictly more precise results than existing analyses.

[1]  Agostino Cortesi,et al.  Widening and narrowing operators for abstract interpretation , 2011, Comput. Lang. Syst. Struct..

[2]  Agostino Cortesi,et al.  A suite of abstract domains for static analysis of string values , 2015, Softw. Pract. Exp..

[3]  Agostino Cortesi,et al.  M-String Segmentation: A Refined Abstract Domain for String Analysis in C Programs , 2018, 2018 International Symposium on Theoretical Aspects of Software Engineering (TASE).

[4]  Arun Lakhotia,et al.  Abstract Symbolic Automata: Mixed syntactic/semantic similarity analysis of executables , 2015, POPL.

[5]  Philipp Rümmer,et al.  Decision procedures for path feasibility of string-manipulating programs with complex operations , 2018, Proc. ACM Program. Lang..

[6]  Margus Veanes Applications of Symbolic Finite Automata , 2013, CIAA.

[7]  Parosh Aziz Abdulla,et al.  Efficient handling of string-number conversion , 2020, PLDI.

[8]  Nicolas Halbwachs,et al.  Automatic discovery of linear restraints among variables of a program , 1978, POPL.

[9]  Kyung-Goo Doh,et al.  A Practical String Analyzer by the Widening Approach , 2006, APLAS.

[10]  Agostino Cortesi,et al.  Completeness of Abstract Domains for String Analysis of JavaScript Programs , 2019, ICTAC.

[11]  Patrick Cousot,et al.  Systematic design of program analysis frameworks , 1979, POPL.

[12]  Ahmed Bouajjani,et al.  Abstract Regular Tree Model Checking of Complex Dynamic Data Structures , 2006, SAS.

[13]  Peter J. Stuckey,et al.  Dashed strings for string constraint solving , 2020, Artif. Intell..

[14]  Sergio Maffeis,et al.  Abstract Domains for Type Juggling , 2017, Electron. Notes Theor. Comput. Sci..

[15]  Loris D'Antoni,et al.  Minimization of symbolic automata , 2014, POPL.

[16]  Isabella Mastroeni,et al.  A sound abstract interpreter for dynamic code , 2020, SAC.

[17]  Roberto Giacobazzi,et al.  Making abstract interpretations complete , 2000, JACM.

[18]  Tevfik Bultan,et al.  Widening Arithmetic Automata , 2004, CAV.

[19]  Patrick Cousot,et al.  Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints , 1977, POPL.

[20]  Flemming Nielson,et al.  A Parametric Abstract Domain for Lattice-Valued Regular Expressions , 2016, SAS.

[21]  Xavier Rival,et al.  The trace partitioning abstract domain , 2007, TOPL.

[22]  Patrick Cousot,et al.  Abstract Interpretation Frameworks , 1992, J. Log. Comput..

[23]  Esben Andreasen,et al.  String Analysis for Dynamic Field Access , 2014, CC.

[24]  Ahmed Bouajjani,et al.  Abstract Regular Model Checking , 2004, CAV.

[25]  Parosh Aziz Abdulla,et al.  String Constraints for Verification , 2014, CAV.

[26]  Jie-Hong Roland Jiang,et al.  A Symbolic Model Checking Approach to the Analysis of String and Length Constraints , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[27]  Oscar H. Ibarra,et al.  Automata-based symbolic string analysis for vulnerability detection , 2014, Formal Methods Syst. Des..

[28]  Precise analysis of string expressions , 2003 .

[29]  Isabella Mastroeni,et al.  Abstract Program Slicing: From Theory towards an Implementation , 2010, ICFEM.

[30]  Hyeonseung Im,et al.  Precise and scalable static analysis of jQuery using a regular expression domain , 2016, DLS.

[31]  Isabella Mastroeni,et al.  Static Analysis for ECMAScript String Manipulation Programs , 2020 .

[32]  Nabil Almashfi,et al.  Precise String Domain for Analyzing JavaScript Arrays and Objects , 2020, 2020 3rd International Conference on Information and Computer Technologies (ICICT).

[33]  Parosh Aziz Abdulla,et al.  Chain-Free String Constraints , 2019, ATVA.