Precise Analysis of String Expressions

We perform static analysis of Java programs to answer a simple question: which values may occur as results of string expressions? The answers are summarized for each expression by a regular language that is guaranteed to contain all possible values. We present several applications of this analysis, including statically checking the syntax of dynamically generated expressions, such as SQL queries. Our analysis constructs flow graphs from class files and generates a context-free grammar with a nonterminal for each string expression. The language of this grammar is then widened into a regular language through a variant of an algorithm previously used for speech recognition. The collection of resulting regular languages is compactly represented as a special kind of multi-level automaton from which individual answers may be extracted. If a program error is detected, examples of invalid strings are automatically produced. We present extensive benchmarks demonstrating that the analysis is efficient and produces results of useful precision.

[1]  Flemming Nielson,et al.  Principles of Program Analysis , 1999, Springer Berlin Heidelberg.

[2]  Benjamin C. Pierce,et al.  XDuce: A Typed XML Processing Language (Preliminary Report) , 2000, WebDB.

[3]  Aske Simon Christensen,et al.  Static Analysis for Dynamic XML , 2002 .

[4]  Olivier Danvy A New One-Pass Transformation into Monadic Normal Form , 2003, CC.

[5]  Hubert Comon,et al.  Tree automata techniques and applications , 1997 .

[6]  Olivier Danvy,et al.  Tagging, Encoding, and Jones Optimality , 2003, ESOP.

[7]  Alexander Aiken,et al.  Introduction to Set Constraint-Based Program Analysis , 1999, Sci. Comput. Program..

[8]  Thomas W. Reps,et al.  Program analysis via graph reachability , 1997, Inf. Softw. Technol..

[9]  Nils Klarlund,et al.  Document Structure Description 1.0 , 2000 .

[10]  David A. Wagner,et al.  This copyright notice must be included in the reproduced paper. USENIX acknowledges all trademarks herein. Detecting Format String Vulnerabilities with Type Qualifiers , 2001 .

[11]  Rajesh Parekh,et al.  DFA Learning from Simple Examples , 2001 .

[12]  Patrick Cousot,et al.  Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints , 1977, POPL.

[13]  Benjamin C. Pierce,et al.  Xduce: a typed xml processing language , 1997 .

[14]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[15]  Vladimiro Sassone,et al.  Deriving Bisimulation Congruences: 2-Categories Vs Precategories , 2003, FoSSaCS.

[16]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[17]  Steven J. DeRose,et al.  XML Path Language (XPath) , 1999 .

[18]  Akinori Yonezawa,et al.  Regular Expression Types for Strings in a Text Processing Language , 2002, Electron. Notes Theor. Comput. Sci..

[19]  Glynn Winskel,et al.  HOPLA-A Higher-Order Process Language , 2002, CONCUR.

[20]  Laurie Hendren,et al.  Soot---a java optimization framework , 1999 .

[21]  Rajesh Parekh,et al.  Learning DFA from Simple Examples , 1997, Machine Learning.

[22]  Mark-Jan Nederhof,et al.  Regular Approximation of Context-Free Grammars through Transformation , 2001 .

[23]  S. Srinivasa Rao,et al.  Computing Refined Buneman Trees in Cubic Time , 2002, WABI.

[24]  Michael Rodeh,et al.  Cleanness Checking of String Manipulations in C Programs via Integer Analysis , 2001, SAS.

[25]  C. Crépeau,et al.  On the Computational Collapse of Quantum Information , 2003 .

[26]  Aske Simon Christensen,et al.  Extending Java for high-level Web service construction , 2002, TOPL.