Static analysis of dynamic scripting languages

Scripting languages, such as PHP, are among the most widely used and fastest growing programming languages, particularly for web applications. Static analysis is an important tool for detecting security flaws, finding bugs, and improving compilation of programs. However, static analysis of scripting languages is difficult due to features found in languages such as PHP. These features include run-time code generation, dynamic weak typing, dynamic aliasing, implicit object and array creation, and overloading of simple operators. We find that as a result, simple analysis techniques such as SSA and def-use chains are not straight-forward to use, and that a single unconstrained variable can ruin our analysis. In this paper we describe a static analyser for PHP, and show how classical static analysis techniques can be extended to analyse PHP. In particular our analysis combines alias analysis, type-inference and constantpropagation for PHP, computing results that are essential for other analyses and optimizations. We find that this combination of techniques allows the generation of meaningful and useful results from our static analysis. 1. Motivation In recent years the importance of dynamic scripting languages — such as PHP, Python, Ruby and Javascript — has grown as they are used for an increasing amount of software development. Scripting languages provide high-level language features, a fast compilemodify-test environment for rapid prototyping, strong integration with database and web development systems, and extensive standard libraries. PHP powers many of the most popular web applications such as Facebook, Wikipedia and Yahoo. In general, there is a trend towards writing an increasing amount of an application in a scripting language rather than in a traditional programming language, not least to avoid the complexity of crossing between

[1]  Jeffrey S. Foster,et al.  Profile-guided static typing for dynamic scripting languages , 2009, OOPSLA 2009.

[2]  Barbara G. Ryder,et al.  A safe approximate algorithm for interprocedural aliasing , 1992, PLDI '92.

[3]  Jong-Deok Choi,et al.  Stack allocation and synchronization optimizations for Java using escape analysis , 2003, TOPL.

[4]  Julie-Marie Foss,et al.  Web Application Security , 2005 .

[5]  Klaus Ostermann,et al.  Nominal and Structural Subtyping in Component-Based Programming , 2008, J. Object Technol..

[6]  Edsko de Vries,et al.  Design and Implementation of a PHP Compiler Front-end , 2007 .

[7]  David Gregg,et al.  A practical solution for scripting language compilers , 2009, SAC '09.

[8]  Steven S. Muchnick,et al.  Advanced Compiler Design and Implementation , 1997 .

[9]  Zhendong Su,et al.  Sound and precise analysis of web applications for injection vulnerabilities , 2007, PLDI '07.

[10]  Alexander Aiken,et al.  Static Detection of Security Vulnerabilities in Scripting Languages , 2006, USENIX Security Symposium.

[11]  C. Amza,et al.  Specification and implementation of dynamic Web site benchmarks , 2002, 2002 IEEE International Workshop on Workload Characterization.

[12]  Christopher Krügel,et al.  Precise alias analysis for static detection of web application vulnerabilities , 2006, PLAS '06.

[13]  Michiaki Tatsubori,et al.  Copy-on-write in the PHP language , 2009, POPL '09.

[14]  Keith D. Cooper,et al.  Engineering a Compiler , 2003 .

[15]  Ole Agesen The Cartesian Product Algorithm: Simple and Precise Type Inference Of Parametric Polymorphism , 1995, ECOOP.

[16]  Amer Diwan,et al.  Type-based alias analysis , 1998, PLDI.

[17]  Kwang-Moo Choe,et al.  Points-to analysis for JavaScript , 2009, SAC '09.

[18]  David Grove,et al.  Optimization of Object-Oriented Programs Using Static Class Hierarchy Analysis , 1995, ECOOP.

[19]  Jong-Deok Choi,et al.  Flow-Insensitive Interprocedural Alias Analysis in the Presence of Pointers , 1994, LCPC.

[20]  Peter Thiemann,et al.  Type Analysis for JavaScript , 2009, SAS.

[21]  Raymond Lo,et al.  Effective Representation of Aliases and Indirect Memory Operations in SSA Form , 1996, CC.

[22]  Benjamin C. Pierce,et al.  Types and programming languages: the next generation , 2003, 18th Annual IEEE Symposium of Logic in Computer Science, 2003. Proceedings..

[23]  Christopher Krügel,et al.  Pixy: a static analysis tool for detecting Web application vulnerabilities , 2006, 2006 IEEE Symposium on Security and Privacy (S&P'06).

[24]  Jong-Deok Choi,et al.  Interprocedural pointer alias analysis , 1999, TOPL.

[25]  Laurie J. Hendren,et al.  Context-sensitive interprocedural points-to analysis in the presence of function pointers , 1994, PLDI '94.

[26]  D. T. Lee,et al.  Securing web application code by static analysis and runtime protection , 2004, WWW '04.