Andromeda: Accurate and Scalable Security Analysis of Web Applications

Security auditing of industry-scale software systems mandates automation. Static taint analysis enables deep and exhaustive tracking of suspicious data flows for detection of potential leakage and integrity violations, such as cross-site scripting (XSS), SQL injection (SQLi) and log forging. Research in this area has taken two directions: program slicing and type systems. Both of these approaches suffer from a high rate of false findings, which limits the usability of analysis tools based on these techniques. Attempts to reduce the number of false findings have resulted in analyses that are either (i) unsound, suffering from the dual problem of false negatives, or (ii) too expensive due to their high precision, thereby failing to scale to real-world applications. In this paper, we investigate a novel approach for enabling precise yet scalable static taint analysis. The key observation informing our approach is that taint analysis is a demand-driven problem, which enables lazy computation of vulnerable information flows, instead of eagerly computing a complete data-flow solution, which is the reason for the traditional dichotomy between scalability and precision. We have implemented our approach in Andromeda, an analysis tool that computes data-flow propagations on demand, in an efficient and accurate manner, and additionally features incremental analysis capabilities. Andromeda is currently in use in a commercial product. It supports applications written in Java, .NET and JavaScript. Our extensive evaluation of Andromeda on a suite of 16 production-level benchmarks shows Andromeda to achieve high accuracy and compare favorably to a state-of-the-art tool that trades soundness for precision.

[1]  Yasuhiko Minamide,et al.  Static approximation of dynamically generated Web pages , 2005, WWW '05.

[2]  Shay Artzi,et al.  F4F: taint analysis of framework-based web applications , 2011, OOPSLA '11.

[3]  Andrew P. Black ECOOP 2005 - Object-Oriented Programming, 19th European Conference, Glasgow, UK, July 25-29, 2005, Proceedings , 2005, ECOOP.

[4]  Atanas Rountev,et al.  Demand-driven context-sensitive alias analysis for Java , 2011, ISSTA '11.

[5]  Lars Ole Andersen,et al.  Program Analysis and Specialization for the C Programming Language , 2005 .

[6]  Olivier Tardieu,et al.  Demand-driven pointer analysis , 2001, PLDI '01.

[7]  Marco Pistoia,et al.  Saving the world wide web from vulnerable JavaScript , 2011, ISSTA '11.

[8]  J. Meseguer,et al.  Security Policies and Security Models , 1982, 1982 IEEE Symposium on Security and Privacy.

[9]  David A. Wagner,et al.  This copyright notice must be included in the reproduced paper. USENIX acknowledges all trademarks herein. Detecting Format String Vulnerabilities with Type Qualifiers , 2001 .

[10]  David Grove,et al.  Optimization of Object-Oriented Programs Using Static Class Hierarchy Analysis , 1995, ECOOP.

[11]  Claudio Gutierrez,et al.  Survey of graph database models , 2008, CSUR.

[12]  Patrick Cousot,et al.  Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints , 1977, POPL.

[13]  Monica S. Lam,et al.  Cloning-based context-sensitive pointer alias analysis using binary decision diagrams , 2004, PLDI '04.

[14]  Geoffrey Smith,et al.  A Sound Type System for Secure Flow Analysis , 1996, J. Comput. Secur..

[15]  Dorothy E. Denning,et al.  A lattice model of secure information flow , 1976, CACM.

[16]  A. Deutsch,et al.  A storeless model of aliasing and its abstractions using finite representations of right-regular equivalence relations , 1992, Proceedings of the 1992 International Conference on Computer Languages.

[17]  Manu Sridharan,et al.  Thin slicing , 2007, PLDI '07.

[18]  Zhendong Su,et al.  Sound and precise analysis of web applications for injection vulnerabilities , 2007, PLDI '07.

[19]  C. R. Ramakrishnan,et al.  Incremental Evaluation of Tabled Logic Programs , 2003, ICLP.

[20]  Stephen McCamant,et al.  Quantitative information flow as network flow capacity , 2008, PLDI '08.

[21]  Wen-mei W. Hwu,et al.  Modular interprocedural pointer analysis using access paths: design, implementation, and evaluation , 2000, PLDI '00.

[22]  Marco Pistoia,et al.  Path- and index-sensitive string analysis based on monadic second-order logic , 2011, ISSTA '11.

[23]  Gregor Snelting,et al.  Efficient path conditions in dependence graphs for software safety analysis , 2006, TSEM.

[24]  Manu Sridharan,et al.  TAJ: effective taint analysis of web applications , 2009, PLDI '09.

[25]  Andrew C. Myers,et al.  JFlow: practical mostly-static information flow control , 1999, POPL '99.

[26]  Zhendong Su,et al.  Static detection of cross-site scripting vulnerabilities , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[27]  Marco Pistoia,et al.  Interprocedural Analysis for Privileged Code Placement and Tainted Variable Detection , 2005, ECOOP.

[28]  Andrew C. Myers,et al.  A decentralized model for information flow control , 1997, SOSP.

[29]  James Newsom,et al.  Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software, Network and Distributed System Security Symposium Conference Proceedings : 2005 , 2005 .

[30]  Dawson R. Engler,et al.  Using programmer-written compiler extensions to catch security holes , 2002, Proceedings 2002 IEEE Symposium on Security and Privacy.

[31]  Peter J. Denning,et al.  Certification of programs for secure information flow , 1977, CACM.

[32]  Thomas W. Reps,et al.  Precise interprocedural dataflow analysis via graph reachability , 1995, POPL '95.

[33]  Frank Tip,et al.  Efficiently Refactoring Java Applications to Use Generic Libraries , 2005, ECOOP.

[34]  Calvin Lin,et al.  Efficient and extensible security enforcement using dynamic data flow analysis , 2008, CCS.

[35]  Ondrej Lhoták,et al.  Context-Sensitive Points-to Analysis: Is It Worth It? , 2006, CC.

[36]  David F. Bacon,et al.  Fast static analysis of C++ virtual function calls , 1996, OOPSLA '96.

[37]  Manu Sridharan,et al.  Refinement-based context-sensitive points-to analysis for Java , 2006, PLDI '06.

[38]  Derrick G. Kourie,et al.  Server-centric Web frameworks: An overview , 2008, CSUR.

[39]  Andrew C. Myers,et al.  Language-based information-flow security , 2003, IEEE J. Sel. Areas Commun..

[40]  Benjamin Livshits,et al.  Finding Security Vulnerabilities in Java Applications with Static Analysis , 2005, USENIX Security Symposium.

[41]  Xin Zheng,et al.  Demand-driven alias analysis for C , 2008, POPL '08.

[42]  Gregor Snelting,et al.  Information Flow Control for Java Based on Path Conditions in Dependence Graphs , 2006, ISSSE.