PhASAR: An Inter-procedural Static Analysis Framework for C/C++

Static program analysis is used to automatically determine program properties, or to detect bugs or security vulnerabilities in programs. It can be used as a stand-alone tool or to aid compiler optimization as an intermediary step. Developing precise, inter-procedural static analyses, however, is a challenging task, due to the algorithmic complexity, implementation effort, and the threat of state explosion which leads to unsatisfactory performance. Software written in C and C++ is notoriously hard to analyze because of the deliberately unsafe type system, unrestricted use of pointers, and (for C++) virtual dispatch. In this work, we describe the design and implementation of the LLVM-based static analysis framework PhASAR for C/C++ code. PhASAR allows data-flow problems to be solved in a fully automated manner. It provides class hierarchy, call-graph, points-to, and data-flow information, hence requiring analysis developers only to specify a definition of the data-flow problem. PhASAR thus hides the complexity of static analysis behind a high-level API, making static program analysis more accessible and easy to use. PhASAR is available as an open-source project. We evaluate PhASAR’s scalability during whole-program analysis. Analyzing 12 real-world programs using a taint analysis written in PhASAR, we found PhASAR’s abstractions and their implementations to provide a whole-program analysis that scales well to real-world programs. Furthermore, we peek into the details of analysis runs, discuss our experience in developing static analyses for C/C++, and present possible future improvements. Data or code related to this paper is available at: [34].

[1]  A Pnueli,et al.  Two Approaches to Interprocedural Data Flow Analysis , 2018 .

[2]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[3]  Eric Bodden,et al.  Context-, flow-, and field-sensitive data-flow analysis using synchronized Pushdown systems , 2019, Proc. ACM Program. Lang..

[4]  Jeffrey D. Ullman,et al.  Monotone data flow analysis frameworks , 1977, Acta Informatica.

[5]  Eric Bodden,et al.  Inter-procedural data-flow analysis with IFDS/IDE and Soot , 2012, SOAP '12.

[6]  Lars Ole Andersen,et al.  Program Analysis and Specialization for the C Programming Language , 2005 .

[7]  Eric Bodden,et al.  VisuFlow: A Debugging Environment for Static Analyses , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering: Companion (ICSE-Companion).

[8]  Eric Bodden,et al.  Boomerang: Demand-Driven Flow- and Context-Sensitive Pointer Analysis for Java , 2016, ECOOP.

[9]  Ondrej Lhoták,et al.  The Soot framework for Java program analysis: a retrospective , 2011 .

[10]  Somesh Jha,et al.  Weighted pushdown systems and their application to interprocedural dataflow analysis , 2003, Sci. Comput. Program..

[11]  Thomas W. Reps,et al.  Precise interprocedural dataflow analysis via graph reachability , 1995, POPL '95.

[12]  Thomas W. Reps,et al.  Precise Interprocedural Dataflow Analysis with Applications to Constant Propagation , 1995, TAPSOFT.

[13]  Michael Eichberg,et al.  A software product line for static analyses: the OPAL framework , 2014, SOAP '14.

[14]  Rohan Padhye,et al.  Interprocedural data flow analysis in Soot using value contexts , 2013, SOAP '13.

[15]  Jason Merrill Generic and gimple: A new tree represen-tation for entire functions , 2003 .

[16]  Bjarne Steensgaard,et al.  Points-to analysis in almost linear time , 1996, POPL '96.

[17]  Ondrej Lhoták,et al.  In defense of soundiness , 2015, Commun. ACM.

[18]  Ondrej Lhoták,et al.  Practical Extensions to the IFDS Algorithm , 2010, CC.