Data Structure Analysis: A Fast and Scalable Context-Sensitive Heap Analysis

This paper describes a scalable heap analysis algorithm, Data Structure Analysis, designed to enable analyses and transformations of programs at the level of entire logical data structures. Data Structure Analysis attempts to identify disjoint instances of logical program data structures and their internal and external connectivity properties (without trying to categorize their “shape”). To achieve this, Data Structure Analysis is fully context-sensitive (in the sense that it names memory objects by entire acyclic call paths), is fieldsensitive, builds an explicit model of the heap, and is robust enough to handle the full generality of C. Despite these aggressive features, the algorithm is both extremely fast (requiring 2-7 seconds for C programs in the range of 100K lines of code) and is scalable in practice. It has three features we believe are novel: (a) it incrementally builds a precise program call graph during the analysis; (b) it distinguishes complete and incomplete information in a manner that simplifies analysis of libraries or other portions of programs; and (c) it uses speculative field-senstivity in typeunsafe programs in order to preserve efficiency and scalability. Finally, it shows that the key to achieving scalability in a fully context-sensitive algorithm is the use of a unificationbased approach, a combination that has been used before but whose importance has not been clearly articulated.

[1]  David C. Sehr,et al.  On the importance of points-to analysis and other memory disambiguation methods for C programs , 2001, PLDI '01.

[2]  Alain Deutsch,et al.  Interprocedural may-alias analysis for pointers: beyond k-limiting , 1994, PLDI '94.

[3]  Frédéric Vivien,et al.  Incrementalized pointer and escape analysis , 2001, PLDI '01.

[4]  Jakob Rehof,et al.  Scalable context-sensitive flow analysis using instantiation constraints , 2000, PLDI '00.

[5]  Kathryn S. McKinley,et al.  Data flow analysis for software prefetching linked data structures in Java , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[6]  P. Gács,et al.  Algorithms , 1992 .

[7]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[8]  Laurie J. Hendren,et al.  Connection Analysis: A Practical Interprocedural Heap Analysis for C , 1996, International Journal of Parallel Programming.

[9]  Bjarne Steensgaard,et al.  Points-to analysis in almost linear time , 1996, POPL '96.

[10]  Lars Ole Andersen,et al.  Program Analysis and Specialization for the C Programming Language , 2005 .

[11]  Vikram S. Adve,et al.  The LLVM Instruction Set and Compilation Strategy , 2002 .

[12]  Wen-mei W. Hwu,et al.  Modular interprocedural pointer analysis using access paths: design, implementation, and evaluation , 2000, PLDI '00.

[13]  Monica S. Lam,et al.  Efficient context-sensitive pointer analysis for C programs , 1995, PLDI '95.

[14]  Bjarne Steensgaard Points-to Analysis by Type Inference of Programs with Structures and Unions , 1996, CC.

[15]  James R. Larus,et al.  Detecting conflicts between structure accesses , 1988, PLDI '88.

[16]  Vikram S. Adve,et al.  Data Structure Analysis: An Ecien t Context-Sensitive Heap Analysis , 2003 .

[17]  Dinakar Dhurjati,et al.  Memory safety without runtime checks or garbage collection , 2003 .

[18]  Donglin Liang,et al.  Efficient points-to analysis for whole-program analysis , 1999, ESEC/FSE-7.

[19]  Reinhard Wilhelm,et al.  Solving shape-analysis problems in languages with destructive updating , 1998, TOPL.

[20]  Laurie J. Hendren,et al.  Is it a tree, a DAG, or a cyclic graph? A shape analysis for heap-directed pointers in C , 1996, POPL '96.

[21]  Erik Ruf,et al.  Effective synchronization removal for Java , 2000, PLDI '00.

[22]  Laurie J. Hendren,et al.  Context-sensitive interprocedural points-to analysis in the presence of function pointers , 1994, PLDI '94.

[23]  Chris Lattner,et al.  LLVM: AN INFRASTRUCTURE FOR MULTI-STAGE OPTIMIZATION , 2000 .

[24]  Jakob Rehof,et al.  Estimating the Impact of Scalable Pointer Analysis on Optimization , 2001, SAS.

[25]  Vikram S. Adve,et al.  Automatic pool allocation for disjoint data structures , 2003, MSP '02.

[26]  Michael Hind,et al.  Pointer analysis: haven't we solved this problem yet? , 2001, PASTE '01.

[27]  Anne Rogers,et al.  Supporting dynamic data structures on distributed-memory machines , 1995, TOPL.

[28]  Donglin Liang,et al.  Efficient Computation of Parameterized Pointer Information for Interprocedural Analyses , 2001, SAS.