Some directed graph algorithms and their application to pointer analysis

This thesis is focused on improving execution time and precision of scalable pointer analysis. Such an analysis statically determines the targets of all pointer variables in a program. We formulate the analysis as a directed graph problem, where the solution can be obtained by a computation similar, in many ways, to transitive closure. As with transitive closure, identifying strongly connected components and transitive edges offers significant gains. However, our problem differs as the computation can result in new edges being added to the graph and, hence, dynamic algorithms are needed to efficiently identify these structures. Thus, pointer analysis has often been likened to the dynamic transitive closure problem. Two new algorithms for dynamically maintaining the topological order of a directed graph are presented. The first is a unit change algorithm, meaning the solution must be recomputed immediately following an edge insertion. While this has a marginally inferior worse-case time bound, compared with a previous solution, it is far simpler to implement and has fewer restrictions. For these reasons, we find it to be faster in practice and provide an experimental study over random graphs to support this. Our second is a batch algorithm, meaning the solution can be updated after several insertions, and it is the first truly dynamic solution to obtain an optimal time bound of O(v+e+ b) over a batch b of edge insertions. Again, we provide an experimental study over random graphs comparing this against the standard approach to topological sort. Furthermore, we demonstrate how both algorithms can be extended to the problem of dynamically detecting strongly connected components (i.e. cycles), thus achieving the first solutions which do not need to traverse the entire graph for half of all edge insertions. Several other new techniques for improving pointer analysis are also presented. These include difference propagation, which avoids redundant work by tracking changes in the points-to sets, and a novel approach to field-sensitive analysis of C. Finally, a detailed study of numerous solving algorithms, evaluating our techniques and algorithms against previous work, is contained herein. Our benchmark suite consists of many common C programs ranging in size from 15,000-200,000 lines of code.

[1]  Roman Manevich,et al.  Compactly Representing First-Order Structures for Static Analysis , 2002, SAS.

[2]  Susan Horwitz,et al.  An efficient general iterative algorithm for dataflow analysis , 2004, Acta Informatica.

[3]  Jeffrey D. Ullman,et al.  Monotone data flow analysis frameworks , 1977, Acta Informatica.

[4]  Jeffrey D. Ullman,et al.  A Simple Algorithm for Global Data Flow Analysis Problems , 1975, SIAM J. Comput..

[5]  Patrice Godefroid,et al.  VeriSoft: A Tool for the Automatic Analysis of Concurrent Reactive Software , 1997, CAV.

[6]  Erik Schon,et al.  On the Computation of Fixpoints in Static Program Analysis with an Application to AKL , 1995 .

[7]  Wuu Yang,et al.  The Semantics of Program Slicing and Program Integration , 1989, TAPSOFT, Vol.2.

[8]  Chris Hankin,et al.  Finding fixed points in finite lattices , 1987, FPCA.

[9]  Dashing Yeh On incremental evaluation of ordered attributed grammars , 1983, BIT Comput. Sci. Sect..

[10]  Jacob M. Howe,et al.  Efficient Groundness Analysis in Prolog , 2003, Theory Pract. Log. Program..

[11]  G. Ramalingam,et al.  The undecidability of aliasing , 1994, TOPL.

[12]  Marta Z. Kwiatkowska,et al.  Model checking for probability and time: from theory to practice , 2003, 18th Annual IEEE Symposium of Logic in Computer Science, 2003. Proceedings..

[13]  Neil D. Jones,et al.  Flow analysis and optimization of LISP-like structures , 1979, POPL.

[14]  David C. Sehr,et al.  On the importance of points-to analysis and other memory disambiguation methods for C programs , 2001, PLDI '01.

[15]  John C. Reynolds,et al.  Automatic computation of data set definitions , 1968, IFIP Congress.

[16]  Barbara G. Ryder,et al.  Parameterized object sensitivity for points-to and side-effect analyses for Java , 2002, ISSTA '02.

[17]  Martin Müller,et al.  Depth-first discovery algorithm for incremental topological sorting of directed acyclic graphs , 2003 .

[18]  Thomas W. Reps,et al.  Pointer analysis for programs with structures and casting , 1999, PLDI '99.

[19]  Michael Hind,et al.  An Empirical Comparison of Interprocedural Pointer Alias Analyses , 1997 .

[20]  William Landi,et al.  Interprocedural aliasing in the presence of pointers , 1992 .

[21]  Monica S. Lam,et al.  Detecting Coarse - Grain Parallelism Using an Interprocedural Parallelizing Compiler , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[22]  Monica S. Lam,et al.  Cloning-based context-sensitive pointer alias analysis using binary decision diagrams , 2004, PLDI '04.

[23]  Mireille Bousquet-Mélou,et al.  Random Generation of Directed Acyclic Graphs , 2001, Electron. Notes Discret. Math..

[24]  Susan Horwitz,et al.  Precise flow-insensitive may-alias analysis is NP-hard , 1997, TOPL.

[25]  Sungdo Moon,et al.  Measuring the effectiveness of automatic parallelization in SUIF , 1998, ICS '98.

[26]  Michael Hind,et al.  Assessing the Effects of Flow-Sensitivity on Pointer Alias Analyses , 1998, SAS.

[27]  Thomas W. Reps,et al.  Remote attribute updating for language-based editors , 1986, POPL '86.

[28]  Neil D. Jones,et al.  A relational framework for abstract interpretation , 1985, Programs as Data Objects.

[29]  Calvin Lin,et al.  Incorporating domain-specific information into the compilation process , 2003 .

[30]  Barbara G. Ryder,et al.  Practical pointer aliasing analysis , 1998 .

[31]  Chris Hankin,et al.  Online cycle detection and difference propagation for pointer analysis , 2003, Proceedings Third IEEE International Workshop on Source Code Analysis and Manipulation.

[32]  Calvin Lin,et al.  Client-Driven Pointer Analysis , 2003, SAS.

[33]  Barbara G. Ryder,et al.  A schema for interprocedural modification side-effect analysis with pointer aliasing , 2001, TOPL.

[34]  G. Ramalingam Bounded Incremental Computation , 1996, Lecture Notes in Computer Science.

[35]  Atanas Rountev,et al.  Off-line variable substitution for scaling points-to analysis , 2000, PLDI '00.

[36]  Nevin Heintze,et al.  Set-based analysis of ML programs , 1994, LFP '94.

[37]  Olivier Tardieu,et al.  Ultra-fast aliasing analysis using CLA: a million lines of C code in a second , 2001, PLDI '01.

[38]  Kathryn S. McKinley Evaluating automatic parallelization for efficient execution on shared-memory multiprocessors , 1994, ICS '94.

[39]  Jeffrey D. Ullman,et al.  Global Data Flow Analysis and Iterative Algorithms , 1976, J. ACM.

[40]  Michael Hind,et al.  Which pointer analysis should I use? , 2000, ISSTA '00.

[41]  Irit Katriel,et al.  On algorithms for online topological ordering and sorting , 2004 .

[42]  Valerie King,et al.  A fully dynamic algorithm for maintaining the transitive closure , 1999, STOC '99.

[43]  Robert E. Tarjan,et al.  Depth-First Search and Linear Graph Algorithms , 1972, SIAM J. Comput..

[44]  David A. McAllester,et al.  Linear-time subtransitive control flow analysis , 1997, PLDI '97.

[45]  Thomas Reps,et al.  Interconveritibility of Set Constraints and Context-Free Language Reachability , 1997, PEPM.

[46]  Hausi A. Müller,et al.  Analyzing Java software by combining metrics and program visualization , 2000, Proceedings of the Fourth European Conference on Software Maintenance and Reengineering.

[47]  Susan Horwitz,et al.  Using static single assignment form to improve flow-insensitive pointer analysis , 1998, PLDI '98.

[48]  John T. Stasko,et al.  Visualization of test information to assist fault localization , 2002, ICSE '02.

[49]  Ondrej Lhoták,et al.  Jedd: a BDD-based relational extension of Java , 2004, PLDI '04.

[50]  Paul H. J. Kelly,et al.  A Dynamic Algorithm for Topologically Sorting Directed Acyclic Graphs , 2004, WEA.

[51]  Donglin Liang,et al.  Extending and evaluating flow-insenstitive and context-insensitive points-to analyses for Java , 2001, PASTE '01.

[52]  Thomas A. Henzinger,et al.  HYTECH: a model checker for hybrid systems , 1997, International Journal on Software Tools for Technology Transfer.

[53]  Fabio Gagliardi Cozman,et al.  Random Generation of Bayesian Networks , 2002, SBIA.

[54]  Monica S. Lam,et al.  Efficient context-sensitive pointer analysis for C programs , 1995, PLDI '95.

[55]  Alberto Marchetti-Spaccamela,et al.  Maintaining a Topological Order Under Edge Insertions , 1996, Inf. Process. Lett..

[56]  William Landi,et al.  Undecidability of static analysis , 1992, LOPL.

[57]  Edmund M. Clarke,et al.  Symbolic Model Checking: 10^20 States and Beyond , 1990, Inf. Comput..

[58]  Boris Pittel,et al.  A phase transition phenomenon in a random directed acyclic graph , 2001 .

[59]  Alexander Aiken,et al.  The set constraint/CFL reachability connection in practice , 2004, PLDI '04.

[60]  Barbara G. Ryder,et al.  Elimination algorithms for data flow analysis , 1986, CSUR.

[61]  Mats Wir,et al.  Bounded Incremental Parsing , 2022 .

[62]  Alan Mycroft,et al.  Abstract interpretation and optimising transformations for applicative programs , 1982 .

[63]  Tiziano Villa,et al.  VIS: A System for Verification and Synthesis , 1996, CAV.

[64]  Erik Ruf,et al.  Context-insensitive alias analysis reconsidered , 1995, PLDI '95.

[65]  David A. McAllester,et al.  On the cubic bottleneck in subtyping and flow analysis , 1997, Proceedings of Twelfth Annual IEEE Symposium on Logic in Computer Science.

[66]  Barbara G. Ryder,et al.  Points-to analysis for Java using annotated constraints , 2001, OOPSLA '01.

[67]  Ondrej Lhoták,et al.  Scaling Java Points-to Analysis Using SPARK , 2003, CC.

[68]  David A. Padua,et al.  High-Speed Multiprocessors and Compilation Techniques , 1980, IEEE Transactions on Computers.

[69]  Deepak Goyal An Improved Intra-procedural May-alias Analysis Algorithm , 1999 .

[70]  Donglin Liang,et al.  Efficient Computation of Parameterized Pointer Information for Interprocedural Analyses , 2001, SAS.

[71]  Gerard J. Holzmann,et al.  The SPIN Model Checker , 2003 .

[72]  Thomas A. Henzinger,et al.  Software Verification with BLAST , 2003, SPIN.

[73]  Alexander Aiken,et al.  Projection merging: reducing redundancies in inclusion constraint graphs , 2000, POPL '00.

[74]  Klaus Simon,et al.  Finding a Minimal Transitive Reduction in a Strongly Connected Digraph within Linear Time , 1989, WG.

[75]  B. A. Myers,et al.  Visual programming, programming by example, and program visualization: a taxonomy , 1986, CHI '86.

[76]  Raghu Ramakrishnan,et al.  Transitive closure algorithms based on graph traversal , 1993, TODS.

[77]  Esko Nuutila,et al.  Efficient transitive closure computation in large digraphs , 1995 .

[78]  Samir Khuller,et al.  Approximating the minimum equivalent digraph , 1994, SODA '94.

[79]  Matthew S. Hecht,et al.  Flow Analysis of Computer Programs , 1977 .

[80]  Gregory F. Sullivan,et al.  Detecting cycles in dynamic graphs in polynomial time , 1988, STOC '88.

[81]  Chris Hankin,et al.  Deriving algorithms from type inference systems: application to strictness analysis , 1994, POPL '94.

[82]  Chris Mellish,et al.  Abstract Interpretation of Prolog Programs , 1986, ICLP.

[83]  Hong-Seok Kim,et al.  Bottom-Up and Top-Down Context-Sensitive Summary-Based Pointer Analysis , 2004, SAS.

[84]  Thomas W. Reps Optimal-time incremental semantic analysis for syntax-directed editors , 1982, POPL '82.

[85]  David A. Wagner,et al.  A First Step Towards Automated Detection of Buffer Overrun Vulnerabilities , 2000, NDSS.

[86]  Harry T. Hsu,et al.  An Algorithm for Finding a Minimal Equivalent Graph of a Digraph , 1975, JACM.

[87]  David A. Padua,et al.  Advanced compiler optimizations for supercomputers , 1986, CACM.

[88]  Susan Horwitz,et al.  The Effects of the Precision of Pointer Analysis , 1997, SAS.

[89]  Thomas W. Reps,et al.  On the Computational Complexity of Dynamic Graph Problems , 1996, Theor. Comput. Sci..

[90]  Eljas Soisalon-Soininen,et al.  On Finding the Strongly Connected Components in a Directed Graph , 1994, Inf. Process. Lett..

[91]  Mark Harman,et al.  Amorphous program slicing , 2003, J. Syst. Softw..

[92]  Laurie J. Hendren,et al.  Extended SSA numbering: introducing SSA properties to languages with multi-level pointers , 1996, CASCON.

[93]  Chris Hankin,et al.  Online Cycle Detection and Difference Propagation: Applications to Pointer Analysis , 2004, Software Quality Journal.

[94]  Donglin Liang,et al.  Efficient points-to analysis for whole-program analysis , 1999, ESEC/FSE-7.

[95]  Jong-Deok Choi,et al.  Interprocedural pointer alias analysis , 1999, TOPL.

[96]  David W. Binkley,et al.  Interprocedural slicing using dependence graphs , 1988, SIGP.

[97]  Monica S. Lam,et al.  An Efficient Inclusion-Based Points-To Analysis for Strictly-Typed Languages , 2002, SAS.

[98]  Oded Shmueli Dynamic Cycle Detection , 1983, Inf. Process. Lett..

[99]  Alon Itai,et al.  Finding a minimum circuit in a graph , 1977, STOC '77.

[100]  Susan Horwitz,et al.  Fast and accurate flow-insensitive points-to analysis , 1997, POPL '97.

[101]  Bjarne Steensgaard,et al.  Points-to analysis in almost linear time , 1996, POPL '96.

[102]  Andrea De Lucia,et al.  Program slicing: methods and applications , 2001, Proceedings First IEEE International Workshop on Source Code Analysis and Manipulation.

[103]  Jianwen Zhu Symbolic pointer analysis , 2002, ICCAD 2002.

[104]  Barbara G. Ryder,et al.  Incremental data-flow analysis algorithms , 1988, TOPL.

[105]  Hans L. Bodlaender,et al.  Online topological ordering , 2005, SODA '05.

[106]  Roger Hoover Incremental Graph Evaluation , 1987 .

[107]  Anthony Pioli,et al.  Conditional Pointer Aliasing and Constant Propagation , 1999 .

[108]  Bjarne Steensgaard Points-to Analysis by Type Inference of Programs with Structures and Unions , 1996, CC.

[109]  Jianwen Zhu,et al.  Symbolic pointer analysis revisited , 2004, PLDI '04.

[110]  Barbara G. Ryder,et al.  Comparing flow and context sensitivity on the modification-side-effects problem , 1998, ISSTA '98.

[111]  Philip Wadler,et al.  Projections for strictness analysis , 1987, FPCA.

[112]  Thomas W. Reps,et al.  On Competitive On-Line Algorithms for the Dynamic Priority-Ordering Problem , 1994, Inf. Process. Lett..

[113]  Thomas W. Reps,et al.  Program Specialization via Program Slicing , 1996, Dagstuhl Seminar on Partial Evaluation.

[114]  Steven P. Reiss Cacti: a front end for program visualization , 1997, Proceedings of VIZ '97: Visualization Conference, Information Visualization Symposium and Parallel Rendering Symposium.