Generalized Points-to Graphs: A New Abstraction of Memory in the Presence of Pointers

Flow- and context-sensitive points-to analysis is difficult to scale; for top-down approaches, the problem centers on repeated analysis of the same procedure; for bottom-up approaches, the abstractions used to represent procedure summaries have not scaled while preserving precision. We propose a novel abstraction called the Generalized Points-to Graph (GPG) which views points-to relations as memory updates and generalizes them using the counts of indirection levels leaving the unknown pointees implicit. This allows us to construct GPGs as compact representations of bottom-up procedure summaries in terms of memory updates and control flow between them. Their compactness is ensured by the following optimizations: strength reduction reduces the indirection levels, redundancy elimination removes redundant memory updates and minimizes control flow (without over-approximating data dependence between memory updates), and call inlining enhances the opportunities of these optimizations. We devise novel operations and data flow analyses for these optimizations. Our quest for scalability of points-to analysis leads to the following insight: The real killer of scalability in program analysis is not the amount of data but the amount of control flow that it may be subjected to in search of precision. The effectiveness of GPGs lies in the fact that they discard as much control flow as possible without losing precision (i.e., by preserving data dependence without over-approximation). This is the reason why the GPGs are very small even for main procedures that contain the effect of the entire program. This allows our implementation to scale to 158kLoC for C programs.

[1]  Alexander Aiken,et al.  How is aliasing used in systems software? , 2006, SIGSOFT '06/FSE-14.

[2]  Chris Hankin,et al.  Efficient field-sensitive pointer analysis of C , 2007, TOPL.

[3]  Vivek Sarkar,et al.  Array SSA form and its use in parallelization , 1998, POPL '98.

[4]  Eric Bodden,et al.  Boomerang: Demand-Driven Flow- and Context-Sensitive Pointer Analysis for Java , 2016, ECOOP.

[5]  Jeffrey D. Ullman,et al.  Flow graph reducibility , 1972, SIAM J. Comput..

[6]  Amitabha Sanyal,et al.  Data Flow Analysis - Theory and Practice , 2009 .

[7]  Roman Manevich,et al.  Compactly Representing First-Order Structures for Static Analysis , 2002, SAS.

[8]  Monica S. Lam,et al.  Efficient context-sensitive pointer analysis for C programs , 1995, PLDI '95.

[9]  William Landi,et al.  Undecidability of static analysis , 1992, LOPL.

[10]  Rupesh Nasre,et al.  Parallel Replication-Based Points-To Analysis , 2012, CC.

[11]  Alan Mycroft,et al.  Flow- and Context-Sensitive Points-To Analysis Using Generalized Points-To Graphs , 2016, SAS.

[12]  Lian Li,et al.  Precise and scalable context-sensitive pointer analysis via value flow graph , 2013, ISMM '13.

[13]  Stefan Staiger-Stöhr Practical Integrated Analysis of Pointers, Dataflow and Control Flow , 2013, TOPL.

[14]  Monica S. Lam,et al.  An Efficient Inclusion-Based Points-To Analysis for Strictly-Typed Languages , 2002, SAS.

[15]  Welf Löwe,et al.  Parallel points-to analysis for multi-core machines , 2011, HiPEAC.

[16]  Olivier Tardieu,et al.  Demand-driven pointer analysis , 2001, PLDI '01.

[17]  Xin Zheng,et al.  Demand-driven alias analysis for C , 2008, POPL '08.

[18]  Susan Horwitz,et al.  Fast and accurate flow-insensitive points-to analysis , 1997, POPL '97.

[19]  Giuseppe F. Italiano,et al.  Mantaining Dynamic Matrices for Fully Dynamic Transitive Closure , 2001, Algorithmica.

[20]  Yannis Smaragdakis,et al.  Hybrid context-sensitivity for points-to analysis , 2013, PLDI.

[21]  Atanas Rountev,et al.  Merging equivalent contexts for scalable heap-cloning-based context-sensitive points-to analysis , 2008, ISSTA '08.

[22]  Jingling Xue,et al.  Region-Based Selective Flow-Sensitive Pointer Analysis , 2014, SAS.

[23]  Matthew S. Hecht,et al.  Flow Analysis of Computer Programs , 1977 .

[24]  Rupak Majumdar,et al.  Joining dataflow with predicates , 2005, ESEC/FSE-13.

[25]  Alexander Aiken,et al.  Partial online cycle elimination in inclusion constraint graphs , 1998, PLDI.

[26]  Ben Hardekopf,et al.  Semi-sparse flow-sensitive pointer analysis , 2009, POPL '09.

[27]  Xin Zhang,et al.  Hybrid top-down and bottom-up interprocedural analysis , 2014, PLDI.

[28]  Hong-Seok Kim,et al.  Bottom-Up and Top-Down Context-Sensitive Summary-Based Pointer Analysis , 2004, SAS.

[29]  Alastair F. Donaldson,et al.  Software Model Checking , 2014, Computing Handbook, 3rd ed..

[30]  Ondrej Lhoták,et al.  Points-to analysis with efficient strong updates , 2011, POPL '11.

[31]  Thomas W. Reps,et al.  Precise Interprocedural Dataflow Analysis with Applications to Constant Propagation , 1995, TAPSOFT.

[32]  Alan Mycroft,et al.  Liveness-Based Pointer Analysis , 2012, SAS.

[33]  Vikram S. Adve,et al.  Making context-sensitive points-to analysis with heap cloning practical for the real world , 2007, PLDI '07.

[34]  Yannis Smaragdakis,et al.  Pointer Analysis , 2015, Found. Trends Program. Lang..

[35]  Manu Sridharan,et al.  Refinement-based context-sensitive points-to analysis for Java , 2006, PLDI '06.

[36]  Ondrej Lhoták,et al.  Points-to analysis using BDDs , 2003, PLDI '03.

[37]  Lars Ole Andersen,et al.  Program Analysis and Specialization for the C Programming Language , 2005 .

[38]  Sumit Gulwani,et al.  Computing Procedure Summaries for Interprocedural Analysis , 2007, ESOP.

[39]  Olivier Tardieu,et al.  Ultra-fast aliasing analysis using CLA: a million lines of C code in a second , 2001, PLDI '01.

[40]  Thomas W. Reps,et al.  Precise interprocedural dataflow analysis via graph reachability , 1995, POPL '95.

[41]  G. Ramalingam,et al.  The undecidability of aliasing , 1994, TOPL.

[42]  Hong-Seok Kim,et al.  Importance of heap specialization in pointer analysis , 2004, PASTE '04.

[43]  Manu Sridharan,et al.  Demand-driven points-to analysis for Java , 2005, OOPSLA '05.

[44]  Ondrej Lhoták,et al.  Pick your contexts well: understanding object-sensitivity , 2011, POPL '11.

[45]  Thomas A. Henzinger,et al.  Configurable Software Verification: Concretizing the Convergence of Model Checking and Program Analysis , 2007, CAV.

[46]  Jong-Deok Choi,et al.  Automatic construction of sparse data flow evaluation graphs , 1991, POPL '91.

[47]  Sriram K. Rajamani,et al.  The SLAM project: debugging system software via static analysis , 2002, POPL '02.

[48]  R. Govindarajan,et al.  Scalable Context-Sensitive Points-to Analysis Using Multi-dimensional Bloom Filters , 2009, APLAS.

[49]  Vivek Sarkar,et al.  Parallel sparse flow-sensitive points-to analysis , 2018, CC.

[50]  Susan Horwitz,et al.  Using static single assignment form to improve flow-insensitive pointer analysis , 1998, PLDI '98.

[51]  Bjarne Steensgaard,et al.  Points-to analysis in almost linear time , 1996, POPL '96.

[52]  Uday P. Khedker,et al.  Heap Abstractions for Static Analysis , 2014, ACM Comput. Surv..

[53]  Ondrej Lhoták,et al.  Practical Extensions to the IFDS Algorithm , 2010, CC.

[54]  Cristina Cifuentes,et al.  Precise and scalable context-sensitive pointer analysis via value flow graph , 2013, ISMM.

[55]  David Eppstein,et al.  Dynamic graph algorithms , 2010 .

[56]  Martin C. Rinard,et al.  Pointer and escape analysis for multithreaded programs , 2001, PPoPP '01.

[57]  Welf Löwe,et al.  A Scalable Flow-Sensitive Points-to Analysis , 2006 .

[58]  Calvin Lin,et al.  Efficient Flow-Sensitive Interprocedural Data-Flow Analysis in the Presence of Pointers , 2006, CC.

[59]  Isil Dillig,et al.  Bottom-Up Context-Sensitive Pointer Analysis for Java , 2015, APLAS.

[60]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools (2nd Edition) , 2006 .

[61]  Jianwen Zhu Symbolic pointer analysis , 2002, ICCAD 2002.

[62]  Mark N. Wegman,et al.  Analysis of pointers and structures , 1990, SIGP.

[63]  Edmund M. Clarke,et al.  Symbolic model checking for sequential circuit verification , 1993, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[64]  C. R. Ramakrishnan,et al.  On the conversion of indirect to direct recursion , 1993, LOPL.

[65]  Jeffrey D. Ullman,et al.  Characterizations of Reducible Flow Graphs , 1974, JACM.

[66]  Barbara G. Ryder,et al.  An incremental flow- and context-sensitive pointer aliasing analysis , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[67]  Eran Yahav,et al.  Generating precise and concise procedure summaries , 2008, POPL '08.

[68]  Jianwen Zhu,et al.  Symbolic pointer analysis revisited , 2004, PLDI '04.

[69]  C. R. Ramakrishnan,et al.  Incremental and demand-driven points-to analysis using logic programming , 2005, PPDP.

[70]  Michael Hind,et al.  Assessing the Effects of Flow-Sensitivity on Pointer Alias Analyses , 1998, SAS.

[71]  Thomas W. Reps,et al.  Undecidability of context-sensitive data-dependence analysis , 2000, TOPL.

[72]  Ondrej Lhoták,et al.  Pointer Analysis (Dagstuhl Seminar 13162) , 2013, Dagstuhl Reports.

[73]  Jeffrey D. Ullman,et al.  Introduction to automata theory, languages, and computation, 2nd edition , 2001, SIGA.

[74]  Martin C. Rinard,et al.  Pointer analysis for multithreaded programs , 1999, PLDI '99.

[75]  Donglin Liang,et al.  Efficient points-to analysis for whole-program analysis , 1999, ESEC/FSE-7.

[76]  Jong-Deok Choi,et al.  Interprocedural pointer alias analysis , 1999, TOPL.

[77]  Monica S. Lam,et al.  Cloning-based context-sensitive pointer alias analysis using binary decision diagrams , 2004, PLDI '04.

[78]  Jakob Rehof,et al.  Scalable context-sensitive flow analysis using instantiation constraints , 2000, PLDI '00.

[79]  Barbara G. Ryder,et al.  Points-to analysis for Java using annotated constraints , 2001, OOPSLA '01.

[80]  Ondrej Lhoták,et al.  Scaling Java Points-to Analysis Using SPARK , 2003, CC.

[81]  Vineet Kahlon Bootstrapping: a technique for scalable flow and context-sensitive pointer alias analysis , 2008, PLDI '08.

[82]  Martin C. Rinard,et al.  Compositional pointer and escape analysis for Java programs , 1999, OOPSLA '99.

[83]  Rohan Padhye,et al.  Interprocedural data flow analysis in Soot using value contexts , 2013, SOAP '13.

[84]  Susan Horwitz,et al.  Precise flow-insensitive may-alias analysis is NP-hard , 1997, TOPL.

[85]  Jingling Xue,et al.  SPAS: Scalable Path-Sensitive Pointer Analysis on Full-Sparse SSA , 2011, APLAS.

[86]  Martin C. Rinard,et al.  Pointer analysis for structured parallel programs , 2003, TOPL.

[87]  Randal E. Bryant,et al.  Graph-Based Algorithms for Boolean Function Manipulation , 1986, IEEE Transactions on Computers.

[88]  Barbara G. Ryder,et al.  Parameterized object sensitivity for points-to analysis for Java , 2005, TSEM.

[89]  Rupesh Nasre Approximating inclusion-based points-to analysis , 2011, MSPC '11.

[90]  Calvin Lin,et al.  Removing unimportant computations in interprocedural program analysis , 2007 .

[91]  Barbara G. Ryder,et al.  A safe approximate algorithm for interprocedural aliasing , 1992, PLDI '92.

[92]  Calvin Lin,et al.  Client-Driven Pointer Analysis , 2003, SAS.

[93]  Reinhard Wilhelm,et al.  Solving shape-analysis problems in languages with destructive updating , 1998, TOPL.

[94]  Ravichandhran Madhavan,et al.  Modular Heap Analysis for Higher-Order Programs , 2012, SAS.

[95]  Ben Hardekopf,et al.  The ant and the grasshopper: fast and accurate pointer analysis for millions of lines of code , 2007, PLDI '07.

[96]  Erik Ruf Partitioning dataflow analyses using types , 1997, POPL '97.

[97]  Jong-Deok Choi,et al.  Efficient flow-sensitive interprocedural computation of pointer-induced aliases and side effects , 1993, POPL '93.

[98]  Mark N. Wegman,et al.  Efficiently computing static single assignment form and the control dependence graph , 1991, TOPL.

[99]  Daniel Kroening,et al.  A Tool for Checking ANSI-C Programs , 2004, TACAS.

[100]  Michael Hind,et al.  Which pointer analysis should I use? , 2000, ISSTA '00.

[101]  Ondrej Lhoták,et al.  Evaluating the benefits of context-sensitive points-to analysis using a BDD-based implementation , 2008, TSEM.

[102]  G. Ramalingam On Sparse Evaluation Representations , 1997, SAS.

[103]  Barbara G. Ryder,et al.  Relevant context inference , 1999, POPL '99.

[104]  Barbara G. Ryder Dimensions of Precision in Reference Analysis of Object-Oriented Programming Languages , 2003, CC.

[105]  A Pnueli,et al.  Two Approaches to Interprocedural Data Flow Analysis , 2018 .

[106]  Hongtao Yu,et al.  Level by level: making flow- and context-sensitive pointer analysis scalable for millions of lines of code , 2010, CGO '10.

[107]  Ben Hardekopf,et al.  Flow-sensitive pointer analysis for millions of lines of code , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[108]  Calvin Lin,et al.  Error checking with client-driven pointer analysis , 2005, Sci. Comput. Program..

[109]  Welf Löwe,et al.  Towards Path-Sensitive Points-to Analysis , 2007, Seventh IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2007).

[110]  Thomas W. Reps,et al.  Pointer analysis for programs with structures and casting , 1999, PLDI '99.

[111]  Hakjoo Oh,et al.  Design and implementation of sparse global analyses for C-like languages , 2012, PLDI.

[112]  Atanas Rountev,et al.  Rethinking Soot for summary-based whole-program analysis , 2012, SOAP '12.

[113]  Uday P. Khedker,et al.  Efficiency, Precision, Simplicity, and Generality in Interprocedural Data Flow Analysis: Resurrecting the Classical Call Strings Method , 2008, CC.

[114]  Martin C. Rinard,et al.  Purity and Side Effect Analysis for Java Programs , 2005, VMCAI.

[115]  Wen-mei W. Hwu,et al.  Modular interprocedural pointer analysis using access paths: design, implementation, and evaluation , 2000, PLDI '00.

[116]  AikenAlex,et al.  Sound, complete and scalable path-sensitive analysis , 2008 .

[117]  Barbara G. Ryder,et al.  Program decomposition for pointer aliasing: a step toward practical analyses , 1996, SIGSOFT '96.

[118]  Hongseok Yang,et al.  Selective context-sensitivity guided by impact pre-analysis , 2014, PLDI.

[119]  Michael Hind,et al.  Pointer analysis: haven't we solved this problem yet? , 2001, PASTE '01.

[120]  Chao Wang,et al.  Model checking C programs using F-Soft , 2005, 2005 International Conference on Computer Design.

[121]  Rajiv Gupta,et al.  Reducing the Cost of Data Flow Analysis By Congruence Partitioning , 1994, CC.

[122]  Isil Dillig,et al.  Sound, complete and scalable path-sensitive analysis , 2008, PLDI '08.

[123]  Jingling Xue,et al.  On-demand dynamic summary-based points-to analysis , 2012, CGO '12.