Refinement-Based Program Analysis Tools

Refinement-Based Program Analysis Tools by Manu Sridharan Doctor of Philosophy in Computer Science University of California, Berkeley Professor Rastislav Bodik, Chair Program analysis tools are starting to change how software is developed. Verifiers can now eliminate certain complex bugs in large code bases, and automatic refactoring tools can greatly simplify code cleanup. Nevertheless, writing robust large-scale software remains a challenge, as greater use of component frameworks complicates debugging and program understanding. Developers need more powerful programming tools to combat this complexity and produce reliable code. This dissertation presents two techniques for more powerful debugging and program understanding tools based on refinement. In general, refinement-based techniques aim to discover interesting properties of a large program by only reasoning about the most important parts of the program (typically a small amount of code) precisely, abstracting away the behavior of much of the program. Our key contribution is the first framework for effective refinement-based handling of object-oriented data structures; pervasive use of such data structures thwarts the effectiveness of most existing analyses and tools. Our two refinement-based techniques significantly advance the state-of-the-art in program analyses and tools for object-oriented languages. The first technique is a refinement-based points-to analysis that can compute precise answers in interactive

[1]  Alexander Aiken,et al.  Partial online cycle elimination in inclusion constraint graphs , 1998, PLDI.

[2]  Alexander Aiken,et al.  How is aliasing used in systems software? , 2006, SIGSOFT '06/FSE-14.

[3]  Manu Sridharan,et al.  Refinement-based context-sensitive points-to analysis for Java , 2006, PLDI '06.

[4]  Barbara G. Ryder,et al.  A schema for interprocedural modification side-effect analysis with pointer aliasing , 2001, TOPL.

[5]  Ondrej Lhoták,et al.  Points-to analysis using BDDs , 2003, PLDI '03.

[6]  Erik Ruf,et al.  Effective synchronization removal for Java , 2000, PLDI '00.

[7]  Laurie J. Hendren,et al.  Context-sensitive interprocedural points-to analysis in the presence of function pointers , 1994, PLDI '94.

[8]  Olivier Tardieu,et al.  Ultra-fast aliasing analysis using CLA: a million lines of C code in a second , 2001, PLDI '01.

[9]  Sigmund Cherem,et al.  Region analysis and transformation for Java programs , 2004, ISMM '04.

[10]  Jakob Rehof,et al.  Type-base flow analysis: from polymorphic subtyping to CFL-reachability , 2001, POPL '01.

[11]  Xiangyu Zhang,et al.  Pruning dynamic slices with confidence , 2006, PLDI '06.

[12]  Thomas W. Reps,et al.  Speeding up slicing , 1994, SIGSOFT '94.

[13]  Barbara G. Ryder,et al.  Points-to analysis for Java using annotated constraints , 2001, OOPSLA '01.

[14]  Zhe Yang,et al.  Modular checking for buffer overflows in the large , 2006, ICSE.

[15]  Alexander Aiken,et al.  Projection merging: reducing redundancies in inclusion constraint graphs , 2000, POPL '00.

[16]  Zhe Yang,et al.  Software validation via scalable path-sensitive value flow analysis , 2004, ISSTA '04.

[17]  Ondrej Lhoták,et al.  Scaling Java Points-to Analysis Using SPARK , 2003, CC.

[18]  Ondrej Lhoták,et al.  Context-Sensitive Points-to Analysis: Is It Worth It? , 2006, CC.

[19]  Frank Tip,et al.  A survey of program slicing techniques , 1994, J. Program. Lang..

[20]  Xiangyu Zhang,et al.  Dynamic slicing long running programs through execution fast forwarding , 2006, SIGSOFT '06/FSE-14.

[21]  Ondrej Lhoták,et al.  Program analysis using binary decision diagrams , 2006 .

[22]  Donglin Liang,et al.  Slicing objects using system dependence graphs , 1998, Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272).

[23]  Barbara G. Ryder,et al.  Parameterized object sensitivity for points-to analysis for Java , 2005, TSEM.

[24]  Robert O'Callahan,et al.  Generalized aliasing as a basis for program analysis tools , 2001 .

[25]  Michael D. Ernst,et al.  Converting java programs to use generic libraries , 2004, OOPSLA '04.

[26]  David W. Binkley,et al.  Interprocedural slicing using dependence graphs , 1988, SIGP.

[27]  David F. Bacon,et al.  Fast static analysis of C++ virtual function calls , 1996, OOPSLA '96.

[28]  Robert O'Callahan,et al.  Lackwit: A Program Understanding Tool Based on Type Inference , 1997, Proceedings of the (19th) International Conference on Software Engineering.

[29]  Craig Chambers,et al.  Making pure object-oriented languages practical , 1991, OOPSLA 1991.

[30]  Olin Shivers,et al.  Control flow analysis in scheme , 1988, PLDI '88.

[31]  Alexander Aiken,et al.  Banshee: A Scalable Constraint-Based Analysis Toolkit , 2005, SAS.

[32]  Thomas W. Reps,et al.  Undecidability of context-sensitive data-dependence analysis , 2000, TOPL.

[33]  Michael A. Harrison,et al.  Introduction to formal language theory , 1978 .

[34]  Calvin Lin,et al.  Client-Driven Pointer Analysis , 2003, SAS.

[35]  Rajiv Gupta,et al.  A practical framework for demand-driven interprocedural data flow analysis , 1997, TOPL.

[36]  Markus Mock,et al.  Improving program slicing with dynamic points-to data , 2002, SIGSOFT FSE.

[37]  Mihalis Yannakakis,et al.  Graph-theoretic methods in database theory , 1990, PODS.

[38]  Ole Agesen The Cartesian Product Algorithm: Simple and Precise Type Inference Of Parametric Polymorphism , 1995, ECOOP.

[39]  Alexander Aiken,et al.  Regularly annotated set constraints , 2007, PLDI '07.

[40]  Amer Diwan,et al.  Fast online pointer analysis , 2007, TOPL.

[41]  Alexander Aiken,et al.  Effective static race detection for Java , 2006, PLDI '06.

[42]  Phil Pfeiffer,et al.  Dependence analysis for pointer variables , 1989, PLDI '89.

[43]  Vivek Sarkar,et al.  Unified Analysis of Array and Object References in Strongly Typed Languages , 2000, SAS.

[44]  William G. Griswold,et al.  Effective whole-program analysis in the presence of pointers , 1998, SIGSOFT '98/FSE-6.

[45]  Alexander Aiken,et al.  Polymorphic versus Monomorphic Flow-Insensitive Points-to Analysis for C , 2000, SAS.

[46]  Michael I. Jordan,et al.  Statistical debugging: simultaneous identification of multiple bugs , 2006, ICML.

[47]  Vikram S. Adve,et al.  Automatic pool allocation: improving performance by controlling data structure layout in the heap , 2005, PLDI '05.

[48]  Monica S. Lam,et al.  An Efficient Inclusion-Based Points-To Analysis for Strictly-Typed Languages , 2002, SAS.

[49]  Thomas W. Reps,et al.  Solving Demand Versions of Interprocedural Analysis Problems , 1994, CC.

[50]  Xin Zheng,et al.  Demand-driven alias analysis for C , 2008, POPL '08.

[51]  Michael R. Clarkson,et al.  Polyglot: An Extensible Compiler Framework for Java , 2003, CC.

[52]  Rastislav Bodík,et al.  Jungloid mining: helping to navigate the API jungle , 2005, PLDI '05.

[53]  Mark N. Wegman,et al.  Constant propagation with conditional branches , 1985, POPL.

[54]  Nancy G. Leveson,et al.  An investigation of the Therac-25 accidents , 1993, Computer.

[55]  Monica S. Lam,et al.  Efficient context-sensitive pointer analysis for C programs , 1995, PLDI '95.

[56]  Sorin Lerner,et al.  ESP: path-sensitive program verification in polynomial time , 2002, PLDI '02.

[57]  Michael Hind,et al.  Pointer analysis: haven't we solved this problem yet? , 2001, PASTE '01.

[58]  Andreas Zeller,et al.  Isolating cause-effect chains from computer programs , 2002, SIGSOFT FSE.

[59]  Donglin Liang,et al.  Efficient Computation of Parameterized Pointer Information for Interprocedural Analyses , 2001, SAS.

[60]  Xiangyu Zhang,et al.  A study of effectiveness of dynamic slicing in locating real faults , 2006, Empirical Software Engineering.

[61]  Thomas W. Reps,et al.  Precise interprocedural dataflow analysis via graph reachability , 1995, POPL '95.

[62]  Susan Horwitz,et al.  Slicing java programs that throw and catch exceptions , 2003, PEPM '03.

[63]  Laurie Hendren,et al.  Soot---a java optimization framework , 1999 .

[64]  Andrew A. Chien,et al.  Precise concrete type inference for object-oriented languages , 1994, OOPSLA 1994.

[65]  Stephen J. Fink,et al.  The Jalapeño virtual machine , 2000, IBM Syst. J..

[66]  Jakob Rehof,et al.  From Polymorphic Subtyping to CFL Reachability: Context-Sensitive Flow Analysis Using Instantiation Constraints , 2000 .

[67]  Gregor Snelting,et al.  An improved slicer for Java , 2004, PASTE.

[68]  Stephen J. Fink,et al.  Design, implementation and evaluation of adaptive recompilation with on-stack replacement , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[69]  Manu Sridharan,et al.  Demand-driven points-to analysis for Java , 2005, OOPSLA '05.

[70]  Dawson R. Engler,et al.  A system and language for building system-specific, static analyses , 2002, PLDI '02.

[71]  Gregg Rothermel,et al.  Supporting Controlled Experimentation with Testing Techniques: An Infrastructure and its Potential Impact , 2005, Empirical Software Engineering.

[72]  Manu Sridharan,et al.  PSE: explaining program failures via postmortem static analysis , 2004, SIGSOFT '04/FSE-12.

[73]  Olivier Tardieu,et al.  Demand-driven pointer analysis , 2001, PLDI '01.

[74]  Alessandro Orso,et al.  Classifying data dependences in the presence of pointers for program comprehension, testing, and debugging , 2004, TSEM.

[75]  Benjamin Livshits,et al.  Reflection Analysis for Java , 2005, APLAS.

[76]  David Grove,et al.  A framework for call graph construction algorithms , 2001, TOPL.

[77]  Mark N. Wegman,et al.  Efficiently computing static single assignment form and the control dependence graph , 1991, TOPL.

[78]  Reinhard Wilhelm,et al.  Parametric shape analysis via 3-valued logic , 1999, POPL '99.

[79]  Eran Yahav,et al.  Effective typestate verification in the presence of aliasing , 2006, TSEM.

[80]  Scott F. Smith,et al.  Precise Constraint-Based Type Inference for Java , 2001, ECOOP.

[81]  Bjarne Steensgaard,et al.  Points-to analysis in almost linear time , 1996, POPL '96.

[82]  Thomas W. Reps,et al.  Pointer analysis for programs with structures and casting , 1999, PLDI '99.

[83]  Alexander Aiken,et al.  The set constraint/CFL reachability connection in practice , 2004, PLDI '04.

[84]  Jong-Deok Choi,et al.  Escape analysis for Java , 1999, OOPSLA '99.

[85]  Manuvir Das,et al.  Unification-based pointer analysis with directional assignments , 2000, PLDI '00.

[86]  Amer Diwan,et al.  Type-based alias analysis , 1998, PLDI.

[87]  Barbara G. Ryder Dimensions of Precision in Reference Analysis of Object-Oriented Programming Languages , 2003, CC.

[88]  Thomas W. Reps,et al.  Demand interprocedural dataflow analysis , 1995, SIGSOFT FSE.

[89]  Vikram S. Adve,et al.  Making context-sensitive points-to analysis with heap cloning practical for the real world , 2007, PLDI '07.

[90]  Lars Ole Andersen,et al.  Program Analysis and Specialization for the C Programming Language , 2005 .

[91]  Butler W. Lampson Software Components: Only the Giants Survive , 2004 .

[92]  David Grove,et al.  Optimization of Object-Oriented Programs Using Static Class Hierarchy Analysis , 1995, ECOOP.

[93]  Jens Krinke,et al.  Advanced slicing of sequential and concurrent programs , 2003, 20th IEEE International Conference on Software Maintenance, 2004. Proceedings..

[94]  Donglin Liang,et al.  Evaluating the precision of static reference analysis using profiling , 2002, ISSTA '02.

[95]  Ondrej Lhoták,et al.  Jedd: a BDD-based relational extension of Java , 2004, PLDI '04.

[96]  Donglin Liang,et al.  Extending and evaluating flow-insenstitive and context-insensitive points-to analyses for Java , 2001, PASTE '01.

[97]  Xiangyu Zhang,et al.  Efficient forward computation of dynamic slices using reduced ordered binary decision diagrams , 2004, Proceedings. 26th International Conference on Software Engineering.

[98]  Sriram K. Rajamani,et al.  Automatically validating temporal safety properties of interfaces , 2001, SPIN '01.

[99]  Thomas W. Reps,et al.  Program analysis via graph reachability , 1997, Inf. Softw. Technol..

[100]  Mark David Weiser,et al.  Program slices: formal, psychological, and practical investigations of an automatic program abstraction method , 1979 .

[101]  Jakob Rehof,et al.  Estimating the Impact of Scalable Pointer Analysis on Optimization , 2001, SAS.

[102]  Steven P. Reiss,et al.  Fault localization with nearest neighbor queries , 2003, 18th IEEE International Conference on Automated Software Engineering, 2003. Proceedings..

[103]  O. Lhoták Spark: A flexible points-to analysis framework for Java , 2002 .

[104]  C. A. R. HOARE,et al.  An axiomatic basis for computer programming , 1969, CACM.

[105]  Frank Tip,et al.  Refactoring Techniques for Migrating Applications to Generic Java Container Classes , 2004 .

[106]  Monica S. Lam,et al.  Cloning-based context-sensitive pointer alias analysis using binary decision diagrams , 2004, PLDI '04.

[107]  Frédéric Vivien,et al.  Incrementalized pointer and escape analysis , 2001, PLDI '01.

[108]  Jakob Rehof,et al.  Scalable context-sensitive flow analysis using instantiation constraints , 2000, PLDI '00.

[109]  Martin C. Rinard,et al.  Compositional pointer and escape analysis for Java programs , 1999, OOPSLA '99.

[110]  Donald E. Knuth,et al.  An empirical study of FORTRAN programs , 1971, Softw. Pract. Exp..

[111]  Jianwen Zhu,et al.  Symbolic pointer analysis revisited , 2004, PLDI '04.

[112]  Mary Jean Harrold,et al.  Slicing object-oriented software , 1996, Proceedings of IEEE 18th International Conference on Software Engineering.