Modular data-flow analysis of statically typed object-oriented programming languages

The solution of data-flow analysis of object-oriented programming languages such as C++/Java is needed for many important applications: aggressive code optimization, side-effect analysis, program specialization, program slicing and data-flow-based testing. However, data-flow analysis of object-oriented programming languages is difficult due to a large number of heap-allocated objects whose fields point to other heap-allocated objects (recursive structures), dynamic dispatch, frequent method invocations, a large number of methods, many invocation contexts per method and exceptions. In this thesis we present a new data-flow analysis technique called Relevant Context Inference (RCI) for modular, flow- and context-sensitive data-flow analysis of statically typed object-oriented programming languages such as C++ and Java. This technique has been designed to overcome the above difficulties. RCI has several long sought-after characteristics: (1) It can analyze programs by keeping only a part of the programs in memory at a time, with a constant bound on the number of times a procedure needs to be in memory. (2) It can analyze incomplete programs such as libraries. (3) It can analyze programs that have exceptions. We have built a prototype of RCI for points-to analysis of C++ programs. The empirical results obtained using this prototype and presented in this thesis show that RCI is effective in practice. We present several new complexity characterizations of points-to analysis in the presence of object-oriented language constructs: exceptions and dynamic dispatch. Our results clearly identify the difficult features and indicate approximations that any efficient algorithm has to make. We also present a new approach to data-flow-based testing of object-oriented libraries using RCI. We show how the information computed by RCI can be used for generating relevant test cases.

[1]  Andrew A. Chien,et al.  Precise Concrete Type Inference for Object-Oriented Languages , 1994, OOPSLA.

[2]  Barbara G. Ryder,et al.  Performing data flow analysis in parallel , 1990, Proceedings SUPERCOMPUTING '90.

[3]  Jeffrey D. Ullman,et al.  Global Data Flow Analysis and Iterative Algorithms , 1976, J. ACM.

[4]  Thomas P. Murtagh,et al.  Lifetime analysis of dynamically allocated objects , 1988, POPL '88.

[5]  Frank Tip,et al.  A survey of program slicing techniques , 1994, J. Program. Lang..

[6]  Phyllis G. Frankl,et al.  An Experimental Comparison of the Effectiveness of Branch Testing and Data Flow Testing , 1993, IEEE Trans. Software Eng..

[7]  Amer Diwan,et al.  Simple and effective analysis of statically-typed object-oriented programs , 1996, OOPSLA '96.

[8]  Lori A. Clarke,et al.  A comparison of data flow path selection criteria , 1985, ICSE '85.

[9]  Mary Lou Soffa,et al.  Interprocedual data flow testing , 1989 .

[10]  Barbara G. Ryder,et al.  Data-flow-based Testing of Object-Oriented Libraries , 1999 .

[11]  Jong-Deok Choi,et al.  Slicing class hierarchies in C++ , 1996, OOPSLA '96.

[12]  Elaine J. Weyuker,et al.  An Applicable Family of Data Flow Testing Criteria , 1988, IEEE Trans. Software Eng..

[13]  Susan Horwitz,et al.  Fast and accurate flow-insensitive points-to analysis , 1997, POPL '97.

[14]  Matthew S. Hecht,et al.  Flow Analysis of Computer Programs , 1977 .

[15]  Barbara G. Ryder,et al.  Program decomposition for pointer aliasing: a step toward practical analyses , 1996, SIGSOFT '96.

[16]  Elaine J. Weyuker,et al.  Selecting Software Test Data Using Data Flow Information , 1985, IEEE Transactions on Software Engineering.

[17]  Alexandru Nicolau,et al.  Parallelizing Programs with Recursive Data Structures , 1989, IEEE Trans. Parallel Distributed Syst..

[18]  Thomas J. Ostrand,et al.  Experiments on the effectiveness of dataflow- and control-flow-based test adequacy criteria , 1994, Proceedings of 16th International Conference on Software Engineering.

[19]  Barbara G. Ryder,et al.  Interprocedural modification side effect analysis with pointer aliasing , 1993, PLDI '93.

[20]  Jong-Deok Choi,et al.  Efficient flow-sensitive interprocedural computation of pointer-induced aliases and side effects , 1993, POPL '93.

[21]  Elaine J. Weyuker,et al.  Data flow-based test adequacy analysis for languages with pointers , 1991, TAV4.

[22]  G. Ramalingam,et al.  The undecidability of aliasing , 1994, TOPL.

[23]  David Grove,et al.  Optimization of Object-Oriented Programs Using Static Class Hierarchy Analysis , 1995, ECOOP.

[24]  Phyllis G. Frankl,et al.  Further empirical studies of test effectiveness , 1998, SIGSOFT '98/FSE-6.

[25]  David Grove,et al.  Call graph construction in object-oriented languages , 1997, OOPSLA '97.

[26]  Barbara G. Ryder,et al.  Interprocedural Def-Use Associations for C Systems with Single Level Pointers , 1994, IEEE Trans. Software Eng..

[27]  Barbara G. Ryder,et al.  Relevant context inference , 1999, POPL '99.

[28]  Amer Diwan,et al.  Type-based alias analysis , 1998, PLDI.

[29]  Jong-Deok Choi,et al.  Interprocedural pointer alias analysis , 1999, TOPL.

[30]  Matthias Felleisen,et al.  Set-Based Analysis for Full Scheme and Its Use in Soft-Typing , 1995 .

[31]  David W. Binkley,et al.  Interprocedural slicing using dependence graphs , 1990, TOPL.

[32]  Neil D. Jones,et al.  Space-Bounded Reducibility among Combinatorial Problems , 1975, J. Comput. Syst. Sci..

[33]  David F. Bacon,et al.  Fast static analysis of C++ virtual function calls , 1996, OOPSLA '96.

[34]  Nevin Charles Heintze,et al.  Set based program analysis , 1992 .

[35]  Norihisa Suzuki,et al.  Inferring types in Smalltalk , 1981, POPL '81.

[36]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[37]  Barbara G. Ryder,et al.  An efficient hybrid algorithm for incremental data flow analysis , 1989, POPL '90.

[38]  Neil D. Jones,et al.  A flexible approach to interprocedural data flow analysis and programs with recursive data structures , 1982, POPL '82.

[39]  David Grove,et al.  Fast interprocedural class analysis , 1998, POPL '98.

[40]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[41]  Gary A. Kildall,et al.  A unified approach to global program optimization , 1973, POPL.

[42]  Barbara G. Ryder,et al.  Complexity of Concrete Type-Inference in the Presence of Exceptions , 1998, ESOP.

[43]  Guy L. Steele,et al.  The Java Language Specification , 1996 .

[44]  Rajiv Gupta,et al.  Refining data flow information using infeasible paths , 1997, ESEC '97/FSE-5.

[45]  Monica S. Lam,et al.  Efficient context-sensitive pointer analysis for C programs , 1995, PLDI '95.

[46]  Frank Tip,et al.  Class hierarchy specialization , 1997, OOPSLA '97.

[47]  William Landi,et al.  Undecidability of static analysis , 1992, LOPL.

[48]  Matthias Felleisen,et al.  Componential set-based analysis , 1997, TOPL.

[49]  Neil D. Jones,et al.  Flow analysis and optimization of LISP-like structures , 1979, POPL.

[50]  Nevin Heintze,et al.  Set-based analysis of ML programs , 1994, LFP '94.

[51]  Gregg Rothermel,et al.  Separate Computation of Alias Information for Reuse , 1996, IEEE Trans. Software Eng..

[52]  Mary Lou Soffa,et al.  Efficient computation of interprocedural definition-use chains , 1994, TOPL.

[53]  Jeffrey M. Barth A practical interprocedural data flow analysis algorithm , 1978, CACM.

[54]  Barbara G. Ryder,et al.  Comparing flow and context sensitivity on the modification-side-effects problem , 1998, ISSTA '98.

[55]  Barbara G. Ryder,et al.  Data-Flow-Based Virtual Function Resolution , 1996, SAS.

[56]  Leon J. Osterweil,et al.  Data Flow Analysis in Software Reliability , 1976, CSUR.

[57]  Barbara G. Ryder,et al.  Pointer-induced aliasing: a problem classification , 1991, POPL '91.

[58]  Olin Shivers,et al.  Control flow analysis in scheme , 1988, PLDI '88.

[59]  Ole Agesen The Cartesian Product Algorithm: Simple and Precise Type Inference Of Parametric Polymorphism , 1995, ECOOP.

[60]  Jens Palsberg,et al.  Object-oriented type inference , 1991, OOPSLA '91.

[61]  Jong-Deok Choi,et al.  Pointer-induced aliasing: a clarification , 1993, SIGP.

[62]  Dirk Grunwald,et al.  Quantifying Behavioral Differences Between C and C++ Programs , 1994 .

[63]  Erik Ruf,et al.  Context-insensitive alias analysis reconsidered , 1995, PLDI '95.

[64]  Janusz W. Laski,et al.  A Data Flow Oriented Program Testing Strategy , 1983, IEEE Transactions on Software Engineering.

[65]  Alon Itai,et al.  The Complexity of Type Analysis of Object Oriented Programs , 1998, ECOOP.

[66]  Bjarne Steensgaard,et al.  Points-to analysis in almost linear time , 1996, POPL '96.

[67]  Mary F. Fernández,et al.  Simple and effective link-time optimization of Modula-3 programs , 1995, PLDI '95.

[68]  Laurie J. Hendren,et al.  Context-sensitive interprocedural points-to analysis in the presence of function pointers , 1994, PLDI '94.

[69]  William Landi,et al.  Interprocedural aliasing in the presence of pointers , 1992 .

[70]  William E. Weihl,et al.  Interprocedural data flow analysis in the presence of pointers, procedure variables, and label variables , 1980, POPL '80.

[71]  Thomas W. Reps,et al.  Precise interprocedural dataflow analysis via graph reachability , 1995, POPL '95.