Aggregate structure identification and its application to program analysis

In this paper, we describe an efficient algorithm for lazily decomposing aggregates such as records and arrays into simpler components based on the access patterns specific to a given program. This process allows us both to identify implicit aggregate structure not evident from declarative information in the program, and to simplify the representation of declared aggregates when references are made only to a subset of their components. We show that the structure identification process can be exploited to yield the following principal results: - A fast type analysis algorithm applicable to program maintenance applications such as date usage inference for the "Year 2000" problem. - An efficient algorithm for atomization of aggregates. Given a program, an aggregate atomization decomposes all of the data that can be manipulated by the program into a set of disjoint atoms such that each data reference can be modeled as one or more references to atoms without loss of semantic information. Aggregate atomization can be used to adapt program analyses and representations designed for scalar data to aggregate data. In particular, atomization can be used to build more precise versions of program representations such as SSA form or PDGs. Such representations can in turn yield more accurate results for problems such as program slicing.Our techniques are especially useful in weakly-typed languages such as Cobol (where a variable need not be declared as an aggregate to store an aggregate value) and in languages where references to statically-defined subranges of data such as arrays or strings are allowed.

[1]  Mark David Weiser,et al.  Program slices: formal, psychological, and practical investigations of an automatic program abstraction method , 1979 .

[2]  Bogdan Korel,et al.  Dynamic program slicing in understanding of program execution , 1997, Proceedings Fifth International Workshop on Program Comprehension. IWPC'97.

[3]  Robert E. Tarjan,et al.  Data structures and network algorithms , 1983, CBMS-NSF regional conference series in applied mathematics.

[4]  Alexander Aiken,et al.  Solving systems of set constraints , 1992, [1992] Proceedings of the Seventh Annual IEEE Symposium on Logic in Computer Science.

[5]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[6]  Bjarne Steensgaard Points-to Analysis by Type Inference of Programs with Structures and Unions , 1996, CC.

[7]  Arie van Deursen,et al.  Type inference for COBOL systems , 1998, Proceedings Fifth Working Conference on Reverse Engineering (Cat. No.98TB100261).

[8]  Mark Weiser,et al.  Program Slicing , 1981, IEEE Transactions on Software Engineering.

[9]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[10]  Joseph Robert Horgan,et al.  Dynamic program slicing , 1990, PLDI '90.

[11]  Kevin Knight,et al.  Unification: a multidisciplinary survey , 1989, CSUR.

[12]  Alexander Aiken,et al.  Program Analysis Using Mixed Term and Set Constraints , 1997, SAS.

[13]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[14]  Susan Horwitz,et al.  Fast and accurate flow-insensitive points-to analysis , 1997, POPL '97.

[15]  Akihiko Matsuo,et al.  Variable Classification Technique and Application to the Year 2000 Problem , 1998 .

[16]  Robin Milner,et al.  A Theory of Type Polymorphism in Programming , 1978, J. Comput. Syst. Sci..

[17]  Frank Tip,et al.  A survey of program slicing techniques , 1994, J. Program. Lang..

[18]  Bogdan Korel,et al.  Program slicing in understanding of large programs , 1998, Proceedings. 6th International Workshop on Program Comprehension. IWPC'98 (Cat. No.98TB100242).

[19]  Akihiko Matsuo,et al.  Variable classification technique for software maintenance and application to the year 2000 problem , 1998, Proceedings of the Second Euromicro Conference on Software Maintenance and Reengineering.

[20]  Gregg Rothermel,et al.  System-dependence-graph-based slicing of programs with arbitrary interprocedural control flow , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[21]  Robert O'Callahan,et al.  Lackwit: A Program Understanding Tool Based on Type Inference , 1997, Proceedings of the (19th) International Conference on Software Engineering.

[22]  G. A. Venkatesh,et al.  The semantic approach to program slicing , 1991, PLDI '91.

[23]  Mark N. Wegman,et al.  Efficiently computing static single assignment form and the control dependence graph , 1991, TOPL.

[24]  Joe D. Warren,et al.  The program dependence graph and its use in optimization , 1987, TOPL.

[25]  Alexander Aiken,et al.  The Complexity of Set Constraints , 1993, CSL.

[26]  Katsuro Inoue,et al.  Slicing methods using static and dynamic analysis information , 1999, Proceedings Sixth Asia Pacific Software Engineering Conference (ASPEC'99) (Cat. No.PR00509).

[27]  Bjarne Steensgaard,et al.  Points-to analysis in almost linear time , 1996, POPL '96.

[28]  V. Rich Personal communication , 1989, Nature.