Commutativity analysis: a new analysis technique for parallelizing compilers

This article presents a new analysis technique, commutativity analysis, for automatically parallelizing computations that manipulate dynamic, pointer-based data structures. Commutativity analysis views the computation as composed of operations on objects. It then analyzes the program at this granularity to discover when operations commute (i.e., generate the same final result regardless of the order in which they execute). If all of the operations required to perform a given computation commute, the compiler can automatically generate parallel code. We have implemented a prototype compilation system that uses commutativity analysis as its primary analysis technique. We have used this system to automatically parallelize three complete scientific computations: the Barnes-Hut N-body solver, the Water liquid simulation code, and the String seismic simulation code. This article presents performance results for the generated parallel code running on the Stanford DASH machine. These results provide encouraging evidence that commutativity analysis can serve as the basis for a successful parallelizing compiler.

[1]  Klaus-Peter Löhr,et al.  Object-Oriented Concurrent Programming , 1992, TOOLS.

[2]  CONSTANTINE D. POLYCHRONOPOULOS,et al.  Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers , 1987, IEEE Transactions on Computers.

[3]  Alexandru Nicolau,et al.  A general data dependence test for dynamic, pointer-based data structures , 1994, PLDI '94.

[4]  James R. Larus,et al.  Detecting conflicts between structure accesses , 1988, PLDI '88.

[5]  Butler W. Lampson,et al.  Experience with processes and monitors in Mesa , 1980, CACM.

[6]  John K. Salmon,et al.  Parallel hierarchical N-body methods , 1992 .

[7]  Dennis Gannon,et al.  Sage++: An Object-Oriented Toolkit and Class Library for Building Fortran and C++ Restructuring Tool , 1994 .

[8]  Martin C. Rinard,et al.  Implicitly synchronized abstract data types: data structures for modular parallel programming , 1998, J. Program. Lang..

[9]  Allan L. Fisher,et al.  Parallelizing complex scans and reductions , 1994, PLDI '94.

[10]  Laurie J. Hendren,et al.  Context-sensitive interprocedural points-to analysis in the presence of function pointers , 1994, PLDI '94.

[11]  James C. King Program reduction using symbolic execution , 1981, SOEN.

[12]  Alexandru Nicolau,et al.  Abstractions for recursive pointer data structures: improving the analysis and transformation of imperative programs , 1992, PLDI '92.

[13]  Monica S. Lam,et al.  Coarse-grain parallel programming in Jade , 1991, PPOPP '91.

[14]  Rudolf Eigenmann,et al.  Performance Analysis of Parallelizing Compilers on the Perfect Benchmarks Programs , 1992, IEEE Trans. Parallel Distributed Syst..

[15]  Utpal Banerjee,et al.  Dependence analysis for supercomputing , 1988, The Kluwer international series in engineering and computer science.

[16]  Barbara G. Ryder,et al.  Interprocedural modification side effect analysis with pointer aliasing , 1993, PLDI '93.

[17]  John Darlington,et al.  A semantic approach to automatic program improvement , 1972 .

[18]  Oscar H. Ibarra,et al.  On the Complexity of Commutativity Analysis , 1996, Int. J. Found. Comput. Sci..

[19]  Mary W. Hall,et al.  Detecting Coarse - Grain Parallelism Using an Interprocedural Parallelizing Compiler , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[20]  Piet Hut,et al.  A hierarchical O(N log N) force-calculation algorithm , 1986, Nature.

[21]  Guy L. Steele,et al.  Making asynchronous parallelism safe for the world , 1989, POPL '90.

[22]  Andrew A. Chien,et al.  Analysis of Dynamic Structures for Efficient Parallel Execution , 1993, LCPC.

[23]  Christian Lengauer,et al.  A Methodology for Programming with Concurrency: An Informal Presentation , 1982, Sci. Comput. Program..

[24]  Tim Beardsley,et al.  Strategic Defense Initiative: Academicians doubt efficacy , 1986, Nature.

[25]  Reinaldo J. Michelena,et al.  Tomographic string inversion , 1990 .

[26]  Martin C. Rinard,et al.  Commutativity analysis: a technique for automatically parallelizing pointer-based computations , 1996, Proceedings of International Conference on Parallel Processing.

[27]  William E. Weihl,et al.  Commutativity-based concurrency control for abstract data types , 1988, [1988] Proceedings of the Twenty-First Annual Hawaii International Conference on System Sciences. Volume II: Software track.

[28]  D. K. Gifford,et al.  FX-87 PERFORMANCE MEASUREMENTS: DATAFLOW IMPLEMENTATION , 1988 .

[29]  Ken Kennedy,et al.  Compiling Fortran D for MIMD distributed-memory machines , 1992, CACM.

[30]  Rudolf Eigenmann,et al.  Symbolic range propagation , 1995, Proceedings of 9th International Parallel Processing Symposium.

[31]  Daniel E. Lenoski,et al.  The design and analysis of DASH: a scalable directory-based multiprocessor , 1992 .

[32]  Akinori Yonezawa,et al.  Object-oriented concurrent programming ABCL/1 , 1986, OOPLSA '86.

[33]  Ron Y. Pinter,et al.  Program optimization and parallelization using idioms , 1991, POPL '91.

[34]  Robert H. Halstead,et al.  Lazy task creation: a technique for increasing the granularity of parallel programs , 1990, LISP and Functional Programming.

[35]  Rudolf Eigenmann,et al.  Automatic program parallelization , 1993, Proc. IEEE.

[36]  Martin C. Rinard,et al.  Lock Coarsening: Eliminating Lock Overhead in Automatically Parallelized Object-Based Programs , 1996, LCPC.

[37]  Anne Rogers,et al.  Software caching and computation migration in Olden , 1995, PPOPP '95.

[38]  David A. Padua,et al.  Experience in the Automatic Parallelization of Four Perfect-Benchmark Programs , 1991, LCPC.

[39]  Martin C. Rinard Communication Optimizations for Parallel Computing Using Data Access Information , 1995, SC.

[40]  James C. King,et al.  Symbolic execution and program testing , 1976, CACM.

[41]  Anoop Gupta,et al.  SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.

[42]  William Pugh,et al.  Eliminating false data dependences using the Omega test , 1992, PLDI '92.

[43]  Geoffrey C. Fox,et al.  The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers , 1989, Int. J. High Perform. Comput. Appl..

[44]  Susan L. Graham,et al.  Gprof: A call graph execution profiler , 1982, SIGPLAN '82.

[45]  Donald E. Knuth,et al.  An empirical study of FORTRAN programs , 1971, Softw. Pract. Exp..

[46]  Allan L. Fisher,et al.  Flattening and parallelizing irregular, recurrent loop nests , 1995, PPOPP '95.

[47]  Arthur J. Bernstein,et al.  Analysis of Programs for Parallel Processing , 1966, IEEE Trans. Electron. Comput..

[48]  James H. Patterson,et al.  Portable Programs for Parallel Processors , 1987 .

[49]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[50]  Monica S. Lam,et al.  The design and evaluation of a shared object system for distributed memory machines , 1994, OSDI '94.

[51]  Richard A. Kemmerer,et al.  Aslantest: a symbolic execution tool for testing Aslan formal specifications , 1994, ISSTA '94.

[52]  Anoop Gupta,et al.  Data locality and load balancing in COOL , 1993, PPOPP '93.

[53]  Martin Rinard,et al.  The design, implementation and evaluation of Jade: a portable, implicitly parallel programming language , 1994 .

[54]  Monica S. Lam,et al.  Efficient context-sensitive pointer analysis for C programs , 1995, PLDI '95.

[55]  D. Callahan,et al.  Recognizing and Parallelizing Bounded Recurrences , 1991, LCPC.

[56]  Chau-Wen Tseng,et al.  Compiler optimizations for eliminating barrier synchronization , 1995, PPOPP '95.

[57]  Jaswinder Pal Singh,et al.  Hierarchical n-body methods and their implications for multiprocessors , 1993 .

[58]  Richard A. Kemmerer,et al.  Unisex: A unix‐based symbolic executor for pascal , 1985, Softw. Pract. Exp..