An Empirical Study on DOACROSS Loops

Loop-iteration level parallelism is one of the most common forms of parallelism being exploited by optimizing compilers and parallel machines. In this study, we selected 6 large application programs and used an execution-driven simulation technique from MaxPar 5] to identify and to measure the eeectiveness of concurrent DOACROSS loops execution. It was found that executing DOACROSS loops serially can signiicantly degrade the performance for some of the programs. We also measured and studied the characteristics of those cross-iteration dependences in DOACROSS loops and measured the capability of a state-of-the-art parallelizing compiler, KAP, in identifying and eliminating cross-iteration dependences.

[1]  James R. Larus,et al.  Loop-Level Parallelism in Numeric and Symbolic Programs , 1993, IEEE Trans. Parallel Distributed Syst..

[2]  Pen-Chung Yew,et al.  The impact of synchronization and granularity on parallel systems , 1990, ISCA '90.

[3]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[4]  Manoj Kumar,et al.  Measuring Parallelism in Computation-Intensive Scientific/Engineering Applications , 1988, IEEE Trans. Computers.

[5]  David A. Padua,et al.  High-Speed Multiprocessors and Compilation Techniques , 1980, IEEE Transactions on Computers.

[6]  Michael Wolfe,et al.  The KAP/205 : An Advanced Source-to-Source Vectorizer for the Cyber 205 Supercomputer , 1986, ICPP.

[7]  Ron Cytron,et al.  Doacross: Beyond Vectorization for Multiprocessors , 1986, ICPP.

[8]  Joe D. Warren,et al.  The program dependence graph and its use in optimization , 1987, TOPL.

[9]  David A. Padua,et al.  Compiler Algorithms for Synchronization , 1987, IEEE Transactions on Computers.

[10]  Charles L. Seitz,et al.  Multicomputers: message-passing concurrent computers , 1988, Computer.

[11]  Yoichi Muraoka,et al.  On the Number of Operations Simultaneously Executable in Fortran-Like Programs and Their Resulting Speedup , 1972, IEEE Transactions on Computers.

[12]  Zhiyuan Li,et al.  Compiler algorithms for event variable synchronization , 1991, ICS '91.

[13]  Nian-Feng Tzeng,et al.  Distributing Hot-Spot Addressing in Large-Scale Multiprocessors , 1987, IEEE Transactions on Computers.

[14]  Alexandru Nicolau,et al.  Using an oracle to measure potential parallelism in single instruction stream programs , 1981, MICRO 14.

[15]  David Alejandro Padua Haiek Multiprocessors: discussion of some theoretical and practical problems , 1980 .

[16]  Geoffrey C. Fox,et al.  The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers , 1989, Int. J. High Perform. Comput. Appl..

[17]  Pen-Chung Yew,et al.  Cedar architecture and its software , 1989, [1989] Proceedings of the Twenty-Second Annual Hawaii International Conference on System Sciences. Volume 1: Architecture Track.

[18]  Kevin P. McAuliffe,et al.  The IBM Research Parallel Processor Prototype (RP3): Introduction and Architecture , 1985, ICPP.