Performance Analysis of Parallelizing Compilers on the Perfect Benchmarks Programs

The speedups of the Perfect Benchmarks codes that result from automatic parallelization are reported. The performance gains caused by individual restructuring techniques have also been measured. Specific reasons for the successes and failures of the transformations are discussed, and potential improvements that result in measurably better program performance are analyzed. The most important findings are that available restructurers often cause insignificant performance gains in real programs and that only few restructuring techniques contribute to this gain. However, it can be shown that there is potential for advancing compiler technology so that many of the most important loops in these programs can be parallelized. >

[1]  Jordi Torres,et al.  GTS: parallelization and vectorization of tight recurrences , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[2]  Williams Ludwell Harrison,et al.  Automatic recognition of induction variables and recurrence relations by abstract interpretation , 1990, PLDI '90.

[3]  Geoffrey C. Fox,et al.  The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers , 1989, Int. J. High Perform. Comput. Appl..

[4]  R. N. Braswell,et al.  An evaluation of vector Fortran 200 generated by Cyber 205 and ETA-10 pre-compilation tools , 1988, Proceedings. SUPERCOMPUTING '88.

[5]  Ulrich Detert Programmiertechniken für die Vektorisierung , 1987, PIK Prax. Informationsverarbeitung Kommun..

[6]  Hiroki Honda,et al.  A Compilation Scheme for Macro-Dataflow Computation on Hierarchical Multiprocessor Systems , 1990, ICPP.

[7]  David A. Padua,et al.  Restructuring Fortran programs for Cedar , 1993, Concurr. Pract. Exp..

[8]  Zhiyuan Li,et al.  On Reducing Data Synchronization in Multiprocessed Loops , 1987, IEEE Transactions on Computers.

[9]  Alexander V. Veidenbaum,et al.  The effect of restructing compilers on program performance for high-speed computers☆ , 1985 .

[10]  Christopher Eoyang,et al.  A comparison study of automatically vectorizing Fortran compilers , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[11]  P. Sadayappan,et al.  Removal of Redundant Dependences in DOACROSS Loops with Constant Dependences , 1991, IEEE Trans. Parallel Distributed Syst..

[12]  Alexander Aiken,et al.  Optimal loop parallelization , 1988, PLDI '88.

[13]  Pen-Chung Yew,et al.  A Scheme to Enforce Data Dependence on Large Multiprocessor Systems , 1987, IEEE Trans. Software Eng..

[14]  L HennessyJohn,et al.  Efficient and exact data dependence analysis , 1991 .

[15]  Michael Wolfe,et al.  More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[16]  David Padua,et al.  Machine-Independent Evaluation of Parallelizing Compilers , 1992 .

[17]  Corinne Ancourt,et al.  Scanning polyhedra with DO loops , 1991, PPOPP '91.

[18]  Clifford N. Arnold,et al.  Performance evaluation of three automatic vectorizer packages , 1982, ICPP.

[19]  Milind Girkar,et al.  Optimization of Data/Control Conditions in Task Graphs , 1991, LCPC.

[20]  David A. Padua,et al.  Cedar Fortran and Its Compiler , 1990, CONPAR.

[21]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[22]  Jack J. Dongarra,et al.  Vectorizing compilers: a test suite and results , 1988, Proceedings. SUPERCOMPUTING '88.

[23]  Doreen Cheng,et al.  An evaluation of automatic and interactive parallel programming tools , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[24]  Harry Berryman,et al.  Runtime Compilation Methods for Multicomputers , 1991, ICPP.

[25]  François Irigoin,et al.  Supernode partitioning , 1988, POPL '88.

[26]  David A. Padua,et al.  Experience in the Automatic Parallelization of Four Perfect-Benchmark Programs , 1991, LCPC.

[27]  Ken Kennedy,et al.  Practical dependence testing , 1991, PLDI '91.

[28]  David A. Padua,et al.  Compiler Algorithms for Synchronization , 1987, IEEE Transactions on Computers.

[29]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[30]  Milind Girkar,et al.  Automatic Extraction of Functional Parallelism from Ordinary Programs , 1992, IEEE Trans. Parallel Distributed Syst..

[31]  David A. Padua,et al.  Dynamic Dependence Analysis: A Novel Method for Data Depndence Evaluation , 1992, LCPC.

[32]  David A. Padua,et al.  Advanced compiler optimizations for supercomputers , 1986, CACM.