Expert Programmer versus Parallelizing Compiler: A Comparative Study of Two Approaches for Distributed Shared Memory

This article critically examines current parallel programming practice and optimizing compiler development. The general strategies employed by compiler and programmer to optimize a Fortran program are described, and then illustrated for a specific case by applying them to a well-known scientific program, TRED2, using the KSR-1 as the target architecture. Extensive measurement is applied to the resulting versions of the program, which are compared with a version produced by a commercial optimizing compiler, KAP. The compiler strategy significantly outperforms KAP and does not fall far short of the performance achieved by the programmer. Following the experimental section each approach is critiqued by the other. Perceived flaws, advantages, and common ground are outlined, with an eye to improving both schemes.

[1]  Lee-Chung Lu,et al.  A unified framework for systematic loop transformations , 1991, PPOPP '91.

[2]  A PaduaDavid,et al.  Advanced compiler optimizations for supercomputers , 1986 .

[3]  Graham D. Riley,et al.  Parallelization of a Three-Dimensional Shallow-Water Estuary Model on the KSR-1 , 1995, Sci. Program..

[4]  Helmar Burkhart,et al.  Performance-Measurement Tools in a Multiprocessor Environment , 1989, IEEE Trans. Computers.

[5]  Michael F. P. O'Boyle A Data Partitioning Algorithm for Distributed Memory Compilation , 1994, PARLE.

[6]  William Pugh,et al.  Generating schedules and code within a unified reordering transformation framework , 1992 .

[7]  Alexandru Nicolau,et al.  Advances in languages and compilers for parallel processing , 1991 .

[8]  Jingke Li,et al.  Index domain alignment: minimizing cost of cross-referencing between distributed arrays , 1990, [1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation.

[9]  Manish Gupta,et al.  Automatic Data Partitioning on Distributed Memory Multicomputers , 1992 .

[10]  William Pugh,et al.  Eliminating false data dependences using the Omega test , 1992, PLDI '92.

[11]  Michael O'Boyle,et al.  Program and data transformations for efficient execution on distributed memory architectures , 1993, Technical report series.

[12]  Graham D. Riley,et al.  Parallelisation of the SDEM distinct element stress analysis code on the KSR-1 , 1994, ICS '94.

[13]  Mark Crovella,et al.  The Search for Lost Cycles: A New Approach to Parallel Program Performance Evaluation , 1993 .

[14]  Michael F. P. O'Boyle A hierarchical locality algorithm for NUMA compilation , 1995, Proceedings Euromicro Workshop on Parallel and Distributed Processing.

[15]  William Pugh,et al.  Uniform techniques for loop optimization , 1991, ICS '91.

[16]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.