Compile-time synchronization optimizations for software DSMs

Sofware distributed-shared-memory (DSM) systems provide a desirable target for parallelizing compilers due to their flexibility. However, studies show synchronization and load imbalance are significant sources of overhead. The authors investigate the impact of compilation techniques for eliminating synchronization overhead in software DSMs, developing new algorithms to handle situations found in practice. They evaluate the contributions of synchronization elimination algorithms based on 1) dependence analysis, 2) communication analysis, 3) exploiting coherence protocols in software DSMs, and 4) aggressive expansion of parallel SPMD regions. They also found suppressing expensive parallelism to be useful for one application. Experiments indicate these techniques eliminate almost all parallel task invocations, and reduce the number of barriers executed by 66% on average. On a 16 processor IBM SP-2, speedups are improved on average by 35%, and are tripled for some applications.

[1]  Alan L. Cox,et al.  An integrated compile-time/run-time software distributed shared memory system , 1996, ASPLOS VII.

[2]  Ken Kennedy,et al.  Compiling Fortran D for MIMD distributed-memory machines , 1992, CACM.

[3]  Michael F. P. O'Boyle,et al.  Compiler reduction of synchronisation in shared virtual memory systems , 1995, ICS '95.

[4]  Kourosh Gharachorloo,et al.  Design and performance of the Shasta distributed shared memory protocol , 1997, ICS '97.

[5]  Alan L. Cox,et al.  Evaluation of release consistent software distributed shared memory on emerging network technology , 1993, ISCA '93.

[6]  Chau-Wen Tseng,et al.  Enhancing software DSM for compiler-parallelized applications , 1997, Proceedings 11th International Parallel Processing Symposium.

[7]  James R. Larus,et al.  Compiler-directed Shared-Memory Communication for Iterative Parallel Applications , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[8]  Philip J. Hatcher,et al.  Data-Parallel Programming on MIMD Computers , 1991, IEEE Trans. Parallel Distributed Syst..

[9]  M. Philippsen,et al.  Automatic synchronisation elimination in synchronous FORALLs , 1995, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation.

[10]  Ravi Mirchandaney,et al.  Improving the performance of DSM systems via compiler involvement , 1994, Proceedings of Supercomputing '94.

[11]  Willy Zwaenepoel,et al.  Implementation and performance of Munin , 1991, SOSP '91.

[12]  Philip J. Hatcher,et al.  Data-parallel programming on multicomputers , 1990, IEEE Software.

[13]  Peter J. Keleher,et al.  The relative importance of concurrent writers and weak consistency models , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[14]  Alan L. Cox,et al.  Evaluating the performance of software distributed shared memory as a target for parallelizing compilers , 1997, Proceedings 11th International Parallel Processing Symposium.

[15]  James R. Larus,et al.  Efficient support for irregular applications on distributed-memory machines , 1995, PPOPP '95.

[16]  Monica S. Lam,et al.  Detecting Coarse - Grain Parallelism Using an Interprocedural Parallelizing Compiler , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[17]  James R. Larus,et al.  Optimizing communication in HPF programs on fine-grain distributed shared memory , 1997, PPOPP '97.

[18]  Edith Schonberg,et al.  A compiler-assisted approach to SPMD execution , 1990, Proceedings SUPERCOMPUTING '90.

[19]  Alan L. Cox,et al.  Lazy release consistency for software distributed shared memory , 1992, ISCA '92.

[20]  Alan L. Cox,et al.  A performance debugger for eliminating excess synchronization in shared-memory parallel programs , 1996, Proceedings of MASCOTS '96 - 4th International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[21]  Chau-Wen Tseng,et al.  Reducing Synchronization Overhead for Compiler-Parallelized Codes , 1997, LCPC.

[22]  Harry A. G. Wijshoff,et al.  Managing pages in shared virtual memory systems: getting the compiler into the game , 1993, ICS '93.

[23]  Liviu Iftode,et al.  Relaxed consistency and coherence granularity in DSM systems: a performance evaluation , 1997, PPOPP '97.

[24]  Maneesh Dhagat,et al.  Synchronization Issues in Data-Parallel Languages , 1993, LCPC.

[25]  Zhiyuan Li,et al.  Compiler algorithms for event variable synchronization , 1991, ICS '91.

[26]  Nian-Feng Tzeng,et al.  Distributed shared memory systems with improved barrier synchronization and data transfer , 1997, ICS '97.

[27]  Alan L. Cox,et al.  Compiler and software distributed shared memory support for irregular applications , 1997, PPOPP '97.

[28]  Guy L. Steele,et al.  The High Performance Fortran Handbook , 1993 .

[29]  Peiyi Tang,et al.  Compiler techniques for data synchronization in nested parallel loops , 1990, ICS '90.

[30]  Michael F. P. O'Boyle,et al.  A graph based approach to barrier synchronisation minimisation , 1997, ICS '97.

[31]  Chau-Wen Tseng,et al.  Compiler optimizations for eliminating barrier synchronization , 1995, PPOPP '95.