Compilation and Runtime-Optimizations for Software Distributed Shared Memory

We present two novel optimizations for compiling High Performance Fortran (HPF) to page-based software distributed shared memory systems (SDSM). One technique, compiler-managed restricted consistency, uses compiler-derived knowledge to delay the application of memory consistency operations to data that is provably not shared in the current synchronization interval, thus reducing false sharing. The other technique, compiler-managed shared buffers, when combined with the previous optimization, eliminates fragmentation. Together, the two techniques permit compiler-generated code to efficiently apply multi-dimensional computation partitioning and wavefront parallelism to execute efficiently on SDSM systems.

[1]  Chau-Wen Tseng,et al.  Enhancing software DSM for compiler-parallelized applications , 1997, Proceedings 11th International Parallel Processing Symposium.

[2]  Bo Lu,et al.  Compiler optimization of implicit reductions for distributed memory multiprocessors , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[3]  Alan L. Cox,et al.  An integrated compile-time/run-time software distributed shared memory system , 1996, ASPLOS VII.

[4]  Alan L. Cox,et al.  An Evaluation of Software-Based Release Consistent Protocols , 1995, J. Parallel Distributed Comput..

[5]  Chau-Wen Tseng,et al.  Compile-time synchronization optimizations for software DSMs , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[6]  Robert J. Fowler,et al.  Message-driven relaxed consistency in a software distributed shared memory , 1994, OSDI '94.

[7]  Anoop Gupta,et al.  Integration of message passing and shared memory in the Stanford FLASH multiprocessor , 1994, ASPLOS VI.

[8]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[9]  David A. Wood,et al.  Decoupled Hardware Support for Distributed Shared Memory , 1996, ISCA.

[10]  Alan L. Cox,et al.  Compiler and software distributed shared memory support for irregular applications , 1997, PPOPP '97.

[11]  Alan L. Cox,et al.  Evaluation of release consistent software distributed shared memory on emerging network technology , 1993, ISCA '93.

[12]  Alan L. Cox,et al.  ThreadMarks: Shared Memory Computing on Networks of Workstations , 1996, Computer.

[13]  Vikram S. Adve,et al.  Using integer sets for data-parallel program analysis and optimization , 1998, PLDI.

[14]  Kourosh Gharachorloo,et al.  Shasta: a low overhead, software-only approach for supporting fine-grain shared memory , 1996, ASPLOS VII.

[15]  Kai Zhang Compiling for software distributed-shared memory systems , 2000 .

[16]  James R. Larus,et al.  Fine-grain access control for distributed shared memory , 1994, ASPLOS VI.

[17]  Alan L. Cox,et al.  A comparison of entry consistency and lazy release consistency implementations , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[18]  Brian N. Bershad,et al.  The Midway distributed shared memory system , 1993, Digest of Papers. Compcon Spring.

[19]  Kirk L. Johnson,et al.  CRL: high-performance all-software distributed shared memory , 1995, SOSP.

[20]  Ravi Mirchandaney,et al.  Improving the performance of DSM systems via compiler involvement , 1994, Proceedings of Supercomputing '94.

[21]  James R. Larus,et al.  Optimizing communication in HPF programs on fine-grain distributed shared memory , 1997, PPOPP '97.

[22]  Vikram S. Adve,et al.  High Performance Fortran Compilation Techniques for Parallelizing Scientific Codes , 1998, Proceedings of the IEEE/ACM SC98 Conference.