LCM: memory system support for parallel language implementation

Higher-level parallel programming languages can be difficult to implement efficiently on parallel machines. This paper shows how a flexible, compiler-controlled memory system can help achieve good performance for language constructs that previously appeared too costly to be practical. Our compiler-controlled memory system is called Loosely Coherent Memory (LCM). It is an example of a larger class of Reconcilable Shared Memory (RSM) systems, which generalize the replication and merge policies of cache-coherent shared-memory. RSM protocols differ in the action taken by a processor in response to a request for a location and the way in which a processor reconciles multiple outstanding copies of a location. LCM memory becomes temporarily inconsistent to implement the semantics of C** parallel functions efficiently. RSM provides a compiler with control over memory-system policies, which it can use to implement a language's semantics, improve performance, or detect errors. We illustrate the first two points with LCM and our compiler for the data-parallel language C**.

[1]  Willy Zwaenepoel,et al.  Munin: distributed shared memory based on type-specific memory coherence , 1990, PPOPP '90.

[2]  Alan L. Cox,et al.  Software versus hardware shared-memory implementation: a case study , 1994, ISCA '94.

[3]  James R. Larus,et al.  Mechanisms for cooperative shared memory , 1993, ISCA '93.

[4]  James R. Larus,et al.  Cooperative shared memory: software and hardware for scalable multiprocessors , 1993, TOCS.

[5]  Anoop Gupta,et al.  The Stanford FLASH multiprocessor , 1994, ISCA '94.

[6]  Michael L. Scott,et al.  False sharing and its effect on shared memory performance , 1993 .

[7]  Guy E. Blelloch,et al.  Size and access inference for data-parallel programs , 1991, PLDI '91.

[8]  Mark D. Hill,et al.  A Unified Formalization of Four Shared-Memory Models , 1993, IEEE Trans. Parallel Distributed Syst..

[9]  Barton P. Miller,et al.  Improving the accuracy of data race detection , 1991, PPOPP '91.

[10]  Guy L. Steele,et al.  Making asynchronous parallelism safe for the world , 1989, POPL '90.

[11]  Henry M. Levy,et al.  Distributed shared memory with versioned objects , 1992, OOPSLA.

[12]  James R. Larus,et al.  Tempest and typhoon: user-level shared memory , 1994, ISCA '94.

[13]  Anant Agarwal,et al.  LimitLESS directories: A scalable cache coherence scheme , 1991, ASPLOS IV.

[14]  BeltramettiMonica,et al.  The control mechanism for the Myrias parallel computer system , 1988 .

[15]  James R. Larus,et al.  Cooperative shared memory: software and hardware for scalable multiprocessor , 1992, ASPLOS V.

[16]  Monica Beltrametti,et al.  The control mechanism for the Myrias parallel computer system , 1988, CARN.

[17]  James R. Larus,et al.  Application-specific protocols for user-level shared memory , 1994, Proceedings of Supercomputing '94.

[18]  W. Zwaenepoel,et al.  Software versus hardware shared-memory implementation: a case study , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[19]  Anant Agarwal,et al.  Integrating message-passing and shared-memory: early experience , 1993, PPOPP '93.

[20]  Gary Sabot The paralation model - architecture-independent parallel programming , 1988 .

[21]  James R. Larus Compiling for shared-memory and message-passing computers , 1993, LOPL.

[22]  Rice UniversityCORPORATE,et al.  High performance Fortran language specification , 1993 .

[23]  James R. Larus C**: A Large-Grain, Object-Oriented, Data-Parallel Programming Language , 1992, LCPC.

[24]  Paul Hudak,et al.  Memory coherence in shared virtual memory systems , 1989, TOCS.

[25]  Philip J. Hatcher,et al.  Data-Parallel Programming on MIMD Computers , 1991, IEEE Trans. Parallel Distributed Syst..

[26]  Williams Ludwell Harrison,et al.  Automatic recognition of induction variables and recurrence relations by abstract interpretation , 1990, PLDI '90.

[27]  James R. Larus,et al.  Fine-grain access control for distributed shared memory , 1994, ASPLOS VI.

[28]  Andrew W. Appel,et al.  Virtual memory primitives for user programs , 1991, ASPLOS IV.

[29]  Anoop Gupta,et al.  The directory-based cache coherence protocol for the DASH multiprocessor , 1990, ISCA '90.

[30]  Joel H. Saltz,et al.  Run-time parallelization and scheduling of loops , 1989, SPAA '89.

[31]  Willy Zwaenepoel,et al.  Implementation and performance of Munin , 1991, SOSP '91.

[32]  Guy E. Blelloch,et al.  NESL: A Nested Data-Parallel Language (Version 2.6) , 1993 .