Decoupled pre-fetching for distributed shared memory

Distributed shared memory is an architectural technique for providing a global view of memory in a distributed-store parallel machine by introducing mechanisms which make copies of remote areas of memory when required. One of the major problems of such a system is the performance penalties incurred due to the need to wait for areas of memory to be copied. This can be ameliorated to a certain extent using user annotations, compile-time analysis or run-time prediction to aid pre-fetching of data. This paper proposes a decoupled run-time technique for pre-fetching in a distributed shared memory environment which is applicable in circumstances where static analysis is difficult and the access patterns are sufficiently irregular that run-time prediction may fail. The proposal is in the form of a dual processor structure where one processor performs a partial evaluation of the program and thereby anticipates the need for data fetches before they are required by a second processor which performs the full evaluation.<<ETX>>

[1]  James R. Larus,et al.  Tempest and typhoon: user-level shared memory , 1994, ISCA '94.

[2]  Andrew R. Pleszkun,et al.  PIPE: a VLSI decoupled architecture , 1985, ISCA '85.

[3]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[4]  T. J. Bergendahl,et al.  DIGITAL EQUIPMENT CORPORATION. , 1968, Analytical chemistry.

[5]  Anoop Gupta,et al.  Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.

[6]  Jean-Loup Baer,et al.  A performance study of software and hardware data prefetching schemes , 1994, ISCA '94.

[7]  W. Daniel Hillis,et al.  The Network Architecture of the Connection Machine CM-5 , 1996, J. Parallel Distributed Comput..

[8]  James E. Smith,et al.  Decoupled access/execute computer architectures , 1984, TOCS.

[9]  Alasdair Rawsthorne,et al.  The effectiveness of decoupling , 1993, ICS '93.

[10]  Anoop Gupta,et al.  SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.

[11]  Allan Porterfield,et al.  The Tera computer system , 1990 .

[12]  Anoop Gupta,et al.  Performance evaluation of memory consistency models for shared-memory multiprocessors , 1991, ASPLOS IV.

[13]  Thomas J. LeBlanc,et al.  A Preliminary Evaluation of Cache-Miss-Initiated Prefetching Techniques in Scalable Multiprocessors , 1994 .

[14]  Dean M. Tullsen,et al.  Limitations of cache prefetching on a bus-based multiprocessor , 1993, ISCA '93.

[15]  Anoop Gupta,et al.  The directory-based cache coherence protocol for the DASH multiprocessor , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[16]  Pen-Chung Yew,et al.  : Data Prefetching In Shared Memory Multiprocessors , 1987, ICPP.

[17]  Seif Haridi,et al.  Data Diffusion Machine - A Scalable Shared Virtual Memory Multiprocessor , 1988, FGCS.

[18]  Jean-Loup Baer,et al.  An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[19]  Anoop Gupta,et al.  The directory-based cache coherence protocol for the DASH multiprocessor , 1990, ISCA '90.

[20]  Christopher W. Fraser,et al.  A code generation interface for ANSI C , 1991, Softw. Pract. Exp..

[21]  Allan Porterfield,et al.  The Tera computer system , 1990, ICS '90.