论文信息 - Decoupled pre-fetching for distributed shared memory

Decoupled pre-fetching for distributed shared memory

Distributed shared memory is an architectural technique for providing a global view of memory in a distributed-store parallel machine by introducing mechanisms which make copies of remote areas of memory when required. One of the major problems of such a system is the performance penalties incurred due to the need to wait for areas of memory to be copied. This can be ameliorated to a certain extent using user annotations, compile-time analysis or run-time prediction to aid pre-fetching of data. This paper proposes a decoupled run-time technique for pre-fetching in a distributed shared memory environment which is applicable in circumstances where static analysis is difficult and the access patterns are sufficiently irregular that run-time prediction may fail. The proposal is in the form of a dual processor structure where one processor performs a partial evaluation of the program and thereby anticipates the need for data fetches before they are required by a second processor which performs the full evaluation.<<ETX>>

Ian Watson | Alasdair Rawsthorne | A. Rawsthorne | I. Watson

[1] James R. Larus,et al. Tempest and typhoon: user-level shared memory , 1994, ISCA '94.

[2] Andrew R. Pleszkun,et al. PIPE: a VLSI decoupled architecture , 1985, ISCA '85.

[3] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.

[4] T. J. Bergendahl,et al. DIGITAL EQUIPMENT CORPORATION. , 1968, Analytical chemistry.

[5] Anoop Gupta,et al. Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.

[6] Jean-Loup Baer,et al. A performance study of software and hardware data prefetching schemes , 1994, ISCA '94.

[7] W. Daniel Hillis,et al. The Network Architecture of the Connection Machine CM-5 , 1996, J. Parallel Distributed Comput..

[8] James E. Smith,et al. Decoupled access/execute computer architectures , 1984, TOCS.

[9] Alasdair Rawsthorne,et al. The effectiveness of decoupling , 1993, ICS '93.

[10] Anoop Gupta,et al. SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.

[11] Allan Porterfield,et al. The Tera computer system , 1990 .

[12] Anoop Gupta,et al. Performance evaluation of memory consistency models for shared-memory multiprocessors , 1991, ASPLOS IV.

[13] Thomas J. LeBlanc,et al. A Preliminary Evaluation of Cache-Miss-Initiated Prefetching Techniques in Scalable Multiprocessors , 1994 .

[14] Dean M. Tullsen,et al. Limitations of cache prefetching on a bus-based multiprocessor , 1993, ISCA '93.

[15] Anoop Gupta,et al. The directory-based cache coherence protocol for the DASH multiprocessor , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[16] Pen-Chung Yew,et al. : Data Prefetching In Shared Memory Multiprocessors , 1987, ICPP.

[17] Seif Haridi,et al. Data Diffusion Machine - A Scalable Shared Virtual Memory Multiprocessor , 1988, FGCS.

[18] Jean-Loup Baer,et al. An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[19] Anoop Gupta,et al. The directory-based cache coherence protocol for the DASH multiprocessor , 1990, ISCA '90.

[20] Christopher W. Fraser,et al. A code generation interface for ANSI C , 1991, Softw. Pract. Exp..

[21] Allan Porterfield,et al. The Tera computer system , 1990, ICS '90.