Automatic scheduling for cache only memory architectures

For parallel and distributed systems to gain more acceptance than they have to date, they will need to be scalable, affordable-but most importantly, they must be made as easy to program as sequential systems. Ideally, we would like to be able to take programs written in conventional languages and recompile them for parallel architectures, thus freeing the programmer from all additional effort above and beyond that necessary to program a conventional computer. This in turn implies that either the compiler, the hardware, or both, must address the fundamental issue of distribution. This problem is two fold: both data and computation must somehow be distributed. The paper attempts to bring data distribution concepts from cache only memory architectures together with scheduling concepts from multithreaded architectures, in order to arrive at one unified, simplified, cohesive abstract model of computation. The fusion of data and computation distribution is the central principle guiding the development of a new architecture being developed by the authors, named SDAARC (Self Distributing Associative Architecture).

[1]  Rishiyur S. Nikhil,et al.  A Multithreaded Implementation of Id using P-RISC Graphs , 1993, LCPC.

[2]  John B. Carter,et al.  An argument for simple COMA , 1995, Future Gener. Comput. Syst..

[3]  Erik Hagersten,et al.  The Cache Coherence Protocol of the Data Diffusion Machine , 1989 .

[4]  Arnold L. Rosenberg,et al.  An empirical study of dynamic scheduling on rings of processors , 1999, Parallel Comput..

[5]  Seth Copen,et al.  ENABLING PRIMITIVES FOR COMPILING PARALLEL LANGUAGES , 1995 .

[6]  Erik Hagersten,et al.  DDM - A Cache-Only Memory Architecture , 1992, Computer.

[7]  David E. Culler,et al.  Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine , 1991, ASPLOS IV.

[8]  Jack B. Dennis,et al.  First version of a data flow procedure language , 1974, Symposium on Programming.

[9]  Seth Copen Goldstein,et al.  Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.

[10]  Walid A. Najjar,et al.  Comparison of two storage models in data-driven multithreaded architectures , 1996, Proceedings of SPDP '96: 8th IEEE Symposium on Parallel and Distributed Processing.

[11]  Per Stenström,et al.  A Survey of Cache Coherence Schemes for Multiprocessors , 1990, Computer.

[12]  Jean-Luc Gaudiot,et al.  Advanced Topics in Data-Flow Computing , 1991 .

[13]  Bernd Klauer,et al.  A Combined Virtual Shared Memory and Network which Schedules , 1997, Euro-PDS.

[14]  Seth Copen Goldstein,et al.  Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.

[15]  Seth Copen Goldstein,et al.  TAM - A Compiler Controlled Threaded Abstract Machine , 1993, J. Parallel Distributed Comput..

[16]  Paul W. A. Stallard,et al.  Hiding Miss Latencies with Multithreading on the Data Diffusion Machine , 1995, ICPP.

[17]  Arnold L. Rosenberg,et al.  An empirical study of dynamic scheduling on rings of processors , 1996, Proceedings of SPDP '96: 8th IEEE Symposium on Parallel and Distributed Processing.