A universal parallel computer architecture

Advances in interconnection network performance and interprocessor interaction mechanisms enable the construction of fine-grain parallel computers in which the nodes are physically small and have a small amount of memory. This class of machines has a much higher ratio of processor to memory area and hence provides greater processor throughput and memory bandwidth per unit cost relative to conventional memory-dominated machines. This paper describes the technology and architecture trends motivating fine-grain architecture and the enabling technologies of high-performance interconnection networks and low-overhead interaction mechanisms. We conclude with a discussion of our experiences with the J-Machine, a prototype fine-grain concurrent computer.

[1]  K. Kavi Cache Memories Cache Memories in Uniprocessors. Reading versus Writing. Improving Performance , 2022 .

[2]  Donald Yeung,et al.  THE MIT ALEWIFE MACHINE: A LARGE-SCALE DISTRIBUTED-MEMORY MULTIPROCESSOR , 1991 .

[3]  William J. Dally,et al.  A mechanism for efficient context switching , 1991, [1991 Proceedings] IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[4]  William J. Dally,et al.  Experiments with Dataflow on a General-Purpose Parallel Computer , 1991 .

[5]  S. Konstantinidou,et al.  Chaos router: architecture and performance , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.

[6]  Alain J. Martin,et al.  Submicron Systems Architecture , 1993 .

[7]  Andrew A. Chien,et al.  Experience with CST: programming and implementation , 1989, PLDI '89.

[8]  William J. Dally,et al.  Network and processor architecture for message-driven computers , 1990 .

[9]  Shekhar Y. Borkar,et al.  iWarp: an integrated solution to high-speed parallel computing , 1988, Proceedings. SUPERCOMPUTING '88.

[10]  Andrew A. Chien,et al.  Architecture of a message-driven processor , 1987, ISCA '87.

[11]  David E. Culler,et al.  Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine , 1991, ASPLOS IV.

[12]  William J. Dally,et al.  Express Cubes: Improving the Performance of k-Ary n-Cube Interconnection Networks , 1989, IEEE Trans. Computers.

[13]  Kenji Nishida,et al.  Evaluation of a Prototype Data Flow Processor of the SIGMA-1 for Scientific Computations , 1986, ISCA.

[14]  Andrew A. Chien,et al.  Planar-adaptive routing: low-cost adaptive networks for multiprocessors , 1992, ISCA '92.

[15]  J A Fisher,et al.  Instruction-Level Parallel Processing , 1991, Science.

[16]  Arvind,et al.  T: a multithreaded massively parallel architecture , 1992, ISCA '92.

[17]  T. Yuba,et al.  An architecture of a dataflow single chip processor , 1989, ISCA '89.

[18]  A. Gupta,et al.  Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: preliminary results , 1989, ISCA '89.

[19]  Charles L. Seitz,et al.  The cosmic cube , 1985, CACM.

[20]  William J. Dally Virtual-channel flow control , 1990, ISCA '90.

[21]  William J. Dally,et al.  The J-machine system , 1991 .

[22]  Burton J. Smith Architecture And Applications Of The HEP Multiprocessor Computer System , 1982, Optics & Photonics.

[23]  Lynn Conway,et al.  Introduction to VLSI systems , 1978 .

[24]  William J. Dally,et al.  Universal Mechanisms for Concurrency , 1989, PARLE.

[25]  William J. Dally,et al.  Design of a Self-Timed VLSI Multicomputer Communication Controller, , 1987 .

[26]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[27]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[28]  David E. Culler,et al.  Monsoon: an explicit token-store architecture , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[29]  William J. Dally,et al.  Mechanisms for Concurrent Computing , 1988, FGCS.

[30]  Brian K. Totty Experimental analysis of data management for distributed data structures , 1992 .

[31]  Norman P. Jouppi,et al.  Computer technology and architecture: an evolving interaction , 1991, Computer.

[32]  Donald S Wills Pi: A Parallel Architecture Interface for Multi-Model Execution , 1990 .