论文信息 - Design and Implementation of a Circular Omega Network in the EM-4

Design and Implementation of a Circular Omega Network in the EM-4

Abstract This paper presents the design principles and prototype implementation of an interconnection network in a highly parallel computer EM-4, that will have more than a thousand processing elements. As a first step, a single chip processing element EMC-R was designed and fabricated in 1989, and an EM-4 prototype system with 80 EMC-Rs has been fully operational since April 1990. The peak performance of this prototype is 1 GIPS. The interconnection network of the EM-4 prototype adopts circular omega topology. This paper first examines the features of this topology, then proposes a node grouping method, a node addressing method and a self-routing algorithm on each node. Then store-and-forward deadlock prevention mechanisms are proposed, and automatic load distribution mechanisms attached to this network are presented. Next, network implementation in the EM-4 prototype system is described. The network consists of distributedly controlled switching units and connection lines, and actually performs 14.63 GB/s. The switching unit was implemented as a unit of the EMC-R and performs packet communication concurrently with and independently of the other functional units on the chip.

Shuichi Sakai | Yuetsu Kodama | Yoshinori Yamaguchi

[1] Hiroaki Ishihata,et al. Low-latency message communication support for the AP1000 , 1992, ISCA '92.

[2] Franco P. Preparata,et al. The cube-connected-cycles: A versatile network for parallel computation , 1979, 20th Annual Symposium on Foundations of Computer Science (sfcs 1979).

[3] Shuichi Sakai,et al. Load balancing by function distribution on the EM-4 prototype , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[4] T. Yuba,et al. An architecture of a dataflow single chip processor , 1989, ISCA '89.

[5] John L. Gustafson,et al. The Architecture of a Homogeneous Vector Supercomputer , 1986, J. Parallel Distributed Comput..

[6] Paul Watson,et al. Flagship: a parallel architecture for declarative programming , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.

[7] Katsuyuki Kaneko,et al. VLSI Parallel Computer with Data Transfer Network: ADENA , 1989, ICPP.

[8] Ware Myers. Supercomputing 91 , 1992 .

[9] William J. Dally,et al. Deadlock-Free Message Routing in Multiprocessor Interconnection Networks , 1987, IEEE Transactions on Computers.

[10] Shuichi Sakai,et al. A prototype of a highly parallel dataflow machine EM-4 and its preliminary evaluation , 1992, Future Gener. Comput. Syst..

[11] L. W. Tucker,et al. Architecture and applications of the Connection Machine , 1988, Computer.

[12] J. D. Haenle,et al. A Method of Deadlock-free Resource Allocation and Flow Control in Packet Networks , 1976, ICCC.

[13] Duncan H. Lawrie,et al. Access and Alignment of Data in an Array Processor , 1975, IEEE Transactions on Computers.

[14] Tsutomu Hoshino,et al. An Invitation to the World of PAX , 1986, Computer.

[15] Carolyn Gannon,et al. Error Detection Using Path Testing and Static Analysis , 1979, Computer.