Design and Implementation of a Circular Omega Network in the EM-4

Abstract This paper presents the design principles and prototype implementation of an interconnection network in a highly parallel computer EM-4, that will have more than a thousand processing elements. As a first step, a single chip processing element EMC-R was designed and fabricated in 1989, and an EM-4 prototype system with 80 EMC-Rs has been fully operational since April 1990. The peak performance of this prototype is 1 GIPS. The interconnection network of the EM-4 prototype adopts circular omega topology. This paper first examines the features of this topology, then proposes a node grouping method, a node addressing method and a self-routing algorithm on each node. Then store-and-forward deadlock prevention mechanisms are proposed, and automatic load distribution mechanisms attached to this network are presented. Next, network implementation in the EM-4 prototype system is described. The network consists of distributedly controlled switching units and connection lines, and actually performs 14.63 GB/s. The switching unit was implemented as a unit of the EMC-R and performs packet communication concurrently with and independently of the other functional units on the chip.

[1]  Hiroaki Ishihata,et al.  Low-latency message communication support for the AP1000 , 1992, ISCA '92.

[2]  Franco P. Preparata,et al.  The cube-connected-cycles: A versatile network for parallel computation , 1979, 20th Annual Symposium on Foundations of Computer Science (sfcs 1979).

[3]  Shuichi Sakai,et al.  Load balancing by function distribution on the EM-4 prototype , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[4]  T. Yuba,et al.  An architecture of a dataflow single chip processor , 1989, ISCA '89.

[5]  John L. Gustafson,et al.  The Architecture of a Homogeneous Vector Supercomputer , 1986, J. Parallel Distributed Comput..

[6]  Paul Watson,et al.  Flagship: a parallel architecture for declarative programming , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.

[7]  Katsuyuki Kaneko,et al.  VLSI Parallel Computer with Data Transfer Network: ADENA , 1989, ICPP.

[8]  Ware Myers Supercomputing 91 , 1992 .

[9]  William J. Dally,et al.  Deadlock-Free Message Routing in Multiprocessor Interconnection Networks , 1987, IEEE Transactions on Computers.

[10]  Shuichi Sakai,et al.  A prototype of a highly parallel dataflow machine EM-4 and its preliminary evaluation , 1992, Future Gener. Comput. Syst..

[11]  L. W. Tucker,et al.  Architecture and applications of the Connection Machine , 1988, Computer.

[12]  J. D. Haenle,et al.  A Method of Deadlock-free Resource Allocation and Flow Control in Packet Networks , 1976, ICCC.

[13]  Duncan H. Lawrie,et al.  Access and Alignment of Data in an Array Processor , 1975, IEEE Transactions on Computers.

[14]  Tsutomu Hoshino,et al.  An Invitation to the World of PAX , 1986, Computer.

[15]  Carolyn Gannon,et al.  Error Detection Using Path Testing and Static Analysis , 1979, Computer.