A new-generation parallel computer and its performance evaluation

Abstract An innovative design is proposed for an MIMD distributed shared-memory (DSM) parallel computer capable of achieving gracious performance with technology expected to become feasible/viable in less than a decade. This New Millennium Computing Point Design was chosen by NSF, DARPA, and NASA as having the potential to deliver 100 TeraFLOPS and 1 PetaFLOPS performance by the year 2005 and 2007, respectively. Its scalability guarantees a lifetime extending well into the next century. Our design takes advantage of free-space optical technologies, with simple guided-wave concepts, to produce a 1D building block (BB) that implements efficiently a large, fully connected system of processors. Designing fully connected, large systems of electronic processors could be a very beneficial impact of optics on massively parallel processing. A 2D structure is proposed for the complete system, where the aforementioned 1D BB is extended into two dimensions. This architecture behaves like a 2D generalized hypercube, which is characterized by outstanding performance and extremely high wiring complexity that prohibits its electronics-only implementation. With readily available technology, a mesh of clear plastic/glass bars in our design facilitate point-to-point bit-parallel transmissions that utilize wavelength-division multiplexing (WDM) and follow dedicated optical paths. Each processor is mounted on a card. Each card contains eight processors interconnected locally via an electronic crossbar. Taking advantage of higher-speed optical technologies, all eight processors share the same communications interface to the optical medium using time-division multiplexing (TDM). A case study for 100 TeraFLOPS performance by the year 2005 is investigated in detail; the characteristics of chosen hardware components in the case study conform to SIA (Semiconductor Industry Association) projections. An impressive property of our system is that its bisection bandwidth matches, within an order of magnitude, the performance of its computation engine. Performance results based on the implementation of various important algorithmic kernels show that our design could have a tremendous, positive impact on massively parallel computing. 2D and 3D implementations of our design could achieve gracious (i.e., sustained) PetaFLOPS performance before the end of the next decade.

[1]  Marshall C. Pease,et al.  The Indirect Binary n-Cube Microprocessor Array , 1977, IEEE Transactions on Computers.

[2]  Sotirios G. Ziavras,et al.  Parallel DSP algorithms on TurboNet: an experimental system with hybrid message-passing/shared-memory architecture , 1996, Concurr. Pract. Exp..

[3]  Patrick W. Dowd,et al.  Parallel Computer Reconfigurability Through Optical Interconnects , 1992, ICPP.

[4]  Sotirios G. Ziavras On the Problem of Expanding Hypercube-Based Systems , 1992, J. Parallel Distributed Comput..

[5]  Alan L. Cox,et al.  Software versus hardware shared-memory implementation: a case study , 1994, ISCA '94.

[6]  M. Soda,et al.  Si-analog IC's for 20 Gb/s optical receiver , 1994 .

[7]  Anoop Gupta,et al.  The Stanford FLASH multiprocessor , 1994, ISCA '94.

[8]  Dharma P. Agrawal,et al.  Generalized Hypercube and Hyperbus Structures for a Computer Network , 1984, IEEE Transactions on Computers.

[9]  Anoop Gupta,et al.  The Stanford Dash multiprocessor , 1992, Computer.

[10]  Transverse holographic optical interconnect design. , 1994, Applied optics.

[11]  T. Terada,et al.  A 20 GHz 8 bit multiplexer IC implemented with 0.5 /spl mu/m WN/sub x//W-gate GaAs MESFET's , 1994 .

[12]  Sotirios G. Ziavras Scalable Multifolded Hypercubes for versatile Parallel Computers , 1995, Parallel Process. Lett..

[13]  Ted H. Szymanski,et al.  "Hypermeshes": Optical Intercomnnection Network for Parallel Computing , 1995, J. Parallel Distributed Comput..

[14]  M R Feldman,et al.  Guided-wave and free-space optical interconnects for parallel-processing systems: a comparison. , 1994, Applied optics.

[15]  Charles L. Seitz,et al.  Concurrent VLSI Architectures , 1984, IEEE Transactions on Computers.

[16]  Sotirios G. Ziavras,et al.  Data Broadcasting and Reduction, Prefix Computation, and Sorting on Reduces Hypercube Parallel Computer , 1996, Parallel Comput..

[17]  J. Howard,et al.  Optical electronics. , 1967, Applied optics.

[18]  Selim G. Akl,et al.  Optimal Communication Primitives on the Generalized Hypercube Network , 1996, J. Parallel Distributed Comput..

[19]  J. Senior Optical Fiber Communications , 1992 .

[20]  Larry D. Wittie,et al.  Communication Structures for Large Networks of Microcomputers , 1981, IEEE Transactions on Computers.

[21]  Sotirios G. Ziavras RH: A Versatile Family of Reduced Hypercube Interconnection Networks , 1994, IEEE Trans. Parallel Distributed Syst..

[22]  Sotirios G. Ziavras Investigation of Various Mesh Architectures With Broadcast Buses for High-Performance Computing , 1999, VLSI Design.

[23]  William J. Dally,et al.  Network and processor architecture for message-driven computers , 1990 .

[24]  Anoop Gupta,et al.  The Stanford FLASH Multiprocessor , 1994, ISCA.

[25]  Rami G. Melhem,et al.  Reducing communication latency with path multiplexing: in optically interconnected multiprocessor systems , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.

[26]  H. Grebel,et al.  A low-complexity parallel system for gracious scalable performance. Case study for near PetaFLOPS computing , 1996, Proceedings of 6th Symposium on the Frontiers of Massively Parallel Computation (Frontiers '96).

[27]  E E Frietman,et al.  Parallel optical interconnects: implementation of optoelectronics in multiprocessor architectures. , 1990, Applied optics.

[28]  Constantine N. Manikopoulos,et al.  Parallel DSP algorithms on TurboNet: an experimental system with hybrid message‐passing/shared‐memory architecture , 1996 .

[29]  Sotirios G. Ziavras Generalized reduced hypercube interconnection networks for massively parallel computers , 1994, Interconnection Networks and Mapping and Scheduling Parallel Computations.