This paper presents the design and implementation of a communication protocol for the IBM Cyclops-64 (C64) supercomputer system to enable reliable data transfer between the two major components of a C64 system: the C64 host system (also called C64 frontend) and the C64 compute engine (also called C64 back-end). The building block of C64 compute engine (C64 chip) employs a multi-core-on-a-chip architecture. A C64 computer system includes (in its compute engine) a large number of C64 nodes (chips) that are arranged in a 3D-mesh cellular structure. The compute engine is attached to the host system via Gigabit Ethernet links. The host system can be a Linux cluster that provides a familiar operating environment to the end users, plus special services targeted to the C64 compute engine, including system administration, job scheduling, file I/O, and remote memory operations, etc. Early in the design stage, a clear specification of the application requirements is presented to the design team: most of the communication is bulk data transfer, with little interactive requirement, and is mainly used to feed program code and huge amount of user data that need to be processed to the back-end of a C64 machine. Based on these requirements, this paper introduces CDP (Cyclops Datagram Protocol), a simple, reliable, and programmable communication protocol we designed for the communication between C64 back-end and the host system. Our CDP protocol has the following features: (1) creating a global name space on the C64 back-end and the host system; (2) providing a reliable communication channel between the C64 and host nodes; (3) providing a set of standard programming interfaces to system software developers; (4) it employes a simple design that are engineered to provide just enough capacity to achieve the specific application requirements. The CDP protocol has been fully implemented and tested, and a report of the experimental results is included that demonstrated that our design has met the application requirement.
[1]
G. Gao,et al.
FAST : A Functionally Accurate Simulation Toolset for the Cyclops 64 Cellular Architecture
,
2005
.
[2]
D. Tolmie,et al.
HIPPI: simplicity yields success
,
1993,
IEEE Network.
[3]
Guang R. Gao,et al.
Toward a Software Infrastructure for the Cyclops-64 Cellular Architecture
,
2006,
20th International Symposium on High-Performance Computing in an Advanced Collaborative Environment (HPCS'06).
[4]
David F. Heidel,et al.
An Overview of the BlueGene/L Supercomputer
,
2002,
ACM/IEEE SC 2002 Conference (SC'02).
[5]
P. Ribenboim,et al.
Collected Papers, Volume 1+2
,
1999
.
[6]
José E. Moreira,et al.
An Overview of the Blue Gene/L System Software Organization
,
2003,
Euro-Par.
[7]
Wu-chun Feng,et al.
The Quadrics Network: High-Performance Clustering Technology
,
2002,
IEEE Micro.
[8]
Charles L. Seitz,et al.
Myrinet: A Gigabit-per-Second Local Area Network
,
1995,
IEEE Micro.
[9]
Don E. Tolmie.
High-Performance Parallel Interface (HIPPI)
,
1994
.