TCCluster: A Cluster Architecture Utilizing the Processor Host Interface as a Network Interconnect

So far, large computing clusters consisting of several thousand machines have been constructed by connecting nodes together using interconnect technologies as e.g. Ethernet, Infiniband or Myrinet. We propose an entirely new architecture called Tightly Coupled Cluster (TCCluster) that instead uses the native host interface of the processors as a direct network interconnect. This approach offers higher bandwidth and much lower communication latencies than the traditional approaches by virtually integrating the network interface adapter into the processor. Our technique neither applies any modifications to the processor nor requires any additional hardware. Instead, we use commodity off the shelf AMD processors and exploit the HyperTransport host interface as a cluster interconnect. Our approach is purely software based and does not require any additional hardware nor modifications to the existing processors. In this paper, we explain the addressing of nodes in such a cluster, the routing within such a system and the programming model that can be applied. We present a detailed description of the tasks that need to be addressed and provide a proof of concept implementation. For the evaluation of our technique a two node TCCluster prototype is presented. Therefore, the BIOS firmware, a custom Linux kernel and a small message library has been developed. We present microbenchmarks that show a sustained bandwidth of up to 2500 MB/s for messages as small as 64 Byte and a communication latency of 227 ns between two nodes outperforming other high performance networks by an order of magnitude.

[1]  James K. Archibald,et al.  Cache coherence protocols: evaluation using a multiprocessor simulation model , 1986, TOCS.

[2]  Sayantan Sur,et al.  Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms , 2007, 15th Annual IEEE Symposium on High-Performance Interconnects (HOTI 2007).

[3]  Pat Conway,et al.  The AMD Opteron Northbridge Architecture , 2007, IEEE Micro.

[4]  David B. Gustavson The Scalable Coherent Interface and related standards projects , 1992, IEEE Micro.

[5]  Holger Fröning,et al.  VELO: A Novel Communication Engine for Ultra-Low Latency Message Transfers , 2008, 2008 37th International Conference on Parallel Processing.

[6]  Dana S. Henry,et al.  A tightly-coupled processor-network interface , 1992, ASPLOS V.

[7]  Michael Lang,et al.  Entering the petaflop era: The architecture and performance of Roadrunner , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[8]  Scott Pakin,et al.  Entering the petaflop era: the architecture and performance of Roadrunner , 2008, HiPC 2008.

[9]  Dave Olson,et al.  Pathscale InfiniPath: a first look , 2005, 13th Symposium on High Performance Interconnects (HOTI'05).

[10]  Rajesh Kota,et al.  Horus: large-scale symmetric multiprocessing for Opteron systems , 2005, IEEE Micro.

[11]  Keith D. Underwood,et al.  Initial performance evaluation of the Cray SeaStar interconnect , 2005, 13th Symposium on High Performance Interconnects (HOTI'05).

[12]  Paul Hudak,et al.  Memory coherence in shared virtual memory systems , 1989, TOCS.

[13]  Colin Whitby-Strevens The transputer , 1985, ISCA 1985.