A System Area Network Characterization In A Commercial Cluster

and have found that it is complete and satisfactory in all respects, and that any and all revisions required by the final examining committee have been made. Abstract Commercial computer clusters have become an essential element in information system centers around the world. Cluster installations are now pervasive in the commercial marketplace. Clusters are deployed to provide both continuous availability and horizontal growth. Continuous availability is essential in mission critical applications where twenty-four hours a day, seven days a week availability is required for business survival. Horizontal growth provides both incremental expansion and increases in maximum capacity and performance. In order to continue to meet the customer cluster needs into the future, reductions in inter-system communication latencies are required. This thesis presents measured data collected from a client/server test environment using a commercially available IBM AS/400 computer. This measured data is then analyzed to identify inter-system communication latency bottlenecks. Finally, a System Area Network is proposed that reduces the inter-system communication latency. The proposed System Area Network provides increases in link bandwidth and reductions in transmission backpressure. This System Area Network provides a key component which will assist in improving the benefits of commercial clusters well into the next decade.

[1]  James C. Hoe,et al.  START-NG: Delivering Seamless Parallel Computing , 1995, Euro-Par.

[2]  Thomas H. Dunigan KENDALL SQUARE MULTIPROCESSOR: EARLY EXPERIENCES AND PERFORMANCE , 1992 .

[3]  Ravi Kumar,et al.  Scalability Study of the KSR-1 , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[4]  Donald Yeung,et al.  Sparcle: an evolutionary processor design for large-scale multiprocessors , 1993, IEEE Micro.

[5]  Andrew S. Tanenbaum,et al.  Computer Networks, Second Edition , 1981 .

[6]  Robert W. Horst,et al.  A flexible ServerNet-based fault-tolerant architecture , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[7]  Anders Landin,et al.  Bus-based COMA-reducing traffic in shared-bus multiprocessors , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[8]  Liviu Iftode,et al.  Software support for virtual memory-mapped communication , 1996, Proceedings of International Conference on Parallel Processing.

[9]  Gerhard Hausberger Digital's TruCluster Architecture , 1996, ACPC.

[10]  Wayne G. Nation,et al.  Parallel Fiber-Optic SCI Links , 1996, IEEE Micro.

[11]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[12]  Anoop Gupta,et al.  Comparative performance evaluation of cache-coherent NUMA and COMA architectures , 1992, ISCA '92.

[13]  D.E. Culler,et al.  Effects Of Communication Latency, Overhead, And Bandwidth In A Cluster Architecture , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[14]  John B. Carter,et al.  An argument for simple COMA , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.

[15]  Richard B. Gillett Memory Channel Network for PCI , 1996, IEEE Micro.

[16]  Liana L. Fong,et al.  Performance analysis on a CC-NUMA prototype , 1997, IBM J. Res. Dev..

[17]  Robert W. Horst TNet: A Reliable System Area Network , 1995, IEEE Micro.

[18]  David A. Patterson,et al.  Logp quantified: the case for low-overhead local area networks , 1995 .

[19]  Alan L. Cox,et al.  TreadMarks: shared memory computing on networks of workstations , 1996 .

[20]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[21]  Donald Yeung,et al.  The MIT Alewife machine: architecture and performance , 1995, ISCA '98.