Massively parallel computing using commodity components

The Computational Plant (Cplant) project at Sandia National Laboratories is developing a large-scale, massively parallel computing resource from a cluster of commodity computing and networking components. We are combining the benefits of commodity cluster computing with our expertise in designing, developing, using, and maintaining large-scale, massively parallel processing (MPP) machines. In this paper, we present the design goals of the cluster and an approach to developing a commodity-based computational resource capable of delivering performance comparable to production-level MPP machines. We provide a description of the hardware components of a 96-node Phase I prototype machine and discuss the experiences with the prototype that led to the hardware choices for a 400-node Phase II production machine. We give a detailed description of the management and runtime software components of the cluster and oAer computational performance data as well as performance measurements of functions that are critical to the management of large systems. ” 2000 Elsevier Science B.V. All rights reserved.

[1]  R. Brightwell,et al.  Design and implementation of MPI on Puma portals , 1996, Proceedings. Second MPI Developer's Conference.

[2]  David E. Culler,et al.  Active message applications programming interface and communication subsystem organization , 1995 .

[3]  Andrea C. Arpaci-Dusseau,et al.  Parallel computing on the berkeley now , 1997 .

[4]  Vaidy S. Sunderam,et al.  PVM: A Framework for Parallel Distributed Computing , 1990, Concurr. Pract. Exp..

[5]  Rajkumar Buyya,et al.  High Performance Cluster Computing: Architectures and Systems , 1999 .

[6]  Andrea C. Arpaci-Dusseau,et al.  Parallel programming in Split-C , 1993, Supercomputing '93. Proceedings.

[7]  Rajkumar Buyya,et al.  High Performance Cluster Computing , 1999 .

[8]  Scott Pakin,et al.  Efficient layering for high speed communication: Fast Messages 2.x , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).

[9]  Greg Henry,et al.  Application of a High Performance Parallel Eigensolver to Electronic Structure Calculations , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[10]  Thomas Sterling,et al.  An assessment of Beowulf-class computing for NASA requirements: initial findings from the first NASA workshop on Beowulf-class clustered computing , 1998, 1998 IEEE Aerospace Conference Proceedings (Cat. No.98TH8339).

[11]  Dimitrios Katramatos,et al.  Cross-Operating System Process Migration on a Massively Parallel Processor , 1998 .

[12]  D. S. Greenberg,et al.  Experiences implementing the MPI standard on Sandia`s lightweight kernels , 1997 .

[13]  Rolf Riesen,et al.  SUNMOS for the Intel Paragon - a brief user`s guide , 1994 .

[14]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[15]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[16]  J. M. McGlaun,et al.  CTH: A software family for multi-dimensional shock physics analysis , 1995 .

[17]  V. Klema LINPACK user's guide , 1980 .

[18]  Thomas L. Sterling,et al.  Parallel Supercomputing with Commodity Components , 1997, PDPTA.

[19]  Amin Vahdat,et al.  GLUix: a global layer unix for a network of workstations , 1998, Softw. Pract. Exp..

[20]  Bernard Tourancheau,et al.  BIP messages user manual , 1997 .

[21]  Loc Prylli,et al.  BIP Messages User Manual for BIP 0.94 , 1998 .

[22]  R. Brightwell,et al.  A System Software Architecture for High End Computing , 1997, ACM/IEEE SC 1997 Conference (SC'97).

[23]  David E. Culler,et al.  High-performance local area communication with fast sockets , 1997 .

[24]  Charles C. Mosher,et al.  Seismic imaging on massively parallel computers , 1996 .