论文信息 - Configuring Large High-Performance Clusters at Lightspeed: A Case Study

Configuring Large High-Performance Clusters at Lightspeed: A Case Study

Over a decade ago, the TOP500 list was started as a way to measure supercomputers by their sustained performance on a particular linear algebra benchmark. Once reserved for the exotic machines and extremely well-funded centers and laboratories, commodity clusters now make it possible for smaller groups to deploy and use high performance machines in their own laboratories. This paper describes a weekend activity where two existing 128-node commodity clusters were fused into a single 256-node cluster for the specific purpose of running the benchmark used to rank the machines in the TOP500 supercomputer list. The resulting metacluster sits on the November 2002 list at position 233. A key differentiator for this cluster is that it was assembled, in terms of its software, from the NPACI Rocks open-source cluster toolkit as downloaded from the public website. The toolkit allows non-cluster experts to deploy and run supercomputer-class machines in a matter of hours instead of weeks or months. With the exception of recompiling the University of Tennessee’s Automatically Tuned Linear Algebra Subroutines (ATLAS) library with a recommended version of the GNU C compiler, this metacluster ran a “stock” Rocks distribution. Successful first-time deployment of the fused cluster was completed in a scant 6 h. Partitioning of the metacluster and restoration of the two 128-node clusters to their original configuration was completed in just over 40 min. This paper describes early (pre-weekend) benchmark activities to empirically determine reasonably good parameters for the High Performance Linpack (HPL) code on both Ethernet and Myrinet interconnects. It fully describes the physical layout of the machine, the description-based installation methods used in Rocks to re-deploy two independent clusters as a single cluster, and gives the benchmark results that were gathered over the 40-h period allotted for the complete experiment. In addition, we describe some of the on-line monitoring and measurement techniques that were employed during the experiment. Finally, we point out the issues uncovered with a commodity cluster of this size. The techniques presented in this paper truly bring supercomputers into the hands of the masses of computational scientists.

[1] Jack J. Dongarra,et al. Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..

[2] Philip M. Papadopoulos,et al. Leveraging standard core technologies to programmatically build Linux cluster appliances , 2002, Proceedings. IEEE International Conference on Cluster Computing.

[3] Philip M. Papadopoulos,et al. NPACI: rocks: tools and techniques for easily deploying manageable Linux clusters , 2001, Proceedings 42nd IEEE Symposium on Foundations of Computer Science.

[4] Jack J. Dongarra,et al. Performance of various computers using standard linear equations software in a FORTRAN environment , 1988, CARN.

[5] SkjellumAnthony,et al. A high-performance, portable implementation of the MPI message passing interface standard , 1996 .

[6] R. C. Whaley,et al. Automated empirical optimization of high performance floating point kernels , 2004 .

[7] Yuefan Deng,et al. New trends in high performance computing , 2001, Parallel Computing.

[8] David E. Culler,et al. A case for NOW (networks of workstation) , 1995, PODC '95.

[9] Anthony Skjellum,et al. A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[10] Thomas L. Sterling,et al. BEOWULF: A Parallel Workstation for Scientific Computation , 1995, ICPP.