A semi-empirical model for maximal LINPACK performance predictions

In general, the maximal LINPACK performance of a large cluster depends on the number of processors, the total memory capacities, the problem size, the block size, the middle-ware of message passing, and the BLAS (basic linear algebra subprograms) library. One must handle these multi-variables factors to predict the performance score. In the paper, we propose a semi-empirical weighting function to improve the performance prediction model for high performance Linpack (HPL) for large clusters. In order to better predict the maximal LINPACK performance, we first divide the performance model into two parts: computational power, and message passing overhead. In the latter part, we adopt Xu and Hwang's broadcast model and introduce a weighting function w to account for the other effects. The difference between scores based on our semi-empirical model and the measured scores are less than 5%. The clusters used in the study include Myrinet-based, Quadrics, Gigabits Ethernet, IA64 or IA32 architectures.

[1]  Roger W. Hockney,et al.  The Communication Challenge for MPP: Intel Paragon and Meiko CS-2 , 1994, Parallel Computing.

[2]  Greg Burns,et al.  LAM: An Open Cluster Environment for MPI , 2002 .

[3]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[4]  Thomas L. Sterling,et al.  Communication overhead for space science applications on the Beowulf parallel workstation , 1995, Proceedings of the Fourth IEEE International Symposium on High Performance Distributed Computing.

[5]  Fabrizio Petrini,et al.  Performance Evaluation of the Quadrics Interconnection Network , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[6]  Thomas L. Sterling,et al.  BEOWULF: A Parallel Workstation for Scientific Computation , 1995, ICPP.

[7]  P. Altena,et al.  In search of clusters , 2007 .

[8]  P. Merkey,et al.  Beowulf: harnessing the power of parallelism in a pile-of-PCs , 1997, 1997 IEEE Aerospace Conference.

[9]  Wenli Zhang,et al.  HPL Performance Prevision to Intending System Improvement , 2004, ISPA.

[10]  Zhiwei Xu,et al.  Modeling communication overhead: MPI and MPL performance on the IBM SP2 , 1996, IEEE Parallel Distributed Technol. Syst. Appl..

[11]  Peng Wang,et al.  LINPACK performance on a geographically distributed Linux cluster , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[12]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[13]  Thomas L. Sterling,et al.  A design study of alternative network topologies for the Beowulf parallel workstation , 1996, Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing.

[14]  Wentong Cai,et al.  Performance Analysis of a Myrinet-Based Cluster , 2003, Cluster Computing.