Performance modeling in action: Performance prediction of a Cray XT4 system during upgrade

We present predictive performance models of two of the petascale applications, S3D and GTC, from the DOE Office of Science workload. We outline the development of these models and demonstrate their validation on an Opteron/Infiniband cluster and the pre-upgrade ORNL Jaguar system (Cray XT3/XT4). Given the high accuracy of the full application models, we predict the performance of the Jaguar system after the upgrade of its nodes, and subsequently compare this to the actual performance of the upgraded system. We then analyze the performance of the system based on the models to quantify bottlenecks and potential optimizations. Finally, the models are used to quantify the benefits of alternative node allocation strategies.

[1]  Mark R. Fahey,et al.  Performance of a direct numerical simulation solver for turbulent combustion on the Cray XT3/4 , 2007 .

[2]  P. H. Worley Comparison of Cray XT3 and XT4 Scalability , 2008 .

[3]  Sadaf R. Alam,et al.  Cray XT4: an early evaluation for petascale scientific simulation , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[4]  S. Ethier,et al.  Gyrokinetic particle-in-cell simulations of plasma microturbulence on advanced computing platforms , 2005 .

[5]  S.B. Cable,et al.  Application Scalability and Performance on Multicore Architectures , 2007, 2007 DoD High Performance Computing Modernization Program Users Group Conference.

[6]  Jacqueline H. Chen,et al.  Direct numerical simulation of turbulent combustion: fundamental insights towards predictive models , 2005 .

[7]  Scott Pakin,et al.  The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8, 192 Processors of ASCI Q , 2003, SC.

[8]  Michael Lang,et al.  A Performance Evaluation of the Nehalem Quad-Core Processor for Scientific Computing , 2008, Parallel Process. Lett..

[9]  Michael Lang,et al.  Experiences in scaling scientific applications on current-generation quad-core processors , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[10]  Darren J. Kerbyson A look at application performance sensitivity to the bandwidth and latency of InfiniBand networks , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.