In this paper we describe an important use of predictive application performance modeling - the validation of measured performance during a new large-scale system installation. Using a previously-developed and validated performance model for SAGE, a multidimensional, 3D, multi-material hydrodynamics code with adaptive mesh refinement, we were able to help guide the stabilization of the first phase of the Los Alamos ASCI Q supercomputer. We review the salient features of an analytical model for this code that has been applied to predict its performance on a large class of Tera-scale parallel systems. We describe the methodology applied during system installation and upgrades to establish a baseline for the achievable "real" performance of the system. We also show the effect on overall application performance of certain key subsystems such as PCI bus speed and multi-rail networks. We show that utilization of predictive performance models is also a powerful system debugging tool.
[1]
Adolfy Hoisie,et al.
Performance and Scalability Analysis of Teraflop-Scale Parallel Architectures Using Multidimensional Wavefront Applications
,
2000,
Int. J. High Perform. Comput. Appl..
[2]
Fabrizio Petrini,et al.
Predictive Performance and Scalability Modeling of a Large-Scale Application
,
2001,
ACM/IEEE SC 2001 Conference (SC'01).
[3]
Fabrizio Petrini,et al.
Using Multirail Networks in High-Performance Clusters
,
2001,
CLUSTER.
[4]
Wu-chun Feng,et al.
The Quadrics Network: High-Performance Clustering Technology
,
2002,
IEEE Micro.
[5]
Shawn D. Pautz,et al.
Performance modeling of deterministic transport computations
,
2004
.