A Framework for Performance Modeling and Prediction

Cycle-accurate simulation is far too slow for modeling the expected performance of full parallel applications on large HPC systems. And just running an application on a system and observing wallclock time tells you nothing about why the application performs as it does (and is anyway impossible on yet-to-be-built systems). Here we present a framework for performance modeling and prediction that is faster than cycle-accurate simulation, more informative than simple benchmarking, and is shown useful for performance investigations in several dimensions.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  W. M. Thorburn,et al.  OCCAM'S RAZOR , 1915 .

[3]  M CarlsteadSara,et al.  The Grace Hopper Celebration of Women in Computing , 1994 .

[4]  Ian Foster,et al.  Performance of parallel computers for spectral atmospheric models , 1995 .

[5]  Ian Foster,et al.  Performance of Massively Parallel Computers for Spectral Atmospheric Models , 1996 .

[6]  Jens Simon,et al.  Accurate Performance Prediction for Assively Parallel Systems and Its Applications , 1996, Euro-Par, Vol. II.

[7]  Ian T. Foster,et al.  Parallel Algorithms for the Spectral Transform Method , 1997, SIAM J. Sci. Comput..

[8]  Dean M. Tullsen,et al.  Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading , 1997, TOCS.

[9]  Daniel A. Reed,et al.  Integrated compilation and scalability analysis for parallel systems , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).

[10]  Ying Zhang,et al.  SvPablo: A Multi-language Performance Analysis System , 1998, Computer Performance Evaluation.

[11]  Sharad Malik,et al.  Cache miss equations: a compiler framework for analyzing and tuning memory behavior , 1999, TOPL.

[12]  Daniel A. Reed,et al.  SvPablo: A multi-language architecture-independent performance analysis system , 1999, Proceedings of the 1999 International Conference on Parallel Processing.

[13]  Mark Heinrich,et al.  FLASH vs. (simulated) FLASH: closing the simulation loop , 2000, SIGP.

[14]  Jeffrey K. Hollingsworth,et al.  An API for Runtime Code Patching , 2000, Int. J. High Perform. Comput. Appl..

[15]  David E. Keyes,et al.  Towards Realistic Performance Bounds for Implicit CFD Codes , 2000 .

[16]  Patrick H. Worley,et al.  Performance evaluation of the IBM SP and the Compaq AlphaServer SC , 2000, ICS '00.

[17]  Dee A. B. Weikle,et al.  Caches as filters: a framework for the analysis of caching systems , 2001 .

[18]  Siddhartha Chatterjee,et al.  Exact analysis of the cache behavior of nested loops , 2001, PLDI '01.

[19]  Laura Carrington,et al.  Modeling application performance by convolving machine signatures with application profiles , 2001 .

[20]  Laura Carrington,et al.  A Framework For Application Performance Prediction to Enable Scalability Understanding , 2002 .

[21]  Rajat Todi,et al.  Conventional benchmarks as a sample of the performance spectrum , 1998, Proceedings of the Thirty-First Hawaii International Conference on System Sciences.