Matching Application Signatures for Performance Predictions Using a Single Execution

Performance predictions for large problem sizes and processors using limited small scale runs are useful for a variety of purposes including scalability projections, and help in minimizing the time taken for constructing training data for building performance models. In this paper, we present a prediction framework that matches execution signatures for performance predictions of HPC applications using a single small scale application execution. Our framework extracts execution signatures of applications and performs automatic phase identification of different application phases. Application signatures of the different phases are matched with the execution profiles of reference kernels stored in a kernel database. The performance of the reference kernels are then used to predict the performance of the application phases. For phases that do not match significantly, our framework performs static analysis of loops and functions in the application to provide prediction ranges. We demonstrate this integrated set of techniques in our framework with three large scale applications, including GTC, a Particle-in-Cell code for turbulence simulation, Sweep3d, a 3D neutron transport application and SMG2000, a multigrid solver. We show that our prediction ranges are accurate in most cases.

[1]  Torsten Hoefler,et al.  Using automated performance modeling to find scalability bugs in complex codes , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[2]  Viktor K. Decyk,et al.  Skeleton PIC Codes for Parallel Computers , 1995 .

[3]  Torsten Hoefler,et al.  Low-Overhead LogGP Parameter Assessment for Modern Interconnection Networks , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[4]  Jesús Labarta,et al.  Unveiling Internal Evolution of Parallel Application Computation Phases , 2011, 2011 International Conference on Parallel Processing.

[5]  Robert D. Falgout,et al.  Semicoarsening Multigrid on Distributed Memory Machines , 1999, SIAM J. Sci. Comput..

[6]  Michael A. Frumkin,et al.  Automatic Recognition of Performance Idioms in Scientific Applications , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[7]  Xin Li,et al.  Prophesy: automating the modeling process , 2001, Proceedings Third Annual International Workshop on Active Middleware Services.

[8]  Anthony J. G. Hey,et al.  The Development of Parkbench and Performance Prediction , 2000, Int. J. High Perform. Comput. Appl..

[9]  Sally A. McKee,et al.  Methods of inference and learning for performance modeling of parallel applications , 2007, PPoPP.

[10]  David H. Bailey,et al.  Performance Modeling: Understanding the Past and Predicting the Future , 2005, Euro-Par.

[11]  Sally A. McKee,et al.  Predicting parallel application performance via machine learning approaches , 2007, Concurr. Comput. Pract. Exp..

[12]  Samuel Williams,et al.  TORCH Computational Reference Kernels - A Testbed for Computer Science Research , 2010 .

[13]  Xingfu Wu,et al.  Performance projection of HPC applications using SPEC CFP2006 benchmarks , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[14]  Pangfeng Liu,et al.  Sampling-Based Phase Classification and Prediction for Multi-threaded Program Execution on Multi-core Architectures , 2013, 2013 42nd International Conference on Parallel Processing.

[15]  Sheri Mickelson,et al.  Community Earth System Model (CESM) , 2011, Encyclopedia of Parallel Computing.

[16]  Martin Schulz,et al.  A regression-based approach to scalability prediction , 2008, ICS '08.

[17]  Michael C. Huang,et al.  Program phase detection and exploitation , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[18]  Allen D. Malony,et al.  The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..

[19]  Juan Gonzalez,et al.  On the usefulness of object tracking techniques in performance analysis , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[20]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[21]  David H. Bailey,et al.  The NAS Parallel Benchmarks 2.0 , 2015 .

[22]  Daisuke Takahashi,et al.  The HPC Challenge (HPCC) benchmark suite , 2006, SC.