Extraction of Parallel Application Signatures for Performance Prediction

Predicting performance of parallel applications is becoming increasingly complex and the best performance predictor is the application itself, but the time required to run it thoroughly is a onerous requirement. We seek to characterize the behavior of message-passing applications on different systems by extracting a signature which will allow us to predict what system will allow the application to perform best. To achieve this goal, we have developed a method we called Parallel Application Signatures for Performance Prediction (PAS2P) that strives to describe an application based on its behavior. Based on the application’s message-passing activity, we have been able to identify and extract representative phases, with which we created a Parallel Application Signature that has allowed us to predict the application’s performance. We have experimented with different signature-extraction algorithms and found a reduction in the prediction error using different scientific applications on different clusters. We were able to predict execution times with an average accuracy of over 98%.

[1]  Brad Calder,et al.  How to use SimPoint to pick simulation points , 2004, PERV.

[2]  Jeffrey S. Vetter Performance analysis of distributed applications using automatic classification of communication inefficiencies , 2000, ICS '00.

[3]  William Gropp,et al.  Toward Scalable Performance Visualization with Jumpshot , 1999, Int. J. High Perform. Comput. Appl..

[4]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[5]  Brad Calder,et al.  Detecting phases in parallel applications on shared memory architectures , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[6]  Quinn Snell,et al.  An Analytical Model of the HINT Performance Metric , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[7]  Qiang Xu,et al.  Performance prediction with skeletons , 2008, Cluster Computing.

[8]  Robert D. Falgout,et al.  Semicoarsening Multigrid on Distributed Memory Machines , 1999, SIAM J. Sci. Comput..

[9]  Jesús Labarta,et al.  Validation of Dimemas Communication Model for MPI Collective Operations , 2000, PVM/MPI.

[10]  Adolfy Hoisie,et al.  Performance and Scalability Analysis of Teraflop-Scale Parallel Architectures Using Multidimensional Wavefront Applications , 2000, Int. J. High Perform. Comput. Appl..

[11]  Emilio Luque,et al.  Parallel application signature , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[12]  Jesús Labarta,et al.  A Framework for Performance Modeling and Prediction , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[13]  A. Lumsdaine,et al.  A Checkpoint and Restart Service Specification for Open MPI , 2006 .

[14]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[15]  Brad Calder,et al.  Basic block distribution analysis to find periodic behavior and simulation points in applications , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.