Parallel Application Signature for Performance Analysis and Prediction

Predicting the performance of parallel scientific applications is becoming increasingly complex. Our goal was to characterize the behavior of message-passing applications on different target machines. To achieve this goal, we developed a method called parallel application signature for performance prediction (PAS2P), which strives to describe an application based on its behavior. Based on the application's message-passing activity, we identified and extracted representative phases, with which we created a parallel application signature that enabled us to predict the application's performance. We experimented with using different scientific applications on different clusters. We were able to predict execution times with an average accuracy greater than 97 percent.

[1]  Brad Calder,et al.  How to use SimPoint to pick simulation points , 2004, PERV.

[2]  Emilio Luque,et al.  Predicting parallel applications performance using signatures: The workload effect , 2011, 2011 9th IEEE/ACS International Conference on Computer Systems and Applications (AICCSA).

[3]  Qiang Xu,et al.  Performance prediction with skeletons , 2008, Cluster Computing.

[4]  Jesús Labarta,et al.  A Framework for Performance Modeling and Prediction , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[5]  Lixin Zhang,et al.  Mambo: a full system simulator for the PowerPC architecture , 2004, PERV.

[6]  Adolfy Hoisie,et al.  Performance and Scalability Analysis of Teraflop-Scale Parallel Architectures Using Multidimensional Wavefront Applications , 2000, Int. J. High Perform. Comput. Appl..

[7]  A. Lumsdaine,et al.  A Checkpoint and Restart Service Specification for Open MPI , 2006 .

[8]  Jesús Labarta,et al.  Validation of Dimemas Communication Model for MPI Collective Operations , 2000, PVM/MPI.

[9]  Jason Duell,et al.  Berkeley Lab Checkpoint/Restart (BLCR) for Linux Clusters , 2006 .

[10]  Brad Calder,et al.  Basic block distribution analysis to find periodic behavior and simulation points in applications , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[11]  Emilio Luque,et al.  PAS2P Tool, Parallel Application Signature for Performance Prediction , 2010, PARA.

[12]  Frank Mueller,et al.  ScalaBenchGen: Auto-Generation of Communication Benchmarks Traces , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[13]  Martin Schulz,et al.  ScalaTrace: Scalable compression and replay of communication traces for high-performance computing , 2008, J. Parallel Distributed Comput..

[14]  Emilio Luque,et al.  Parallel Application Signature for Performance Prediction , 2010, PDPTA.

[15]  Jesús Labarta,et al.  Automatic Phase Detection and Structure Extraction of MPI Applications , 2010, Int. J. High Perform. Comput. Appl..

[16]  Gene Cooperman,et al.  DMTCP: Transparent checkpointing for cluster computations and the desktop , 2007, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[17]  Matthias S. Müller,et al.  Tools for scalable parallel program analysis: Vampir NG, MARMOT, and DeWiz , 2009, Int. J. Comput. Sci. Eng..

[18]  Frank Mueller,et al.  Cross-Platform Performance Prediction of Parallel Applications Using Partial Execution , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[19]  Robert D. Falgout,et al.  Semicoarsening Multigrid on Distributed Memory Machines , 1999, SIAM J. Sci. Comput..

[20]  Emilio Luque,et al.  Parallel application signature , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[21]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[22]  Keith Refson,et al.  Moldy: a portable molecular dynamics simulation program for serial and parallel computers , 2000 .

[23]  Brad Calder,et al.  Detecting phases in parallel applications on shared memory architectures , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[24]  Jeffrey S. Vetter Performance analysis of distributed applications using automatic classification of communication inefficiencies , 2000, ICS '00.

[25]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[26]  Carsten Kutzner,et al.  GROMACS 4:  Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. , 2008, Journal of chemical theory and computation.

[27]  Yves Métivier,et al.  Approximation of a TRace, Asynchronous Automata and the Ordering of Events in a Distributed System , 1988, ICALP.