Automatic construction of coordinated performance skeletons

Performance prediction is particularly challenging for dynamic and unpredictable environments that cannot be modeled well, such as execution with sharing of CPU and bandwidth resources. Our approach to performance estimation in such scenarios is based on actual execution of short running customized performance skeletons for target applications. This work focuses on automatic construction of performance skeletons for parallel MPI programs. Logicalization of a family of traces to a single trace is presented as a key technique for skeleton construction. Compression of communication traces is achieved by identifying the loop structure from traces. Results are presented that demonstrate that logicalization and compression are accurate and efficient. Automatically constructed performance skeletons were able to effectively predict application performance in a variety of scenarios involving resource sharing and changes in the execution environment.

[1]  Jaspal Subhlok,et al.  Replicating memory behavior for performance prediction , 2004 .

[2]  Quentin F. Stout,et al.  The Use of the MPI Communication Library in the NAS Parallel Benchmarks , 1999 .

[3]  Jaspal Subhlok,et al.  Skeleton based performance prediction on shared networks , 2004, IEEE International Symposium on Cluster Computing and the Grid, 2004. CCGrid 2004..

[4]  Maxime Crochemore,et al.  An Optimal Algorithm for Computing the Repetitions in a Word , 1981, Inf. Process. Lett..

[5]  Jaspal Subhlok Xu , Qiang , “ Automatic Construction of Coordinated Performance Skeletons ” , 2007 .

[6]  Darren J. Kerbyson,et al.  Automatic Identification of Application Communication Patterns via Templates , 2005, ISCA PDCS.

[7]  Qiang Xu,et al.  Automatic clustering of grid nodes , 2005, The 6th IEEE/ACM International Workshop on Grid Computing, 2005..

[8]  Jaspal Subhlok,et al.  Automatic construction and evaluation of performance skeletons , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[9]  Qiang Xu,et al.  Performance prediction with skeletons , 2008, Cluster Computing.