Integrated Runtime Measurement Summarisation and Selective Event Tracing for Scalable Parallel Execution Performance Diagnosis

Straightforward trace collection and processing becomes increasingly challenging and ultimately impractical for more complex, long-running, highly parallel applications. Accordingly, the SCALASCA project is extending the kojak measurement system for MPI, OpenMP and partitioned global address space (pgas) parallel applications to incorporate runtime management and summarisation capabilities. This offers a more scalable and effective profile of parallel execution performance for an initial overview and to direct instrumentation and event tracing to the key functions and callpaths for comprehensive analysis. The design and re-structuring of the revised measurement system are described, highlighting the synergies possible from integrated runtime callpath summarisation and event tracing for scalable parallel execution performance diagnosis. Early results from measurements of 16,384 MPI processes on IBM BlueGene/L already demonstrate considerably improved scalability.

[1]  David F. Heidel,et al.  An Overview of the BlueGene/L Supercomputer , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[2]  Bernd Mohr,et al.  Large Event Traces in Parallel Performance Analysis , 2006, ARCS Workshops.

[3]  Michael Gerndt,et al.  : A Profiling Tool for OpenMP , 2005, IWOMP.

[4]  Allen D. Malony,et al.  The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..

[5]  Stephen Gilmore,et al.  Combining Measurement and Stochastic Modelling to Enhance Scheduling Decisions for a Parallel Mean Value Analysis Algorithm , 2006, International Conference on Computational Science.

[6]  Jesús Labarta,et al.  DiP: A Parallel Program Development Environment , 1996, Euro-Par, Vol. II.

[7]  Michael F. P. O'Boyle,et al.  Compiler Reduction of Invalidation Traffic in Virtual Shared Memory Systems , 1996, Euro-Par, Vol. I.

[8]  Bernd Mohr,et al.  A Platform for Scalable Parallel Trace Analysis , 2006 .

[9]  Jack Dongarra,et al.  Recent Advances in Parallel Virtual Machine and Message Passing Interface, 15th European PVM/MPI Users' Group Meeting, Dublin, Ireland, September 7-10, 2008. Proceedings , 2008, PVM/MPI.

[10]  Jack Dongarra,et al.  An algebra for cross-experiment performance analysis , 2004 .

[11]  Barton P. Miller,et al.  A callgraph‐based search strategy for automated performance diagnosis , 2002, Concurr. Comput. Pract. Exp..

[12]  Bernd Mohr,et al.  Automatic performance analysis of hybrid MPI/OpenMP applications , 2003, Eleventh Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2003. Proceedings..

[13]  Bernd Mohr,et al.  Scalable Parallel Trace-Based Performance Analysis , 2006, PVM/MPI.

[14]  Wolfgang E. Nagel,et al.  Introducing the Open Trace Format (OTF) , 2006, International Conference on Computational Science.

[15]  Bernd Mohr,et al.  A Parallel Trace-Data Interface for Scalable Performance Analysis , 2006, PARA.