Profiling of OpenMP Tasks with Score-P

With the task construct, the OpenMP 3.0 specification introduces an additional level of parallelism that challenges established schemes of performance profiling. First, a thread may execute a sequence of interleaved task fragments the profiling system must properly distinguish to enable correct performance analyses. Furthermore, the additional parallelization dimension requires new visualization methods for presenting analysis results. Finally, as a new programming paradigm, tasking implicitly introduces paradigm-specific performance issues and creates a need for corresponding optimization strategies. This paper presents solutions to overcome the challenges of profiling applications based on OpenMP tasks. Second, the paper describes metrics that may help uncover performance problems related to tasking. We present an implementation of our solution within the Score-P performance measurement system, which we evaluate using the Barcelona OpenMP Task Suite.

[1]  Yuan Lin,et al.  Providing Observability for OpenMP 3.0 Applications , 2009, IWOMP.

[2]  Toni Cortes,et al.  PARAVER: A Tool to Visualize and Analyze Parallel Code , 2007 .

[3]  Michael Gerndt,et al.  : A Profiling Tool for OpenMP , 2005, IWOMP.

[4]  Dirk Schmidl,et al.  How to Reconcile Event-Based Performance Analysis with Tasking in OpenMP , 2010, IWOMP.

[5]  Dirk Schmidl,et al.  Score-P: A Unified Performance Measurement System for Petascale Applications , 2010, CHPC.

[6]  Dirk Schmidl,et al.  Performance Analysis Techniques for Task-Based OpenMP Applications , 2012, IWOMP.

[7]  Allen D. Malony,et al.  The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..

[8]  Alejandro Duran,et al.  Barcelona OpenMP Tasks Suite: A Set of Benchmarks Targeting the Exploitation of Task Parallelism in OpenMP , 2009, 2009 International Conference on Parallel Processing.

[9]  Matthias S. Müller,et al.  The Vampir Performance Analysis Tool-Set , 2008, Parallel Tools Workshop.

[10]  David Skinner,et al.  Performance Profiling for OpenMP Tasks , 2009, IWOMP.

[11]  Bernd Mohr,et al.  Design and Prototype of a Performance Tool Interface for OpenMP , 2002, The Journal of Supercomputing.

[12]  Bernd Mohr,et al.  The Scalasca performance toolset architecture , 2010, Concurr. Comput. Pract. Exp..

[13]  Alejandro Duran,et al.  Optimizing the Exploitation of Multicore Processors and GPUs with OpenMP and OpenCL , 2010, LCPC.

[14]  Nathan R. Tallent,et al.  Effective performance measurement and analysis of multithreaded applications , 2009, PPoPP '09.

[15]  Nathan R. Tallent,et al.  HPCTOOLKIT: tools for performance analysis of optimized parallel programs , 2010, Concurr. Comput. Pract. Exp..

[16]  Gabriel Wittum,et al.  Competence in High Performance Computing 2010 , 2012, Springer Berlin Heidelberg.