A Plugin Architecture for the TAU Performance System

Several robust performance systems have been created for parallel machines with the ability to observe diverse aspects of application execution on different hardware platforms. All of these are designed with the objective to support measurement methods that are efficient, portable, and scalable. For these reasons, the performance measurement infrastructure is tightly embedded with the application code and runtime execution environment. As parallel software and systems evolve, especially towards more heterogeneous, asynchronous, and dynamic operation, it is expected that the requirements for performance observation and awareness will change. For instance, heterogeneous machines introduce new types of performance data to capture and performance behaviors to characterize. Furthermore, there is a growing interest in interacting with the performance infrastructure for in situ analytics and policy-based control. The problem is that an existing performance system architecture could be constrained in its ability to evolve to meet these new requirements. The paper reports our research efforts to address this concern in the context of the TAU Performance System. In particular, we consider the use of a powerful plugin model to both capture existing capabilities in TAU and to extend its functionality in ways it was not necessarily conceived originally. The TAU plugin architecture supports three types of plugin paradigms: EVENT, TRIGGER, and AGENT. We demonstrate how each operates under several different scenarios. Results from larger-scale experiments are shown to highlight the fact that efficiency and robustness can be maintained, while new flexibility and programmability can be offered that leverages the power of the core TAU system while allowing significant and compelling extensions to be realized.

[1]  George Ho,et al.  PAPI: A Portable Interface to Hardware Performance Counters , 1999 .

[2]  Matthias S. Müller,et al.  The Vampir Performance Analysis Tool-Set , 2008, Parallel Tools Workshop.

[3]  Ian Karlin,et al.  LULESH 2.0 Updates and Changes , 2013 .

[4]  Martin Schulz,et al.  Caliper: Performance Introspection for HPC Software Stacks , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[5]  Allen D. Malony,et al.  The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..

[6]  Jesús Labarta,et al.  Tools for Power-Energy Modelling and Analysis of Parallel Scientific Applications , 2012, 2012 41st International Conference on Parallel Processing.

[7]  Allen D. Malony,et al.  ParaProf: A Portable, Extensible, and Scalable Tool for Parallel Performance Profile Analysis , 2003, Euro-Par.

[8]  Anna Sikora,et al.  AutoTune: A Plugin-Driven Approach to the Automatic Tuning of Parallel Applications , 2012, PARA.

[9]  Dirk Schmidl,et al.  Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir , 2011, Parallel Tools Workshop.

[10]  Martin Schulz,et al.  Open | SpeedShop: An open source infrastructure for parallel performance analysis , 2008, Sci. Program..

[11]  Allen D. Malony,et al.  Design and Implementation of a Hybrid Parallel Performance Measurement System , 2010, 2010 39th International Conference on Parallel Processing.

[12]  William Gropp,et al.  From Trace Generation to Visualization: A Performance Framework for Distributed Parallel Systems , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[13]  Nathan R. Tallent,et al.  HPCTOOLKIT: tools for performance analysis of optimized parallel programs , 2010, Concurr. Comput. Pract. Exp..

[14]  Wolfgang E. Nagel,et al.  Extending the Functionality of Score-P through Plugins: Interfaces and Use Cases , 2017 .

[15]  Allen D. Malony,et al.  Parallel Performance Measurement of Heterogeneous Parallel Systems with GPUs , 2011, 2011 International Conference on Parallel Processing.

[16]  Allen D. Malony,et al.  A Scalable Observation System for Introspection and In Situ Analytics , 2016, 2016 5th Workshop on Extreme-Scale Programming Tools (ESPT).

[17]  Allen D. Malony,et al.  PerfExplorer: A Performance Data Mining Framework For Large-Scale Parallel Computing , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[18]  Bernd Mohr,et al.  The Scalasca performance toolset architecture , 2010, Concurr. Comput. Pract. Exp..

[19]  Michael Gerndt,et al.  PERISCOPE: An Online-Based Distributed Performance Analysis Tool , 2009, Parallel Tools Workshop.

[20]  Wolfgang E. Nagel,et al.  Introducing the Open Trace Format (OTF) , 2006, International Conference on Computational Science.

[21]  Jack J. Dongarra,et al.  An algebra for cross-experiment performance analysis , 2004, International Conference on Parallel Processing, 2004. ICPP 2004..

[22]  Dhabaleswar K. Panda,et al.  MPI performance engineering with the MPI tool interface: the integration of MVAPICH and TAU , 2017, EuroMPI/USA.

[23]  Barton P. Miller,et al.  The Paradyn Parallel Performance Measurement Tool , 1995, Computer.

[24]  Thomas Hérault,et al.  Software-Defined Events through PAPI , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[25]  Martin Schulz,et al.  PNMPI tools: a whole lot greater than the sum of their parts , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[26]  Allen D. Malony,et al.  An Autonomic Performance Environment for Exascale , 2015, Supercomput. Front. Innov..

[27]  Anirban Mandal,et al.  System-wide Introspection for Accurate Attribution of Performance Bottlenecks , 2013 .

[28]  Allen D. Malony,et al.  Design and implementation of a parallel performance data management framework , 2005, 2005 International Conference on Parallel Processing (ICPP'05).

[29]  Bernd Mohr,et al.  Efficient Pattern Search in Large Traces Through Successive Refinement , 2004, Euro-Par.

[30]  Robert Dietrich,et al.  OMPT: An OpenMP Tools Application Programming Interface for Performance Analysis , 2013, IWOMP.