Caliper: Performance Introspection for HPC Software Stacks

Many performance engineering tasks, from long-term performance monitoring to post-mortem analysis and online tuning, require efficient runtime methods for introspection and performance data collection. To understand interactions between components in increasingly modular HPC software, performance introspection hooks must be integrated into runtime systems, libraries, and application codes across the software stack. This requires an interoperable, cross-stack, general-purpose approach to performance data collection, which neither application-specific performance measurement nor traditional profile or trace analysis tools provide. With Caliper, we have developed a general abstraction layer to provide performance data collection as a service to applications, runtime systems, libraries, and tools. Individual software components connect to Caliper in independent data producer, data consumer, and measurement control roles, which allows them to share performance data across software stack boundaries. We demonstrate Caliper's performance analysis capbilities with two case studies of production scenarios.

[1]  D. Corkill Blackboard Systems , 1991 .

[2]  Richard D. Hornung,et al.  The RAJA Portability Layer: Overview and Status , 2014 .

[3]  Dirk Schmidl,et al.  Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir , 2011, Parallel Tools Workshop.

[4]  Ralph Grishman,et al.  Artificial Intelligence Research in Progress at the Courant Institute, New York University , 1986, AI Mag..

[5]  Ian Karlin,et al.  LULESH Programming Model and Performance Ports Overview , 2012 .

[6]  Interner Bericht VAMPIR: Visualization and Analysis of MPI Resources , 1996 .

[7]  Anirban Mandal,et al.  System-wide Introspection for Accurate Attribution of Performance Bottlenecks , 2013 .

[8]  Robert D. Falgout,et al.  The Design and Implementation of hypre, a Library of Parallel High Performance Preconditioners , 2006 .

[9]  Robert Dietrich,et al.  OMPT: An OpenMP Tools Application Programming Interface for Performance Analysis , 2013, IWOMP.

[10]  Martin Schulz,et al.  Open | SpeedShop: An open source infrastructure for parallel performance analysis , 2008, Sci. Program..

[11]  Nathan R. Tallent,et al.  HPCTOOLKIT: tools for performance analysis of optimized parallel programs , 2010, Concurr. Comput. Pract. Exp..

[12]  Allen D. Malony,et al.  PerfExplorer: A Performance Data Mining Framework For Large-Scale Parallel Computing , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[13]  Andrew M. Wissink,et al.  Parallel clustering algorithms for structured AMR , 2005, J. Parallel Distributed Comput..

[14]  H. Penny Nii,et al.  Blackboard Application Systems, Blackboard Systems and a Knowledge Engineering Perspective , 1986 .

[15]  Allen D. Malony,et al.  An Autonomic Performance Environment for Exascale , 2015, Supercomput. Front. Innov..

[16]  Stephen A. Jarvis,et al.  Resident Block-Structured Adaptive Mesh Refinement on Thousands of Graphics Processing Units , 2015, 2015 44th International Conference on Parallel Processing.

[17]  H. P Nii,et al.  Blackboard Systems , 1986 .

[18]  Allen D. Malony,et al.  The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..

[19]  Robert J. Fowler,et al.  HPCVIEW: A Tool for Top-down Analysis of Node Performance , 2002, The Journal of Supercomputing.

[20]  Bernd Mohr,et al.  The Scalasca performance toolset architecture , 2010, Concurr. Comput. Pract. Exp..

[21]  George Ho,et al.  PAPI: A Portable Interface to Hardware Performance Counters , 1999 .

[22]  Penny Nii,et al.  Blackboard systems part two: Blackboard application systems , 1986 .

[23]  Victor R. Lesser,et al.  The Hearsay-II Speech-Understanding System: Integrating Knowledge to Resolve Uncertainty , 1980, CSUR.