Using adaptive runtime filtering to support an event‐based performance analysis

Event‐based performance monitoring and analysis are effective means when tuning parallel applications for optimal resource usage. In this article, we address the data capacity challenge that arises when applying the tracing methodology to large‐scale parallel applications and long execution times. Existing approaches use static, pre‐defined event filters to reduce the performance data to a manageable size. In contrast, we propose self‐guided filters that automatically adapt to an application's runtime behaviour and therefore, do not require any previous knowledge or application executions. Our contribution consists of four adaptive runtime filters, which target a specific type of data redundancy each. The filters focus on detecting identical events in loop iterations, constant events with no variation in time, and very short, highly frequent, typically not very meaningful events, having a severe impact on the total data volume. We evaluate our prototype implementation with five real‐world applications and achieve a data reduction of two orders of magnitude while increasing execution time less than 1%. Likewise, we show that the qualitative impact of our filters on performance analysis in state‐of‐the‐art analysis tools can be reduced by adding feedback methods and statistical information to the filtered traces. Copyright © 2017 John Wiley & Sons, Ltd.

[1]  Michael Wagner,et al.  Open Trace Format 2: The Next Generation of Scalable Trace Formats and Support Libraries , 2011, PARCO.

[2]  B.P. Miller,et al.  MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[3]  Michael Wagner,et al.  Tracing long running applications: A case study using Gromacs , 2015, 2015 International Conference on High Performance Computing & Simulation (HPCS).

[4]  Steve Plimpton,et al.  Fast parallel algorithms for short-range molecular dynamics , 1993 .

[5]  Michael Wagner,et al.  Runtime message uniquification for accurate communication analysis on incomplete MPI event traces , 2013, EuroMPI.

[6]  Dirk Schmidl,et al.  Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir , 2011, Parallel Tools Workshop.

[7]  J. C. Yan,et al.  Constructing Space-Time Views from Fixed Size Trace Files - Getting the Best of Both Worlds , 1997, PARCO.

[8]  Carsten Kutzner,et al.  GROMACS 4:  Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. , 2008, Journal of chemical theory and computation.

[9]  Ken Kennedy,et al.  An Implementation of Interprocedural Bounded Regular Section Analysis , 1991, IEEE Trans. Parallel Distributed Syst..

[10]  Michael Wagner,et al.  Hierarchical Memory Buffering Techniques for an In-Memory Event Tracing Extension to the Open Trace Format 2 , 2013, 2013 42nd International Conference on Parallel Processing.

[11]  Priya Narasimhan,et al.  Middleware 2012 : ACM/IFIP/USENIX, 13th International Middleware Conference, Montreal, QC, Canada, December 3-7, 2012 : proceedings , 2012 .

[12]  Martin Schulz,et al.  ScalaTrace: Scalable compression and replay of communication traces for high-performance computing , 2008, J. Parallel Distributed Comput..

[13]  Juan Gonzalez,et al.  On-line detection of large-scale parallel application's structure , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[14]  Jack J. Dongarra,et al.  An algebra for cross-experiment performance analysis , 2004, International Conference on Parallel Processing, 2004. ICPP 2004..

[15]  Wolfgang E. Nagel,et al.  Compressible memory data structures for event-based trace analysis , 2006, Future Gener. Comput. Syst..

[16]  Matthias S. Müller,et al.  The Vampir Performance Analysis Tool-Set , 2008, Parallel Tools Workshop.

[17]  Bernd Mohr,et al.  The Scalasca performance toolset architecture , 2010, Concurr. Comput. Pract. Exp..

[18]  Michael Wagner,et al.  Strategies for Real-Time Event Reduction , 2012, Euro-Par Workshops.

[19]  Karen L. Karavanic,et al.  Evaluating similarity-based trace reduction techniques for scalable performance analysis , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[20]  Wolfgang E. Nagel,et al.  Towards Detailed Exascale Application Analysis - Selective Monitoring and Visualisation , 2014, EASC.

[21]  Aamer Jaleel,et al.  Analyzing Parallel Programs with PIN , 2010, Computer.

[22]  Jesús Labarta,et al.  A dynamic periodicity detector: application to speedup computation , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[23]  Felix Wolf,et al.  Reducing the Overhead of Direct Application Instrumentation Using Prior Static Analysis , 2011, Euro-Par.

[24]  Michael Wagner,et al.  Selective runtime monitoring: Non-intrusive elimination of high-frequency functions , 2014, 2014 International Conference on High Performance Computing & Simulation (HPCS).

[25]  Michael Wagner,et al.  Enhanced Encoding Techniques for the Open Trace Format 2 , 2012, ICCS.

[26]  Jesús Labarta,et al.  Scalability of Tracing and Visualization Tools , 2005, PARCO.

[27]  Jeffrey K. Hollingsworth,et al.  An API for Runtime Code Patching , 2000, Int. J. High Perform. Comput. Appl..

[28]  Jesús Labarta,et al.  Trace Spectral Analysis toward Dynamic Levels of Detail , 2011, 2011 IEEE 17th International Conference on Parallel and Distributed Systems.

[29]  Rui Abreu,et al.  A dynamic code coverage approach to maximize fault localization efficiency , 2014, J. Syst. Softw..