Selective runtime monitoring: Non-intrusive elimination of high-frequency functions

High performance computing (HPC) systems are getting more and more powerful but also more and more complex. Supportive environments such as performance analysis tools are essential to assist developers in utilizing the computing resources of such complex systems. One of the most urgent challenges in event based performance analysis is the enormous amount of collected data. In particular, the recording of high-frequency short-running functions such as getter/setter class methods produces enormous amounts of data while in the same time contributing very less to an analysis of the overall application behavior. In this paper we address the impact of high-frequency function calls and present a method to minimize the amount of stored heavily-used functions while still keeping outliers that have an impact on the applications behavior. We propose a hierarchical memory buffer that is capable to discard recorded function calls when their duration is smaller than a pre-defined lower bound. We demonstrate the capabilities of our method with a prototype implementation that is based on the Open Trace Format 2, a state-of-the-art Open Source event trace library used by the performance analysis tools VAMPIR, SCALASCA, and TAU.

[1]  Wolfgang E. Nagel,et al.  Compressible memory data structures for event-based trace analysis , 2006, Future Gener. Comput. Syst..

[2]  Carsten Kutzner,et al.  GROMACS 4:  Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. , 2008, Journal of chemical theory and computation.

[3]  Guido Juckeland,et al.  Non-intrusive Performance Analysis of Parallel Hardware Accelerated Applications on Hybrid Architectures , 2010, 2010 39th International Conference on Parallel Processing Workshops.

[4]  Allen D. Malony,et al.  The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..

[5]  Bernd Mohr,et al.  The Scalasca performance toolset architecture , 2010, Concurr. Comput. Pract. Exp..

[6]  B.P. Miller,et al.  MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[7]  Felix Wolf,et al.  Reducing the Overhead of Direct Application Instrumentation Using Prior Static Analysis , 2011, Euro-Par.

[8]  Michael Wagner,et al.  Strategies for Real-Time Event Reduction , 2012, Euro-Par Workshops.

[9]  David H. Bailey,et al.  NAS parallel benchmark results , 1992, Proceedings Supercomputing '92.

[10]  Juan Gonzalez,et al.  On-line detection of large-scale parallel application's structure , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[11]  Michael Wagner,et al.  Enhanced Encoding Techniques for the Open Trace Format 2 , 2012, ICCS.

[12]  Matthias S. Müller,et al.  The Vampir Performance Analysis Tool-Set , 2008, Parallel Tools Workshop.

[13]  Michael Wagner,et al.  Hierarchical Memory Buffering Techniques for an In-Memory Event Tracing Extension to the Open Trace Format 2 , 2013, 2013 42nd International Conference on Parallel Processing.

[14]  Michael Wagner,et al.  Open Trace Format 2: The Next Generation of Scalable Trace Formats and Support Libraries , 2011, PARCO.

[15]  Dirk Schmidl,et al.  Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir , 2011, Parallel Tools Workshop.