Addressing Challenges in Visualizing Huge Call-Path Traces

Analysis and optimization of long-running applications on large-scale parallel systems is important to avoid unacceptable inefficiencies. Tracing is one of the most popular techniques for understanding the performance of parallel programs. Since tracing captures data in the time dimension, the size of a trace is linearly proportional to execution time. For that reason, traces of long-running executions of parallel programs may contain gigabytes or even terabytes of data. Presenting huge traces in a scalable fashion and identifying performance bottlenecks hidden in an ocean of data are challenging problems. To pinpoint performance bottlenecks effectively, a performance visualization tool needs to be relatively responsive and scalable. It also needs to be able to present both a global view of the performance of all threads and processes in a parallel execution, and a local view to see the full detail of a trace for an individual thread or process. Our approach to address this challenge is to use a client-server approach for trace visualization in hpctraceviewer, which is part of the HPC-TOOLKIT performance tools. This paper demonstrates the utility of our tool for identifying performance bottlenecks in large-scale executions through case studies with two Department of Energy procurement benchmarks: Algebraic Multi Grid (AMG) and Unstructured Mesh Transport (UMT) codes. Finally, the experiment shows that our implementation is scalable, rendering views of huge traces stored on remote supercomputers in a few seconds.

[1]  Jeffrey S. Vetter,et al.  A Dynamic Tracing Mechanism for Performance Analysis of OpenMP Applications , 2001, WOMPAT.

[2]  Martin Schulz,et al.  ScalaTrace: Scalable compression and replay of communication traces for high-performance computing , 2008, J. Parallel Distributed Comput..

[3]  Martin Schulz,et al.  Scalable compression and replay of communication traces in massively parallel environments , 2006, SC.

[4]  Matthias S. Müller,et al.  The Vampir Performance Analysis Tool-Set , 2008, Parallel Tools Workshop.

[5]  Bernd Mohr,et al.  The Scalasca performance toolset architecture , 2010, Concurr. Comput. Pract. Exp..

[6]  Jesús Labarta,et al.  John von Neumann Institute for Computing Scalability of Visualization and Tracing Tools , 2022 .

[7]  Martin Schulz,et al.  Scalable load-balance measurement for SPMD codes , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[8]  Karsten Schwan,et al.  Falcon: On‐line monitoring for steering parallel programs , 1998 .

[9]  John Shalf,et al.  The International Exascale Software Project roadmap , 2011, Int. J. High Perform. Comput. Appl..

[10]  Nathan R. Tallent,et al.  Scalable fine-grained call path tracing , 2011, ICS '11.

[11]  Nathan R. Tallent,et al.  Diagnosing performance bottlenecks in emerging petascale applications , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[12]  Robert B. Ross,et al.  Enabling event tracing at leadership-class scale through I/O forwarding middleware , 2012, HPDC '12.

[13]  SchulzMartin,et al.  Open|SpeedShop: An open source infrastructure for parallel performance analysis , 2008 .

[14]  Allen D. Malony,et al.  The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..

[15]  Bernd Hamann,et al.  State of the Art of Performance Visualization , 2014, EuroVis.

[16]  Nathan R. Tallent,et al.  HPCTOOLKIT: tools for performance analysis of optimized parallel programs , 2010, Concurr. Comput. Pract. Exp..

[17]  Nathan R. Tallent,et al.  Effectively Presenting Call Path Profiles of Application Performance , 2010, 2010 39th International Conference on Parallel Processing Workshops.

[18]  James R. Larus,et al.  Exploiting hardware performance counters with flow and context sensitive profiling , 1997, PLDI '97.

[19]  Wolfgang E. Nagel,et al.  Construction and compression of complete call graphs for post-mortem program trace analysis , 2005, 2005 International Conference on Parallel Processing (ICPP'05).

[20]  Martin Schulz,et al.  Open | SpeedShop: An open source infrastructure for parallel performance analysis , 2008, Sci. Program..

[21]  William Gropp,et al.  Toward Scalable Performance Visualization with Jumpshot , 1999, Int. J. High Perform. Comput. Appl..

[22]  Wim De Pauw,et al.  Zinsight: a visual and analytic environment for exploring large event traces , 2010, SOFTVIS '10.

[23]  F. Mueller,et al.  Scalable Compression and Replay of Communication Traces in Massively P arallel E nvironments , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[24]  Luiz De Rose,et al.  Cray Performance Analysis Tools , 2008, Parallel Tools Workshop.

[25]  Peter Deutsch,et al.  DEFLATE Compressed Data Format Specification version 1.3 , 1996, RFC.

[26]  Felix Wolf,et al.  Space-efficient time-series call-path profiling of parallel applications , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[27]  Robert J. Fowler,et al.  Scalable methods for monitoring and detecting behavioral equivalence classes in scientific codes , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[28]  Laxmikant V. Kalé,et al.  Towards scalable performance analysis and visualization through data reduction , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[29]  Hao Yu,et al.  A study of MPI performance analysis tools on Blue Gene/L , 2006, IPDPS.

[30]  Karsten Schwan,et al.  Falcon: On-line monitoring for steering parallel programs , 1998, Concurr. Pract. Exp..