Combing the Communication Hairball: Visualizing Parallel Execution Traces using Logical Time

With the continuous rise in complexity of modern supercomputers, optimizing the performance of large-scale parallel programs is becoming increasingly challenging. Simultaneously, the growth in scale magnifies the impact of even minor inefficiencies - potentially millions of compute hours and megawatts in power consumption can be wasted on avoidable mistakes or sub-optimal algorithms. This makes performance analysis and optimization critical elements in the software development process. One of the most common forms of performance analysis is to study execution traces, which record a history of per-process events and interprocess messages in a parallel application. Trace visualizations allow users to browse this event history and search for insights into the observed performance behavior. However, current visualizations are difficult to understand even for small process counts and do not scale gracefully beyond a few hundred processes. Organizing events in time leads to a virtually unintelligible conglomerate of interleaved events and moderately high process counts overtax even the largest display. As an alternative, we present a new trace visualization approach based on transforming the event history into logical time inferred directly from happened-before relationships. This emphasizes the code's structural behavior, which is much more familiar to the application developer. The original timing data, or other information, is then encoded through color, leading to a more intuitive visualization. Furthermore, we use the discrete nature of logical timelines to cluster processes according to their local behavior leading to a scalable visualization of even long traces on large process counts. We demonstrate our system using two case studies on large-scale parallel codes.

[1]  Matthias Zwicker,et al.  Ieee Transactions on Visualization and Computer Graphics Ewa Splatting , 2002 .

[2]  Laxmikant V. Kalé,et al.  Scaling Molecular Dynamics to 3000 Processors with Projections: A Performance Analysis Case Study , 2003, International Conference on Computational Science.

[3]  Robert J. Fowler,et al.  Scalable methods for monitoring and detecting behavioral equivalence classes in scientific codes , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[4]  Niklas Elmqvist,et al.  Growing squares: animated visualization of causal relations , 2003, SoftVis '03.

[5]  Interner Bericht VAMPIR: Visualization and Analysis of MPI Resources , 1996 .

[6]  W. Cleveland,et al.  Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods , 1984 .

[7]  Kwan-Liu Ma,et al.  Visualizing Large‐scale Parallel Communication Traces Using a Particle Animation Technique , 2013, Comput. Graph. Forum.

[8]  M. Schulz,et al.  Extracting Critical Path Graphs from MPI Applications , 2005, 2005 IEEE International Conference on Cluster Computing.

[9]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[10]  Bernd Hamann,et al.  State of the Art of Performance Visualization , 2014, EuroVis.

[11]  Martin Wattenberg,et al.  Studying cooperation and conflict between authors with history flow visualizations , 2004, CHI.

[12]  Vaidy S. Sunderam,et al.  PVaniM: a tool for visualization in network computing environments , 1998, Concurr. Pract. Exp..

[13]  Jeffrey Heer,et al.  Tracing genealogical data with TimeNets , 2010, AVI.

[14]  Fan Zhang,et al.  Combining in-situ and in-transit processing to enable extreme-scale scientific analysis , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[15]  Kwan-Liu Ma,et al.  Software evolution storylines , 2010, SOFTVIS '10.

[16]  Jürgen Döllner,et al.  Understanding complex multithreaded software systems by using trace visualization , 2010, SOFTVIS '10.

[17]  Guido Juckeland,et al.  Comprehensive Performance Tracking with Vampir 7 , 2009, Parallel Tools Workshop.

[18]  Ben Shneiderman,et al.  Temporal Event Sequence Simplification , 2013, IEEE Transactions on Visualization and Computer Graphics.

[19]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[20]  Wim De Pauw,et al.  Zinsight: a visual and analytic environment for exploring large event traces , 2010, SOFTVIS '10.

[21]  Juan Gonzalez,et al.  Automatic detection of parallel applications computation phases , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[22]  David W. Stemple,et al.  The Ariadne debugger: scalable application of event-based abstraction , 1993, PADD '93.

[23]  Martin Schulz,et al.  Scalable Critical-Path Based Performance Analysis , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[24]  Martin Schulz,et al.  Clustering performance data efficiently at massive scales , 2010, ICS '10.

[25]  Toni Cortes,et al.  PARAVER: A Tool to Visualize and Analyze Parallel Code , 2007 .

[26]  Robin Sibson,et al.  SLINK: An Optimally Efficient Algorithm for the Single-Link Cluster Method , 1973, Comput. J..

[27]  Margaret H. Wright,et al.  The opportunities and challenges of exascale computing , 2010 .

[28]  Ray W. Grout,et al.  Feature-Based Statistical Analysis of Combustion Simulation Data , 2011, IEEE Transactions on Visualization and Computer Graphics.

[29]  Bernd Mohr,et al.  Usage of the SCALASCA toolset for scalable performance analysis of large-scale parallel applications , 2008, Parallel Tools Workshop.

[30]  Valerio Pascucci,et al.  In-Situ Feature Extraction of Large Scale Combustion Simulations Using Segmented Merge Trees , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[31]  Heidrun Schumann,et al.  Visualization of Time-Oriented Data , 2011, Human-Computer Interaction Series.

[32]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[33]  Kwan-Liu Ma,et al.  Visual Analysis of Inter-Process Communication for Large-Scale Parallel Computing , 2009, IEEE Transactions on Visualization and Computer Graphics.

[34]  Martin Schulz,et al.  Visualizing Network Traffic to Understand the Performance of Massively Parallel Simulations , 2012, IEEE Transactions on Visualization and Computer Graphics.

[35]  John B. Bell,et al.  Interactive Exploration and Analysis of Large-Scale Simulations Using Topology-Based Data Segmentation , 2011, IEEE Transactions on Visualization and Computer Graphics.

[36]  Ben Shneiderman,et al.  LifeFlow: visualizing an overview of event sequences , 2011, CHI.

[37]  Bernd Hamann,et al.  Mapping applications with collectives over sub-communicators on torus networks , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[38]  A. B. Langdon,et al.  Filamentation and forward Brillouin scatter of entire smoothed and aberrated laser beams , 2000 .

[39]  Bernd Hamann,et al.  Ordering Traces Logically to Identify Lateness in Parallel Programs , 2014 .

[40]  Peter J. Rousseeuw,et al.  Clustering Large Applications (Program CLARA) , 2008 .