Evaluating Trace Aggregation Through Entropy Measures for Optimal Performance Visualization of Large Distributed Systems

Large-scale distributed high-performance applications are involving an ever-increasing number of threads to explore the extreme concurrency of today's systems. The performance analysis through visualization techniques usually su ers severe semantic limitations due, from one side, to the size of parallel applications, from another side, to the challenges to visualize large-scale traces. Most of performance visualization tools rely therefore on data aggregation in order to be able to scale. Even if this technique is frequently used, to the best of our knowledge, there has not been any real attempt to evaluate the quality of aggregated data for visualization. This paper presents an approach which lls this gap. We propose to build optimized macroscopic visualizations using measures inherited from information theory, and in particular the Kullback-Leibler divergence. These measures are used to estimate the complexity reduced and the information lost during any given data aggregation. We rst illustrate the applicability of our approach by exploiting these two measures in the analysis of work stealing traces using squari ed treemaps. We then report the e ective scalability of our approach by visualizing known anomalies in a synthetic trace le with the behavior of one million processes, with encouraging results.

[1]  Lucas Mello Schnorr,et al.  Visualizing More Performance Data Than What Fits on Your Screen , 2012, Parallel Tools Workshop.

[2]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[3]  Imre Csiszár,et al.  Axiomatic Characterizations of Information Measures , 2008, Entropy.

[4]  Jesús Labarta,et al.  John von Neumann Institute for Computing Scalability of Visualization and Tracing Tools , 2022 .

[5]  Thierry Gautier,et al.  KAAPI: A thread scheduling runtime system for data flow computations on cluster of multi-processors , 2007, PASCO '07.

[6]  William Gropp,et al.  Toward Scalable Performance Visualization with Jumpshot , 1999, Int. J. High Perform. Comput. Appl..

[7]  Patricia J. Teller,et al.  A systematic multi-step methodology for performance analysis of communication traces of distributed applications based on hierarchical clustering , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[8]  Toni Cortes,et al.  PARAVER: A Tool to Visualize and Analyze Parallel Code , 2007 .

[9]  Jean-Daniel Fekete,et al.  Hierarchical Aggregation for Information Visualization: Overview, Techniques, and Design Guidelines , 2010, IEEE Transactions on Visualization and Computer Graphics.

[10]  Y. Demazeau,et al.  Informational Measures of Aggregation for Complex Systems Analysis , 2012 .

[11]  Lucas Mello Schnorr,et al.  A hierarchical aggregation model to achieve visualization scalability in the analysis of parallel applications , 2012, Parallel Comput..

[12]  Wolfgang E. Nagel,et al.  Construction and compression of complete call graphs for post-mortem program trace analysis , 2005, 2005 International Conference on Parallel Processing (ICPP'05).

[13]  Jean-Marc Vincent,et al.  Detection and analysis of resource usage anomalies in large distributed systems through multi‐scale visualization , 2012, Concurr. Comput. Pract. Exp..

[14]  Jacques Chassin de Kergommeaux,et al.  Pajé, an interactive visualization tool for tuning multi-threaded parallel applications , 2000, Parallel Comput..

[15]  Yves Demazeau,et al.  How to Build the Best Macroscopic Description of Your Multi-Agent System? , 2013, PAAMS.

[16]  James M. Wilson,et al.  Gantt charts: A centenary appreciation , 2003, Eur. J. Oper. Res..

[17]  William Gropp,et al.  An efficient format for nearly constant-time access to arbitrary time intervals in large trace files , 2008 .

[18]  Franck Cappello,et al.  Grid'5000: A Large Scale And Highly Reconfigurable Experimental Grid Testbed , 2006, Int. J. High Perform. Comput. Appl..

[19]  Laxmikant V. Kalé,et al.  Towards scalable performance analysis and visualization through data reduction , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[20]  Karen L. Karavanic,et al.  Evaluating similarity-based trace reduction techniques for scalable performance analysis , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[21]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[22]  William Gropp,et al.  An efficient format for nearly constant-time access to arbitrary time intervals in large trace files , 2008, Sci. Program..

[23]  Lucas Mello Schnorr,et al.  Towards Visualization Scalability through Time Intervals and Hierarchical Organization of Monitoring Data , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[24]  Guido Juckeland,et al.  Comprehensive Performance Tracking with Vampir 7 , 2009, Parallel Tools Workshop.