Scalable Compression and Replay of Communication Traces in Massively P arallel E nvironments

Characterizing the communication behavior of large-scale applications is a difficult and costly task due to code/system complexity and their long execution times. An alternative to running actual codes is to gather their communication traces and then replay them, which facilitates application tuning and future procurements. While past approaches lacked lossless scalable trace collection, we contribute an approach that provides orders of magnitude smaller, if not near constant-size, communication traces regardless of the number of nodes while preserving structural information. We introduce intraand inter-node compression techniques of MPI events and present results of our implementation for BlueGene/L. Given this novel capability, we discuss its impact on communication tuning and beyond. To the best of our knowledge, such a concise representation of MPI traces in a scalable manner combined with deterministic MPI call replay are without any precedence.

[1]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[2]  Erkki Mäkinen,et al.  A Survey on Binary Tree Codings , 1991, Comput. J..

[3]  Ken Kennedy,et al.  An Implementation of Interprocedural Bounded Regular Section Analysis , 1991, IEEE Trans. Parallel Distributed Syst..

[4]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[5]  E. Lusk,et al.  Installation guide to mpich, a portable implementation of MPI , 1996 .

[6]  Richard P. Martin,et al.  Architectural Requirements and Scalability of the NAS Parallel Benchmarks , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[7]  Remzi H. Arpaci-Dusseau,et al.  Architectural Requirements and Scalability of the NAS Parallel Benchmarks , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[8]  Bronis R. de Supinski,et al.  Dynamic Software Testing of MPI Applications with Umpire , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[9]  Jeffrey S. Vetter,et al.  Statistical scalability analysis of communication operations in distributed applications , 2001, PPoPP '01.

[10]  Wolfgang E. Nagel,et al.  Performance Optimization for Large Scale Computing: The Scalable VAMPIR Approach , 2001, International Conference on Computational Science.

[11]  Jeffrey S. Vetter,et al.  Dynamic statistical profiling of communication activity in distributed applications , 2002, SIGMETRICS '02.

[12]  Jeffrey S. Vetter,et al.  Scalable Analysis Techniques for Microprocessor Performance Counter Metrics , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[13]  Greg Burns,et al.  LAM: An Open Cluster Environment for MPI , 2002 .

[14]  David F. Heidel,et al.  An Overview of the BlueGene/L Supercomputer , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[15]  Jeffrey K. Hollingsworth,et al.  SIGMA: A Simulator Infrastructure to Guide Memory Analysis , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[16]  George L.-T. Chiu,et al.  Blue Gene/L, a system-on-a-chip , 2002, Proceedings. IEEE International Conference on Cluster Computing.

[17]  William Gropp,et al.  MPI on BlueGene/L: Designing an Efficient General Purpose Messaging Solution for a Large Cellular System , 2003, PVM/MPI.

[18]  Sally A. McKee,et al.  METRIC: tracking down inefficiencies in the memory hierarchy via binary rewriting , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[19]  B.P. Miller,et al.  MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[20]  Allen Kuhl,et al.  Raptor – Software and Applications on BlueGene/L* , 2003 .

[21]  José E. Moreira,et al.  An Overview of the Blue Gene/L System Software Organization , 2003, Euro-Par.

[22]  Eleanor Chu,et al.  Minimizing Communication Penalty of Triangular Solvers by Runtime Mesh Configuration and Workload Redistribution , 2004, The Journal of Supercomputing.

[23]  George L.-T. Chiu,et al.  Overview of the Blue Gene/L system architecture , 2005, IBM J. Res. Dev..

[24]  Bronis R. de Supinski,et al.  Tera-Scalable Algorithms for Variable-Density Elliptic Hydrodynamics with Spectral Accuracy , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[25]  Rami G. Melhem,et al.  A compiler-based communication analysis approach for multiprocessor systems , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[26]  Wolfgang E. Nagel,et al.  Introducing the Open Trace Format (OTF) , 2006, International Conference on Computational Science.

[27]  Toni Cortes,et al.  PARAVER: A Tool to Visualize and Analyze Parallel Code , 2007 .