FACT: fast communication trace collection for parallel applications through program slicing

A proper understanding of communication patterns of parallel applications is important to optimize application performance and design better communication subsystems. Communication patterns can be obtained by analyzing communication traces. However, existing approaches to generate communication traces need to execute the entire parallel applications on full-scale systems that are time-consuming and expensive. In this paper, we propose a novel technique, called Fact, which can perform FAst Communication Trace collection for large-scale parallel applications on small-scale systems. Our idea is to reduce the original program to obtain a program slice through static analysis, and to execute the program slice to acquire the communication traces. The program slice preserves all the variables and statements in the original program relevant to spatial and volume communication attributes. Our idea is based on an observation that most computation and message contents in message-passing parallel applications are independent of these attributes, and therefore can be removed from the programs for the purpose of communication trace collection. We have implemented Fact and evaluated it with NPB programs and Sweep3D. The results show that Fact can preserve the spatial and volume communication attributes of original programs and reduce resource consumptions by two orders of magnitude in most cases. For example, Fact collects the communication traces of the Sweep3D for 512 processes on a 4-node (32 cores) platform in just 6.79 seconds, consuming 1.25 GB memory, while the original program takes 256.63 seconds and consumes 213.83 GB memory on a 32-node (512 cores) platform. Finally, we present an application of Fact.

[1]  Fabrizio Petrini,et al.  Predictive Performance and Scalability Modeling of a Large-Scale Application , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[2]  Martin Schulz,et al.  Detecting Patterns in MPI Communication Traces , 2008, 2008 37th International Conference on Parallel Processing.

[3]  David W. Binkley,et al.  The application of program slicing to regression testing , 1998, Inf. Softw. Technol..

[4]  John Banning,et al.  : An Efficient , 2022 .

[5]  Mark Harman,et al.  Using program slicing to simplify testing , 1995, Softw. Test. Verification Reliab..

[6]  Andrew W. Appel Modern Compiler Implementation in ML: Basic Techniques , 1997 .

[7]  Wenguang Chen,et al.  MPIPP: an automatic profile-guided parallel process placement toolset for SMP clusters and multiclusters , 2006, ICS '06.

[8]  M. Schulz,et al.  Extracting Critical Path Graphs from MPI Applications , 2005, 2005 IEEE International Conference on Cluster Computing.

[9]  Martin Schulz,et al.  Using MPI Communication Patterns to Guide Source Code Transformations , 2008, ICCS.

[10]  Wolfgang E. Nagel,et al.  VAMPIR: Visualization and Analysis of MPI Resources , 2010 .

[11]  Allen D. Malony,et al.  The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..

[12]  Bernd Mohr,et al.  KOJAK - A Tool Set for Automatic Performance Analysis of Parallel Programs , 2003, Euro-Par.

[13]  Nai-Wei Lin,et al.  Static Analysis of Communication Structures in Parallel Programs , 2002 .

[14]  Paul D. Hovland,et al.  Data-Flow Analysis for MPI Programs , 2006, 2006 International Conference on Parallel Processing (ICPP'06).

[15]  Jeffrey S. Vetter,et al.  Statistical scalability analysis of communication operations in distributed applications , 2001, PPoPP '01.

[16]  Wenguang Chen,et al.  MPIWiz: subgroup reproducible replay of mpi applications , 2009, PPoPP '09.

[17]  Laura Carrington,et al.  A Framework for Application Performance Modeling and Prediction , 2002 .

[18]  Jeffrey S. Vetter,et al.  Communication characteristics of large-scale scientific applications for contemporary cluster architectures , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[19]  Chita R. Das,et al.  Towards a communication characterization methodology for parallel applications , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[20]  David W. Binkley,et al.  Interprocedural slicing using dependence graphs , 1990, TOPL.

[21]  David J. Lilja,et al.  Characterization of Communication Patterns in Message-Passing Parallel Scientific Application Programs , 1998, CANPC.

[22]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[23]  Jesús Labarta,et al.  A Framework for Performance Modeling and Prediction , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[24]  Reza Zamani,et al.  Communication Characteristics of Message-Passing Scientific and Engineering Applications , 2005, IASTED PDCS.

[25]  Keith Brian Gallagher,et al.  Using Program Slicing in Software Maintenance , 1991, IEEE Trans. Software Eng..

[26]  Rami G. Melhem,et al.  Switch design to enable predictive multiplexed switching in multiprocessor networks , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[27]  Mark Weiser,et al.  Programmers use slices when debugging , 1982, CACM.

[28]  Sudhakar Yalamanchili,et al.  Interconnection Networks: An Engineering Approach , 2002 .

[29]  Surendra Byna,et al.  Hiding I/O latency with pre-execution prefetching for parallel applications , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[30]  Rami G. Melhem,et al.  A compiler-based communication analysis approach for multiprocessor systems , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[31]  Steven S. Muchnick,et al.  Advanced Compiler Design and Implementation , 1997 .

[32]  Ahmad Faraj,et al.  Communication Characteristics in the NAS Parallel Benchmarks , 2002, IASTED PDCS.

[33]  Greg Bronevetsky,et al.  Communication-Sensitive Static Dataflow for Parallel Message Passing Applications , 2009, 2009 International Symposium on Code Generation and Optimization.

[34]  David W. Binkley,et al.  Program slicing , 2008, 2008 Frontiers of Software Maintenance.

[35]  Jin Zhang,et al.  Process Mapping for MPI Collective Communications , 2009, Euro-Par.

[36]  Joe D. Warren,et al.  The program dependence graph and its use in optimization , 1987, TOPL.

[37]  Jesús Labarta,et al.  DiP: A Parallel Program Development Environment , 1996, Euro-Par, Vol. II.