Dynamic statistical profiling of communication activity in distributed applications

Performance analysis of communication activity for a terascale application with traditional message tracing can be overwhelming in terms of overhead, perturbation, and storage. We propose a novel alternative that enables dynamic statistical profiling of an application's communication activity using message sampling. We have implemented an operational prototype, named PHOTON, and our evidence shows that this new approach can provide an accurate, low-overhead, tractable alternative for performance analysis of communication activity. PHOTON consists of two components: a Message Passing Interface (MPI) profiling layer that implements sampling and analysis, and a modified MPI runtime that appends a small but necessary amount of information to individual messages. More importantly, this alternative enables an assortment of runtime analysis techniques so that, in contrast to post-mortem, trace-based techniques, the raw performance data can be jettisoned immediately after analysis. Our investigation shows that message sampling can reduce overhead to imperceptible levels for many applications. Experiments on several applications demonstrate the viability of this approach. For example, with one application, our technique reduced the analysis overhead from 154% for traditional tracing to 6% for statistical profiling. We also evaluate different sampling techniques in this framework. The coverage of the sample space provided by purely random sampling is superior to counter- and timer-based sampling. Also, PHOTON's design reveals that frugal modifications to the MPI runtime system could facilitate such techniques on production computing systems, and it suggests that this sampling technique could execute continuously for long-running applications.

[1]  William Gropp,et al.  From Trace Generation to Visualization: A Performance Framework for Distributed Parallel Systems , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[2]  Lance M. Berc,et al.  Continuous profiling: where have all the cycles gone? , 1997, TOCS.

[3]  Jack Dongarra,et al.  MPI: The Complete Reference , 1996 .

[4]  Pat Hanrahan,et al.  Rivet: a flexible environment for computer systems visualization , 2000, SIGGRAPH 2000.

[5]  Jeffrey S. Vetter,et al.  A Dynamic Tracing Mechanism for Performance Analysis of OpenMP Applications , 2001, WOMPAT.

[6]  Susan L. Graham,et al.  Gprof: A call graph execution profiler , 1982, SIGPLAN '82.

[7]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[8]  John Domingue,et al.  Software visualization : programming as a multimedia experience , 1998 .

[9]  Thomas E. Anderson,et al.  Quartz: a tool for tuning parallel program performance , 1990, SIGMETRICS '90.

[10]  Karsten Schwan,et al.  Falcon: On-line Monitoring and Steering of Parallel Programs , 1995 .

[11]  G. A. Geist,et al.  A user's guide to PICL a portable instrumented communication library , 1990 .

[12]  Karsten Schwan,et al.  Falcon: On‐line monitoring for steering parallel programs , 1998 .

[13]  Jeffrey S. Vetter,et al.  Communication characteristics of large-scale scientific applications for contemporary cluster architectures , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[14]  José E. Moreira,et al.  Demonstrating the scalability of a molecular dynamics application on a Petaflop computer , 2001, ICS '01.

[15]  Fabrizio Petrini,et al.  A general predictive performance model for wavefront algorithms on clusters of SMPs , 2000, Proceedings 2000 International Conference on Parallel Processing.

[16]  Jeffrey S. Vetter Performance analysis of distributed applications using automatic classification of communication inefficiencies , 2000, ICS '00.

[17]  Lance M. Berc,et al.  Continuous profiling: where have all the cycles gone? , 1997, ACM Trans. Comput. Syst..

[18]  Michael T. Heath,et al.  Parallel performance visualization: from practice to theory , 1995, IEEE Parallel Distributed Technol. Syst. Appl..

[19]  Allen D. Malony,et al.  Portable profiling and tracing for parallel, scientific applications using C++ , 1998, SPDT '98.

[20]  kc claffy,et al.  Application of sampling methodologies to network traffic characterization , 1993, SIGCOMM 1993.

[21]  B. C. Curtis,et al.  Very High Resolution Simulation of Compressible Turbulence on the IBM-SP System , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[22]  D.A. Reed,et al.  Scalable performance analysis: the Pablo performance analysis environment , 1993, Proceedings of Scalable Parallel Libraries Conference.

[23]  Jesús Labarta,et al.  DiP: A Parallel Program Development Environment , 1996, Euro-Par, Vol. II.

[24]  Robert D. Falgout,et al.  Semicoarsening Multigrid on Distributed Memory Machines , 1999, SIAM J. Sci. Comput..

[25]  Michael Voss,et al.  An Integrated Performance Visualizer for MPI/OpenMP Programs , 2001, WOMPAT.

[26]  Allen D. Malony,et al.  Visualizing parallel computer system performance , 1989 .