Continuous performance monitoring for large-scale parallel applications

Traditional performance analysis techniques are performed after a parallel program has completed. In this paper, we describe an online method for continuously monitoring the performance of a parallel program, specifically the fraction of the time spent in various activities as the program executes. Our implementation of both a visualization client and the parallel performance framework that gathers utilization data are described. The data gathering uses a scalable and asynchronous reduction with an appropriate lossless compressed data format. The overheads in the initial system are low, even when run on thousands of processors. The data gathering occurs in an out-of-band communication mechanism, interleaving itself transparently with the execution of the parallel application by leveraging a message-driven runtime system.

[1]  Barton P. Miller,et al.  The Paradyn Parallel Performance Measurement Tool , 1995, Computer.

[2]  Laxmikant V. Kalé,et al.  Scalable molecular dynamics with NAMD , 2005, J. Comput. Chem..

[3]  Laxmikant V. Kalé,et al.  A Case Study in Tightly Coupled Multi-paradigm Parallel Programming , 2008, LCPC.

[4]  Peter L. Freddolino,et al.  Molecular dynamics simulations of the complete satellite tobacco mosaic virus. , 2006, Structure.

[5]  William Gropp,et al.  Toward Scalable Performance Visualization with Jumpshot , 1999, Int. J. High Perform. Comput. Appl..

[6]  Laxmikant V. Kalé,et al.  Performance evaluation of adaptive MPI , 2006, PPoPP '06.

[7]  Allen D. Malony,et al.  Online Performance Observation of Large-Scale Parallel Applications , 2003, PARCO.

[8]  Ronald Minnich,et al.  Supermon: a high-speed cluster monitoring system , 2002, Proceedings. IEEE International Conference on Cluster Computing.

[9]  Allen D. Malony,et al.  A Runtime Monitoring Framework for the TAU Profiling System , 1999, ISCOPE.

[10]  Laxmikant V. Kalé,et al.  Scaling applications to massively parallel machines using Projections performance analysis tool , 2006, Future Gener. Comput. Syst..

[11]  Felix Wolf,et al.  KOJAK - A Tool Set for Automatic Performance Analysis of Parallel Applications , 2003 .

[12]  Laxmikant V. Kale,et al.  Programming Petascale Applications with Charm , 2007 .

[13]  D.A. Reed,et al.  Scalable performance analysis: the Pablo performance analysis environment , 1993, Proceedings of Scalable Parallel Libraries Conference.

[14]  Laxmikant V. Kalé,et al.  Structured Dagger: A Coordination Language for Message-Driven Programming , 1996, Euro-Par, Vol. I.

[15]  Allen D. Malony,et al.  Observing Performance Dynamics Using Parallel Profile Snapshots , 2008, Euro-Par.

[16]  Jeffrey S. Vetter,et al.  Autopilot: adaptive control of distributed applications , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).

[17]  Laxmikant V. Kalé,et al.  MSA: Multiphase Specifically Shared Arrays , 2004, LCPC.

[18]  Interner Bericht VAMPIR: Visualization and Analysis of MPI Resources , 1996 .

[19]  Laxmikant V. Kalé,et al.  Charisma: orchestrating migratable parallel objects , 2007, HPDC '07.

[20]  Allen D. Malony,et al.  TAUg: Runtime Global Performance Data Access Using MPI , 2006, PVM/MPI.

[21]  Allen D. Malony,et al.  TAUoverSupermon : Low-Overhead Online Parallel Performance Monitoring , 2007, Euro-Par.

[22]  Allen D. Malony,et al.  The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..