Performance analysis of distributed applications using automatic classification of communication inefficiencies

We present a technique for performance analysis that helps users understand the communication behavior of their message passing applications. Our method automatically classifies individual communication operations and it reveals the cause of communication inefficiencies in the application. This classification allows the developer to focus quickly on the culprits of truly inefficient behavior, rather than manually foraging through massive amounts of performance data. Specifically, we trace the message operations of MPI applications and then classify each individual communication event using decision tree classification, a supervised learning technique. We train our decision tree using microbenchmarks that demonstrate both efficient and inefficient communication. Since our technique adapts to the target system's configuration through these microbenchmarks, we can simultaneously automate the performance analysis process and improve classification accuracy. Our experiments on four applications demonstrate that our technique can improve the accuracy of performance analysis, and dramatically reduce the amount of data that users must encounter

[1]  Michael Mikolajczak,et al.  Designing And Building Parallel Programs: Concepts And Tools For Parallel Software Engineering , 1997, IEEE Concurrency.

[2]  Allen D. Malony,et al.  Portable profiling and tracing for parallel, scientific applications using C++ , 1998, SPDT '98.

[3]  Yarsun Hsu,et al.  Timestamp consistency and trace-driven analysis for distributed parallel systems , 1995, Proceedings of 9th International Parallel Processing Symposium.

[4]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[5]  John Domingue,et al.  Software visualization : programming as a multimedia experience , 1998 .

[6]  Jack Dongarra,et al.  MPI: The Complete Reference , 1996 .

[7]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[8]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[9]  C.-L. Chen,et al.  Trace-based analysis and tuning for distributed parallel applications , 1994, Proceedings of 1994 International Conference on Parallel and Distributed Systems.

[10]  B. C. Curtis,et al.  Very High Resolution Simulation of Compressible Turbulence on the IBM-SP System , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[11]  David A. Landgrebe,et al.  A survey of decision tree classifier methodology , 1991, IEEE Trans. Syst. Man Cybern..

[12]  Thomas L. Casavant,et al.  Using perturbation tracking to compensate for intrusion in message-passing systems , 1994, 14th International Conference on Distributed Computing Systems.

[13]  M.,et al.  An Overview of the Pablo Performance Analysis , 1992 .

[14]  G. A. Geist,et al.  A user's guide to PICL a portable instrumented communication library , 1990 .

[15]  Alan L. Cox,et al.  Performance debugging shared memory parallel programs using run-time dependence analysis , 1997, SIGMETRICS '97.

[16]  Jerry C. Yan,et al.  Normalized performance indices for message passing parallel programs , 1994, ICS '94.

[17]  Thomas E. Anderson,et al.  Quartz: a tool for tuning parallel program performance , 1990, SIGMETRICS '90.

[18]  Salvatore J. Stolfo,et al.  Mining in a data-flow environment: experience in network intrusion detection , 1999, KDD '99.

[19]  Barton P. Miller,et al.  The Paradyn Parallel Performance Measurement Tool , 1995, Computer.

[20]  Michael T. Heath,et al.  Parallel performance visualization: from practice to theory , 1995, IEEE Parallel Distributed Technol. Syst. Appl..

[21]  Mariacarla Calzarossa,et al.  Medea: a tool for workload characterization of parallel systems , 1995, IEEE Parallel Distributed Technol. Syst. Appl..

[22]  Philip C. Roth,et al.  Real-Time Statistical Clustering for Event Trace Reduction , 1997, Int. J. High Perform. Comput. Appl..

[23]  Wagner Meira,et al.  Waiting time analysis and performance visualization in Carnival , 1996, SPDT '96.

[24]  Anthony Skjellum,et al.  Using MPI: portable parallel programming with the message-passing interface, 2nd Edition , 1999, Scientific and engineering computation series.

[25]  Kai Li,et al.  Performance measurements for multithreaded programs , 1998, SIGMETRICS '98/PERFORMANCE '98.